API


Recognition speeches

Asynchronous recognition


Automation of speech recognition using advanced technologies.

Asynchronous recognition, suitable for recognizing audio and video recordings.

Main actions:
1. Upload the file for recognition via HTTP protocol to our repository.
2. Run the recognition task: in the request, pass the name of the downloaded file and the recognition parameters. The response will contain the recognized text.

File upload code:

curl https://yazapishu.ru/api/upload.php \
--header "Authorization: <api_token>" \
-F "upload=@<file_name>"


where:
api_token - unique user identifier. Issued after registration. Used for authorization.
file_name - the name of your file to recognize (may contain the path to your file).


Recognition launch code:

curl https://yazapishu.ru/api.php \
--header "Authorization: <api_token>" \
--header "Content-Type: application/json" \
--data '{
"audio_name": "<file_name>",
"language": "<language_cod>"
}'


List of additional parameters:
file_name - the name of the uploaded file for recognition (only the name of your file with the extension. Example : audio1.mp3).
"language": "<language_cod>" - language recognition
"speaker": "-l" - split text into speakers
"timecod": "yes" - specify timecodes in text
"json": "yes" - get data in JSON format


Available recognition languages:
Russian - ru, English - en , English – British - en_uk, English – American - en_us, English – Australian - en_au, Spanish - es , Italian - it , Chinese - zh , Korean - ko , German - de , Dutch - nl , Polish - pl , Portuguese - pt , Turkish - tr , French - fr , Finnish - fi , Japanese - ja
Example parameter: "language": "ru"



Getting data in Json format


Recognition launch code:


curl https://yazapishu.ru/api.php \
--header "Authorization: <api_token>" \
--header "Content-Type: application/json" \
--data '{
"audio_name": "<file_name>",
"language": "<language_cod>",
"json": "yes"
}'


The results contain the entire recognized text, the text divided by parameters, and a list of recognized words.

The response will be data in JSON format, where:


Key Type Description
text string Transcript audio file .
words array An array containing information about each word
utterances array Array containing speakers' statements
utterances[i].speaker string Statement of a specific speaker
words[i].text string Text of the i-th word in the transcript
words[i].start number The beginning of the pronunciation of this word in the audio file, in milliseconds.
words[i].end number The end of the pronunciation of this word in the audio file, in milliseconds.
words[i].confidence number Reliability assessment for decoding the i-th word
words[i].speaker string If the "Speaker Separation" feature is enabled, then the speaker who uttered the i-th word
status string Status recognition : completed or error
audio_duration number Audio Duration
id number Current recognition identifier


Request for completed recognitions by ID:


curl https://yazapishu.ru/api/api_request.php \
--header "Authorization: <api_token>" \
--header "Content-Type: application/json" \
--data '{
"id": "<id>"
}'


Example of performing recognition in the SHELL command shell:


audio='myspeech.mp3'

curl https://yazapishu.ru/api/upload.php \
--header "Authorization: 773fec3ac285bc7c9e9951ef7f7ddad8" \
-F "upload=@$audio"

text=$(curl https://yazapishu.ru/api.php \
--header "Authorization: 773fec3ac285bc7c9e9951ef7f7ddad8" \
--header "Content-Type: application/json" \
--data '{
"audio_name": "'$audio'",
"language": "en",
"speaker": "-l",
"json": "yes"
}')


printf '%s\n' "$text"


You can upload audio and video files to the system: mp3,wav,mp4,avi,aac,m4a, ac3,flac,ogg,wma,mov,flv,3gp,asf,wmv,mkv,webm.

All recognition results will also be available in your personal account.

You can run up to 100 parallel recognitions.