Speech recognition integration via API. Speech to text AI. Implementation instructions

Asynchronous recognition

Automation of speech recognition using advanced technologies.

Asynchronous recognition, suitable for recognizing audio and video recordings.

Main actions:
1. Upload the file for recognition via HTTP protocol to our repository.
2. Run the recognition task: in the request, pass the name of the downloaded file and the recognition parameters. The response will contain the recognized text.

File upload code:

curl https://yazapishu.ru/api/upload.php \

--header "Authorization: <api_token>" \

-F "upload=@<file_name>"

where:
api_token - unique user identifier. Issued after registration. Used for authorization.
file_name - the name of your file to recognize (may contain the path to your file).

Recognition launch code:

curl https://yazapishu.ru/api.php \

--header "Authorization: <api_token>" \

--header "Content-Type: application/json" \

--data '{

"audio_name": "<file_name>",

"language": "<language_cod>"

}'

List of additional parameters:
file_name - the name of the uploaded file for recognition (only the name of your file with the extension. Example : audio1.mp3).
"language": "<language_cod>" - language recognition
"speaker": "-l" - split text into speakers
"timecod": "yes" - specify timecodes in text
"json": "yes" - get data in JSON format

Available recognition languages:
Russian - ru, English - en , English – British - en_uk, English – American - en_us, English – Australian - en_au, Spanish - es , Italian - it , Chinese - zh , Korean - ko , German - de , Dutch - nl , Polish - pl , Portuguese - pt , Turkish - tr , French - fr , Finnish - fi , Japanese - ja
Example parameter: "language": "ru"

Getting data in Json format

Recognition launch code:

curl https://yazapishu.ru/api.php \

--header "Authorization: <api_token>" \

--header "Content-Type: application/json" \

--data '{

"audio_name": "<file_name>",

"language": "<language_cod>",

"json": "yes"

}'

The results contain the entire recognized text, the text divided by parameters, and a list of recognized words.

The response will be data in JSON format, where:

Key	Type	Description
text	string	Transcript audio file .
words	array	An array containing information about each word
utterances	array	Array containing speakers' statements
utterances[i].speaker	string	Statement of a specific speaker
words[i].text	string	Text of the i-th word in the transcript
words[i].start	number	The beginning of the pronunciation of this word in the audio file, in milliseconds.
words[i].end	number	The end of the pronunciation of this word in the audio file, in milliseconds.
words[i].confidence	number	Reliability assessment for decoding the i-th word
words[i].speaker	string	If the "Speaker Separation" feature is enabled, then the speaker who uttered the i-th word
status	string	Status recognition : completed or error
audio_duration	number	Audio Duration
id	number	Current recognition identifier

Request for completed recognitions by ID:

curl https://yazapishu.ru/api/api_request.php \

--header "Authorization: <api_token>" \

--header "Content-Type: application/json" \

--data '{

"id": "<id>"

}'

Example of performing recognition in the SHELL command shell:

audio='myspeech.mp3'



curl https://yazapishu.ru/api/upload.php \

--header "Authorization: 773fec3ac285bc7c9e9951ef7f7ddad8" \

-F "upload=@$audio"




text=$(curl https://yazapishu.ru/api.php \

--header "Authorization: 773fec3ac285bc7c9e9951ef7f7ddad8" \

--header "Content-Type: application/json" \

--data '{

"audio_name": "'$audio'",

"language": "en",

"speaker": "-l",

"json": "yes"

}')




printf '%s\n' "$text"

You can upload audio and video files to the system: mp3,wav,mp4,avi,aac,m4a, ac3,flac,ogg,wma,mov,flv,3gp,asf,wmv,mkv,webm.

All recognition results will also be available in your personal account.

You can run up to 100 parallel recognitions.

API

Recognition speeches

Asynchronous recognition

Getting data in Json format