Continuous Speech Recognition Testing

Table of contents

Conversational AI and Voice

Continuous Speech Recognition Testing

Configure Speech Services in Botium Box

Speech Recognition (Speech-to-Text)
Speech Synthesis (Text-to-Speech)
Custom Speech Engines
Prepare Chatbot for Speech Synthesis and Recognition
Prepare Test Set

Gathering Test Data

Labelled Reference Audio Files
Synthesizing Speech
Record Speech
Run Speech Recognition Test Session

Humanification: Adding Noise

Voice Effects Pipeline
Apply Noise Effects

Bonus: Check Word Error Rate

Bonus: Speech Recognition Regression Tests

 

Conversational AI and Voice

Continuous Speech Recognition Testing

Configure Speech Services in Botium Box

Speech Recognition (Speech-To-Text)

Botium Box allows very fine grained control over how the speech service is used with the Custom Engine Configuration field. For example, to select your own customized Azure speech model:

{
"speechConfig":{
"endpointId":"xxx-yyyy-zzzzzzzzzz"
}
}

It is possible to immediately test the speech service configuration by using your own microphone.

Speech Synthesis (Text-To-Speech)

Again, you can test the configuration by listening to a first example.

Custom Speech Engines

Prepare Chatbot for Speech Synthesis and Recognition

The configuration can be tested immediately with the Say Hello or even with the Live Chat. You can either enter some text in the live chat for which voice will be synthesized, or you can record your own voice as well.

Prepare Test Set

Gathering Test Data

Labelled Reference Audio Files

F01-hello-how-are-you.wav;hello how are you
F02-hi-whats-up.wav;hi whats up

Synthesizing Speech

When opening the folder now in the File Browser, you can see a bunch of audio files — one for each line of text and voice — as well as one transcription file per audio file, ending in .txt.

Record Speech

Run Speech Recognition Test Session

Now click on Start Test Session Now to start the first test session. A few minutes later you can already start to inspect the results.

You can see the list of audio files where the speech recognition matched the expected transcription, and the ones where it failed. You can dive into the results, listen to the audio files and view the result details on a JSON code level (use the <> button).

Humanification: Adding Noise

Voice Effects Pipeline

Clone the Test Set from above and add something with “Noise” to the name. This Test Set is initially empty, but has the same settings as the above.

Verify that the Audio File Usage field is indeed set to Use all audio files as Test Case input, and read transcription from file.

Apply Noise Effects

Now create a new Test Project for the new Test Set with noise, and start a first test session.

Again, you can inspect the test results after a few minutes to check if the applied noise makes the speech recognition struggle.

Bonus: Check Word Error Rate

Currently there is no special Word Error Rate asserter in Botium Box, but we can tailor the Generic JSONPath Asserter for doing this. In Botium Box, in the Settings menu open the Registered Components section. Register a new component there, name it Check WER and select Test Case Asserter as Component Type.

{
"path": "$.response..[?(@.wer == 0)]"
}

If we are fine with a word error rate of exactly or below 0.1, this is the expression to use:

{
"path": "$.response..[?(@.wer <= 0.1)]"
}

We now have to tell Botium Box to use this asserter in our tests. Save the registered component, and navigate to your Test Project. Open the Test Execution section in Settings, and add the Check WER component to the Involved Registered Component(s) field.

When now running a test session, Botium will not assert on the exact transcription anymore, but it will instead use the Check WER asserter to fail a test case if the word error rate is above 0.1 — you can inspect the detailed results in the test session again:

 

Bonus: Speech Recognition Regression Tests

In this CSV file, we are only interested in the columns named testCaseName (which contains the file name) and convoStepActual (which contains the transcription). Filter the convoStepSender column on the value bot and hide all other columns except the two mentioned ones. Copy the remaining data as file transcript.csv to the folder with your audio files in Botium Box.

When now switch the Audio File Usage field to Use all audio files as Test Case input, and read transcription from file, Botium will read the expected transcriptions from this file and any changes in the transcription for your reference files will popup up in the test results.