If you have a hammer, every problem looks like a nail.
With Botium, we are currently defining the industry standard for testing chatbots. In our support and developer channels we are regularily receiving questions like:
- I have to test a Whatsapp chatbot, can you help me to set up Appium for it ?
- For our client I have to test a chatbot embedded in their app, can I test it with Botium ?
- I have troubles with testing the customer support chatbot on our website, Selenium says
- … and so on
The conclusion to draw from these questions is: the test engineers learned how to test websites with Selenium and smartphone apps with Appium in the past, and now they try to apply this valueable knowledge again — neglecting the fact that chatbots are a new kind of apps that require new kind of tools (like Botium).
You can read about the most important differences in one of my previous blog posts.
With Selenium and Appium, we are talking about End-2-End testing (E2E) — simulating the full user experience on a graphical user interface. Those tests
- are extremely slow in execution, as they are basically running in realtime, and even for a medium-size chatbot project there typically is a 5-figure number of test cases for having a satisfying test coverage — running those tests in and E2E scenario will take hours in best case
- require a high amount of computing resources or access to expensive browser/device cloud services
- are flaky as the required infrastructure is error-prone as well
- cannot provide a holistic view of the test object quality, as some important assertions as the pure NLP performance are technically not possible at all with E2E testing.
So here are my recommendations for test engineers how to get going when asked for testing a chatbot.
The most important metric for a chatbot is: is it able to do a meaningful conversation with a client ? In every chatbot project team there are conversation designers which, well, design the conversations that will make up the final user experience. The chatbot engine is trained (or coded) to provide the logic for these conversations.
And this is the place to start testing: make sure that the conversations are working as designed, from a content perspective. You can read more about conversation flow testing in the Botium docs.
One important skill to have is knowing BotiumScript, the scripting language to define conversation flow test cases.
Testing the NLP engine
Most chatbots have some kind of natural language processing (NLP) component as part of the processing pipeline — it enabled users to communicate with the chatbot in natural language, and that’s what actually makes up a chatbot. As a test engineer it is your job to explore the limits of the NLP engine, and this requires basic skills in machine learning concepts, such as
- intents, entities and prediction confidence
- accuracy, sensitivity, specificity, precision, recall, F1-score
- confusion matrix
You can read about it in my blog series Quality Metrics for NLU/Chatbot Training Data.
Testing the end-user experience on user interface level is an important part of a testing strategy. When doing it right you now have the confidence the conversation flow and the NLP component are doing their work, so it is now time to add some user interface testing to the mix. The recommendation is to
- do a small number of test cases, which cover all of the possible user interaction elements
- do those tests on a mix of representative browser versions / operating systems / smartphone devices, both virtual and physical
The good news is that here test engineers can shine with the existing knowledge on Selenium and Appium!
Read on in the Botium Wiki how to setup this with Botium!
Finally, there are also non-functional tests like performance tests and security tests to add to the test mix. Opposed to the other test types those are typically done on certain milestones in the project.
A new generation of apps such as chatbots require a new generation of testing tools, like Botium. Test engineers have to develop additional skills for testing conversational interfaces like chatbots.