Testing AI with Quality
An interview with Rik Marselis, Digital Assurance & Testing Expert at Sogeti Labs, Trainer for TMap, ISTQB & TPI.
How do you see the role of Quality Characteristics (QCs) in AI systems? And, what are the Quality Characteristics specific to AI Systems that do not overlap traditional systems?
For every IT-system, quality is determined by many different characteristics. Often, we distinguish between “functional” and “non-functional”. These non-functionals are many, such as performance, usability, and security. All quality characteristics are just as relevant for systems that include AI as for other systems. But when I researched testing of AI-systems, I found that the well-known characteristics, for example, of the ISO 25010 standard, did not cover all relevant aspects. So, we added “Intelligent behavior” (which covers topics like ability to learn and transparency of choices), “Morality” (to view the ethical side of AI implementations), and “Personality” (that amongst other aspects looks at mood and humor).
As an AI tester, how can one ensure that above mentioned QCs are achieved while testing an AI application?
Of course, there are very many approaches and techniques to test for quality characteristics. I would like to go into details for one of them, a very important one. That is the sub-characteristic-transparency of choices, part of intelligent behavior. I’m convinced that in the near future under many situations, customers will ask their bank or insurance company, to explain why they made a specific decision (for example, if an insurance company doesn’t pay a claim). In that case, the explanation “Because our AI decided so” won’t do. So, companies will need “explainable AI”. This field is currently rapidly discovered, for example, by our own team, and the term “XAI” (eXplainableAI) pops up more and more often. Very briefly, XAI can be applied both after the AI has made the decision by tracing back for example based on a log. Or, it can be done up-front by adding some
functionality to the AI to make it show extra information that helps understand why decisions have been made. In the example of the insurance claim, the AI may give the information that the fact that an original receipt of purchase of the article was not included and therefore, it wasn’t paid. In which case the customer knows how he could solve it.
What is Cognitive QA and how does that help in testing?
Cognitive QA is a service of Sogeti that uses AI to support various testing tasks. For example, creating a real-time dashboard for which the AI gathers data from various test management tools to compile a concise overview of the current status of quality, risks and progress. Another example is to evaluate a huge test-set and decide which test cases are relevant for regression testing and which test cases can be skipped.
Since AI is also used in the testing lifecycle, are there any risks that can come with the usage of AI in testing?
Of course, AI brings risks just like any other tool that is used to support testing tasks. Currently, a major risk is too high expectations. People misunderstand the meaning of the words artificial intelligence and think that it will be magic. But, in general, it is not much more than interpreting huge amounts of data and based on that come to conclusions. If the wrong training data is used or the wrong goals are set, then AI will not fulfil the expectations. So, like with any testing tool, it starts with defining the objectives and then carefully finds a tool that is capable of reaching those objectives.
Why is it so, that the release cycle in ‘Digital Testing’ is shorter than that in ‘Continuous Testing’? And, how do we ensure QA with such shorter release cycles?
Thanks for this question. You have obviously read my book “Testing in the digital age; AI makes the difference”. In our book, we state that continuous testing takes less than days and digital testing takes less than minutes. Our experience is that when people use continuous testing in their DevOps pipeline, they often include a traditional regression test that often slows down the deployment process because the regression test still takes hours. In digital testing, there are two developments that make it possible to significantly shorten the duration of the test
First, we use AI to make testing more efficient, for example, by deciding per run which test cases may be skipped (for example, if a low-risk area was not changed, only a small subset of the test cases is run). Secondly, we really believe in Quality Forecasting. This means that we use AI to predict the evolution of the quality level of a system. This can be done by using data from previous test cycles together with data from monitoring the live operation of a system. If the AI forecasts a decrease in quality, the team can already take measures before any customer notices a problem.
What approaches and/or testing methodologies are used when the outcomes/oracles are not known while testing AI applications?
Indeed, a big problem in testing AI systems, specifically for continuously learning AI, is that a correct answer today may differ from a correct answer tomorrow. We have described several approaches, I’ll explain two. Tolerance is the first. This means we define boundaries between which an answer should be. Input is another. Traditionally, testers focus on the output of systems. But since machine learning algorithms change their behaviour based on the input, testers also should have a look at the input. Of course, testers can’t sit next to the system all day and watch the input. However, testers can contribute to creating input-filters that ensure that AI only gets relevant and good input.
Where do you see ‘future testing’ heading?
Testers today already need to be able to use test tools. The more AItools become available, the more testers will also need to be capable of testing using these tools. The efficiency and effectiveness of testing will be further improved. But still some manual exploratory testing will always remain needed. Also, testers need to have a general understanding of machine learning and its pitfalls. Just like for other technologies, testers can only come to a solid assessment of the quality of a system if they understand what kind of aspects of quality are relevant. Therefore, I think the quality characteristics, both existing and specifically added for AI, as discussed earlier in this interview, are crucial for testers to make a well-founded judgement of the quality of AIpowered systems.
What resources do you refer for testing in general and for testing AI?
For me, both ISTQB (www.istqb. org) and TMap (www.tmap.net) are a good starting point. For both many books and other materials have been published. Currently, I’m working on a new book in the TMap series and based on that we’ll also do a complete overhaul of the tmap.net website early next year. Further, I like to visit testing conferences because that’s where you meet people that are working on the latest innovations in the testing profession, and they are eager to share their visions and experiences.
What was your experience being engaged in the syllabus writing for AiU?
To me, it was a pleasure to contribute to the Ai United syllabus. People that know me have seen that I’m also willing to help other testers improve their knowledge and skills. So, when this opportunity to spread knowledge on AI testing came by, I was very glad to take it. And I really like the result. I’m just about to get the results of the pilot-courses, and I’m curious where the syllabus can be further improved before it’s quickly brought live.
What do you like doing in your free time?
Many people have seen me walking around at testing conferences carrying my Canon SLR with a huge Tamron 16-300 zoom lens. So yes, that gives away that, besides testing, my other hobby is photography. And it’s not only at conferences of course. Actually, during vacations, my wife (who is also a keen photographer) and I make many pictures and my wife always creates very nice books as a memory about the great road-trips.