Testing Artificial Intelligence (AI) Systems

Testing Artificial Intelligence (AI) Systems

Artificial Intelligent systems fall within the domain of scientific software. Scientific software help in decision making, critical thinking, and cognitive analysis. In general, testing scientific software is tough and so is testing AI systems. In reality, the scope of testing AI systems goes beyond functional and non-functional areas. We discuss the AI testing approach in this article along with some myths related to AI.

Why Test AI?

AI can also go wrong; it can fail! See tai Newsletter, June 2019. It is time to break the prevailing myths around AI. Some of the leading myths about AI which Gartner [6] addresses are:

Myth #1: AI works in the same way as the human brain

AI has not yet become as mature as a human brain. The category of problems solved by AI systems is Narrow. They are trained to perform a specific task and assist humans in solving pressing problems. Deep Learning is a form of machine learning, which emulates the human brain, but has not yet reached the unique capability of the human brain.

Myth #2: Intelligent machines learn on their own

Can ‘AI’ learn and evolve on its own? The answer is an emphatic ‘No’. Human intervention is always needed to develop AI
based systems and solutions, even when it comes to upgrading or optimizing software. The need for humans will always be there. For example, humans label data which is used in machine learning. This proves that AI as a technology will not take away the jobs of humans. Taking inputs from humans is at the core of AI technology.

Myth #3: AI is free of bias

Since AI systems learn from the training dataset provided by humans, there will always be an intrinsic bias in AI-based solutions.
There are no ways to completely abandon this bias, but diversified datasets and the diversity in developing and testing teams helps to reduce the selection and confirmation bias. Due to the existence of various myths, people think that there is no need to test AI systems. The truth is that AI systems need to be tested, but with a different approach.

Why is Testing of AI Systems Different?

Since AI involves training the machines to learn, unlearn, and optimize specific tasks during their lifecycle, there is a specific need to train and test them ethically.
The development of AI systems is different from the development of traditional software. The differences appear not only in the software development life cycle (SDLC), but also in the software testing life cycle (STLC). Non-functional aspects of testing defined in ISO25010 lists many quality characteristics, such as performance, security, reliability, robustness, and usability. However, AI systems need to be hammered on the anvil of ‘Ethics’ before getting deployed in realtime environment.

1. Ethical Behaviour

Research work of Rik Marselis [2] shows that most quality characteristics do not cover all the relevant aspects of testing AI systems. The extended quality characteristics specifically crucial for AI systems are:

  • Intelligent Behavior which signifies the ability to learn and comprehend with transparency of choices.
  • Morality as related to ethics, privacy, and human friendliness.
  • Personality which consists of the individual’s distinctive characteristics.

These quality characteristics can be ensured with the use of XAI (eXplainable AI) – where the developer/tester will have an answer to the question: why did the machine make a specific decision? XAI is a rapidly developing field. To explore more about XAI, stay tuned with our upcoming newsletters.

2. The Test Oracle Problem

The oracle problem remains one of the challenging problems in the world of software testing; it gets exacerbated in the field of testing AI, since the outcomes are nondeterministic. Without test oracle automation, humans need to determine whether the outcome of the testing process is correct or incorrect. Very little research has been done in this direction. In machine learning and in AI algorithms, often, there are no oracles without human intervention. For example, the image recognition problem is solved by the supervised machine learning (ML) algorithm, which begins by labelling the dataset correctly for training and testing. Metamorphic testing is used to mitigate the ‘oracle’ problem. We will discuss this in detail in our upcoming newsletters.

3. Criticality of Input Data

The ‘input data’, with which the ML models are trained and tested play a major role in explaining the systems’ outcome. Some important points to consider are:

  • Generating corner cases in ML systems are tough and costly.
  • Testing ML systems, to quite an extent, depends on the imagination and creativity of the tester who considers every possible boundary case scenario.
  • Simulating ML systems does not always guarantee a fool proof outcome, as opposed to traditional software.



Tom van de Ven, Rik Marselis and Humayun Shaukat. Testing in the Digital Age: AI makes the difference. Kleine Uil, Uitgeverij, 2018 – ISBN 978 90 75414 87 5


Testing Of Artificial Intelligence https://www.sogeti.com/ globalassets/global/ downloads/reports/testing-ofartificial-intelligence_sogetireport_11_12_2017-.pdf


EuroSTAR 2018 tutorial by Rik Marselis: Testing Intelligent Machines. https://www.slideshare. net/RikMarselis/eurostar-2018tutorial-rik-marselis-testingintelligent-machines

Test your Machine Learning Algorithm with Metamorphic Testing. https://medium. com/trustableai/testing-aiwith-metamorphic-testing61d690001f5c

Testing scientific software: A systematic literature review. https://www.sciencedirect. com/science/article/abs/pii/ S0950584914001232

Debunking the key myths around Artificial Intelligence: Gartner. https://content.techgig.com/ debunking-the-key-myths-aroundartificial-intelligence-gartner/ articleshow/68007651.cms

Sonika Bengani

Righteousness/Dharma (धर्मं) at tai

Interview with Rik Marselis

Interview with Rik Marseli

Testing AI with Quality

An interview with Rik Marselis, Digital Assurance & Testing Expert at Sogeti Labs, Trainer for TMap, ISTQB & TPI.

How do you see the role of Quality Characteristics (QCs) in AI systems? And, what are the Quality Characteristics specific to AI Systems that do not overlap traditional systems?

For every IT-system, quality is determined by many different characteristics. Often, we distinguish between “functional” and “non-functional”. These non-functionals are many, such as performance, usability, and security. All quality characteristics are just as relevant for systems that include AI as for other systems. But when I researched testing of AI-systems, I found that the well-known characteristics, for example, of the ISO 25010 standard, did not cover all relevant aspects. So, we added “Intelligent behavior” (which covers topics like ability to learn and transparency of choices), “Morality” (to view the ethical side of AI implementations), and “Personality” (that amongst other aspects looks at mood and humor).

As an AI tester, how can one ensure that above mentioned QCs are achieved while testing an AI application?

Of course, there are very many approaches and techniques to test for quality characteristics. I would like to go into details for one of them, a very important one. That is the sub-characteristic-transparency of choices, part of intelligent behavior. I’m convinced that in the near future under many situations, customers will ask their bank or insurance company, to explain why they made a specific decision (for example, if an insurance company doesn’t pay a claim). In that case, the explanation “Because our AI decided so” won’t do. So, companies will need “explainable AI”. This field is currently rapidly discovered, for example, by our own team, and the term “XAI” (eXplainableAI) pops up more and more often. Very briefly, XAI can be applied both after the AI has made the decision by tracing back for example based on a log. Or, it can be done up-front by adding some
functionality to the AI to make it show extra information that helps understand why decisions have been made. In the example of the insurance claim, the AI may give the information that the fact that an original receipt of purchase of the article was not included and therefore, it wasn’t paid. In which case the customer knows how he could solve it.

What is Cognitive QA and how does that help in testing?

Cognitive QA is a service of Sogeti that uses AI to support various testing tasks. For example, creating a real-time dashboard for which the AI gathers data from various test management tools to compile a concise overview of the current status of quality, risks and progress. Another example is to evaluate a huge test-set and decide which test cases are relevant for regression testing and which test cases can be skipped.

Since AI is also used in the testing lifecycle, are there any risks that can come with the usage of AI in testing?

Of course, AI brings risks just like any other tool that is used to support testing tasks. Currently, a major risk is too high expectations. People misunderstand the meaning of the words artificial intelligence and think that it will be magic. But, in general, it is not much more than interpreting huge amounts of data and based on that come to conclusions. If the wrong training data is used or the wrong goals are set, then AI will not fulfil the expectations. So, like with any testing tool, it starts with defining the objectives and then carefully finds a tool that is capable of reaching those objectives.

Why is it so, that the release cycle in ‘Digital Testing’ is shorter than that in ‘Continuous Testing’? And, how do we ensure QA with such shorter release cycles?

Thanks for this question. You have obviously read my book “Testing in the digital age; AI makes the difference”. In our book, we state that continuous testing takes less than days and digital testing takes less than minutes. Our experience is that when people use continuous testing in their DevOps pipeline, they often include a traditional regression test that often slows down the deployment process because the regression test still takes hours. In digital testing, there are two developments that make it possible to significantly shorten the duration of the test

First, we use AI to make testing more efficient, for example, by deciding per run which test cases may be skipped (for example, if a low-risk area was not changed, only a small subset of the test cases is run). Secondly, we really believe in Quality Forecasting. This means that we use AI to predict the evolution of the quality level of a system. This can be done by using data from previous test cycles together with data from monitoring the live operation of a system. If the AI forecasts a decrease in quality, the team can already take measures before any customer notices a problem.

What approaches and/or testing methodologies are used when the outcomes/oracles are not known while testing AI applications?

Indeed, a big problem in testing AI systems, specifically for continuously learning AI, is that a correct answer today may differ from a correct answer tomorrow. We have described several approaches, I’ll explain two. Tolerance is the first. This means we define boundaries between which an answer should be. Input is another. Traditionally, testers focus on the output of systems. But since machine learning algorithms change their behaviour based on the input, testers also should have a look at the input. Of course, testers can’t sit next to the system all day and watch the input. However, testers can contribute to creating input-filters that ensure that AI only gets relevant and good input.

Where do you see ‘future testing’ heading?

Testers today already need to be able to use test tools. The more AItools become available, the more testers will also need to be capable of testing using these tools. The efficiency and effectiveness of testing will be further improved. But still some manual exploratory testing will always remain needed. Also, testers need to have a general understanding of machine learning and its pitfalls. Just like for other technologies, testers can only come to a solid assessment of the quality of a system if they understand what kind of aspects of quality are relevant. Therefore, I think the quality characteristics, both existing and specifically added for AI, as discussed earlier in this interview, are crucial for testers to make a well-founded judgement of the quality of AIpowered systems.

What resources do you refer for testing in general and for testing AI?

For me, both ISTQB (www.istqb. org) and TMap (www.tmap.net) are a good starting point. For both many books and other materials have been published. Currently, I’m working on a new book in the TMap series and based on that we’ll also do a complete overhaul of the tmap.net website early next year. Further, I like to visit testing conferences because that’s where you meet people that are working on the latest innovations in the testing profession, and they are eager to share their visions and experiences.

What was your experience being engaged in the syllabus writing for AiU?

To me, it was a pleasure to contribute to the Ai United syllabus. People that know me have seen that I’m also willing to help other testers improve their knowledge and skills. So, when this opportunity to spread knowledge on AI testing came by, I was very glad to take it. And I really like the result. I’m just about to get the results of the pilot-courses, and I’m curious where the syllabus can be further improved before it’s quickly brought live.

What do you like doing in your free time?

Many people have seen me walking around at testing conferences carrying my Canon SLR with a huge Tamron 16-300 zoom lens. So yes, that gives away that, besides testing, my other hobby is photography. And it’s not only at conferences of course. Actually, during vacations, my wife (who is also a keen photographer) and I make many pictures and my wife always creates very nice books as a memory about the great road-trips.

Interview with Ai-United

Interview with Ai-United

What is Ai-United?

AiU or Artificial Intelligence United (www.ai-united.org) is a group of international experts who are working to create certification standards in the area of Artificial Intelligence.

What kind of Trainings and Certification you mainly focus on?

The 1st training of the AiU is Certified Tester in AI (CTAI), which focuses mainly on inherent challenges and evolving roles of a tester with AI projects. There will be further courses, so please stay tuned for the growing roadmap.

Tell us something more about the certification, pre-requisites, outcomes and industry SIG involved in it.

In general, there are no mandatory requirements for AiU-CTAI; however, in order to get the most out of the training, we recommend
some experience in software testing and/or development, and highly recommend completing the ISTQB Certified Tester Foundation Level certification before joining this course. Basic knowledge of any programming language – Java/ Python/C++ as well as a general understanding of statistics will also benefit you throughout the course.

Apart from being the first what else do you think are unique things about this certification?

AiU – Certified Tester in AI is focused on the role of the tester and has been created by experts from both the software testing and AI fields, who came together to come up with something that fits the demands of the global AI community. It has been reviewed by experts from various fields who are interested in AI over 30 countries across 5 continents who have provided important
feedback. This is why we can proudly announce that this is the first global certification scheme of its kind, supporting the quality of testing in AI projects.

What is the vision and mission of this organization?

Artificial intelligence (AI) was founded as an academic discipline in 1956. It is only in recent years that AI and its constituent technology of Machine Learning (ML) have emerged as commonplace in business, which are in turn becoming integral parts of many IT projects. There are endless possibilities and uses for AI and ML which can bring incredible benefits; however, as with many new technologies, it is important that we understand them well so we can better consider potential ethical and negative implications. This is why, at AiU, we believe that the most important thing is knowledge. The mission of the organization is to enable comprehensive dissemination and evangelization of AI knowledge.

Who is partnering with you?

We are working with international experts to create content and have a review committee of 43 professionals from many relevant disciplines in nearly 30 countries, listed on the website.

What are the recognitions to the training providers?

The recognized training providers are listed in the AiU website. There is a lot of interest from the community for further recognition of future training providers; however, we are waiting for the finalization of the syllabus, which will be completed very soon, so everyone can see the official final version before further recognitions are finalized.

Why getting trained on AI skills is important?

AI can be a terrifying topic for many people. It’s easy to see the many positive advantages, but at the same time, be afraid of some of the possible misuse scenarios and negative implications. Due to these reasons, Artificial Intelligence United finds it so important to set quality standards in the AI field as soon as possible. It is required by society to open the eyes of as many people as possible to the critical thinking skills that are required, especially in projects using AI fundamentals, as these incredible advantages can come with potential risks which need to be properly taken into account while building these systems. For the first course AiU CTAI, when focussing on the role of the tester, the tester has the ability to not only verify AI systems, but also consider how potential AI risks can be mitigated.

Why did you think standards are required in the area of Artificial Intelligence?

I believe that there is a undeniable need for setting standards in AI as it is beginning to have implications in just about every area of our lives. There are situations why we can’t even imagine today which will become reality over the coming years and even months. Such proliferation necessitates development of cost effective tools and products interoperating with each other seamlessly while assuring confidence in functionality. This inevitably points to need for standards.