Testing Artificial Intelligence (AI) Systems

Artificial Intelligent systems fall within the domain of scientific software. Scientific software help in decision making, critical thinking, and cognitive analysis. In general, testing scientific software is tough and so is testing AI systems. In reality, the scope of testing AI systems goes beyond functional and non-functional areas. We discuss the AI testing approach in this article along with some myths related to AI.

Why Test AI?

AI can also go wrong; it can fail! See tai Newsletter, June 2019. It is time to break the prevailing myths around AI. Some of the leading myths about AI which Gartner [6] addresses are:

Myth #1: AI works in the same way as the human brain

AI has not yet become as mature as a human brain. The category of problems solved by AI systems is Narrow. They are trained to perform a specific task and assist humans in solving pressing problems. Deep Learning is a form of machine learning, which emulates the human brain, but has not yet reached the unique capability of the human brain.

Myth #2: Intelligent machines learn on their own

Can ‘AI’ learn and evolve on its own? The answer is an emphatic ‘No’. Human intervention is always needed to develop AI
based systems and solutions, even when it comes to upgrading or optimizing software. The need for humans will always be there. For example, humans label data which is used in machine learning. This proves that AI as a technology will not take away the jobs of humans. Taking inputs from humans is at the core of AI technology.

Myth #3: AI is free of bias

Since AI systems learn from the training dataset provided by humans, there will always be an intrinsic bias in AI-based solutions.
There are no ways to completely abandon this bias, but diversified datasets and the diversity in developing and testing teams helps to reduce the selection and confirmation bias. Due to the existence of various myths, people think that there is no need to test AI systems. The truth is that AI systems need to be tested, but with a different approach.

Why is Testing of AI Systems Different?

Since AI involves training the machines to learn, unlearn, and optimize specific tasks during their lifecycle, there is a specific need to train and test them ethically.
The development of AI systems is different from the development of traditional software. The differences appear not only in the software development life cycle (SDLC), but also in the software testing life cycle (STLC). Non-functional aspects of testing defined in ISO25010 lists many quality characteristics, such as performance, security, reliability, robustness, and usability. However, AI systems need to be hammered on the anvil of ‘Ethics’ before getting deployed in realtime environment.

1. Ethical Behaviour

Research work of Rik Marselis [2] shows that most quality characteristics do not cover all the relevant aspects of testing AI systems. The extended quality characteristics specifically crucial for AI systems are:

  • Intelligent Behavior which signifies the ability to learn and comprehend with transparency of choices.
  • Morality as related to ethics, privacy, and human friendliness.
  • Personality which consists of the individual’s distinctive characteristics.

These quality characteristics can be ensured with the use of XAI (eXplainable AI) – where the developer/tester will have an answer to the question: why did the machine make a specific decision? XAI is a rapidly developing field. To explore more about XAI, stay tuned with our upcoming newsletters.

2. The Test Oracle Problem

The oracle problem remains one of the challenging problems in the world of software testing; it gets exacerbated in the field of testing AI, since the outcomes are nondeterministic. Without test oracle automation, humans need to determine whether the outcome of the testing process is correct or incorrect. Very little research has been done in this direction. In machine learning and in AI algorithms, often, there are no oracles without human intervention. For example, the image recognition problem is solved by the supervised machine learning (ML) algorithm, which begins by labelling the dataset correctly for training and testing. Metamorphic testing is used to mitigate the ‘oracle’ problem. We will discuss this in detail in our upcoming newsletters.

3. Criticality of Input Data

The ‘input data’, with which the ML models are trained and tested play a major role in explaining the systems’ outcome. Some important points to consider are:

  • Generating corner cases in ML systems are tough and costly.
  • Testing ML systems, to quite an extent, depends on the imagination and creativity of the tester who considers every possible boundary case scenario.
  • Simulating ML systems does not always guarantee a fool proof outcome, as opposed to traditional software.



Tom van de Ven, Rik Marselis and Humayun Shaukat. Testing in the Digital Age: AI makes the difference. Kleine Uil, Uitgeverij, 2018 – ISBN 978 90 75414 87 5


Testing Of Artificial Intelligence https://www.sogeti.com/ globalassets/global/ downloads/reports/testing-ofartificial-intelligence_sogetireport_11_12_2017-.pdf


EuroSTAR 2018 tutorial by Rik Marselis: Testing Intelligent Machines. https://www.slideshare. net/RikMarselis/eurostar-2018tutorial-rik-marselis-testingintelligent-machines

Test your Machine Learning Algorithm with Metamorphic Testing. https://medium. com/trustableai/testing-aiwith-metamorphic-testing61d690001f5c

Testing scientific software: A systematic literature review. https://www.sciencedirect. com/science/article/abs/pii/ S0950584914001232

Debunking the key myths around Artificial Intelligence: Gartner. https://content.techgig.com/ debunking-the-key-myths-aroundartificial-intelligence-gartner/ articleshow/68007651.cms

Sonika Bengani

Righteousness/Dharma (धर्मं) at tai

Testing Artificial Intelligence (AI) Systems