Blog 17.12.2024

Testing AI applications: ensuring reliability and safety

Competence

Artificial intelligence (AI) and machine learning have revolutionised software development in recent years. While AI offers tremendous opportunities, it also introduces unique challenges to testing. Traditional testing methods alone are insufficient to ensure the quality and reliability of AI applications. 

Distinctive features of AI testing

The rapid growth of AI stems from several key factors: increased computational power, particularly through GPUs, the rise of big data, advances in AI algorithms, and the proliferation of open-source AI libraries and tools. These developments have enabled AI to be applied across nearly every domain but have simultaneously introduced new challenges to testing.

AI applications differ significantly from traditional software. They learn from data rather than following predefined rules, often feature complex internal logic, can produce variable outputs for the same input during different runs, and can evolve over time with new data.

Approaches to AI testing

Given these characteristics, AI testing requires new approaches. Data-centric testing is critical, as is a strong emphasis on ensuring the quality and diversity of test data. Continuous performance monitoring in production helps detect deviations early.

Bias detection and ethical testing have become central themes in AI testing. It is crucial to ensure that models function equitably across user groups and adhere to societal norms and values. Penetration testing helps understand model behaviour under unexpected or incorrect inputs, while explainability testing ensures that models can justify their decisions in an understandable manner.

Six practical tips for AI testing

1. Build diverse test data

Collect a wide and varied dataset that covers diverse scenarios and edge cases.

Example: When developing an AI model to analyse ECG readings, include data from patients of different age groups, health conditions (e.g., heart disease, diabetes), and environments (e.g., during activity or rest) to ensure reliable model performance across all use cases.

2. Leverage cross-validation

Use cross-validation to evaluate model performance on new data.

Example: For a facial recognition model, split the dataset into multiple parts (e.g., five folds), using each part alternately for testing while training on the others. This helps assess how well the model generalises to unseen data and prevents overfitting.

3. Conduct A/B testing

Compare different model versions in production.

Example: For an e-commerce recommendation system, split users between the old and new model versions. Track metrics such as average cart value or click-through rates to determine whether the updated model improves business outcomes.

4. Use adversarial testing

Test the model with deliberately challenging or misleading inputs.

Example: For autonomous vehicles, expose the object detection system to rare scenarios, such as reflections, fog, or unusual road signs. Adversarial tests might include traffic sign images altered with graffiti or distortions to evaluate model robustness against interference.

5. Implement continuous monitoring

Monitor the model’s performance and behaviour in production.

Example: For a customer service chatbot, track response accuracy, speed, and customer satisfaction. Watch for anomalies like inappropriate or nonsensical replies, especially when encountering new slang or jargon.   

6. Collaborate with experts across domains

Testing AI requires multidisciplinary expertise – leverage the knowledge of AI specialists, business professionals, and ethics experts.

Example: When testing a medical imaging AI, involve radiologists to evaluate how well the system detects tumors, and ethics experts to ensure no patient groups are disadvantaged by biases in the model.

The evolution of testing in the era of AI

AI has brought a paradigm shift to software development. Instead of traditional code-driven development, AI applications revolve around data collection, processing, and model training. This shift is also reflected in testing practices.

  • Data quality testing is now a central focus, ensuring training data is sufficient, diverse, and of high quality.
  • Performance testing evaluates accuracy, speed, and resource utilisation in various scenarios.
  • Functional testing ensures the model performs reliably with unseen data and continues to do so after updates.
  • Integration testing remains essential to confirm that AI components work seamlessly within larger systems.

Testing as the foundation of reliable AI

Testing AI applications is challenging but essential for developing reliable and safe systems. Traditional methods must be augmented with approaches that account for AI’s unique features. 

As AI evolves rapidly and its applications expand, testers must continuously update their skills and adapt to new methodologies. AI testing is an exciting and fast-developing field that offers testers a unique opportunity to shape the quality and trustworthiness of future technologies.

(This blog was created with the support by Claude-3.5-Sonnet AI and ChatGPT 4o.)

You can also listen to this blog

This audio file has been created using AI technology that converts text into speech.


Are you interested in Gofore as an employer?

quality assurance & test automation

Kari Kakkonen

Head of Expertise Development Services

Kari is a passionate advocate for software testing and QA. He has almost 30 years of testing experience, over 15 years of agile experience, over 10 years of DevOps experience, and over 5 years of AI experience. He's been working in ICT consulting, training, finance, insurance, pension insurance, public sector, embedded software, telecom, gaming, and industry domains. He moved his training business to Gofore in 2024  and runs the successful Expertise Capability that provides ISTQB, TMMi, test automation, and other software testing, agile, DevOps, and AI training services in Europe. He has trained over 3000 people over the years. Kari also assesses companies’ testing, agile, DevOps, and IT processes with TMMi, TPI Next, or other methodologies regularly.

Kari is the 2021 EuroSTAR Testing Excellence Award winner, Tester of the Year in Finland Award 2021 winner, and DASA Exemplary DevOps Instructor Award 2023 winner, and 2024 finalist. He is included in the Tivi 100 Most Influential People in IT 2010 listing. Kari has M.Sc. in Industrial Management from Aalto University.

Back to top