Validation of the Entire Machine Learning Pipeline

by Oliver Haase AUG 21ST, 2020

How do you match ML with the regulatory requirements?

In machine learning, the complete process comprising all phases from data assembly to the deployment of the trained model is called the machine learning pipeline. Even though there are numerous variants, the basic building blocks of all ML pipelines are alike. For illustration, we use the CRISP-DM pipeline that consists of the following phases:

  • Business Understanding
  • Data Understanding
  • Data Preparation
  • Modeling
  • Evaluation
  • Deployment
CRISP-DM Pipeline

As can be seen in the figure, the CRISP-DM pipeline is not strictly sequential, but contains various options to revisit earlier phases.

The V-shaped software development process as defined in IEC 62304, on the other hand, looks as follows:

We have mapped CRISP-DM to the IEC 62304 software development process for a better understanding of the overall picture and the regulatory requirements. We will be happy to send you a free copy if you drop us a short note via our contact form.

How do you check compliance of your ML pipeline?

To bridge the gap between the rather traditional, high-level regulatory requirements and the specific machine learning development peculiarities, the Interessengemeinschaft der Benannten Stellen für Medizinprodukte in Deutschland (IGNB), a consortium of virtually all notified bodies for medical devices in German speaking countries, has drafted a checklist Fragenkatalog Künstliche Intelligenz bei Medizinprodukten for auditors, manufacturers and other interested parties. This checklist reflects the current expectations of the (at least German speaking) notified bodies on your ML development process and your documentation. If you are interested in an English translation, we will be happy to help if you drop us a short note via our contact form.

How does this look in real life?

We have compiled a comprehensive set of validation documentation for a typical machine learning project – a diagnosis support system for heart disease – along the CRISP-DM pipeline. To ensure consistency of the documentation with the actual process and to simplify traceability, we have integrated the documentation into the development code. This documentation serves two purposes:

  • It exemplifies how you can structure your documentation to facilitate easy checking for regulatory compliance.
  • It helps you and your team avoid common gaps and pitfalls, both in the documentation and in the ML development process itself.

If you are interested in a sneak preview of our documentation bundle to see if it helps your business, we will be happy to send you a free copy if you drop us a short note via our contact form.