Reuse is a wonderful concept to reduce time, effort, and cost. It can be leveraged on different levels and in different places in machine learning. For one, machine learning libraries and frameworks are a great source of code reuse. Next, ML library validation provides another reuse opportunity in the area of validation. And last but not least, machine learning models themselves can be reused and adapted to new classification tasks and new data.
This is exactly what transfer learning is about. In the context of medical device regulation, however, the question arises how a pre-trained model can be validated for use in a medical device. In this article, I shed some light on this question. To get started, let’s briefly recap how transfer learning works.
Standing on the shoulders of giants
At this point in time, transfer learning is mainly used for image classification. In this domain, the predominant technology is deep neural networks, in particular convoluted nets. These networks often have dozens if not hundreds of hidden layers and millions of weights. Their training can be both very cost- and time-consuming. The aim of transfer learning is to reuse and adapt the resulting models for new, similar classification tasks and new, similar data. Luckily, there is a broad array of public pre-trained models whose prediction performance on some public benchmark data sets is well studied and documented. If you’re using, e.g., Keras for model development, then at the time of writing you can choose among around 25 pre-trained models for image classification.
From general to specific
Simply put, a convolutional network is made up of two parts, a convolutional base and a classifier backend. The convolutional base comprises a set of convolutional and max pool layers that extract features from the input data. The first layers extract very general features such as edges and regions, whereas the layers further down the network extract more specific features, such as faces or buildings. The classifier backend, often a set of densely connected layers, makes a classification prediction based on the most specific features from the convolutional base.
Freeze and retrain
Adapting a pre-trained model means to freeze some of its weights and to retrain others. Freezing a weight means that the re-training process cannot change its value. Let’s start with the classifier backend: Because this part of the model is very unique to the exact classification task and to the most specific extracted features, it is always retrained. If the number of prediction classes varies from the original to the new classification task, then the output layer is not only retrained but replaced.
Now to the convolutional base: The boundary between frozen and retrained layers runs right through the convolutional base. The layers that extract general features will always be reused and thus frozen, otherwise using a pre-trained model would make very little sense. For the increasingly more specific layers, the design decision whether to freeze or to retrain them depends on the similarity of the original and the new data, and on the size of the new training data set. The higher the data similarity, the more layers should be frozen. The bigger the new training data set, the more layers can be retrained. A good retraining process depends on a suitable balance between these considerations.
Regulatory requirements for machine learning
Before we can reason about the regulatory requirements for pre-trained data, it helps to consider the general case of ML model development and then adapt these considerations to the special case of pre-trained models. For general ML model development, the following steps and concepts and their regulatory counterparts need to be taken into consideration:
- Harmonized standard IEC 62304 regulates the development of software for medical devices. From the perspective of this standard, the machine learning development process is embedded in the software unit implementation phase. As such, it needs to follow state-of-the-art best practices.
- The IEC 62304 unit implementation phase is completed with unit testing. Because the goal of unit testing is to verify the correctness of the implemented software unit, unit testing translates to model evaluation for the machine learning software unit.
- The machine learning library or framework that is used to train the model – Keras, PyTorch, TensorFlow, or the like – is a software tool that is regulated by harmonized standard ISO 13485.
- The machine learning library or framework that runs in the medical device to make predictions is considered software of unknown provenance (SOUP). The requirements on the use of SOUP is regulated by IEC 62304.