As machine learning practitioners working in healthcare, it's important to understand the different types of data that we may encounter in our projects. One such type is left- and right-censored data.
Left-censored data refers to data where the true value of the variable of interest is not known, but it is known that the value is less than a certain threshold. An example of this in the healthcare sector is measurement data below the lower limit of detection for laboratory tests, such as blood glucose levels.
Right-censored data, on the other hand, refers to data where the true value of the variable of interest is not known, but it is known that the value is greater than a certain threshold. A common example in healthcare is survival data in cancer studies where the time of death is unknown for patients who are still alive at the time of data collection.
Treating this type of data naively can lead to suboptimal results, as it does not take into account the censoring aspect. For example, if we simply use average or mean to handle right-censored data, we will underestimate the true value.
Luckily, the solution to handling left- and right-censored data is relatively simple. We can recommend simply using the "Tobit loss" when optimizing the model. The Tobit loss function is a modified version of the mean squared error loss function. It takes into account the censoring aspect and thus learns the true distribution of the data, rather than just the observed values, leading to more accurate predictions.
Google feels invincible. It has had a monopoly on search engines for most of the time they have existed and is the gateway to the internet for most of us. But since a couple of weeks, there is a surprising competitor: ChatGPT.
ChatGPT is a large language model trained by the company OpenAI. It isn't a search engine per se, but instead directly gives answers to any question you throw at it. The answers are (usually) helpful and concise.
It remains to be seen if this threat is real. Google, at least, is taking it seriously.
How will our societies handle the rapid emergence of AI-generated media that are indistinguishable by eye? China just made a first move: From January 10th on, AI-generated media will be banned unless marked as such.
The European Commission has drafted a standardization request for harmonized standards in support of the EU AI Act. This is certainly good news and not too early.
The deadlines for the standardization bodies (CEN and CENELEC) will be end of Jan 2025. Unless drafts of the standards will be known in advance, manufacturers of high-risk AI systems will have to get prepared for the AI Act without guidance through standards.
Do You Want to Learn More On How To Build AI-Products Sustainable for the Healthcare Sector? Then Join Professor Oliver Haase and Listen to his Hands-On Advice!
In this hybrid seminar, co-hosted by BIH and BCRT at the Digital Labs offices in the heart of Berlin Mitte, Prof. Oliver Haase will share his knowledge and expertise on how to develop an AI-driven product for the healthcare field with everyone who is interested. During his teaching activities and for his own company, ValidateML, he gained comprehensive insights into the hurdles of getting AI health products, e.g. health apps placed on the market, including certification and regulatory-related challenges, from which you can hopefully learn a lot and use his insights for your own product.
The seminar will be in a hybrid format in both Berlin and remote! Grab a free ticket here.
Picture this: A tabular data classification model. Able to fit on a tiny dataset. 1-second training time, a single forward pass. Outperforms hyperparameters-tuned gradient boosting (SOTA performance). TabPFN represents an approach that might revolutionize how deep learning is applied in practice.
Remember how, not too long ago, we trained image classifiers from scratch? Now, we use pre-trained models in order to jump-start the training process. Remember how we used to search for a suitable learning rate by hand? Now, AutoML does that in a single button click, way faster and more intelligently than we do. Meta-learning is what's coming next.
Why should every model training start with a blank slate?
"That binary classifier I trained two months ago. That problem was similar. Would a similar architecture be suitable here, too? And the dataset my colleague next door works with has similar features. I should ask her which type of preprocessing worked well in her experiments."
In a nutshell, that's what TabPFN does. It is trained on a smartly assembled database of synthetic problem settings and equipped with a prior based on the principles of causality and simplicity.
This could be fantastic for medical device manufacturers, where labeled data is usually a bit sparse and access to pre-trained models is still not as easy as in other domains. We are optimistic that this approach can be scaled up to larger problems and image data in the future.
Exciting stuff coming from the AutoML Group in Freiburg / Hannover! Check out their blog post and pre-print paper here.
Last Friday, the EC presidency published a new, fourth compromise text for the upcoming EU AI act. This new text contains quite a few clarifications and editorial changes, but a very limited number of substantial changes only. It looks as if the member states are moving towards a not-too-far-away consensus. The more substantial changes include:
The United Nations System (comprising the UN's six principal organs) has very recently released its "Principles for the Ethical Use of Artificial Intelligence in the United Nations System". These principles will be mandatory for the development and use of any AI system within the UN system.
The speed at which various organizations publish almost identical sets of ethical or trustworthy AI principles and definitions is breathtaking. It would have been more helpful if the UN system had (1) simply reused an existing definition, and (2) stated how they intend to validate compliance with their principles.
The FDA's policy towards AI/ML based products has changed drastically. In Oct 2021, they have published 10 high-level guiding principles for the development of good machine learning practices (GMLP). As of today, the FDA hasn't officially released the GMLP themselves. They have, however, very consistently stated their new expectations in various (pre-)submission responses we have seen within the last few months. The expectations have much to do with the quality of training and in particular validation data, and the avoidance of unwanted biases.
While the consistency of the FDA feedback is good, it would be really helpful if the FDA made their expectations official to avoid failed submissions, and thus to save time and money for manufacturers.
For manufacturers of AI/ML-based medical devices, one of the main obstacles is transparency / explainability of the system, shortly XAI. Especially for deep neural networks, XAI is hard to achieve and a topic of ongoing research.
One technique to improve transparency / explainability is calibration. In a well-calibrated model, the score of the output neurons can be interpreted as confidence values of the model predictions. Evidently, the information that the model made its prediction with a confidence of, e.g., 89%, greatly contributes to the interpretability of the system.
We have compiled a short article that describes the idea and the effect of post-calibration. Post-calibration is particularly interesting because it adds to an existing model another output layer that yields much better calibration than the original calibration. If you're interested, check out our article.
The MDCG has issued a position paper on the "Transition to the MDR and IVDR". The paper recognizes that the limited capacities of the notified bodies "may lead to disruption of supply of devices needed for health systems and patients ...".
In the paper the MDCG announces additional guidance for clinical evaluation. This is highly appreciated. The paper does not, however, mention the much anticipated guidance for AI/ML-based devices. I would be positively surprised if it were to see the light of day before 2024.
Artificial intelligence in radiology: 100 commercially available products and their scientific evidence conclusion: "For 64/100 products, peer-reviewed evidence on its efficacy is lacking. Only 18/100 AI products have demonstrated (potential) clinical impact."
No wonder the FDA has raised its standards for certification.
The EU AI act is moving forward: the upcoming Czech presidency plans to come up with a revised compromise text by Jul 20 and to collect comments on that text by Sep 2. This ambitious timeline hints towards the adoption in the parliament within this year or early next year. We stay tuned and keep you posted on the changes in the planned compromise text.
The FDA has provided an update on the impact the massive amount of COVID-19 related requests for Emergency Use Authorization (EUA) has had on premarket review times.
Since Jan 2020, the FDA has received the enormous number of 8,000 COVID-related EUA requests and pre-submissions, and has granted some 2,300 clearances.
Yesterday's update announces a step back to normal with the reopening of acceptance of presubmissions for non-COVID IVDs.
This is good news for medical devices manufacturers who seek FDA clearance. The transparency that the FDA communicated their review times with is something manufacturers can only dream of in the EU MDR and IVDR realm.
The IMDRF has recently published the document "Machine Learning-enabled Medical Devices: Key Terms and Definitions". The intention of the document is to clarify common machine learning terms as a basis for further standardization work. These definitions have, however, shortcomings with unintended, negative impact. One of them is the definition of biases that has been adopted from ISO/IEC TR 24027:2021. It is a rather broad definition stating that a bias is a "systematic difference in treatment of certain objects, people, or groups in comparison to others."
As the IMDRF document points out this definition does not necessarily imply unfairness. It also includes treatment that has been optimized towards specific sub-groups. According to this definition personalized treatment is highly biased, at the same time highly desirable.
This is not an academic discussion because current good machine learning practices all require to avoid biases. Upcoming standards and regulations will do the same. For notified bodies, bias avoidance is a high priority topic for auditing ML-based medical devices. The IMDRF (and therefore ISO/IEC TR 24027:2021) definition of biases is not the right basis for this.
The FDA has issued a warning with regard to the intended use of certain AI-based brain imaging software. The obvious problem is that the affected software devices are intended for prioritization and triage, but doctors use them for diagnostic assistance.
The less obvious problem is, in my humble opinion, that the option of categorizing imaging software this way by virtue of the intended purpose only helps lowering the quality requirements at the risk of foreseeable misuse. It is quite natural for a doctor to use a software for diagnostic support if the software seems to work well. Even the more as examining brain images for vessel occlusion does not seem to me as a task that needs prioritization and triage in the first place, but thorough diagnosis.
Lilian Edwards from the renowned Ada Lovelace Institute has written an excellent review of the proposed EU AI act. While she generally appreciates the proposal as an "excellent starting point for a holistic approach to AI regulation", she also criticizes the following four shortcomings:
The paper is one of the best reads on the topic we've come across, and definitely well spent time for anyone interested in AI regulation.
The Biometrics Institute has published these definitions and illustrations with the aim to establish a common understanding of some basic and highly relevant concepts. In particular, we appreciate the clear distinction of verification vs. identification, because the latter one will be considered high-risk or even prohibited under the upcoming AI act.