
š AI in Healthcare: Med-PaLM 2 Surpasses Human Doctors in Medical Knowledge Assessment š©ŗ
Googleās freshly minted medical AI just raised the bar yet again, scoring an impressive 86.5% on a battery of medical questions modeled after the US Medical Licensing Exam. It outperformed both fellow AIs and human doctorsānot just in accuracy, but also in the quality of answers, as rated by a panel of human physician judges.
The Special Sauce
- The AIāMed-PaLM 2āis a custom-tuned variant of Googleās latest language model, PaLM 2, trained with large amounts of domain-specific medical data.
- The researchers utilized a unique prompting method termed āensemble refinement.ā This approach involved innovative techniques such as āchain-of-thoughtā and āself-consistencyā to boost the modelās capacity for medical reasoning in multiple-choice questions.
Outcomes: Impressive and Promising
- The AI achieved an 86.5% performance across the MedQA benchmark questions, setting a new record.
- Long-form answers were tested pair-wise against human answers to the same questions. Across the board, a panel of doctors preferred the AI answers to the human ones across 8/9 categories:
- Notably, while ChatGPT-4 achieved a 78.6ā81.4% performance on the MedQA questions, that was only on the multiple-choice onesāthey could not evaluate it against the long-form questions.
Are Human Doctors Becoming Obsolete?
Not quite yet. Here are a few of the study / AI limitations:
- This only tested performance against single questions, whereas most real-life medical interactions require significant back and forth between doctor and patient.
- Similarly, the solutions to many real-life medical scenarios require the understanding of cross-domain context factors, such as how a patientās home life might impact medication adherence. These are areas where AI still struggles.
- The doctors giving answers in the study werenāt first given sample āidealā answers. They, therefore, defaulted to their standard practice of giving short and to-the-point responses. The AI, in comparison, always gave detailed, fleshed-out answers, which were more likely to be rated higher by the review panel.
Future of AI in Medicine
In the foreseeable future, expect Large Language Models (LLMs) fine-tuned for specific domains to become ubiquitous. These models, irrespective of being open or proprietary, hold significant commercial potential as they offer more accurate and specialized knowledge than generic models.
AI is undoubtedly carving its niche in the medical field. Weāre only witnessing the initial stages, but itās a promising preview of an imminent AI-powered medical landscape. Itās quite plausible that many of our future healthcare interactions could be with AI chatbots, helping to alleviate the pressure on our limited human doctor resources.
Even more exciting (to me at least), is the potential to drastically increase the speed and accuracy of medical diagnoses. Even the best doctor canāt be an expert in every field, or take into account thousands of medical data points all at once. As soon as we start using trained AI models to handle the factual analysis, which could then get handed off to a human physician for some added broad-scope reasoning, we could see a qualitative leap forward in overall quality of care that hasnāt been seen since the advent of antibiotics.
I, for one, canāt wait!