šŸš€ AI in Healthcare: Med-PaLM 2 Surpasses Human Doctors in Medical Knowledge Assessment šŸ©ŗ

Avisha NessAiver
3 min readMay 19, 2023

Googleā€™s freshly minted medical AI just raised the bar yet again, scoring an impressive 86.5% on a battery of medical questions modeled after the US Medical Licensing Exam. It outperformed both fellow AIs and human doctorsā€“not just in accuracy, but also in the quality of answers, as rated by a panel of human physician judges.

The Special Sauce

  • The AIā€“Med-PaLM 2ā€“is a custom-tuned variant of Googleā€™s latest language model, PaLM 2, trained with large amounts of domain-specific medical data.
  • The researchers utilized a unique prompting method termed ā€˜ensemble refinement.ā€™ This approach involved innovative techniques such as ā€˜chain-of-thoughtā€™ and ā€˜self-consistencyā€™ to boost the modelā€™s capacity for medical reasoning in multiple-choice questions.

Outcomes: Impressive and Promising

  • The AI achieved an 86.5% performance across the MedQA benchmark questions, setting a new record.
  • Long-form answers were tested pair-wise against human answers to the same questions. Across the board, a panel of doctors preferred the AI answers to the human ones across 8/9 categories:
  • Notably, while ChatGPT-4 achieved a 78.6ā€“81.4% performance on the MedQA questions, that was only on the multiple-choice onesā€“they could not evaluate it against the long-form questions.

Are Human Doctors Becoming Obsolete?

Not quite yet. Here are a few of the study / AI limitations:

  • This only tested performance against single questions, whereas most real-life medical interactions require significant back and forth between doctor and patient.
  • Similarly, the solutions to many real-life medical scenarios require the understanding of cross-domain context factors, such as how a patientā€™s home life might impact medication adherence. These are areas where AI still struggles.
  • The doctors giving answers in the study werenā€™t first given sample ā€œidealā€ answers. They, therefore, defaulted to their standard practice of giving short and to-the-point responses. The AI, in comparison, always gave detailed, fleshed-out answers, which were more likely to be rated higher by the review panel.

Future of AI in Medicine

In the foreseeable future, expect Large Language Models (LLMs) fine-tuned for specific domains to become ubiquitous. These models, irrespective of being open or proprietary, hold significant commercial potential as they offer more accurate and specialized knowledge than generic models.

AI is undoubtedly carving its niche in the medical field. Weā€™re only witnessing the initial stages, but itā€™s a promising preview of an imminent AI-powered medical landscape. Itā€™s quite plausible that many of our future healthcare interactions could be with AI chatbots, helping to alleviate the pressure on our limited human doctor resources.

Even more exciting (to me at least), is the potential to drastically increase the speed and accuracy of medical diagnoses. Even the best doctor canā€™t be an expert in every field, or take into account thousands of medical data points all at once. As soon as we start using trained AI models to handle the factual analysis, which could then get handed off to a human physician for some added broad-scope reasoning, we could see a qualitative leap forward in overall quality of care that hasnā€™t been seen since the advent of antibiotics.

I, for one, canā€™t wait!

#AI #Medicine #Google #MedicalAI

--

--

Avisha NessAiver

CTO of Birya Biotech. Engineer, autodidact, self-hacker, coder, speaker, gamer. Spends too much time reading medical journals.