This is interesting as it runs counter to what many people think about current AI. Its performance seems directly linked to the quality of the training data it has. Here the opposite is happening; it has poor training data and still outperforms humans. It’s not surprising the humans would do badly in this situation too; it’s hard to keep up to date on things that you may only encounter once or twice in your entire career. It’s interesting to extrapolate from this observation as it applies to many other fields.
One of the authors of the paper goes into more detail on Twitter.
I mean, recognition is literally the task that is always used for intro to machine learning. From facial recognition and other biometric, handwriting, object recognition. It isn’t a surprise that “AI” is able to outperform humans in this task since sometimes AI can pick up features that are too subtle for us to notice. The problem is LLM being hailed as the truth machine or AGI. LLM to NLP is what CNN and GAN is to image processing tasks.
They should provide that instantly if the patient wants it (once the scan is developed). Ad whatever disclaimers and waivers you want, but I wouldn’t mind an instant answer.
Or, just have it as part of the xrqy software.
Analysis determines this could be X, here’s a link to Kore info on this rate condition. Please confirm diagnosis and report.
We don’t need AI to make a diagnosis. Its a tool. The health professional can be trained in its use, just like they do for any other test.
If you tell a profesional that the answer is “B”, while the professional had “A” in mind, you will have to convince them on why “B” is the correct answer, or they will ignore your suggestion. I think a good LLM model should be able to tell which features it valued most in it’s reasoning. It would make it much easier to get used to as a tool that way.
I agree, while they are sceotical. However research data over time should show sensitivity and specificity, just like any other test.
If the conditions are rare, and training date is poorly labeled, isn’t there a danger that these models could be overfitting?