VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence
- 阅读感受:
Recently, AI foundation models (FMs), such as GPT-417 and SAM18, have emerged and has the potential to transform many research and industrial domains19, 20. FMs are models trained with a broad range of data and can be later adapted to solve a wider (rather than narrow) spectrum of tasks with their generalist intelligence, providing new opportunities to tackle the growing global ophthalmic challenges in a much more efficient, adaptable and scalable solution.
最近,GPT-417 和 SAM18 等 AI 基础模型 (FM) 应运而生,并有可能改变许多研究和工业领域 19, 20。FM 是使用广泛数据训练的模型,以后可以利用其通用智能来解决更广泛(而非狭窄)的任务范围,从而为以更高效、适应性更强和可扩展的解决方案应对日益增长的全球眼科挑战提供新的机会 21
Albeit impressive, RETFound is still limited in the number of ophthalmic modalities it can process, i.e., only fundus photography and optical coherence tomography (OCT), the spectrum of clinical tasks it excels, i.e., mainly ocular disease diagnosis and prognosis, as well as prediction of systemic diseases. In diagnosing diseases, RETFound still relies on modality-specific classifiers, which is inefficient when generalizing to a broader range of ophthalmic image modalities.
尽管令人印象深刻,RETFound 仍然受到其可处理的眼科模式数量(即仅限眼底照相和光学相干断层扫描 (OCT))以及其擅长的临床任务范围(即主要是眼部疾病的诊断和预后以及全身性疾病的预测)的限制。
阅读感受:
不同前人的方法,该模型的特点聚焦于多模态和多任务,技能够处理多模态和多任务的模型。
但是阅读之后,从技术上来说,精度是不是最高,显然不是的,因为对比的方法都比较久远了,例如UNet比较。因此该论文使用了大数据和模型设计的新思路,倒不是技术多么新。