E-POSTER DETAIL | 11th Asia Pacific Tele Ophthalmology Society Symposium

Title
Oculoplastic Surgeons vs. Large Language Models in the Non-surgical Management of Thyroid Eye Disease

Authors
Shiqi Hui, Dongmei Li

Presenting
Shiqi Hui

PURPOSE:
Thyroid eye disease (TED) presents heterogeneous clinical manifestations requiring nuanced, individualized management. Large language models (LLMs) have shown potential in clinical reasoning, but their consistency with physician decision-making in non-surgical TED management remains unclear.

METHODS:
A structured 19-item questionnaire covering medical, injection-based, and radiotherapeutic treatments for TED was distributed to 17 oculoplastic surgeons across the Asia-Pacific region. Responses were binarized (1 = selected, 0 = not selected) and compared with standardized outputs from GPT and Gemini. Jaccard and cosine similarity indices were calculated to quantify the alignment between physicians and LLMs.

METHODS:

A structured 19-item questionnaire covering medical, injection-based, and radiotherapeutic treatments for TED was distributed to 17 oculoplastic surgeons across the Asia-Pacific region. Responses were binarized (1 = selected, 0 = not selected) and compared with standardized outputs from GPT and Gemini. Jaccard and cosine similarity indices were calculated to quantify the alignment between physicians and LLMs.

RESULTS:
Overall, GPT demonstrated higher alignment with physician responses than Gemini (mean Jaccard 0.67 ± 0.08 vs. 0.59 ± 0.09, p < 0.05). Both models closely matched physician consensus regarding glucocorticoid use and radiotherapy indications but diverged in immunosuppressant selection and injection-based strategies. Similarity varied by physician experience and patient volume.

CONCLUSIONS:
LLMs exhibit substantial consistency with clinical experts in guideline-based therapeutic decisions for TED but differ in experience-dependent domains. These findings highlight the potential of LLMs as supplementary tools in endocrine ophthalmology while underscoring the need for context-aware validation.