| Title |
|---|
| Oculoplastic Surgeons vs. Large Language Models in the Non-surgical Management of Thyroid Eye Disease |
| Authors |
|---|
| Shiqi Hui, Dongmei Li |
| Presenting |
|---|
| Shiqi Hui |
| PURPOSE: |
|---|
| Thyroid eye disease (TED) presents heterogeneous clinical manifestations requiring nuanced, individualized management. Large language models (LLMs) have shown potential in clinical reasoning, but their consistency with physician decision-making in non-surgical TED management remains unclear. |
| METHODS: |
|---|
| A structured 19-item questionnaire covering medical, injection-based, and radiotherapeutic treatments for TED was distributed to 17 oculoplastic surgeons across the Asia-Pacific region. Responses were binarized (1 = selected, 0 = not selected) and compared with standardized outputs from GPT and Gemini. Jaccard and cosine similarity indices were calculated to quantify the alignment between physicians and LLMs. |
| RESULTS: |
|---|
| Overall, GPT demonstrated higher alignment with physician responses than Gemini (mean Jaccard 0.67 ± 0.08 vs. 0.59 ± 0.09, p < 0.05). Both models closely matched physician consensus regarding glucocorticoid use and radiotherapy indications but diverged in immunosuppressant selection and injection-based strategies. Similarity varied by physician experience and patient volume. |
| CONCLUSIONS: |
|---|
| LLMs exhibit substantial consistency with clinical experts in guideline-based therapeutic decisions for TED but differ in experience-dependent domains. These findings highlight the potential of LLMs as supplementary tools in endocrine ophthalmology while underscoring the need for context-aware validation. |