TY - JOUR AU - Socrates, Vimig AU - Wright, Donald S AU - Huang, Thomas AU - Fereydooni, Soraya AU - Dien, Christine AU - Chi, Ling AU - Albano, Jesse AU - Patterson, Brian AU - Sasidhar Kanaparthy, Naga AU - Wright, Catherine X AU - Loza, Andrew AU - Chartash, David AU - Iscoe, Mark AU - Taylor, Richard Andrew PY - 2025 DA - 2025/4/11 TI - Identifying Deprescribing Opportunities With Large Language Models in Older Adults: Retrospective Cohort Study JO - JMIR Aging SP - e69504 VL - 8 KW - deprescribing KW - large language models KW - geriatrics KW - potentially inappropriate medication list KW - emergency medicine KW - natural language processing KW - calibration AB - Background: Polypharmacy, the concurrent use of multiple medications, is prevalent among older adults and associated with increased risks for adverse drug events including falls. Deprescribing, the systematic process of discontinuing potentially inappropriate medications, aims to mitigate these risks. However, the practical application of deprescribing criteria in emergency settings remains limited due to time constraints and criteria complexity. Objective: This study aims to evaluate the performance of a large language model (LLM)–based pipeline in identifying deprescribing opportunities for older emergency department (ED) patients with polypharmacy, using 3 different sets of criteria: Beers, Screening Tool of Older People’s Prescriptions, and Geriatric Emergency Medication Safety Recommendations. The study further evaluates LLM confidence calibration and its ability to improve recommendation performance. Methods: We conducted a retrospective cohort study of older adults presenting to an ED in a large academic medical center in the Northeast United States from January 2022 to March 2022. A random sample of 100 patients (712 total oral medications) was selected for detailed analysis. The LLM pipeline consisted of two steps: (1) filtering high-yield deprescribing criteria based on patients’ medication lists, and (2) applying these criteria using both structured and unstructured patient data to recommend deprescribing. Model performance was assessed by comparing model recommendations to those of trained medical students, with discrepancies adjudicated by board-certified ED physicians. Selective prediction, a method that allows a model to abstain from low-confidence predictions to improve overall reliability, was applied to assess the model’s confidence and decision-making thresholds. Results: The LLM was significantly more effective in identifying deprescribing criteria (positive predictive value: 0.83; negative predictive value: 0.93; McNemar test for paired proportions: χ21=5.985; P=.02) relative to medical students, but showed limitations in making specific deprescribing recommendations (positive predictive value=0.47; negative predictive value=0.93). Adjudication revealed that while the model excelled at identifying when there was a deprescribing criterion related to one of the patient’s medications, it often struggled with determining whether that criterion applied to the specific case due to complex inclusion and exclusion criteria (54.5% of errors) and ambiguous clinical contexts (eg, missing information; 39.3% of errors). Selective prediction only marginally improved LLM performance due to poorly calibrated confidence estimates. Conclusions: This study highlights the potential of LLMs to support deprescribing decisions in the ED by effectively filtering relevant criteria. However, challenges remain in applying these criteria to complex clinical scenarios, as the LLM demonstrated poor performance on more intricate decision-making tasks, with its reported confidence often failing to align with its actual success in these cases. The findings underscore the need for clearer deprescribing guidelines, improved LLM calibration for real-world use, and better integration of human–artificial intelligence workflows to balance artificial intelligence recommendations with clinician judgment. SN - 2561-7605 UR - https://aging.jmir.org/2025/1/e69504 UR - https://doi.org/10.2196/69504 DO - 10.2196/69504 ID - info:doi/10.2196/69504 ER -