TY - JOUR AU - West, Matthew AU - Cheng, You AU - He, Yingnan AU - Leng, Yu AU - Magdamo, Colin AU - Hyman, Bradley T AU - Dickson, John R AU - Serrano-Pozo, Alberto AU - Blacker, Deborah AU - Das, Sudeshna PY - 2025 DA - 2025/3/31 TI - Unsupervised Deep Learning of Electronic Health Records to Characterize Heterogeneity Across Alzheimer Disease and Related Dementias: Cross-Sectional Study JO - JMIR Aging SP - e65178 VL - 8 KW - Alzheimer disease and related dementias KW - electronic health records KW - large language models KW - clustering KW - unsupervised learning AB - Background: Alzheimer disease and related dementias (ADRD) exhibit prominent heterogeneity. Identifying clinically meaningful ADRD subtypes is essential for tailoring treatments to specific patient phenotypes. Objective: We aimed to use unsupervised learning techniques on electronic health records (EHRs) from memory clinic patients to identify ADRD subtypes. Methods: We used pretrained embeddings of non-ADRD diagnosis codes (International Classification of Diseases, Ninth Revision) and large language model (LLM)–derived embeddings of clinical notes from patient EHRs. Hierarchical clustering of these embeddings was used to identify ADRD subtypes. Clusters were characterized regarding their demographic and clinical features. Results: We analyzed a cohort of 3454 patients with ADRD from a memory clinic at Massachusetts General Hospital, each with a specialist diagnosis. Clustering pretrained embeddings of the non-ADRD diagnosis codes in patient EHRs revealed the following 3 patient subtypes: one with skin conditions, another with psychiatric disorders and an earlier age of onset, and a third with diabetes complications. Similarly, using LLM-derived embeddings of clinical notes, we identified 3 subtypes of patients as follows: one with psychiatric manifestations and higher prevalence of female participants (prevalence ratio: 1.59), another with cardiovascular and motor problems and higher prevalence of male participants (prevalence ratio: 1.75), and a third one with geriatric health disorders. Notably, we observed significant overlap between clusters from both data modalities (χ24=89.4; P<.001). Conclusions: By integrating International Classification of Diseases, Ninth Revision codes and LLM-derived embeddings, our analysis delineated 2 distinct ADRD subtypes with sex-specific comorbid and clinical presentations, offering insights for potential precision medicine approaches. SN - 2561-7605 UR - https://aging.jmir.org/2025/1/e65178 UR - https://doi.org/10.2196/65178 DO - 10.2196/65178 ID - info:doi/10.2196/65178 ER -