%0 Journal Article %@ 2561-7605 %I JMIR Publications %V 8 %N %P e65178 %T Unsupervised Deep Learning of Electronic Health Records to Characterize Heterogeneity Across Alzheimer Disease and Related Dementias: Cross-Sectional Study %A West,Matthew %A Cheng,You %A He,Yingnan %A Leng,Yu %A Magdamo,Colin %A Hyman,Bradley T %A Dickson,John R %A Serrano-Pozo,Alberto %A Blacker,Deborah %A Das,Sudeshna %+ Massachusetts General Hospital, 65 Landsdowne Street, Cambridge, MA, 02139, United States, 1 617 768 8254, sdas5@mgh.harvard.edu %K Alzheimer disease and related dementias %K electronic health records %K large language models %K clustering %K unsupervised learning %D 2025 %7 31.3.2025 %9 Original Paper %J JMIR Aging %G English %X Background: Alzheimer disease and related dementias (ADRD) exhibit prominent heterogeneity. Identifying clinically meaningful ADRD subtypes is essential for tailoring treatments to specific patient phenotypes. Objective: We aimed to use unsupervised learning techniques on electronic health records (EHRs) from memory clinic patients to identify ADRD subtypes. Methods: We used pretrained embeddings of non-ADRD diagnosis codes (International Classification of Diseases, Ninth Revision) and large language model (LLM)–derived embeddings of clinical notes from patient EHRs. Hierarchical clustering of these embeddings was used to identify ADRD subtypes. Clusters were characterized regarding their demographic and clinical features. Results: We analyzed a cohort of 3454 patients with ADRD from a memory clinic at Massachusetts General Hospital, each with a specialist diagnosis. Clustering pretrained embeddings of the non-ADRD diagnosis codes in patient EHRs revealed the following 3 patient subtypes: one with skin conditions, another with psychiatric disorders and an earlier age of onset, and a third with diabetes complications. Similarly, using LLM-derived embeddings of clinical notes, we identified 3 subtypes of patients as follows: one with psychiatric manifestations and higher prevalence of female participants (prevalence ratio: 1.59), another with cardiovascular and motor problems and higher prevalence of male participants (prevalence ratio: 1.75), and a third one with geriatric health disorders. Notably, we observed significant overlap between clusters from both data modalities (χ24=89.4; P<.001). Conclusions: By integrating International Classification of Diseases, Ninth Revision codes and LLM-derived embeddings, our analysis delineated 2 distinct ADRD subtypes with sex-specific comorbid and clinical presentations, offering insights for potential precision medicine approaches. %R 10.2196/65178 %U https://aging.jmir.org/2025/1/e65178 %U https://doi.org/10.2196/65178