Determinants of Visual Impairment Among Chinese Middle-Aged and Older Adults: Risk Prediction Model Using Machine Learning Algorithms

doi:10.2196/59810

¹School of Public Health, Shanghai University of Traditional Chinese Medicine, , 1200 Cai Lun Road, Shanghai, , China

²Monash e-Research Centre, Faculty of Engineering, Airdoc Research, Nvidia AI Technology Research Centre, Monash University, , Melbourne, , Australia

³Nutrition and Dietetics Program, Department of Individual, Family, and Community Education, University of New Mexico, , Albuquerque, NM, , United States

⁴Department of Social and Behavioral Health, School of Public Health, University of Nevada, , Las Vegas, NV, , United States

⁵Department of Internal Medicine, Kirk Kerkorian School of Medicine, University of Nevada, , Las Vegas, NV, , United States

⁶School of Translational Medicine, Faculty of Medicine, Nursing and Health Sciences, Monash University, , Melbourne, Victoria, , Australia

⁷Artificial Intelligence and Modelling in Epidemiology Program, Melbourne Sexual Health Centre, Alfred Health, , Carlton, Victoria, , Australia

*these authors contributed equally

Corresponding Author:

Xianglong Xu, PhD

Background: Visual impairment (VI) is a prevalent global health issue, affecting over 2.2 billion people worldwide, with nearly half of the Chinese population aged 60 years and older being affected. Early detection of high-risk VI is essential for preventing irreversible vision loss among Chinese middle-aged and older adults. While machine learning (ML) algorithms exhibit significant predictive advantages, their application in predicting VI risk among the general middle-aged and older adult population in China remains limited.

Objective: This study aimed to predict VI and identify its determinants using ML algorithms.

Methods: We used 19,047 participants from 4 waves of the China Health and Retirement Longitudinal Study (CHARLS) that were conducted between 2011 and 2018. To envisage the prevalence of VI, we generated a geographical distribution map. Additionally, we constructed a model using indicators of a self-reported questionnaire, a physical examination, and blood biomarkers as predictors. Multiple ML algorithms, including gradient boosting machine, distributed random forest, the generalized linear model, deep learning, and stacked ensemble, were used for prediction. We plotted receiver operating characteristic and calibration curves to assess the predictive performance. Variable importance analysis was used to identify key predictors.

Results: Among all participants, 33.9% (6449/19,047) had VI. Qinghai, Chongqing, Anhui, and Sichuan showed the highest VI rates, while Beijing and Xinjiang had the lowest. The generalized linear model, gradient boosting machine, and stacked ensemble achieved acceptable area under curve values of 0.706, 0.710, and 0.715, respectively, with the stacked ensemble performing best. Key predictors included hearing impairment, self-expectation of health status, pain, age, hand grip strength, depression, night sleep duration, high-density lipoprotein cholesterol, and arthritis or rheumatism.

Conclusions: Nearly one-third of middle-aged and older adults in China had VI. The prevalence of VI shows regional variations, but there are no distinct east-west or north-south distribution differences. ML algorithms demonstrate accurate predictive capabilities for VI. The combination of prediction models and variable importance analysis provides valuable insights for the early identification and intervention of VI among Chinese middle-aged and older adults.

JMIR Aging 2024;7:e59810

doi:10.2196/59810

Keywords

visual impairment (22); China (294); middle-aged and elderly adults (1); machine learning (1742); prediction model (107)

Visual impairment (VI) represents a significant global public health challenge. Over the period from 1990 to 2019, the burden index of VI has escalated for individuals aged 50‐74 years and individuals aged 75 years and older, shifting from the 20th and 16th positions to the 19th and 15th positions, respectively [GBD 2019 Diseases and Injuries Collaborators. Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet. Oct 17, 2020;396(10258):1204-1222. [CrossRef] [Medline]1]. The global increase in VI prevalence is primarily attributed to cataracts and uncorrected refractive errors, accounting for 55% of blindness cases and 77% of VI cases among adults aged 50 years and older in 2015 [Flaxman SR, Bourne RRA, Resnikoff S, et al. Global causes of blindness and distance vision impairment 1990–2020: a systematic review and meta-analysis. Lancet Glob Health. Dec 2017;5(12):e1221-e1234. [CrossRef]2]. This trend is further exacerbated by population growth and aging [Flaxman SR, Bourne RRA, Resnikoff S, et al. Global causes of blindness and distance vision impairment 1990–2020: a systematic review and meta-analysis. Lancet Glob Health. Dec 2017;5(12):e1221-e1234. [CrossRef]2]. According to the National Bureau of Statistics of China’s January 2022 report, adults aged 60 years and older accounted for 18.9% of the total population by the end of 2021.

Meanwhile, adults aged 65 years and older exceeded 200 million, representing 14.2% of the total population [NHC: China made solid progress in elderly care over past decade. The State Council of the People’s Republic of China. Sep 20, 2022. URL: https://english.www.gov.cn/statecouncil/ministries/202209/20/content_WS6329c182c6d0a757729e0446.html [Accessed 2023-11-22] 3]. It is projected that during the “14th Five-Year Plan” period, the total number of adults aged 60 years and older will surpass 300 million, accounting for over 20% of the population, indicating a transition into the moderate aging phase. By around 2035, China is anticipated to enter the severe aging phase [NHC: China made solid progress in elderly care over past decade. The State Council of the People’s Republic of China. Sep 20, 2022. URL: https://english.www.gov.cn/statecouncil/ministries/202209/20/content_WS6329c182c6d0a757729e0446.html [Accessed 2023-11-22] 3]. Older adults with VI are at a heightened risk of falls [Jin H, Zhou Y, Stagg BC, Ehrlich JR. Association between vision impairment and increased prevalence of falls in older US adults. J Am Geriatr Soc. May 2024;72(5):1373-1383. [CrossRef] [Medline]4], potentially leading to fractures and severe outcomes such as cerebral hemorrhage. Furthermore, VI can hinder social engagement among older adults, possibly giving rise to more profound mental health issues, including depression and anxiety [Almidani L, Miller R, Varadaraj V, Mihailovic A, Swenor BK, Ramulu PY. Vision impairment and psychosocial function in US adults. JAMA Ophthalmol. Apr 1, 2024;142(4):283-291. [CrossRef] [Medline]5]. As the population ages, the prevalence of VI is expected to increase dramatically. However, half of all VI cases are estimated to be preventable or treatable [Blindness and vision impairment. World Health Organization. 2023. URL: https://www.who.int/news-room/fact-sheets/detail/blindness-and-visual-impairment [Accessed 2023-11-29] 6]. Hence, bolstering screening efforts and enhancing risk prediction for VI is paramount in stemming the tide of this growing concern.

In response to the crises posed by VI and to improve public visual health, the China National Health Commission has released the 14th Five-Year National Eye Health Plan (2021‐2025). The plan focuses on enhancing eye health information platforms and promoting the harmonious integration of big data, artificial intelligence, and ophthalmology services to advance the early detection of eye diseases [Circular of the National Health Commission on the issuance of the “14th five-year plan” for National Eye Health (2021-2025) [Article in Chinese]. Central People’s Government of the People’s Republic of China. 2022. URL: https://www.gov.cn/zhengce/zhengceku/2022-01/17/content_5668951.htm [Accessed 2023-11-28] 7]. Through the development of machine learning (ML) prediction models for VI, the precise determination of VI risk and identification of influencing risk factors can be achieved. ML could offer new insights for early detection and timely intervention of retinopathy, as well as for the integrated management of ocular health in older adults, ultimately enhancing the overall eye health status of the population.

Artificial intelligence has experienced swift progress in recent years, resulting in extensive use of diverse ML algorithms in clinical research [Farah L, Borget I, Martelli N, Vallee A. Suitability of the current health technology assessment of innovative artificial intelligence-based medical devices: scoping literature review. J Med Internet Res. May 13, 2024;26:e51514. [CrossRef] [Medline]8,Alexander N, Aftandilian C, Guo LL, et al. Perspective toward machine learning implementation in pediatric medicine: mixed methods study. JMIR Med Inform. Nov 17, 2022;10(11):e40039. [CrossRef] [Medline]9]. Compared with traditional statistical methods, ML algorithms can handle more complex nonlinear relationships, interactions, and multiple covariances, significantly improving the predictive ability of artificial intelligence models [Rajula HSR, Verlato G, Manchia M, Antonucci N, Fanos V. Comparison of conventional statistical methods with machine learning in medicine: diagnosis, drug development, and treatment. Medicina (Kaunas). Sep 8, 2020;56(9):455. [CrossRef]10,Xi Y, Wang H, Sun N. Machine learning outperforms traditional logistic regression and offers new possibilities for cardiovascular risk prediction: a study involving 143,043 Chinese patients with hypertension. Front Cardiovasc Med. 2022;9:1025705. [CrossRef] [Medline]11].

Despite the advantages of ML algorithms, there is an absence of the use of ML algorithms to predict the risk of VI in the general middle-aged and older adult population in China. Previous studies on predicting VI have focused on various topics, including examining trends in the incidence of VI among populations [GBD 2019 Blindness and Vision Impairment Collaborators, Vision Loss Expert Group of the Global Burden of Disease Study. Trends in prevalence of blindness and distance and near vision impairment over 30 years: an analysis for the Global Burden of Disease Study. Lancet Glob Health. Feb 2021;9(2):e130-e143. [CrossRef] [Medline]12,Fricke TR, Jong M, Naidoo KS, et al. Global prevalence of visual impairment associated with myopic macular degeneration and temporal trends from 2000 through 2050: systematic review, meta-analysis and modelling. Br J Ophthalmol. Jul 2018;102(7):855-862. [CrossRef] [Medline]13]; assessing the risk of VI in specific groups, such as those with congenital cytomegalovirus infection [Hsia Y, Lin YY, Wang BS, Su CY, Lai YH, Hsieh YT. Prediction of visual impairment in epiretinal membrane and feature analysis: a deep learning approach using optical coherence tomography. Asia Pac J Ophthalmol (Phila). 2023;12(1):21-28. [CrossRef] [Medline]14-Jin HD, Demmler-Harrison GJ, Miller J, et al. Cortical visual impairment in congenital cytomegalovirus infection. J Pediatr Ophthalmol Strabismus. May 22, 2019;56(3):194-202. [CrossRef] [Medline]16]; and further predicting the risk of developing particular types of VI [Liu TYA, Ling C, Hahn L, Jones CK, Boon CJ, Singh MS. Prediction of visual impairment in retinitis pigmentosa using deep learning and multimodal fundus images. Br J Ophthalmol. Oct 2023;107(10):1484-1489. [CrossRef] [Medline]17]. However, these studies have not focused on predicting the individual risk of VI among the general population. In current research on predicting individual risk of VI among the general population, three studies [Zhao Y, Wang A. Development and validation of a risk prediction model for visual impairment in older adults. Int J Nurs Sci. Jul 2023;10(3):383-390. [CrossRef] [Medline]18-Burkemper B, Torres M, Jiang X, McKean-Cowdin R, Varma R. Factors associated with visual impairment in Chinese American adults: the Chinese American eye study. Ophthalmic Epidemiol. Oct 2019;26(5):329-335. [CrossRef] [Medline]20] based on traditional statistical methods and two studies [Tham YC, Anees A, Zhang L, et al. Referral for disease-related visual impairment using retinal photograph-based deep learning: a proof-of-concept, model development study. Lancet Digit Health. Jan 2021;3(1):e29-e40. [CrossRef] [Medline]21,Chen W, Li R, Yu Q, et al. Early detection of visual impairment in young children using a smartphone-based deep learning system. Nat Med. Feb 2023;29(2):493-503. [CrossRef] [Medline]22] based on ML algorithms have achieved good predictive performance. Among these, two studies [Burkemper B, Torres M, Jiang X, McKean-Cowdin R, Varma R. Factors associated with visual impairment in Chinese American adults: the Chinese American eye study. Ophthalmic Epidemiol. Oct 2019;26(5):329-335. [CrossRef] [Medline]20,Tham YC, Anees A, Zhang L, et al. Referral for disease-related visual impairment using retinal photograph-based deep learning: a proof-of-concept, model development study. Lancet Digit Health. Jan 2021;3(1):e29-e40. [CrossRef] [Medline]21] lacked Chinese populations, with the studied populations being mainly from the United States and Singapore; one focused on Chinese children [Chen W, Li R, Yu Q, et al. Early detection of visual impairment in young children using a smartphone-based deep learning system. Nat Med. Feb 2023;29(2):493-503. [CrossRef] [Medline]22]; one was a single-center study [Zhao Y, Wang A. Development and validation of a risk prediction model for visual impairment in older adults. Int J Nurs Sci. Jul 2023;10(3):383-390. [CrossRef] [Medline]18]; and one [Zhao Y, Yu R, Sun C, et al. Nomogram model predicts the risk of visual impairment in diabetic retinopathy: a retrospective study. BMC Ophthalmol. Dec 8, 2022;22(1):36482340. [CrossRef]19] had a small sample size of 133 participants. To date, no research has yet been conducted on using ML algorithms to predict VI in Chinese middle-aged and older adults. Therefore, our objective was to develop an individual risk prediction model for VI, which could be used to assess the risk of VI among China’s general middle-aged and older population. Additionally, we aimed to identify key predictors of VI. Our findings could be used to provide personalized intervention guidance for health care professionals, aiming to reduce and delay the onset of retinal diseases among middle-aged and older adults.

Analytic Sample

The data used in our study originate from the China Health and Retirement Longitudinal Study (CHARLS), a longitudinal survey that represents a nationally diverse cohort of Chinese adults aged 45 years and older. This survey strives to establish a comprehensive public database documenting Chinese adults’ social, economic, and health statuses, thereby bolstering scientific investigations conducted by the National Development Institute of Peking University. The CHARLS project executed a nationwide baseline survey between 2011 and 2012, with subsequent follow-up visits occurring biennially [Zhao Y, Hu Y, Smith JP, Strauss J, Yang G. Cohort profile: the China Health and Retirement Longitudinal Study (CHARLS). Int J Epidemiol. Feb 2014;43(1):61-68. [CrossRef] [Medline]23]. The CHARLS baseline survey encompassed 450 villages and neighborhoods spread across 150 counties in China. The sampling process encompassed multiple levels, including counties, villages, households, and individuals, culminating in interviews with 10,257 households and broadly reflecting the general Chinese middle-aged and older adult populace. We used 25,538 observations from 4 waves of surveys between 2011 and 2018. Excluding ages younger than 45 years (n=4079) and missing information on self-reported vision conditions (n=2457), 19,047 participants were included in this study. More details of the sampling process are shown in Figure S1A in

Multimedia Appendix 1

Supplementary tables and figures containing further data on participant characteristics, visual impairment prevalence, predictive factors, model performance, hyperparameters, the sampling and study processes, and receiver operating characteristic curves of prediction models on the training dataset.

DOCX File, 440 KB Multimedia Appendix 1.

Predictors of VI

Drawing on existing literature and expert insights, 42 predictors were used for ML algorithm training. The predictors were categorized as follows: self-reported questionnaire, physical examination, and blood biomarkers. The self-reported questionnaire included (1) demographic factors (gender, age, and region), [Sun M, Bo Q, Lu B, Sun X, Zhou M. The association of sleep duration with vision impairment in middle-aged and elderly adults: evidence from the China Health and Retirement Longitudinal Study. Front Med (Lausanne). 2021;8:778117. [CrossRef] [Medline]24] (2) lifestyle (night sleep duration, smoking, and drinking), (3) health status factors (pain, weight change, health status during childhood, and self-expectations of health status), [Gu Y, Cheng H, Liu X, Dong X, Congdon N, Ma X. Prevalence of self-reported chronic conditions and poor health among older adults with and without vision impairment in China: a nationally representative cross-sectional survey. BMJ Open Ophthalmol. Mar 2023;8(1):e001211. [CrossRef] [Medline]25] (4) disease factors (depression, hearing impairment, hypertension, dyslipidemia, diabetes, liver disease, heart disease, stroke, kidney disease, stomach or other digestive disease, memory-related disease, arthritis or rheumatism, menopause, and prostatic diseases), (5) living environment factors (house structure, heating energy, cooking energy, and room temperature), and (6) socioeconomic factors (standard of living and education level). Measurement parameters included (1) physical examination data ([Shang X, Wu G, Wang W, et al. Associations of vision impairment and eye diseases with frailty in community-dwelling older adults: a nationwide longitudinal study in China. Br J Ophthalmol. Dec 19, 2022;108(2):310-316. [CrossRef] [Medline]26] hand grip strength, waist, and BMI) and (2) blood biomarker data (white blood cell, platelets, glycated hemoglobin, glucose, total cholesterol, triglycerides, high-density lipoprotein cholesterol, and low-density lipoprotein cholesterol). The characteristics of the predictor distribution in the study are displayed in Table S1 in

Multimedia Appendix 1

DOCX File, 440 KB Multimedia Appendix 1.

Measurement of VI

Vision encompasses both far and near eyesight. VI in our study was assessed using the following questions from the CHARLS questionnaire: (1) “How well do you see things in the distance? For example, can you recognize a friend across the road (even with glasses on)? Is it excellent, very good, good, fair, or bad?” and (2) “How well do you see things close up? For example, can you read a newspaper with your glasses on? Is it excellent, very good, good, fair, or bad?” Respondents who answered “not good” to any of the questions were categorized as having a VI, while those who answered “excellent” to “fair” were considered to have no VI [Shang X, Zhu Z, Wang W, He M. Associations of vision impairment and eye diseases with memory decline over 4 years in China and the United States. Am J Ophthalmol. Aug 2021;228:16-26. [CrossRef] [Medline]27].

Statistical Analysis

We used R (version 4.3.1), developed by the R Core Team of the R Foundation for Statistical Computing, for statistical analysis and model development. The summary of continuous variables involved the use of the median and IQR (25th and 75th percentiles), and categorical variables were summarized by providing the count (n) and proportion (%) for each category. We used the R H2O package to construct various ML predictive models for a dichotomous outcome of VI [Convergence of the world’s best predictive and generative AI for private, protected data. H2O.ai. URL: https://h2o.ai/ [Accessed 2024-07-01] 28]. H2O supports a wide range of ML models, including deep learning (DL), gradient boosting machine (GBM), distributed random forest (DRF), and more. In our study, we chose the generalized linear model (GLM) as our benchmark model, representing logistic regression. The study process is shown in Figure S1B in

Multimedia Appendix 1

DOCX File, 440 KB Multimedia Appendix 1. As per the No Free Lunch Theorem [], no algorithm can outperform a linear enumeration of the search space or a purely random search algorithm. Thus, we split the dataset into training (n=14,286) and testing (n=4761) datasets at a 75:25 ratio. The training dataset was used to develop various models, including a GLM with regularization to prevent overfitting and enhance the model’s predictive accuracy. GBM uses decision trees as weak learners and boosts their predictions iteratively []. DRF incorporates both DRF and extremely randomized tree approaches to ensure diversity and robustness in the ensemble []. The DL model consists of a fully connected multilayer artificial neural network trained with backpropagation to capture complex nonlinear relationships [,]. A stacked ensemble combines the predictions of these individual models as input features for the ensemble’s meta-learner. The meta-learner then outputs a final prediction based on the learned weights of each base model’s contribution, enhancing overall prediction performance []. The random forest algorithm was used to impute missing values, while the continuous variables were normalized. In this study, the ratio of positive to negative outcomes in the target variable was 1:2, indicating an imbalanced dataset. To address data imbalance, random oversampling of the minority class was initially used []. Furthermore, to mitigate overfitting and enhance model generalization [], external 5-fold cross-validation was implemented. However, the model’s performance did not show improvement compared with the no-resampling, blending mode. Therefore, we trained the stacked combinations using the no-resampling and blending mode, plotted the receiver operating characteristic (ROC) curves, and constructed the confusion matrix. We used the area under the curve (AUC) to evaluate the best model, with an acceptable AUC of 0.7‐0.8, a good AUC of 0.8‐0.9, and an excellent AUC of >0.9 []. We calibrated the probabilities predicted by the models to the actual occurrence level in the testing dataset using a logistic function and calculated the Brier score to assess the reliability of the prediction of VI []. The Brier score takes values from 0 to 1, and at a predicted probability of 50%, the Brier score is 0.25 []. A model score between 0 and 0.25 indicates a correct prediction, and a score closer to 0 indicates better model effectiveness. Additionally, we used models with acceptable AUCs for variable importance analysis. This enabled us to quantitatively assess the contribution of each feature towards model predictions, thereby allowing us to evaluate and compare the significance of various features.

Ethical Considerations

The Peking University Institutional Review Board (IRB) granted ethical approval for all waves of the CHARLS. The IRB approval number for the self-reported questionnaire (including physical examination measurements) was IRB00001052-11015; the IRB approval number for the biomarker collection was IRB00001052-11014.

Geographical Distribution of VI

Figure 1 presents the prevalence of VI by province in China, based on data from the 4 waves of the CHARLS conducted between 2011 and 2018. Qinghai, Chongqing, Anhui, and Sichuan provinces reported a high prevalence of VI, with rates exceeding 40% (69/153, 45.1%; 117/265, 44.2%; 390/930, 41.9%; and 654/1619, 40.4%, respectively). In contrast, Xinjiang and Beijing had a low prevalence of VI, with rates below 20% (23/116, 19.8% and 14/101, 13.8%, respectively). The remaining provinces, municipalities, and autonomous regions exhibited a moderate prevalence of VI, ranging from 20% to 40%. Additional information regarding the prevalence of VI in each province is presented in Table S2 in

Multimedia Appendix 1

DOCX File, 440 KB Multimedia Appendix 1.

**Figure 1.** The prevalence of VI by province in China from the China Health and Retirement Longitudinal Study (2011‐2018) 4 waves. VI: visual impairment.

Characteristics of the Study Participants

A total of 33.9% (6449/19,047) of participants reported VI, divided between the training dataset (n=4837) and the testing dataset (n=1612). Among the cases of VI, 58.8% (3795/6449) were female, and 41.2% (2654/6449) were male. The age group with the highest prevalence of VI was 55‐65 years, accounting for 39.1% (2520/6449), followed by those aged 65 years or older at 31.4% (2025/6449) and 45‐55 years at 29.5% (1904/6449). The selected characteristics of the study participants are shown in Table 1. The full characteristics of the predictor distribution in the study are shown in Table S1 in

Multimedia Appendix 1

DOCX File, 440 KB Multimedia Appendix 1.

Table 1. Selected characteristics of study participants among Chinese adults older than 45 years as drawn from the China Health and Retirement Longitudinal Study (2011‐2018; N=19,047).

Characteristic		Overall (N=19,047), n (%)	Non-VI^a (n=12,598), n (%)	VI (n=6449), n (%)	P value^b
Sex					<.001
	Female	9927 (52.1)	6132 (48.7)	3795 (58.8)
	Male	9120 (47.9)	6466 (51.3)	2654 (41.2)
Age (years)					<.001
	45‐55	7279 (38.2)	5375 (42.7)	1904 (29.5)
	55‐65	6794 (35.7)	4274 (33.9)	2520 (39.1)
	≥65	4974 (26.1)	2949 (23.4)	2025 (31.4)
Region					<.001
	East	6995 (36.7)	4486 (35.6)	2509 (38.9)
	Central	6315 (33.2)	4355 (34.6)	1960 (30.4)
	West	5737 (30.1)	3757 (29.8)	1980 (30.7)
Education level					<.001
	Less than elementary school	8368 (43.9)	4869 (38.6)	3499 (54.3)
	Elementary school	4113 (21.6)	2805 (22.3)	1308 (20.3)
	Middle school	3971 (20.8)	2909 (23.1)	1062 (16.5)
	High school or above	2595 (13.6)	2015 (16.0)	580 (9.0)
Standard of living					<.001
	Poor	2260 (11.9)	1182 (9.4)	1078 (16.7)
	Relatively poor	5824 (30.6)	3768 (29.9)	2056 (31.9)
	Average	10,420 (54.7)	7257 (57.6)	3163 (49.0)
	Relatively high	507 (2.7)	365 (2.9)	142 (2.2)
	Very high	36 (0.2)	26 (0.2)	10 (0.2)

^aVI: visual impairment.

^bPearson χ² test.

VI Prediction

We applied the trained models to the testing dataset. The distribution of predictor variables between the testing and training datasets is detailed in Table S3 in

Multimedia Appendix 1

DOCX File, 440 KB Multimedia Appendix 1. The results indicate that the ensemble model demonstrates superior predictive performance compared with individual ML models. Three algorithms, namely the GLM, GBM, and stacked ensemble model (GBM-XGBoost-GLM-DL-DRF), achieved acceptable AUC values 0.706, 0.710, and 0.715, respectively. The ensemble model exhibited the best performance. However, the DRF and DL models did not meet the acceptable AUC threshold of 0.70, achieving an AUC of 0.698. Detailed evaluation metrics for all models on the testing dataset are provided in Table S4 in . depicts the ROC curves for all the models. The hyperparameters used in model training are summarized in Table S5 in . ROC curves of all VI prediction models on the training dataset are shown in Figure S2 in .

**Figure 2.** Receiver operating characteristic curves of all visual impairment prediction models on the testing dataset. AUC: area under the curve; DL: deep learning; DRF: distributed random forest; GBM: gradient boosting machine; GLM: generalized linear model; ROC: receiver operating characteristic; StackedEnsemble: GBM-XGBoost-GLM-DL-DRF.

Assessing the Efficacy of ML Models for VI Prediction

Figure 3 presents the calibration curves of all the models on the testing dataset. These curves illustrate the agreement between each model’s predicted probabilities and VI’s observed probabilities in the testing data. The results indicated that all models accurately predicted VI, as evidenced by their Brier scores being less than 0.25.

**Figure 3.** The calibration curves of all visual impairment prediction models on the testing dataset. DL: deep learning; DRF: distributed random forest; GBM: gradient boosting machine; GLM: generalized linear model; StackedEnsemble: GBM-XGBoost-GLM-DL-DRF.

Determinants of VI

Table 2 presents the importance of predictors for VI in models with acceptable AUCs. The top 10 important predictors for VI were identified by the GLM, including hearing impairment, pain, depression, hand grip strength, standard of living, education level, age, self-expectation of health status, night sleep duration, and arthritis or rheumatism. For GBM, the top predictors for VI were hearing impairment, self-expectation of health status, pain, age, hand grip strength, depression, night sleep duration, hemoglobin, high-density lipoprotein cholesterol, and arthritis or rheumatism. Notably, hearing impairment, self-expectation of health status, pain, age, hand grip strength, depression, night sleep duration, and arthritis or rheumatism have emerged as common predictors for these models.

Table 2. Variable importance analysis was performed by the generalized linear model (GLM) and gradient boosting machine (GBM), sorted in descending order.

Rank	GLM	GBM
1	Hearing impairment	Hearing impairment
2	Pain	Self-expectations of health status
3	Depression	Pain
4	Hand grip strength	Age
5	Standard of living	Hand grip strength
6	Education level	Depression
7	Age	Night sleep duration
8	Self-expectations of health status	Hemoglobin
9	Night sleep duration	High-density lipoprotein cholesterol
10	Arthritis or rheumatism	Arthritis or rheumatism

Principal Results

To our knowledge, this study is the first attempt to predict the risk of VI among middle-aged and older adults in China using ML algorithms. The findings indicated that ML algorithms could accurately identify individuals at risk of VI among this demographic. Our results also showed that ensemble algorithms proved superior to individual ML models. Furthermore, we calculated the prevalence of VI and presented it through regional visualization. The results indicated the existence of regional disparities in the prevalence of VI among Chinese middle-aged and older adults, with varying rates across provinces. However, it is worth noting that we did not find any significant north-south or east-west directional differences in prevalence. Based on these results, gaining a more intuitive understanding of the regional distribution of VI is possible. Additionally, our prediction model can be leveraged to develop a risk assessment tool for early detection of VI. Predictors’ importance could help guide personalized early interventions for middle-aged and older individuals at risk of VI.

Comparison With Prior Work

Our study showed that GBM outperformed logistic regression in predicting VI among middle-aged and older Chinese adults. Previous ML studies on VI prediction have predominantly relied on single algorithms without comparing the predictive performance across multiple algorithms. In contrast, our work used various ML algorithms for VI prediction and assessed their comparative effectiveness. Furthermore, our findings highlighted the superior performance of ensemble algorithms over individual learning models despite an accuracy of 0.625. This is due to the imbalanced nature of our dataset, where accuracy may not fully reflect the model’s ability to predict VI. We chose AUC as the primary metric and found that the ensemble model had the highest AUC, indicating its overall superior ability to rank positive and negative samples.

The calibration curves showed satisfactory calibration but moderate discrimination, indicating the models’ ability to assess the overall risk for VI in the population effectively. This finding has significant implications for designing resource allocation strategies in clinical and public health settings. While the models’ discriminatory power was not exceptional, they could be used as supplementary tools in broader clinical or public health assessment frameworks, providing valuable insights for early screening of VI risk. Nevertheless, we acknowledged and considered the limitations of these models when applying them in practice. Consequently, our research contributed significantly to future investigations of ML algorithms for VI prediction by providing a comprehensive evaluation of multiple algorithms and emphasizing the advantages of ensemble methods. Additionally, our work offers technical guidance for the primary prevention of VI by identifying the most effective predictive models.

Our study has the following strengths compared with previous ML models for VI prediction. First, our model is more adept at identifying potentially modifiable risk factors. Unlike prior studies that primarily relied on image or video data as predictors [Tham YC, Anees A, Zhang L, et al. Referral for disease-related visual impairment using retinal photograph-based deep learning: a proof-of-concept, model development study. Lancet Digit Health. Jan 2021;3(1):e29-e40. [CrossRef] [Medline]21,Chen W, Li R, Yu Q, et al. Early detection of visual impairment in young children using a smartphone-based deep learning system. Nat Med. Feb 2023;29(2):493-503. [CrossRef] [Medline]22], our approach incorporates easily accessible everyday information, such as lifestyle factors like night sleep duration. Second, our predictive model holds greater representativeness and general applicability for the middle-aged and older adult Chinese population. Previous domestic studies on VI prediction tended to focus on specific disease groups (stroke patients) or particular types of VI (anterior retinal VI) [Hsia Y, Lin YY, Wang BS, Su CY, Lai YH, Hsieh YT. Prediction of visual impairment in epiretinal membrane and feature analysis: a deep learning approach using optical coherence tomography. Asia Pac J Ophthalmol (Phila). 2023;12(1):21-28. [CrossRef] [Medline]14,Xu J, Wu Z, Nurnberger A, Sabel BA. Interhemispheric cortical network connectivity reorganization predicts vision impairment in stroke. Annu Int Conf IEEE Eng Med Biol Soc. Nov 2021:836-840. [CrossRef] [Medline]15]. Our study, based on a national sample, includes individuals from the middle-aged and older adult demographics, and our results pertain to general types of VI.

We found that the important predictors of VI included hearing impairment, self-expectation of health status, pain, age, hand grip strength, depression, night sleep duration, high-density lipoprotein cholesterol, and arthritis or rheumatism. This finding aligns with those documented in prior research [Sun M, Bo Q, Lu B, Sun X, Zhou M. The association of sleep duration with vision impairment in middle-aged and elderly adults: evidence from the China Health and Retirement Longitudinal Study. Front Med (Lausanne). 2021;8:778117. [CrossRef] [Medline]24,Gu Y, Cheng H, Liu X, Dong X, Congdon N, Ma X. Prevalence of self-reported chronic conditions and poor health among older adults with and without vision impairment in China: a nationally representative cross-sectional survey. BMJ Open Ophthalmol. Mar 2023;8(1):e001211. [CrossRef] [Medline]25,Wong PWF, Lau JKP, Choy BNK, et al. Sociodemographic, behavioral, and medical risk factors associated with visual impairment among older adults: a community-based pilot survey in Southern District of Hong Kong. BMC Ophthalmol. Sep 18, 2020;20(1):372. [CrossRef] [Medline]40-Sun R, Huang D, Liu Z, et al. Prevalence, causes, and risk factors of presenting visual impairment and presenting blindness in adults presenting to an examination center in Suzhou, China. J Ophthalmol. 2022;2022:2885738. [CrossRef] [Medline]42]. Significantly, our study newly highlighted the crucial role of hearing impairment, self-expectation of health status, and pain in predicting VI. Although hearing impairment does not directly affect vision, its underlying causes may be associated with ocular or neurological disorders. For instance, neurofibromatosis occurring near the inner ear and optic nerve can potentially lead to concurrent hearing and VI [Alshoabi SA. Neurofibromatosis type-2 presenting with vision impairment. Pak J Med Sci. 2023;39(2):611-615. [CrossRef] [Medline]43]. The eyes and ears share a common neuroectodermal origin and exhibit similar genetic networks [Thiery A, Buzzi AL, Streit A. Cell fate decisions during the development of the peripheral nervous system in the vertebrate head. Curr Top Dev Biol. 2020;139:127-167. [CrossRef] [Medline]44]. When pathogenic mutations occur in these shared genes, they can concurrently affect the functions of both the eyes and ears, leading to dual sensory loss. For instance, defects in the development of inner ear hair cells and photoreceptor cells may underlie the pathogenesis of Usher syndrome, the most prevalent syndromic form of retinitis pigmentosa [Fuster-García C, García-Bohórquez B, Rodríguez-Muñoz A, et al. Usher syndrome: genetics of a human ciliopathy. Int J Mol Sci. Jun 23, 2021;22(13):6723. [CrossRef] [Medline]45]. In addition, self-expectation of health status reveals one’s attitudes towards personal health. High expectations often lead to proactive health behaviors, such as regular ophthalmologic check-ups for vision issues. In contrast, lower expectations, possibly due to comorbid chronic conditions [Gu Y, Cheng H, Liu X, Dong X, Congdon N, Ma X. Prevalence of self-reported chronic conditions and poor health among older adults with and without vision impairment in China: a nationally representative cross-sectional survey. BMJ Open Ophthalmol. Mar 2023;8(1):e001211. [CrossRef] [Medline]25], may cause psychological stress affecting vision [Zhang Q, Cao GY, Yao SS, et al. Self-reported vision impairment, vision correction, and depressive symptoms among middle-aged and older Chinese: findings from the China Health and Retirement Longitudinal Study. Int J Geriatr Psychiatry. Jan 2021;36(1):86-95. [CrossRef] [Medline]46]. For example, depression is more prevalent in patients with VI [Virgili G, Parravano M, Petri D, et al. The association between vision impairment and depression: a systematic review of population-based studies. J Clin Med. Apr 25, 2022;11(9):2412. [CrossRef] [Medline]47]. Additionally, pain serves as a vital physiological indicator, revealing certain discomforts or underlying ailments in the body. Prolonged pain can keep an individual in a constant state of stress, disrupting the normal functions of the immune and endocrine systems, thereby indirectly impacting vision. Our findings enhance comprehension of the mechanisms underlying the development of VI, enabling early identification of high-risk groups and the implementation of targeted interventions for these individuals. Moreover, our findings provide a valuable reference for selecting variables in constructing VI prediction models. Nevertheless, to ensure the accuracy and credibility of our findings, further studies are required to validate these associations.

Limitations

This study has several limitations that should be acknowledged. First, the assessment of VI relied on self-reported data, which may be subject to recall bias or subjectivity in responses. The qualitative nature of the VI assessment responses lacks the numerical precision of a quantitative evaluation, potentially affecting the accuracy of the outcome variable. Second, while environmental and lifestyle factors were considered, VI is also influenced by genetic factors, which were not included in the model due to the absence of such information in the database. Incorporating genetic data could potentially enhance the model’s predictive performance. Finally, the data were sourced from the CHARLS, which only represents the Chinese population aged 45 years and older. Consequently, the model’s generalizability to other age groups or populations outside of China remains uncertain. Further validation studies are necessary to evaluate the model’s effectiveness in diverse populations across different countries and age ranges.

Conclusions

The prevalence of VI was notably high among middle-aged and older Chinese adults, displaying regional disparities but no significant variances between north-south or east-west regions. Our study is the first to use ML algorithms in predicting VI among China’s general middle-aged and older population. The findings demonstrate that ML algorithms can accurately predict VI among this demographic. Ensemble algorithms outperform individual learning models in predicting VI. Variable importance analysis highlighted the importance of considering factors such as hearing impairment and individuals’ self-expectation of health status when predicting VI risk. By incorporating these predictors, our study facilitates the early identification of individuals at high risk for VI, enabling timely interventions and preventive measures to mitigate the development and progression of VI.

Acknowledgments

This document uses data from 4 waves of the China Health and Retirement Longitudinal Study (CHARLS) from 2011 to 2018. We thank the CHARLS research team and each respondent for contributing to this study. We are grateful for funding from the Traditional Chinese Medicine research project of the Shanghai Municipal Health Commission (2024QN108), Shanghai University of Traditional Chinese Medicine in 2021 (2021LK008), and Shanghai University of Traditional Chinese Medicine in 2024 (KECJ2024019).

Authors' Contributions

XX and LM conceived and designed the study. LM and XX cleaned the data and built the models and codes. LM wrote the first draft and edited the manuscript. LM and XX contributed to data cleaning. HS, HZ, and XX contributed to data validation and supervision. YZ, LL, MS, HS, HZ, and XX contributed to data interpretation and manuscript revision. All authors contributed to the preparation of the manuscript and approved the final manuscript.

Conflicts of Interest

None declared.

Multimedia Appendix 1

DOCX File, 440 KB

GBD 2019 Diseases and Injuries Collaborators. Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet. Oct 17, 2020;396(10258):1204-1222. [CrossRef] [Medline]
Flaxman SR, Bourne RRA, Resnikoff S, et al. Global causes of blindness and distance vision impairment 1990–2020: a systematic review and meta-analysis. Lancet Glob Health. Dec 2017;5(12):e1221-e1234. [CrossRef]
NHC: China made solid progress in elderly care over past decade. The State Council of the People’s Republic of China. Sep 20, 2022. URL: https://english.www.gov.cn/statecouncil/ministries/202209/20/content_WS6329c182c6d0a757729e0446.html [Accessed 2023-11-22]
Jin H, Zhou Y, Stagg BC, Ehrlich JR. Association between vision impairment and increased prevalence of falls in older US adults. J Am Geriatr Soc. May 2024;72(5):1373-1383. [CrossRef] [Medline]
Almidani L, Miller R, Varadaraj V, Mihailovic A, Swenor BK, Ramulu PY. Vision impairment and psychosocial function in US adults. JAMA Ophthalmol. Apr 1, 2024;142(4):283-291. [CrossRef] [Medline]
Blindness and vision impairment. World Health Organization. 2023. URL: https://www.who.int/news-room/fact-sheets/detail/blindness-and-visual-impairment [Accessed 2023-11-29]
Circular of the National Health Commission on the issuance of the “14th five-year plan” for National Eye Health (2021-2025) [Article in Chinese]. Central People’s Government of the People’s Republic of China. 2022. URL: https://www.gov.cn/zhengce/zhengceku/2022-01/17/content_5668951.htm [Accessed 2023-11-28]
Farah L, Borget I, Martelli N, Vallee A. Suitability of the current health technology assessment of innovative artificial intelligence-based medical devices: scoping literature review. J Med Internet Res. May 13, 2024;26:e51514. [CrossRef] [Medline]
Alexander N, Aftandilian C, Guo LL, et al. Perspective toward machine learning implementation in pediatric medicine: mixed methods study. JMIR Med Inform. Nov 17, 2022;10(11):e40039. [CrossRef] [Medline]
Rajula HSR, Verlato G, Manchia M, Antonucci N, Fanos V. Comparison of conventional statistical methods with machine learning in medicine: diagnosis, drug development, and treatment. Medicina (Kaunas). Sep 8, 2020;56(9):455. [CrossRef]
Xi Y, Wang H, Sun N. Machine learning outperforms traditional logistic regression and offers new possibilities for cardiovascular risk prediction: a study involving 143,043 Chinese patients with hypertension. Front Cardiovasc Med. 2022;9:1025705. [CrossRef] [Medline]
GBD 2019 Blindness and Vision Impairment Collaborators, Vision Loss Expert Group of the Global Burden of Disease Study. Trends in prevalence of blindness and distance and near vision impairment over 30 years: an analysis for the Global Burden of Disease Study. Lancet Glob Health. Feb 2021;9(2):e130-e143. [CrossRef] [Medline]
Fricke TR, Jong M, Naidoo KS, et al. Global prevalence of visual impairment associated with myopic macular degeneration and temporal trends from 2000 through 2050: systematic review, meta-analysis and modelling. Br J Ophthalmol. Jul 2018;102(7):855-862. [CrossRef] [Medline]
Hsia Y, Lin YY, Wang BS, Su CY, Lai YH, Hsieh YT. Prediction of visual impairment in epiretinal membrane and feature analysis: a deep learning approach using optical coherence tomography. Asia Pac J Ophthalmol (Phila). 2023;12(1):21-28. [CrossRef] [Medline]
Xu J, Wu Z, Nurnberger A, Sabel BA. Interhemispheric cortical network connectivity reorganization predicts vision impairment in stroke. Annu Int Conf IEEE Eng Med Biol Soc. Nov 2021:836-840. [CrossRef] [Medline]
Jin HD, Demmler-Harrison GJ, Miller J, et al. Cortical visual impairment in congenital cytomegalovirus infection. J Pediatr Ophthalmol Strabismus. May 22, 2019;56(3):194-202. [CrossRef] [Medline]
Liu TYA, Ling C, Hahn L, Jones CK, Boon CJ, Singh MS. Prediction of visual impairment in retinitis pigmentosa using deep learning and multimodal fundus images. Br J Ophthalmol. Oct 2023;107(10):1484-1489. [CrossRef] [Medline]
Zhao Y, Wang A. Development and validation of a risk prediction model for visual impairment in older adults. Int J Nurs Sci. Jul 2023;10(3):383-390. [CrossRef] [Medline]
Zhao Y, Yu R, Sun C, et al. Nomogram model predicts the risk of visual impairment in diabetic retinopathy: a retrospective study. BMC Ophthalmol. Dec 8, 2022;22(1):36482340. [CrossRef]
Burkemper B, Torres M, Jiang X, McKean-Cowdin R, Varma R. Factors associated with visual impairment in Chinese American adults: the Chinese American eye study. Ophthalmic Epidemiol. Oct 2019;26(5):329-335. [CrossRef] [Medline]
Tham YC, Anees A, Zhang L, et al. Referral for disease-related visual impairment using retinal photograph-based deep learning: a proof-of-concept, model development study. Lancet Digit Health. Jan 2021;3(1):e29-e40. [CrossRef] [Medline]
Chen W, Li R, Yu Q, et al. Early detection of visual impairment in young children using a smartphone-based deep learning system. Nat Med. Feb 2023;29(2):493-503. [CrossRef] [Medline]
Zhao Y, Hu Y, Smith JP, Strauss J, Yang G. Cohort profile: the China Health and Retirement Longitudinal Study (CHARLS). Int J Epidemiol. Feb 2014;43(1):61-68. [CrossRef] [Medline]
Sun M, Bo Q, Lu B, Sun X, Zhou M. The association of sleep duration with vision impairment in middle-aged and elderly adults: evidence from the China Health and Retirement Longitudinal Study. Front Med (Lausanne). 2021;8:778117. [CrossRef] [Medline]
Gu Y, Cheng H, Liu X, Dong X, Congdon N, Ma X. Prevalence of self-reported chronic conditions and poor health among older adults with and without vision impairment in China: a nationally representative cross-sectional survey. BMJ Open Ophthalmol. Mar 2023;8(1):e001211. [CrossRef] [Medline]
Shang X, Wu G, Wang W, et al. Associations of vision impairment and eye diseases with frailty in community-dwelling older adults: a nationwide longitudinal study in China. Br J Ophthalmol. Dec 19, 2022;108(2):310-316. [CrossRef] [Medline]
Shang X, Zhu Z, Wang W, He M. Associations of vision impairment and eye diseases with memory decline over 4 years in China and the United States. Am J Ophthalmol. Aug 2021;228:16-26. [CrossRef] [Medline]
Convergence of the world’s best predictive and generative AI for private, protected data. H2O.ai. URL: https://h2o.ai/ [Accessed 2024-07-01]
Wolpert DH, Macready WG. No free lunch theorems for optimization. IEEE Trans Evol Computat. Apr 1997;1(1):67-82. [CrossRef]
Ke G, Meng Q, Finley T, et al. LightGBM: a highly efficient gradient boosting decision tree. Presented at: 31st Conference on Neural Information Processing Systems (NIPS 2017); Dec 4-9, 2017; Long Beach, CA.
Rigatti SJ. Random forest. J Insur Med. 2017;47(1):31-39. [CrossRef] [Medline]
López OAM, López AM, Crossa J. Fundamentals of artificial neural networks and deep learning. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction. Springer International Publishing; 2022:379-425.
Jang HJ, Cho KO. Applications of deep learning for the analysis of medical data. Arch Pharm Res. Jun 2019;42(6):492-504. [CrossRef] [Medline]
Dey R, Mathur R. Ensemble learning method using stacking with base learner, a comparison. Presented at: International Conference on Data Analytics and Insights (ICDAI 2023); May 11-13, 2023; Kolkata, India. [CrossRef]
Longadge R, Dongre S. Class imbalance problem in data mining review. arXiv. Preprint posted online on May 8, 2013. [CrossRef]
Atabaki-Pasdar N, Ohlsson M, Viñuela A, et al. Predicting and elucidating the etiology of fatty liver disease: a machine learning modeling and validation study in the IMI DIRECT cohorts. PLoS Med. Jun 2020;17(6):e1003149. [CrossRef] [Medline]
Mandrekar JN. Receiver operating characteristic curve in diagnostic test assessment. J Thorac Oncol. Sep 2010;5(9):1315-1316. [CrossRef] [Medline]
Alba AC, Agoritsas T, Walsh M, et al. Discrimination and calibration of clinical prediction models: users’ guides to the medical literature. JAMA. Oct 10, 2017;318(14):1377-1384. [CrossRef] [Medline]
Brier GW. Verification of forecasts expressed in terms of probability. Mon Wea Rev. Jan 1950;78(1):1-3. [CrossRef]
Wong PWF, Lau JKP, Choy BNK, et al. Sociodemographic, behavioral, and medical risk factors associated with visual impairment among older adults: a community-based pilot survey in Southern District of Hong Kong. BMC Ophthalmol. Sep 18, 2020;20(1):372. [CrossRef] [Medline]
Shang X, Wu G, Wang W, et al. Associations of vision impairment and eye diseases with frailty in community-dwelling older adults: a nationwide longitudinal study in China. Br J Ophthalmol. Jan 29, 2024;108(2):310-316. [CrossRef] [Medline]
Sun R, Huang D, Liu Z, et al. Prevalence, causes, and risk factors of presenting visual impairment and presenting blindness in adults presenting to an examination center in Suzhou, China. J Ophthalmol. 2022;2022:2885738. [CrossRef] [Medline]
Alshoabi SA. Neurofibromatosis type-2 presenting with vision impairment. Pak J Med Sci. 2023;39(2):611-615. [CrossRef] [Medline]
Thiery A, Buzzi AL, Streit A. Cell fate decisions during the development of the peripheral nervous system in the vertebrate head. Curr Top Dev Biol. 2020;139:127-167. [CrossRef] [Medline]
Fuster-García C, García-Bohórquez B, Rodríguez-Muñoz A, et al. Usher syndrome: genetics of a human ciliopathy. Int J Mol Sci. Jun 23, 2021;22(13):6723. [CrossRef] [Medline]
Zhang Q, Cao GY, Yao SS, et al. Self-reported vision impairment, vision correction, and depressive symptoms among middle-aged and older Chinese: findings from the China Health and Retirement Longitudinal Study. Int J Geriatr Psychiatry. Jan 2021;36(1):86-95. [CrossRef] [Medline]
Virgili G, Parravano M, Petri D, et al. The association between vision impairment and depression: a systematic review of population-based studies. J Clin Med. Apr 25, 2022;11(9):2412. [CrossRef] [Medline]

‎

AUC: area under the curve

CHARLS: China Health and Retirement Longitudinal Study

DL: deep learning

DRF: distributed random forest

GBM: gradient boosting machine

GLM: generalized linear model

IRB: Institutional Review Board

ML: machine learning

ROC: receiver operating characteristic

VI: visual impairment

Edited by Yun Jiang; submitted 23.04.24; peer-reviewed by Min-Zhe Zhang, Siqi Mao, Weilin Xu; final revised version received 17.07.24; accepted 13.08.24; published 09.10.24.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Aging, is properly cited. The complete bibliographic information, a link to the original publication on https://aging.jmir.org, as well as this copyright and license information must be included.

This paper is in the following e-collection/theme issue:

Determinants of Visual Impairment Among Chinese Middle-Aged and Older Adults: Risk Prediction Model Using Machine Learning Algorithms