This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Aging, is properly cited. The complete bibliographic information, a link to the original publication on http://aging.jmir.org, as well as this copyright and license information must be included.
Fall-risk assessment is complex. Based on current scientific evidence, a multifactorial approach, including the analysis of physical performance, gait parameters, and both extrinsic and intrinsic risk factors, is highly recommended. A smartphone-based app was designed to assess the individual risk of falling with a score that combines multiple fall-risk factors into one comprehensive metric using the previously listed determinants.
This study provides a descriptive evaluation of the designed fall-risk score as well as an analysis of the app’s discriminative ability based on real-world data.
Anonymous data from 242 seniors was analyzed retrospectively. Data was collected between June 2018 and May 2019 using the fall-risk assessment app. First, we provided a descriptive statistical analysis of the underlying dataset. Subsequently, multiple learning models (Logistic Regression, Gaussian Naive Bayes, Gradient Boosting, Support Vector Classification, and Random Forest Regression) were trained on the dataset to obtain optimal decision boundaries. The receiver operating curve with its corresponding area under the curve (AUC) and sensitivity were the primary performance metrics utilized to assess the fall-risk score's ability to discriminate fallers from nonfallers. For the sake of completeness, specificity, precision, and overall accuracy were also provided for each model.
Out of 242 participants with a mean age of 84.6 years old (SD 6.7), 139 (57.4%) reported no previous falls (nonfaller), while 103 (42.5%) reported a previous fall (faller). The average fall risk was 29.5 points (SD 12.4). The performance metrics for the Logistic Regression Model were AUC=0.9, sensitivity=100%, specificity=52%, and accuracy=73%. The performance metrics for the Gaussian Naive Bayes Model were AUC=0.9, sensitivity=100%, specificity=52%, and accuracy=73%. The performance metrics for the Gradient Boosting Model were AUC=0.85, sensitivity=88%, specificity=62%, and accuracy=73%. The performance metrics for the Support Vector Classification Model were AUC=0.84, sensitivity=88%, specificity=67%, and accuracy=76%. The performance metrics for the Random Forest Model were AUC=0.84, sensitivity=88%, specificity=57%, and accuracy=70%.
Descriptive statistics for the dataset were provided as comparison and reference values. The fall-risk score exhibited a high discriminative ability to distinguish fallers from nonfallers, irrespective of the learning model evaluated. The models had an average AUC of 0.86, an average sensitivity of 93%, and an average specificity of 58%. Average overall accuracy was 73%. Thus, the fall-risk app has the potential to support caretakers in easily conducting a valid fall-risk assessment. The fall-risk score’s prospective accuracy will be further validated in a prospective trial.
Falls have a high prevalence among seniors, with 1/4 seniors aged 65 and above experiencing one fall per year [
Due to demographic changes associated with an aging population, the number of falls among older adults is expected to rise considerably. A recent study even reported an increased rate of death from falls. These researchers investigated data from people who died as a result of a fall. The data showed that the rate of deaths from falls increased by an average of 3.0% per year during 2007-2016 [
Fall-risk assessment is a complicated task. Current scientific evidence suggests that a multifactorial fall-risk assessment, including an analysis of mobility as well as extrinsic and intrinsic risk factors, is crucial [
Thus, a smartphone-based application, Lindera Mobilitätsanalyse (Lindera GmbH, Berlin, Germany), was developed to facilitate fall-risk assessment. As a stand-alone software, this app enables nursing staff to perform a structured fall-risk assessment that conforms to regulatory standards [
Further app-based, fall-risk assessment tools have been identified in the literature [
Fall-risk assessment tools should accurately discriminate fallers from nonfallers. Diagnostic accuracy relates to the fall-risk score’s ability to discriminate between faller and nonfaller status. The discriminative performance of fall-risk assessments has frequently been quantified using measures such as sensitivity, specificity, and the area under the curve (AUC). The validity of each assessment tool should be evaluated to interpret the results correctly. Currently, the diagnostic test accuracy of most existing fall-risk assessment tools appears to be modest [
This paper aimed to study the discriminative ability of the fall-risk score with the aid of learning models. These models were evaluated based on relevant performance metrics, such as the receiver operating curve and its area under the curve, using a real-world dataset containing subjects with and without a previous fall history.
The study was designed as a retrospective analysis of the Lindera user database. All study participants agreed to the collection of data presented in this publication by signing the terms and conditions for the use of Lindera as well as a written informed consent form. Lindera is compliant with the European Union General Data Protection Regulation. All data analyzed for the study were anonymized for statistical analysis.
The study sample consisted of seniors who completed a fall-risk assessment via the app between June 2018 and May 2019 and uploaded their data to the company’s user database. The app only provides analyses for customers who have signed a data processing contract. The company’s customers include nursing homes, outpatient nursing services, care support centers, and daycare institutions. Seniors were recruited and informed by nursing staff in these institutions.
To assure data quality and homogeneity among the study population, only participants aged 65 and above where analyzed, as this is seen as a relevant cut-off age for a higher occurrence of falls [
Due to the nonexperimental, retrospective, and anonymized study design, no ethical approval was needed.
Nurses can analyze a senior’s mobility according to the Tinetti test criteria [
Every risk factor within the analysis is considered in the fall-risk score, which is a metric scale ranging from 0-100 points. Per validated fall-risk models that have shown a good diagnostic test accuracy [
The fall-risk score assessment was completed using an app named Lindera Mobilitätsanalyse. The nursing staff was able to download the app for iOS (App Store) or Android (Google Play Store) mobile devices. The app was free to download, but to get the analysis results, care providers and study participants had to sign a data processing contract with the company and a declaration of consent following data protection law. The collaborating care provider covered the analysis costs. In Germany, care institutions have a prevention budget that provides a legal basis for them to fund appropriate solutions.
All data analyzed in this study were entered by the app’s users and stored on company servers hosted by Deutsche Telekom and located in Bonn, Germany. The Chief Technology Officer of Lindera and backend employees had access to the database and extracted anonymized data for scientific evaluation. No identifiable patient information has been or will be shared.
All statistical analyses were conducted using Python version 3.6.8 (Python Software Foundation, Wilmington, United States) with the aid of the Pandas library version 0.24.2. All modeling research was done using the scikit-learn machine learning library for Python, version 0.20.3. Python is widely used for conducting statistical analyses [
The ability to discriminate between fallers and nonfallers using the fall score feature alone was analyzed, prioritizing a high sensitivity. One of the best performance metrics for quantifying the accuracy of medical diagnostic tests, like the one considered here, is the receiver operating characteristic (ROC) [
To determine the ROC for the two-class classification model, we first calculated the confusion matrix for a predefined test dataset. Secondary performance metrics, like sensitivity, specificity, accuracy, and precision, can be easily calculated from the confusion matrix. Detailed descriptions of the concepts of the ROC, the confusion matrix, and secondary performance metrics, with a clear focus on the sensitivity-specificity trade-off, can be found in the supplementary materials section (see
In this study, we investigated and compared the following five models: Logistic Regression, Gaussian Naive Bayes, Gradient Boosting, Support Vector Classification, and Random Forest Classification. The primary reason for choosing these models was that they exhibit good selection capabilities over multiple model types and are well studied in applications of machine learning in the medical field [
The modeling pipeline was as follows. First, we partitioned the dataset into two subsets via a stratified random split. A total of 85% of the dataset went into a training-validation set (205 subjects) and 15% into a test set (37 subjects). We chose to perform a stratified split in order to ensure that the two classes had the same distribution in both subsets. Next, we performed a stratified k-fold cross-validation (with k=8 splits) [
The sample had a mean age of 84.6 years (SD 6.7), and 169/242 participants (69.9%) were female. A total of 139 seniors (57.4%) reported no previous falls (nonfaller), whereas 103 seniors (42.5%) reported at least one fall event in the last 12 months. There was no statistical difference in age (
The average fall-risk score was 29.5 points (SD 12.4). Fallers had an average fall-risk score of 36.7 (SD 11.6), while nonfallers had an average fall-risk score of 24.0 (SD 10.2). All analyzed subgroups showed a normal distribution (see
Fall-risk score histograms. Row A shows histograms for nonnormalized fall scores and Row B for standard-scaled fall scores. A1 and B1 show the nonfaller subgroup, A2 and B2 show the full dataset, and A3 and B3 show the faller subgroup.
We show the standard-scaled fall score distributions in
Skewness and kurtosis factors of the distributions are also shown in
The results of the k-fold stratified cross-validation are shown in
Results of the k-fold stratified cross-validation study (k=8).
|
LRa model | GNBb model | GBc model | RFd model | SVCe model | Overall average (SD) |
AUCf (SD) | 0.76 (0.09) | 0.76 (0.09) | 0.75 (0.09) | 0.74 (0.07) | 0.74 (0.09) | 0.75 (0.08) |
Sensitivity, % (SD) | 85.0 (4.0) | 85.0 (4.0) | 85.0 (5.0) | 85.0 (5.0) | 84.0 (4.0) | 85.0 (4.0) |
Specificity, % (SD) | 49.0 (10.0) | 49.0 (10.0) | 54.0 (17.0) | 50.0 (8.0) | 51.0 (15.0) | 50.0 (12.0) |
Accuracy, % (SD) | 71.0 (6.0) | 71.0 (6.0) | 68.0 (7.0) | 66.0 (8.0) | 71.0 (6.0) | 70.0 (7.0) |
Precision, % (SD) | 56.0 (5.0) | 56.0 (5.0) | 59.0 (8.0) | 56.0 (4.0) | 56.0 (6.0) | 57.0 (6.0) |
Cut-off probability (SD) | 0.31 (7.0) | 0.29 (0.07) | 0.38 (0.05) | 0.34 (0.06) | 0.27 (0.05) | 0.32 (0.06) |
Cut-off fall score points, mean (SD) | 25.3 (3.3) | 25.0 (3.0) | 29.5 (3.4) | 29.4 (6.0) | 27.4 (1.4) | 27.3 (3.4) |
aLR: Logistic Regression.
bGNB: Gaussian Naive Bayes.
cGB: Gradient Boosting.
dRF: Random Forest.
eSVC: Support Vector Classification.
fAUC: area under the curve.
In a final step, we considered the models with the best average cut-off probabilities as the optimal models. These optimal models were then trained on the full training-validation set (85% of the complete dataset), while test metrics were calculated on the remaining hold-out test set (15% of the complete dataset). Validation metrics for the individual models, together with the averages across all models, are presented in
Results of the test set metrics for the final models.
|
LRa model | GNBb model | GBc model | RFd model | SVCe model | Overall average (SD) |
AUCf | 0.9 | 0.9 | 0.85 | 0.84 | 0.84 | 0.86 (0.03) |
Sensitivity, % | 100.0 | 100.0 | 88.0 | 88.0 | 88.0 | 93.0 (6.0) |
Specificity, % | 52.0 | 52 | 62.0 | 56.0 | 57.0 | 58.0 (5.0) |
Accuracy, % | 73.0 | 73.0 | 73.0 | 70.0 | 76.0 | 73.0 (2.0) |
Precision, % | 62.0 | 62.0 | 64.0 | 61.0 | 67.0 | 63.0 (2.0) |
aLR: Logistic Regression.
bGNB: Gaussian Naive Bayes.
cGB: Gradient Boosting.
dRF: Random Forest.
eSVC: Support Vector Classification.
fAUC: area under the curve.
Confusion matrix averaged over all five models.
ROC curves and corresponding AUCs for the five models and the average over all five models.
The study’s main finding was that the fall-risk score exhibited a high discriminative ability to distinguish fallers from nonfallers across all six models evaluated. The models had an average AUC of 0.86, an average sensitivity of 93%, an average specificity of 58%, and an average accuracy of 73%. As discussed in the methods section, AUCs near 1 (0.8-0.9) indicate very good separability of the models and their corresponding features [
Our results provide a descriptive evaluation of the designed fall-risk score for a sample of very elderly seniors with a mean age of 84.6 years old (SD 6.7). This high average age may be because more than half of the sample (54.3%) were nursing home residents. A total of 14.1% were living in assisted living facilities, and 16.1% received ambulant care. Thus, a large share of the investigated population was in high need of care. The high percentage of fallers (42.5%) in the sample may also be attributable to these demographic characteristics. There is currently only limited data on fall rates among seniors of very high age. Rapp et al [
The average fall-risk score in this sample was 29.5 points (SD 12.4). The descriptive data analysis clearly shows that fallers had significantly higher fall-risk scores than nonfallers (
A large number of studies have evaluated the accuracy of fall-risk assessments [
It must be stated that all of our evaluated models achieved a specificity below 70%. This means that there is a tendency to report a higher risk of falling. This, in turn, could affect the fall prevention strategies recommended by the app. However, a lower specificity can be tolerated due to the noninvasive nature of fall prevention strategies, which often address general health issues. In other words, given the noninvasive nature of fall prevention interventions, falsely diagnosing someone as high risk is considered less detrimental than falsely categorizing someone as low risk (which would result in falls not being prevented). The primary goal of a fall-risk assessment tool is to identify people at a high risk of falling to minimize the occurrence of falls. Accordingly, we conclude that if a fall-risk assessment tool has a high sensitivity, it achieves its primary goal, even though the specificity is low. Thus, although the specificity is not ideal, the overall performance of the fall-risk score and its sensitivity-specificity trade-off meet the specific requirements of a tool for fall prevention.
The available research on the accuracy of fall-risk assessment tools exhibits high interstudy heterogeneity [
To assist health care professionals in understanding the fall-risk score, we suggest a cut-off value. In a precision-sensitivity study, a cut-off value of 27.5 points (SD 4.5) was shown to offer the best combination of sensitivity and specificity. Thus, seniors with a score higher than 27.5 points (SD 4.5) can be classified as having a high fall risk and should be prioritized in the implementation of prevention strategies. However, this cut-off value should be seen as merely a preliminary recommendation. Evaluations of larger sample sizes with prospective data may lead to further adjustments in the recommended cut-off score.
This study’s limitations arise from its retrospective case-control study design, which makes it potentially vulnerable to selection bias. The potential for recall bias should also be considered. Recall bias refers to the increased likelihood that fallers will recall and report the presence of risk factors, whereas nonfallers are less likely to report risk factors [
Moreover, there is a discussion in the fall-risk literature about the self-reporting of falls. One-year retrospective self-reporting of falls has been found to result in a slight underreporting [
The digital assessment of fall risk has the potential to objectify and improve fall-risk assessment and reduce the subjectivity introduced by human judgment due to biases, prior knowledge, experience, preferences, and limited capacities to absorb information.
Various researchers have concluded that the validity of current fall-risk assessment tools is not enough [
The descriptive statistics provided can be used as comparison and reference values for users of the fall-risk assessment app. The fall-risk score showed a high discriminative ability to distinguish fallers from nonfallers in all the evaluated models. On average, the models exhibited good accuracy, excellent sensitivity, moderate specificity, and good AUC values. The fall-risk assessment app has the potential to support nursing staff in performing valid, systematic, and objective fall-risk assessments that can be used to identify relevant risk factors and implement multifactorial prevention strategies. The fall-risk score’s predictive validity will be further validated in future prospective trials, including larger sample sizes based on a growing real-world database.
Example prevention plan.
Details of model-based statistics.
area under the curve
receiver operating characteristic
St. Thomas's Risk Assessment Tool In Falling Elderly Inpatients
The authors would like to thank Hannah Scheiffele and Keri Hartman for proofreading and linguistic advice.
SR, AA, and SM are employees of Lindera.