Search Articles

View query in Help articles search

Search Results (1 to 10 of 147 Results)

Download search results: CSV END BibTex RIS


Challenges in Implementing Artificial Intelligence in Breast Cancer Screening Programs: Systematic Review and Framework for Safe Adoption

Challenges in Implementing Artificial Intelligence in Breast Cancer Screening Programs: Systematic Review and Framework for Safe Adoption

Artificial intelligence (AI) presents a solution by automating and streamlining these processes, potentially augmenting both efficiency and accuracy. However, the adoption of AI in breast cancer screening is not without challenges. Although there are over 20 Food and Drug Administration (FDA)–approved AI applications for breast imaging, their adoption and utilization in clinical settings remain highly variable and generally low [6].

Serene Goh, Rachel Sze Jen Goh, Bryan Chong, Qin Xiang Ng, Gerald Choon Huat Koh, Kee Yuan Ngiam, Mikael Hartman

J Med Internet Res 2025;27:e62941

Use of Retrieval-Augmented Large Language Model for COVID-19 Fact-Checking: Development and Usability Study

Use of Retrieval-Augmented Large Language Model for COVID-19 Fact-Checking: Development and Usability Study

Retrieval-augmented generation (RAG) is a state-of-the-art technique that enhances LLMs by integrating external data retrieval, improving factual accuracy, and reducing costs [13]. By retrieving relevant information from external sources and incorporating it as contextual input, RAG effectively mitigates the issue of hallucinations in LLMs [14].

Hai Li, Jingyi Huang, Mengmeng Ji, Yuyi Yang, Ruopeng An

J Med Internet Res 2025;27:e66098

Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis

Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis

Accuracy for objective questions was calculated as the number of correctly answered questions divided by the total number of questions. For diagnosis and classification, accuracy was defined as the number of cases correctly diagnosed or triaged divided by the total number of cases. Specifically for open-ended questions, accuracy was determined based on the number of questions rated “good” or “accurate” on the accuracy scale divided by the total number of questions.

Ling Wang, Jinglin Li, Boyang Zhuang, Shasha Huang, Meilin Fang, Cunze Wang, Wen Li, Mohan Zhang, Shurong Gong

J Med Internet Res 2025;27:e64486

Comparing Diagnostic Accuracy of Clinical Professionals and Large Language Models: Systematic Review and Meta-Analysis

Comparing Diagnostic Accuracy of Clinical Professionals and Large Language Models: Systematic Review and Meta-Analysis

Therefore, this study aims to comprehensively evaluate the performance and accuracy of LLMs in clinical diagnosis, providing references for their clinical application. This systematic review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-Analysis of Diagnostic Test Accuracy Studies (PRISMA-DTA) statement [7]. Specific details can be found in Checklist 1.

Guxue Shan, Xiaonan Chen, Chen Wang, Li Liu, Yuanjing Gu, Huiping Jiang, Tingqi Shi

JMIR Med Inform 2025;13:e64963

Assessing the Quality and Reliability of ChatGPT’s Responses to Radiotherapy-Related Patient Queries: Comparative Study With GPT-3.5 and GPT-4

Assessing the Quality and Reliability of ChatGPT’s Responses to Radiotherapy-Related Patient Queries: Comparative Study With GPT-3.5 and GPT-4

However, despite being one of the most favored informational modalities, websites often require more content accuracy and better readability [1]. Recently, artificial intelligence (AI)–powered chatbots such as Chat GPT have signified a potential paradigm shift in how patients with cancer can access a vast amount of medical information [1,3,4].

Ana Grilo, Catarina Marques, Maria Corte-Real, Elisabete Carolino, Marco Caetano

JMIR Cancer 2025;11:e63677

Understanding the Relationship Between Ecological Momentary Assessment Methods, Sensed Behavior, and Responsiveness: Cross-Study Analysis

Understanding the Relationship Between Ecological Momentary Assessment Methods, Sensed Behavior, and Responsiveness: Cross-Study Analysis

Despite these advantages, EMA implementation faces challenges, especially in the variability, completeness, and accuracy of participant responses to prompts. Factors such as distraction, self-awareness, boredom, time of day, and interruption burden [11] can impact participant responses. Addressing these issues is essential for maintaining the integrity of research findings. Furthermore, the design of notification strategies may dramatically impact response compliance and quality [12,13].

Diane Cook, Aiden Walker, Bryan Minor, Catherine Luna, Sarah Tomaszewski Farias, Lisa Wiese, Raven Weaver, Maureen Schmitter-Edgecombe

JMIR Mhealth Uhealth 2025;13:e57018

Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study

Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study

Therefore, a comprehensive evaluation of chatbots’ reliability and accuracy in addressing medical inquiries is essential to ensure their effective application in managing diseases like OMG [16]. Recent studies have explored the application of LLMs in ophthalmology. Jaskari et al [17] introduced a model named DR-GPT, designed to analyze fundus images, demonstrating that LLMs can be applied to unstructured medical report databases to aid in classifying diabetic retinopathy.

Bin Wei, Lili Yao, Xin Hu, Yuxiang Hu, Jie Rao, Yu Ji, Zhuoer Dong, Yichong Duan, Xiaorong Wu

J Med Internet Res 2025;27:e67883

Wrist-Worn and Arm-Worn Wearables for Monitoring Heart Rate During Sedentary and Light-to-Vigorous Physical Activities: Device Validation Study

Wrist-Worn and Arm-Worn Wearables for Monitoring Heart Rate During Sedentary and Light-to-Vigorous Physical Activities: Device Validation Study

Moreover, mean absolute error, mean absolute percentage error (MAPE), 5% accuracy (percentage of MAPE within a 5% range of the reference value), root-mean-squared error (RMSE), and ordinary least squares linear regression were used to evaluate accuracy.

Theresa Schweizer, Rahel Gilgen-Ammann

JMIR Cardio 2025;9:e67110

Synthetic Data-Driven Approaches for Chinese Medical Abstract Sentence Classification: Computational Study

Synthetic Data-Driven Approaches for Chinese Medical Abstract Sentence Classification: Computational Study

Notably, when trained on dataset #1, the SBERT-Doc SCAN algorithm emerges as the leading performer, securing an accuracy and F1-score of 0.8985 on the test dataset. This standout performance highlights the algorithm’s capability to classify medical domain data with high precision. Additionally, the SBERT-MEC algorithm also displays comparable performance on the same dataset, with an accuracy and F1-score of 0.8938, making it the second most effective algorithm in our evaluation.

Jiajia Li, Zikai Wang, Longxuan Yu, Hui Liu, Haitao Song

JMIR Form Res 2025;9:e54803