Search Articles

View query in Help articles search

Search Results (1 to 10 of 153 Results)

Download search results: CSV END BibTex RIS


Token Probabilities to Mitigate Large Language Models Overconfidence in Answering Medical Questions: Quantitative Study

Token Probabilities to Mitigate Large Language Models Overconfidence in Answering Medical Questions: Quantitative Study

True positive rate, true negative rate, as well as accuracy rates above and below optimal discrimination threshold were estimated with 95% CIs and compared using Mc Nemar tests.

Raphaël Bentegeac, Bastien Le Guellec, Grégory Kuchcinski, Philippe Amouyel, Aghiles Hamroun

J Med Internet Res 2025;27:e64348


Performance of Open-Source Large Language Models in Psychiatry: Usability Study Through Comparative Analysis of Non-English Records and English Translations

Performance of Open-Source Large Language Models in Psychiatry: Usability Study Through Comparative Analysis of Non-English Records and English Translations

Diagnostic accuracy was also evaluated by comparing the ground truth with diagnostic impressions provided by the model. Top-1 and top-2 diagnostic accuracy were calculated for both the Korean and English versions of the psychiatric notes. To further examine whether translation errors affected diagnostic performance, we divided the 200 translated notes into two groups based on translation quality.

Min-Gyu Kim, Gyubeom Hwang, Junhyuk Chang, Seheon Chang, Hyun Woong Roh, Rae Woong Park

J Med Internet Res 2025;27:e69857


Automatic Image Recognition Meal Reporting Among Young Adults: Randomized Controlled Trial

Automatic Image Recognition Meal Reporting Among Young Adults: Randomized Controlled Trial

While the existing design was shown to be positive in terms of accuracy and was generally well received by users, there were concerns regarding the accuracy and time-consuming nature of completing meal reporting for an entire meal. Furthermore, in authentic dietary intake scenarios, voice reporting during meal consumption was not always convenient. Consequently, we developed the latest version to enhance the existing design.

Prasan Kumar Sahoo, Sherry Yueh-Hsia Chiu, Yu-Sheng Lin, Chien-Hung Chen, Denisa Irianti, Hsin-Yun Chen, Mekhla Sarkar, Ying-Chieh Liu

JMIR Mhealth Uhealth 2025;13:e60070


Deep Learning Multi-Modal Melanoma Detection: Algorithm Development and Validation

Deep Learning Multi-Modal Melanoma Detection: Algorithm Development and Validation

The best-performing model was a combination of Res Net-50 and Inception V3, with an accuracy of 80%. Most of these approaches aim to optimize models through transfer learning and various preprocessing techniques in an attempt to increase accuracy.

Nithika Vivek, Karthik Ramesh

JMIR AI 2025;4:e66561


Proposal for Using AI to Assess Clinical Data Integrity and Generate Metadata: Algorithm Development and Validation

Proposal for Using AI to Assess Clinical Data Integrity and Generate Metadata: Algorithm Development and Validation

XGB achieved the highest overall performance, with an accuracy of 84.7% and an AUC-ROC score of 84.6%. Its F1-score of 84.0% and precision of 83.9% demonstrate its ability to consistently deliver high-accuracy predictions while minimizing false positives. The SVM achieved an accuracy of 73.0%, comparable to that of LR, but it demonstrated an improvement in the AUC-ROC score of 65.7%. Its F1-score of 67.1% reflects a slight enhancement in predictive balance.

Caroline Bönisch, Christian Schmidt, Dorothea Kesztyüs, Hans A Kestler, Tibor Kesztyüs

JMIR Med Inform 2025;13:e60204


Challenges in Implementing Artificial Intelligence in Breast Cancer Screening Programs: Systematic Review and Framework for Safe Adoption

Challenges in Implementing Artificial Intelligence in Breast Cancer Screening Programs: Systematic Review and Framework for Safe Adoption

Artificial intelligence (AI) presents a solution by automating and streamlining these processes, potentially augmenting both efficiency and accuracy. However, the adoption of AI in breast cancer screening is not without challenges. Although there are over 20 Food and Drug Administration (FDA)–approved AI applications for breast imaging, their adoption and utilization in clinical settings remain highly variable and generally low [6].

Serene Goh, Rachel Sze Jen Goh, Bryan Chong, Qin Xiang Ng, Gerald Choon Huat Koh, Kee Yuan Ngiam, Mikael Hartman

J Med Internet Res 2025;27:e62941


Use of Retrieval-Augmented Large Language Model for COVID-19 Fact-Checking: Development and Usability Study

Use of Retrieval-Augmented Large Language Model for COVID-19 Fact-Checking: Development and Usability Study

Retrieval-augmented generation (RAG) is a state-of-the-art technique that enhances LLMs by integrating external data retrieval, improving factual accuracy, and reducing costs [13]. By retrieving relevant information from external sources and incorporating it as contextual input, RAG effectively mitigates the issue of hallucinations in LLMs [14].

Hai Li, Jingyi Huang, Mengmeng Ji, Yuyi Yang, Ruopeng An

J Med Internet Res 2025;27:e66098


Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis

Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis

Accuracy for objective questions was calculated as the number of correctly answered questions divided by the total number of questions. For diagnosis and classification, accuracy was defined as the number of cases correctly diagnosed or triaged divided by the total number of cases. Specifically for open-ended questions, accuracy was determined based on the number of questions rated “good” or “accurate” on the accuracy scale divided by the total number of questions.

Ling Wang, Jinglin Li, Boyang Zhuang, Shasha Huang, Meilin Fang, Cunze Wang, Wen Li, Mohan Zhang, Shurong Gong

J Med Internet Res 2025;27:e64486