Search Articles

View query in Help articles search

Search Results (1 to 10 of 348 Results)

Download search results: CSV END BibTex RIS


Token Probabilities to Mitigate Large Language Models Overconfidence in Answering Medical Questions: Quantitative Study

Token Probabilities to Mitigate Large Language Models Overconfidence in Answering Medical Questions: Quantitative Study

Since the public release of Open AI’s Chat GPT in 2023, use cases in medicine have flourished, from extracting information from large volumes of documents [4] to answering questions of patients [5]. While some models are getting bigger and more capable (Open AI’s o1, Google’s Gemini 2.0, and Anthropic’s Claude), others are focusing on data privacy and portability (Mistral Small, Meta Llama, and Microsoft Phi Mini).

Raphaël Bentegeac, Bastien Le Guellec, Grégory Kuchcinski, Philippe Amouyel, Aghiles Hamroun

J Med Internet Res 2025;27:e64348


Evaluating ChatGPT’s Utility in Biologic Therapy for Systemic Lupus Erythematosus: Comparative Study of ChatGPT and Google Web Search

Evaluating ChatGPT’s Utility in Biologic Therapy for Systemic Lupus Erythematosus: Comparative Study of ChatGPT and Google Web Search

Although Chat GPT was originally developed neither for the health care domain [3] nor explicitly for answering medical questions [4], its content generation potential in the health care field is particularly noteworthy. Studies have found that Chat GPT plays a positive role in helping users gain health knowledge and answering medical inquiries [5,6]. Ayik et al [7] and Javaid et al [8] have indicated that Chat GPT can assist users in answering common questions in the health care domain.

Kai Li, Yunfei Peng, Luyi Li, Bo Liu, Zhijian Huang

JMIR Form Res 2025;9:e76458


Evaluating the Quality and Understandability of Radiology Report Summaries Generated by ChatGPT: Survey Study

Evaluating the Quality and Understandability of Radiology Report Summaries Generated by ChatGPT: Survey Study

Recent improvements in artificial intelligence (AI), particularly large language models (LLMs) such as Chat GPT (Open AI), have emerged as a promising solution for generating accessible and context-appropriate textual summaries [2]. These models show promise in efficiently processing complex medical information and producing coherent summaries that minimize technical jargon while preserving essential clinical content [18,19].

Alexis Sunshine, Grace H Honce, Andrew L Callen, David A Zander, Jody L Tanabe, Samantha L Pisani Petrucci, Chen-Tan Lin, Justin M Honce

JMIR Form Res 2025;9:e76097


Performance of DeepSeek and GPT Models on Pediatric Board Preparation Questions: Comparative Evaluation

Performance of DeepSeek and GPT Models on Pediatric Board Preparation Questions: Comparative Evaluation

This study evaluates the performance of 3 leading LLMs (Deep Seek-R1 [Deep Seek AI, 2024], Chat GPT-4 [Open AI, 2023], and Chat GPT-4.5 [Open AI, 2024]) on a set of 2023 pediatric board examination preparation questions (2023 PREP Self-Assessment, American Academy of Pediatrics), a comprehensive resource containing case-based multiple-choice questions designed to simulate actual board examinations [3].

Masab Mansoor, Andrew Ibrahim, Ali Hamide

JMIR AI 2025;4:e76056


Identification and Categorization of the Top 100 Articles and the Future of Large Language Models: Thematic Analysis Using Bibliometric Analysis

Identification and Categorization of the Top 100 Articles and the Future of Large Language Models: Thematic Analysis Using Bibliometric Analysis

Studies focused on just Chat GPT rather than LLMs as a whole and identified the most influential authors and countries for research on Chat GPT and tracing the rapid evolution of Chat GPT scholarship [5]. More recently, a bibliometric analysis in 2025 similarly identified the most productive institutions, in addition to countries and authors [6].

Ethan Bernstein, Anya Ramsamooj, Kelsey L Millar, Zachary C Lum

JMIR AI 2025;4:e68603


Placebo, Nocebo, and Machine Learning: How Generative AI Could Shape Patient Perception in Mental Health Care

Placebo, Nocebo, and Machine Learning: How Generative AI Could Shape Patient Perception in Mental Health Care

Surprisingly, Chat GPT-4.0 not only proved difficult to distinguish from human therapists but was also rated higher on core therapeutic principles [19]. In one blinded experiment, physicians rated Chat GPT as 10 times more empathetic in written responses to patients’ queries on an online social media platform [17]. On the flip side, Gen AI could also amplify nocebo effects by augmenting negative patient expectations.

Charlotte Blease

JMIR Ment Health 2025;12:e78663


Exploring Young Adults' Experiences and Beliefs in Asthma Medication Management: Pilot Qualitative Study Comparing Human and Multiple AI Thematic Analysis

Exploring Young Adults' Experiences and Beliefs in Asthma Medication Management: Pilot Qualitative Study Comparing Human and Multiple AI Thematic Analysis

After completing human analysis, the investigators performed thematic analysis with multiple AI platforms (Google Gemini, Microsoft Copilot, and Open AI’s Chat GPT) to compare the final themes with investigator-derived themes. Specifically, Open AI’s Chat GPT, Microsoft Co Pilot, and Google Gemini were provided the following prompts: “Please read through the following transcripts and perform a thematic analysis. First, generate codes and then categorize them into emergent themes.

Ruth Ndarake Jeminiwa, Caroline Popielaski, Amber King

JMIR Form Res 2025;9:e69892


Generative Artificial Intelligence Tools in Medical Research (GAMER): Protocol for a Scoping Review and Development of Reporting Guidelines

Generative Artificial Intelligence Tools in Medical Research (GAMER): Protocol for a Scoping Review and Development of Reporting Guidelines

We searched Pub Med, Web of Science, Embase, CINAHL, Psyc INFO, and the first 200 results of Google Scholar using keywords such as “generative AI,” “chatbots,” “Chat GPT,” “large language model,” and “reporting guidelines.” We included existing AI-related reporting guidelines that address the use of Gen AI tools in medical research. Studies were eligible if they focused on the application of Gen AI tools in a medical context and provided reporting recommendations or considerations.

Xufei Luo, Yih Chung Tham, Mohammad Daher, Zhaoxiang Bian, Yaolong Chen, Janne Estill, GAMER Working Group

JMIR Res Protoc 2025;14:e64640


Assessing the Role of Large Language Models Between ChatGPT and DeepSeek in Asthma Education for Bilingual Individuals: Comparative Study

Assessing the Role of Large Language Models Between ChatGPT and DeepSeek in Asthma Education for Bilingual Individuals: Comparative Study

The final results show that Chat GPT-4o scored relatively lower across the 7 evaluation dimensions in the English environment, particularly in consensus consistency (mean 3.92, SD 0.27) and completeness (mean 3.85, SD 0.41). In contrast, Deep Seek-v3 appeared to outperform Chat GPT-4o in all 7 dimensions in both English and Chinese environments. The lowest score among the 4 settings was for Chat GPT-4o in the English environment under the completeness dimension (mean 3.85, SD 0.41).

Yaxin Liu, Fangfei Yu, Xiaofei Zhang, Xiaohan Tong, Kui Li, Weikuan Gu, Baiquan Yu

JMIR Med Inform 2025;13:e65365


Using Artificial Intelligence ChatGPT to Access Medical Information About Chemical Eye Injuries: Comparative Study

Using Artificial Intelligence ChatGPT to Access Medical Information About Chemical Eye Injuries: Comparative Study

Since its release, Chat GPT has gained significant popularity and is rapidly becoming a common tool for individuals to seek various types of information via the web [7]. Currently, many studies are assessing the capabilities and potential applications of this chatbot, but the reliability of the information provided by Chat GPT still requires validation.

Layan Yousef Alharbi, Rema Rashed Alrashoud, Bader Shabib Alotaibi, Abdulaziz Meshal Al Dera, Raghad Saleh Alajlan, Reem Rashed AlHuthail, Dalal Ibrahim Alessa

JMIR Form Res 2025;9:e73642