LabQAR: A Manually Curated Dataset for Question Answering on Laboratory Test Reference Ranges and Interpretation

Balu Bhasuran
Qiao Jin
Angelique Deville
Yonghui Wu
Karim Hanna
Zhiyong Lu
Zhe He

0 evaluations Published on Jun 3, 2025

This article on Sciety

Abstract

Laboratory tests are crucial for diagnosing and managing health conditions, providing essential reference ranges for result interpretation. The diversity of lab tests, influenced by variables like the specimen type (e.g., blood, urine), gender, age-specific, and other influencing factors such as pregnancy, makes automated interpretation challenging. Automated clinical decision support systems attempting to interpret these values must account for such nuances to avoid misdiagnoses or incorrect clinical decisions. In this regard, we presentLabQAR(LaboratoryQuestionAnswering withReference Ranges), a manually curated dataset comprising 550 lab test reference ranges derived from authoritative medical sources, encompassing 363 unique lab tests and including multiple-choice questions with annotations on reference ranges, specimen types, and other factors impacting interpretation. We also assess the performance of several large language models (LLMs), including LLaMA 3.1, GatorTronGPT, GPT-3.5, GPT-4, and GPT-4o, in predicting reference ranges and classifying results as normal, low, or high. The findings indicate that GPT-4o outperforms other models, showcasing the potential of LLMs in clinical decision support.

Related articles are currently not available for this article.