Structured Clinical Differential Reasoning in Small Language Models: A Four-Tier Schema Approach via QLoRA Fine-Tuning
By Gibran Khan Tareen, Mir Farhan Nawaz
Introduces HAI-DiffDx-0.5B, a QLoRA fine-tune of Qwen2.5-0.5B-Instruct trained to generate structured four-tier clinical differential diagnoses from natural language symptom descriptions, operationalising the clinical safety principle that high-stakes diagnoses must be actively considered even when statistically improbable.
Abstract
This paper introduces HAI-DiffDx-0.5B, a QLoRA fine-tune of Qwen2.5-0.5B-Instruct trained to generate structured four-tier clinical differential diagnoses from natural language symptom descriptions. We propose a five-field training schema that explicitly includes an unlikely but serious tier, operationalising the clinical safety principle that high-stakes diagnoses must be actively considered even when statistically improbable. We construct a curated dataset of 250 clinically validated examples spanning 10 medical domains including cardiac, neurological, abdominal, respiratory, infectious disease, dermatological, musculoskeletal, paediatric, endocrine, and psychiatric presentations. Training runs entirely on a consumer-grade NVIDIA GTX 1650 (4 GB VRAM) in under 35 minutes via QLoRA 4-bit NF4 quantisation. The fine-tuned model achieves 56% schema adherence on held-out test cases compared to 4% for the untuned base model.
Key Findings
- 56% schema adherence on held-out test cases vs. 4% for the untuned base model
- Unlikely but serious field is learnable at sub-1B scale with minimal structural cost
- Model correctly surfaces dangerous diagnoses such as subarachnoid haemorrhage even when the primary diagnosis is incorrect
- Recommended workup field functions as a generation stability anchor per ablation study
- Trained on consumer-grade NVIDIA GTX 1650 (4 GB VRAM) in under 35 minutes
- HAI-DiffDx-0.5B and dataset released openly on HuggingFace