Dual-Path Disagreement Resolution: Teaching Small Language Models to Argue and Self-Adjudicate via QLoRA Fine-Tuning
By Gibran Khan Tareen, Mir Farhan Nawaz
We introduce a dual-path disagreement resolution schema: training compact models to generate competing answers, identify the logical conflict between them, and self-adjudicate to a final justified conclusion.
Abstract
We fine-tune Qwen2.5-0.5B-Instruct via QLoRA on a five-field dual-path disagreement schema, training the model to generate two competing answers, explicitly identify the logical conflict between them, and resolve it into a final justified conclusion. The model achieves 100% dual-path format adherence on held-out evaluation prompts. An ablation study confirms that the disagreement identification step is a functional reasoning scaffold, not a formatting detail, whose removal measurably degrades output quality despite producing a lower training loss.
Key Findings
- 100% structural adherence on held-out evaluation prompts, verified through systematic ablation
- Dual-path disagreement identification is a functional reasoning scaffold, not just formatting
- Removing the disagreement step degrades output quality despite producing lower training loss
- Demonstrates meaningful reasoning behaviour in sub-1B models with minimal compute
- HAI-DualPath-0.5B released open-source on HuggingFace