LLMQLoRAReasoningFine-Tuning2026

Dual-Path Disagreement Resolution: Teaching Small Language Models to Argue and Self-Adjudicate via QLoRA Fine-Tuning

By Gibran Khan Tareen, Mir Farhan Nawaz

We introduce a dual-path disagreement resolution schema: training compact models to generate competing answers, identify the logical conflict between them, and self-adjudicate to a final justified conclusion.

Abstract

We fine-tune Qwen2.5-0.5B-Instruct via QLoRA on a five-field dual-path disagreement schema, training the model to generate two competing answers, explicitly identify the logical conflict between them, and resolve it into a final justified conclusion. The model achieves 100% dual-path format adherence on held-out evaluation prompts. An ablation study confirms that the disagreement identification step is a functional reasoning scaffold, not a formatting detail, whose removal measurably degrades output quality despite producing a lower training loss.

Key Findings

100% structural adherence on held-out evaluation prompts, verified through systematic ablation
Dual-path disagreement identification is a functional reasoning scaffold, not just formatting
Removing the disagreement step degrades output quality despite producing lower training loss
Demonstrates meaningful reasoning behaviour in sub-1B models with minimal compute
HAI-DualPath-0.5B released open-source on HuggingFace

View on ResearchGate HuggingFace Model