ERGO: Entropy-guided Resetting for Generation Optimization in Multi-turn Language Models

Khalid, Haziq Mohammad; Jeyaganthan, Athikash; Do, Timothy; Fu, Yicheng; O'Brien, Sean; Sharma, Vasu; Zhu, Kevin

↻ ERGO: Entropy-guided Resetting for Generation Optimization in Multi-turn Language Models

Haziq Mohammad Khalid^*, Athikash Jeyaganthan, Timothy Do, Yicheng Fu, Sean O'Brien, Vasu Sharma, Kevin Zhu

Algoverse AI Research
Second Workshop on Uncertainty-aware NLP @ EMNLP 25'
Also presented at MTI-LLM @ NeurIPS 25'
^*Indicates Lead Author

Paper Code arXiv

ERGO is a model-agnostic inference-time framework that preserves LLM performance in multi-turn dialogue by monitoring token-level entropy and, upon detecting spikes, rewriting degraded context into a distilled, noise-reduced representation.

Abstract

Large Language Models (LLMs) suffer significant performance degradation in multi-turn conversations when information is presented incrementally. Given that multi-turn conversations characterize everyday interactions with LLMs, this degradation poses a severe challenge to real world usability. We hypothesize that abrupt increases in model uncertainty signal misalignment in multi-turn LLM interactions, and we exploit this insight to dynamically realign conversational context. We introduce ERGO (Entropy-guided Resetting for Generation Optimization), which continuously quantifies internal uncertainty via Shannon entropy over next token distributions and triggers adaptive prompt consolidation when a sharp spike in entropy is detected. By treating uncertainty as a first class signal rather than a nuisance to eliminate, ERGO embraces variability in language and modeling, representing and responding to uncertainty. In multi‐turn tasks with incrementally revealed instructions, ERGO yields a 56.6% average performance gain over standard baselines, increases aptitude (peak performance capability) by 24.7%, and decreases unreliability (variability in performance) by 35.3%, demonstrating that uncertainty aware interventions can improve both accuracy and reliability in conversational AI.

Average Performance Results

Model	FULL	SHARDED	ERGO	Relative Improvement
GPT-4o	79.2	51.4	74.1	+44.2%
GPT-4.1	83.6	56.6	77.0	+36.0%
GPT-4o-mini	73.8	44.3	71.8	+62.1%
Phi-4	64.6	36.4	59.2	+62.6%
LLaMA-3.1-8B	46.0	28.7	50.9	+77.4%

More Key Results

Average Performance Gain	Peak Capability Increase	Decrease in Unreliability
56.6%	24.7%	35.3%

BibTeX

@inproceedings{mohammad-khalid-etal-2025-ergo,
    title = "{ERGO}: Entropy-guided Resetting for Generation Optimization in Multi-turn Language Models",
    author = "Mohammad Khalid, Haziq  and
      Jeyaganthan, Athikash  and
      Do, Timothy  and
      Fu, Yicheng  and
      Sharma, Vasu  and
      O{'}Brien, Sean  and
      Zhu, Kevin",
    editor = "Noidea, Noidea",
    booktitle = "Proceedings of the 2nd Workshop on Uncertainty-Aware NLP (UncertaiNLP 2025)",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.uncertainlp-main.23/",
    pages = "273--286",
    ISBN = "979-8-89176-349-4"
}