Home
Blog
Blog
Research, Announcements, and Editorials.
Nov 11, 2025
Elastic Speculation: Adaptive Draft Length and Confidence-Based Early Exit
Ben Zhao · Roy Zhao · Justin Huang
Elastic speculation delivers 30–50% lower latency and up to ~50% fewer speculative KV writes in our experiments, while preserving output quality.