What the post starts from
Queue rewrite result
Queue rewrite cut p95 publish latency from 11.2s to 6.9s. Biggest gain: fewer duplicate retries.
Incident write-up
Retries amplified backlog under load and masked the real bottleneck during peak queue pressure.
Latency chart
p95 latency before and after the queue rewrite across a 14-day window, with failure spikes removed.