Re-ttention: Ultra Sparse Visual Generation via Attention Statistical Reshape

1Department of Electrical and Computer Engineering, University of Alberta
2Division of Computer Science and Engineering, Louisiana State University
3Huawei Technologies, Canada
Proceedings of the 39th Annual Conference on Neural Information Processing Systems (NeurIPS'25)

We introduce Re-ttention, a training-free ultra-sparse attention reshape technique for T2V and T2I DiT that allows us to recover the full attention distribution even under extreme sparsity. By reshaping sparse attention with cached statistics across denoising steps, Re-ttention preserves full-attention visual quality for both T2V and T2I generation while skipping over 95% of attention computation.

Abstract

Diffusion Transformers (DiT) have become the de-facto model for generating high-quality visual content like videos and images. A huge bottleneck is the attention mechanism where complexity scales quadratically with resolution and video length. One logical way to lessen this burden is sparse attention, where only a subset of tokens or patches are included in the calculation. However, existing techniques fail to preserve visual quality at extremely high sparsity levels and might even incur non-negligible compute overheads. To address this concern, we propose Re-ttention, which implements very high sparse attention for visual generation models by leveraging the temporal redundancy of Diffusion Models to overcome the probabilistic normalization shift within the attention mechanism. Specifically, Re-ttention reshapes attention scores based on the prior softmax distribution history in order to preserve the visual quality of the full quadratic attention at very high sparsity levels. Experimental results on T2V/T2I models such as CogVideoX and the PixArt DiTs demonstrate that Re-ttention requires as few as 3.1% of the tokens during inference, outperforming contemporary methods like FastDiTAttn, Sparse VideoGen and MInference.

We highlight the core mechanism behind Re-ttention: by reusing the stable ratio between sparse and full softmax denominators and caching the residual contributions from masked-out tokens, Re-ttention reconstructs the full attention distribution at ultra-high sparsity.

BibTeX

@inproceedings{chen2025rettention,
        title = {Re-ttention: Ultra Sparse Visual Generation via Attention Statistical Reshape},
        author = {Chen, Ruichen and Mills, Keith G., and Jiang, Liyao and Gao, Chao and Niu, Di},
        booktitle={Advances in Neural Information Processing Systems},
        year={2025}
      }