Segment-Level Diffusion: A Framework for Controllable Long-Form Generation with Diffusion Language Models

Diffusion models have shown promise in text generation, but often struggle with generating long, coherent, and contextually accurate text. Token-level diffusion doesn't model word-order dependencies explicitly and operates on short, fixed output windows, while passage-level diffusion struggles with learning robust representations for long-form text. To address these challenges, we propose Segment-Level Diffusion (SLD), a framework that enhances diffusion-based text generation through text segmentation, robust representation training with adversarial and contrastive learning, and improved latent-space guidance. By segmenting long-form outputs into multiple latent representations and decoding them with an autoregressive decoder, SLD simplifies diffusion predictions and improves scalability. Experiments on four datasets demonstrate that, when compared to other diffusion and autoregressive baselines SLD achieves competitive or superior fluency, coherence, and contextual compatibility in automatic and human evaluations.

Keywords

47 Language, Communication and Culture, 4704 Linguistics

Journal Title

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Conference Name

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Journal ISSN

0736-587X

Volume Title

1

Publisher

Association for Computational Linguistics (ACL)

Publisher DOI

https://doi.org/10.18653/v1/2025.acl-long.210

Rights and licensing

Except where otherwised noted, this item's license is described as Attribution 4.0 International

Sponsorship

European Research Council (GA 865958)

Collections

University of Cambridge Research Outputs (Articles and Conferences)