Applications: Domain-Specific DLM Use Cases

Diffusion Language Models (DLM) bring the framework established in image generation over to discrete sequences. Their structural properties — parallelism, bidirectionality, iterative refinement, and the naturalness of editing — show real value precisely in domains where Autoregressive (AR) Large Language Models (LLM) struggle. In this chapter, taking §7 of the survey (Li et al. 2025) as a backbone, we organize DLM applications into four areas: (1) code generation, (2) biology and science, (3) robotics (Vision-Language-Action, VLA), and (4) conventional Natural Language Processing (NLP).

The lineage of applications largely divides into two strands. The first takes general-purpose DLMs such as LLaDA (Nie et al. 2025) or Dream (Ye et al. 2025) as a base and applies fine-tuning or Reinforcement Learning (RL) on top; recent work in VLA and code generation follows this mainline. The second trains domain-specific DLMs from scratch, which is where the protein and small-molecule work sits. Both share the property of structurally exploiting at least one of the following: generation under partial constraints (infilling, motif scaffolding), throughput from parallel inference, or error correction via iterative refinement — each of which is hard to achieve in AR.

Code Generation

Code carries strong syntactic constraints and long-range dependencies, and rewrites and completions occur frequently. Unlike the left-to-right causality of natural language, non-sequential editing — writing a reference before its function definition, fixing a function body to match a later return type — is intrinsically necessary. The global planning and iterative refinement of DLMs are well aligned with this property, and several DLMs that match or exceed AR scores have appeared recently.

DiffuCoder: a dedicated 7B masked DLM

DiffuCoder (S. Gong et al. 2025) is a 7B masked DLM trained specifically for code generation. The paper systematically analyzes DLM behavior on code generation and offers the following observations.

  • Flexibility of generation order: As temperature rises, the order of fixing drifts away from left-to-right and a more lateral generation trajectory emerges. In AR, raising temperature still leaves positional order fixed left-to-right, whereas in DLMs temperature alters the order of fixing itself
  • Coupled-GRPO: A novel sampling scheme that constructs masked-noise candidates complementarily during training. By running two forward passes over the same sequence with different mask patterns, the variance of Group Relative Policy Optimization (GRPO) is suppressed, yielding clear improvements on HumanEval and MBPP

DiffuCoder is the first full-scale demonstration that DLM-specific post-training recipes should be optimized as something distinct from AR-style RL recipes. For RL details, see Post-training (RL).

DCoLT: outcome-based RL for stronger reasoning

DCoLT (Huang et al. 2025) views the entire reverse diffusion process as non-linear lateral thinking and combines outcome-based RL (using only the final reward) with an unmasking policy module. With LLaDA as a base, it improves HumanEval by +19.5 on a code task, reaching territory hard to attain with AR.

The important implication of DCoLT is that, in DLM RL, the trajectory of “which positions were unmasked at which steps” can itself be treated as a policy. While AR RL struggles with credit assignment for a single generation trajectory, DLMs pass through multiple intermediate states, giving rise to a new design dimension — at which stage of iterative refinement to attribute the reward.

DUS: inference-only dilated unmasking

The Dilated Unmasking Scheduler (DUS) (Luxembourg et al. 2025) is an inference-only method that requires no additional training. At each denoising step, it unmasks non-adjacent positions chosen so as to minimize an upper bound on the joint entropy gain.

  • Planner-free (no external planner network required)
  • Improves the speed-quality trade-off in code generation
  • Drops in on top of existing DLMs such as DiffuCoder

The motivation behind DUS is simple: simultaneously unmasking adjacent positions creates inter-dependencies strong enough to propagate errors, so it is entropy-wise safer to simultaneously fix distant positions with weaker correlation.

Mercury Coder: a commercial-class DLM

Mercury Coder (Labs et al. 2025) is a commercial DLM by Inception Labs that demonstrates the throughput advantage of DLMs in code generation.

  • On major code benchmarks (HumanEval, MBPP, etc.), up to 10× the throughput of speed-optimized AR models
  • Quality stays in a comparable range
  • Provided commercially as an API

Mercury — alongside Gemini Diffusion (Google DeepMind 2024) and Seed Diffusion (Song et al. 2025) — marks the transition of DLMs from research stage to commercial product. Code generation in particular is a domain where low latency and the generation of many completion candidates translate directly into value, so DLM parallelism feeds straight into product differentiation.

Biological and Scientific Applications

Biopolymers such as proteins, DNA, and small molecules behave according to global structure rather than local sequential order. The major tasks — motif scaffolding (designing the remainder given a functional motif), conditional folding (generating the rest conditioned on a partial sequence), and inverse folding (recovering a sequence from a structure) — are all forms of generating the remainder under partial observation, and align naturally with the infilling formulation of masked DLMs. The artificial choice of sequence order that AR requires can be structurally avoided in DLMs.

Protein language diffusion: the DPLM family

DPLM (X. Wang et al. 2024a) is a masked diffusion language model for protein sequences that achieves both generation and representation learning. The traditional dichotomy in which masked language model (MLM)-based protein models like ESM-2 are strong at representation while AR-style protein models are strong at generation is unified in a single stage by a DLM.

DPLM-2 (X. Wang et al. 2024b) is a multimodal extension of DPLM that discretely tokenizes 3D structural coordinates and enables joint generation of sequence and structure.

  • Sequence → structure (folding)
  • Structure → sequence (inverse folding)
  • Sequence + structure co-design

These are unified as a single model’s conditional infilling. This contrasts with AR, where the generation order between sequence and structure must be set artificially, making co-design intrinsically difficult.

MeMDLM (Goel et al. 2024) is a masked DLM built on top of ESM-2 and specialized for de novo design of transmembrane proteins. It is designed so that hydrophobic-pattern constraints characteristic of membrane proteins can be injected as sequence-level conditions into intermediate states of masked diffusion.

CFP-Gen (Yin et al. 2025) is a diffusion language model for Combinatorial Functional Protein generation that integrates multi-modal constraints over function, sequence, and structure. It achieves high success rates in multi-functional protein design and generates de novo sequences with activity comparable to natural proteins.

DSM (Hallee et al. 2025) applies LLaDA’s masked diffusion formulation to protein sequences, aiming — like DPLM — to combine generation and representation. The paper explicitly leaves LLaDA-inspired RL post-training as a direction for future extension.

Small-molecule generation: TransDLM and TGM-DLM

TransDLM (Xiong et al. 2024) addresses text-guided molecular optimization. The target property is described in natural language and used as a condition to edit existing molecules so as to satisfy the target. Doing the same in AR turns into a two-stage procedure — identify the edit site, then regenerate — which is prone to error propagation, whereas DLMs avoid this through simultaneous updates of masked regions.

TGM-DLM (H. Gong et al. 2024) is a text-guided molecule generation method that collectively and iteratively updates the token embeddings of SMILES strings, achieving generation performance that exceeds MolT5-Base without any additional data. Because SMILES grammar constraints (bracket matching, atom valence, etc.) act as long-range dependencies, bidirectional refinement is more advantageous than AR.

RL integration and special-purpose objectives: DRAKES, ForceGen

DRAKES (C. Wang et al. 2025) is an RL fine-tuning method for discrete diffusion models that backpropagates reward through discrete samples via the Gumbel-Softmax trick. The gap between continuous reward (binding affinity, functional activity, etc.) for DNA/protein design and discrete generated tokens is smoothly bridged by Gumbel-Softmax.

ForceGen (Ni et al. 2024) generates de novo proteins that satisfy non-linear targets in mechanical unfolding (maximum load, extension, etc.). It is a rare example that conditions protein language diffusion on a mechanical objective and directly optimizes mechanical properties in sequence space.

NoteWhy DLMs Are Structurally Advantageous in Biology

Motif scaffolding (fixing a known active site and designing the rest of the sequence) can be naturally written in a masked DLM as an initialization where specific positions are observed and the rest are [MASK]. Doing the same in AR requires either artificially designing a generation order that crosses the fixed region or implementing constrained decoding separately. Likewise, inverse folding (structure observation → sequence prediction) maps cleanly onto a formulation in which the full sequence is recovered by masked diffusion conditioned on structure.

Robotics (Vision-Language-Action)

Vision-Language-Action (VLA) models are a framework in which visual observation → linguistic reasoning → generation of an action token sequence is performed by a single model. Actions, once discretely tokenized (binarizing gripper open/close, bucketing joint angles, etc.), can be treated like language, and stacking these on top of an LLM or Vision-Language Model (VLM) has become the standard approach. The reasons DLMs are well-suited to VLA are as follows.

  • Long-horizon future prediction is parallelizable: Tens of steps of future actions can be iteratively refined in a single batch
  • Visual subgoals, chain-of-thought (CoT), and actions can be generated jointly: They are all solvable in parallel as a single [MASK] sequence
  • Efficient handling of observations via prefix attention: Placing visual observations on the prompt side makes the KV-cache effective
  • Opportunity for error correction: While AR cannot recover from a mistaken action, a DLM can look back at earlier actions in later stages and re-mask them

LLaDA-VLA: repurposing a general-purpose DLM for VLA

LLaDA-VLA (Y. Wen et al. 2025) is one of the earliest cases of fine-tuning LLaDA into a VLA task as a base. The two key tricks are:

  • Localized special-token classification: Since the action-token vocabulary is far smaller than the language vocabulary, classification over a limited vocabulary is performed only at action positions
  • Hierarchical action decoding: A hierarchy of high-level actions (move to / grasp, etc.) and low-level actions (concrete joint angles) is mapped onto the stages of iterative refinement

It surpasses AR VLA baselines (such as OpenVLA) on both simulation and real robots, demonstrating that general-purpose DLMs are a strong base for VLA.

dVLA: multimodal joint generation with MMaDA as backbone

dVLA (J. Wen et al. 2025) takes MMaDA (Yang et al. 2025), a multimodal diffusion foundation model, as its backbone and jointly generates three modalities — visual subgoal image, textual CoT, and discretized action — by joint diffusion.

  • Visual subgoal: predicted image a few steps ahead
  • Textual CoT: rationale for the action (“reach for cup because…”)
  • Action: concrete joint command

These are arranged in a single token sequence and the whole is generated by masked diffusion. With prefix-attention masking and KV caching, it achieves inference for long-horizon manipulation tasks that is more efficient than AR.

UD-VLA: joint discrete diffusion of images and actions

UD-VLA (Unified Diffusion VLA) (J. Chen et al. 2025) proposes a Joint Discrete Denoising Diffusion Process that synchronously denoises future image tokens and action tokens in the same token space.

  • Image tokens and action tokens are treated identically under the same masked diffusion
  • Mutual constraints (“if I move like this, this is what I will see”) are expressed in a single denoising process
  • SOTA on benchmarks, with inference that is clearly faster than AR

The significance of UD-VLA is that it unifies the world model (next-state prediction) and the policy (next-action prediction). While AR designs typically place the world model and the policy in separate heads, DLMs can couple them through joint denoising.

Conventional NLP

Even before the rise of large-scale DLMs, diffusion-based natural language processing was explored broadly — classification, extraction, summarization, dialogue, machine translation, and more. While most of this work is on the legacy side, we pick out several representative cases where DLM structural advantages stand out.

Editing: EditText

EditText (Lee et al. 2025) is an SDEdit-based controllable coarse-to-fine text editing framework. It brings the SDEdit idea of “resume denoising from a mid-noise state to edit an image” into the text domain, combining it with self-conditioning to improve edit precision. Infilling and editing are an essential strong point of masked DLMs, and can be expressed more naturally than AR’s constrained editing (rewriting only a specific portion while preserving the rest).

Planning: PLANNER

PLANNER (Y. Zhang et al. 2023) combines a latent diffusion planning module with an autoregressive decoder for paragraph generation. Paragraph semantic embeddings are generated by diffusion in latent space, and the final text is produced by an AR decoder conditioned on them.

  • Latent diffusion captures global structure (“the theme and arc of the entire paragraph”)
  • AR ensures local fluency
  • Repetition and redundancy are suppressed

The hierarchical division “global plan via diffusion, local realization via AR” serves as one design pattern that leverages the structural advantages of DLMs.

Constrained generation: PoetryDiffusion

PoetryDiffusion (Hu et al. 2024) tackles poetry generation, handling simultaneous constraints over semantics and metrical structure.

  • Semantics are generated by a diffusion model
  • Meter is enforced by an independently trained metrical controller
  • The two are combined at inference time

Because metrical constraints (syllable count, rhyme patterns) depend on the global structure of the sequence, they are hard to satisfy with AR-style local decoding. The design of inserting an external controller into DLM iterative refinement can be applied as a general template for constrained generation.

Dialogue: DiffusionDialog

DiffusionDialog (J. Xiang et al. 2024) addresses the one-to-many problem in dialogue generation (multiple valid responses for the same context) via diffusion over a continuous latent. While AR’s temperature sampling trades off diversity against quality, latent diffusion controls diversity at the sampling stage in the latent and quality at the decoding stage independently.

Machine translation: XDLM

XDLM (L. Chen et al. 2023) introduces a cross-lingual pre-training objective tailored to diffusion models, learning the inter-language mapping at the pretraining stage. The advantages of diffusion in machine translation lie in capturing long-range dependencies and in being able to refine the whole target while looking at the whole source.

Classification and extraction: ROIC-DM, DiffusionNER, IPAD

These are unconventional uses of DLMs in which the label space itself is diffused.

  • ROIC-DM (Yuan et al. 2024): In text classification, the class label is diffused. Improves adversarial robustness
  • DiffusionNER (Shen et al. 2023): Formulates Named Entity Recognition (NER) as boundary denoising. The start/end positions of entities are iteratively refined from random noise
  • IPAD (X. Xiang et al. 2025): Frames scene text recognition as conditional text generation, balancing recognition accuracy and inference speed via easy-first decoding

These move beyond the naive view of “DLM = text generation” and provide the broader perspective that arbitrary structured outputs can be generated by denoising. Outputs with discrete structure such as boundaries, labels, or selection sets all fall within the reach of DLMs.

Other representatives

In summarization, DiffuSum (H. Zhang et al. 2023) treats extractive summarization as diffusion over sentence representations. This is also an example of structured-output generation in the sense of “diffusing the set of selected sentences.”

Cross-domain comparison

Table 1 summarizes representative methods in each domain and how the structural advantages of DLMs are concretely brought to bear.

Table 1: Representative DLM methods in each application domain and their structural advantages
Domain Representative method Base / type Main DLM advantage Main result / notes
Code DiffuCoder (S. Gong et al. 2025) dedicated 7B masked DLM iterative refinement, non-sequential editing HumanEval/MBPP improved via coupled-GRPO
Code DCoLT (Huang et al. 2025) LLaDA base + outcome RL whole trajectory as policy HumanEval +19.5
Code DUS (Luxembourg et al. 2025) inference-only joint entropy control speed-quality improved, planner-free
Code Mercury Coder (Labs et al. 2025) commercial DLM parallelism 10× throughput vs. AR
Bio DPLM (X. Wang et al. 2024a) masked protein DLM infilling, representation + generation unifies sequence generation and representation
Bio DPLM-2 (X. Wang et al. 2024b) multimodal extension of DPLM joint sequence + structure unifies folding / inverse folding
Bio MeMDLM (Goel et al. 2024) ESM-2 fine-tune domain specialization de novo design of transmembrane proteins
Bio CFP-Gen (Yin et al. 2025) multimodal protein DLM composite constraints high success rate in multi-functional protein design
Bio DSM (Hallee et al. 2025) LLaDA-inspired generation + representation room for LLaDA-style RL
Bio TGM-DLM (H. Gong et al. 2024) text-guided SMILES collective token updates surpasses MolT5-Base
Bio TransDLM (Xiong et al. 2024) text-guided molecule naturalness of editing avoids error propagation
Bio DRAKES (C. Wang et al. 2025) RL fine-tune reward backprop via Gumbel-Softmax DNA/protein design
Bio ForceGen (Ni et al. 2024) protein language diffusion non-linear mechanical objectives de novo protein
Robotics LLaDA-VLA (Y. Wen et al. 2025) LLaDA base hierarchical action, parallel inference surpasses AR VLA baselines
Robotics dVLA (J. Wen et al. 2025) MMaDA backbone joint vision + CoT + action prefix attn + KV cache
Robotics UD-VLA (J. Chen et al. 2025) joint discrete diffusion unifies world model + policy SOTA, fast inference
NLP EditText (Lee et al. 2025) SDEdit + text infilling, editing coarse-to-fine control
NLP PLANNER (Y. Zhang et al. 2023) latent diffusion + AR global plan paragraph generation
NLP PoetryDiffusion (Hu et al. 2024) diffusion + metrical controller constrained generation semantics + meter
NLP DiffusionDialog (J. Xiang et al. 2024) latent diffusion handles one-to-many dialogue diversity
NLP XDLM (L. Chen et al. 2023) cross-lingual diffusion bidirectional context machine translation
NLP ROIC-DM (Yuan et al. 2024) diffuse the label adversarial robustness text classification
NLP DiffusionNER (Shen et al. 2023) boundary denoising structured output NER
NLP IPAD (X. Xiang et al. 2025) iterative parallel decoding easy-first scene text recognition
NLP DiffuSum (H. Zhang et al. 2023) diffuse sentence selection generation over a selection set extractive summarization

State of commercialization

Commercial deployment of DLMs has rapidly accelerated through 2024-2025.

  • Mercury Coder (Labs et al. 2025): Inception Labs, commercial DLM specialized for code, 10× throughput vs. AR
  • Gemini Diffusion (Google DeepMind 2024): Google DeepMind, commercial offering of a general-purpose text DLM
  • Seed Diffusion (Song et al. 2025): ByteDance, DLM for code generation

All of these place throughput from DLM parallelism at the core of their product differentiation. In use cases where the inference cost of AR LLMs becomes a problem (coding assistants, real-time dialogue, batch processing), DLMs have emerged as a realistic option.

TipWhy commercialization started with code

Code generation satisfies three conditions — (1) strong low-latency requirements, (2) direct value from generating many completion candidates, (3) a good match between DLMs and the combination of syntactic constraints and non-sequentiality — and was therefore chosen as the first commercialization area for DLMs. For text generation in general, the balance between AR’s fluency and cost sensitivity is already commercially optimized, making the barrier higher for DLMs to break in.

Future directions

The unresolved issues and research directions common across application domains are as follows.

  • Test-time scaling and reasoning: DLMs can improve quality by increasing the number of steps \(T\), but whether prolonging iterative refinement in reasoning tasks scales as a counterpart to AR’s chain-of-thought is not yet established. RL-based methods such as DCoLT are one answer
  • Absence of standard editing benchmarks: Infilling, fill-in-the-middle (FIM), and controllable editing are the largest structural advantages of DLMs, but DLM-specific benchmarks corresponding to HumanEval established on the AR side are scarce. Standardization of evaluation metrics such as those of EditText is desirable
  • Dedicated DLMs vs. general-purpose DLMs: In proteins and molecules, domain-specific DLMs (DPLM, TGM-DLM) deliver results, while in code and VLA, fine-tuning from general-purpose DLMs (LLaDA, MMaDA) does. Which direction grows further in the long term will be decided by competition between domain-specific data volume and the representational power of general-purpose bases
  • Multimodal extension: Multi-modality joint diffusion as in DPLM-2 or UD-VLA has only just begun, and there is much room to extend toward a diffusion foundation model that unifies image, audio, 3D, and action
  • RL standardization: Coupled-GRPO (DiffuCoder), outcome-based RL (DCoLT), Gumbel-Softmax reward backprop (DRAKES), and so on — RL for DLMs differs paper by paper. Establishment of a standard DLM RL recipe corresponding to AR’s RLHF / GRPO is awaited

References

Chen, Jian, Wei Song, Pu Ding, et al. 2025. “Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process.” arXiv Preprint arXiv:2511.01718. https://arxiv.org/abs/2511.01718.
Chen, Linyao, Aosong Feng, Boming Yang, and Zihui Li. 2023. XDLM: Cross-Lingual Diffusion Language Model for Machine Translation.” arXiv Preprint arXiv:2307.13560. https://arxiv.org/abs/2307.13560.
Goel, Shrey, Vishrut Thoutam, Edgar Marroquin, et al. 2024. MeMDLM: De Novo Membrane Protein Design with Masked Discrete Diffusion Protein Language Models.” NeurIPS 2024 Workshop on AI for New Drug Modalities. https://arxiv.org/abs/2410.16735.
Gong, Haisong, Qiang Liu, Shu Wu, and Liang Wang. 2024. Text-Guided Molecule Generation with Diffusion Language Model.” Proceedings of the AAAI Conference on Artificial Intelligence. https://arxiv.org/abs/2402.13643.
Gong, Shansan, Ruixiang Zhang, Huangjie Zheng, et al. 2025. DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation.” arXiv Preprint arXiv:2506.20639. https://arxiv.org/abs/2506.20639.
Google DeepMind. 2024. Gemini Diffusion. Product page. https://deepmind.google/technologies/gemini-diffusion/.
Hallee, Logan, Nikolaos Rafailidis, David Bichara, and Jason P. Gleghorn. 2025. “Diffusion Sequence Models for Enhanced Protein Representation and Generation.” arXiv Preprint arXiv:2506.08293. https://arxiv.org/abs/2506.08293.
Hu, Zhiyuan, Chumin Liu, Yue Feng, Anh Tuan Luu, and Bryan Hooi. 2024. PoetryDiffusion: Towards Joint Semantic and Metrical Manipulation in Poetry Generation.” Proceedings of the AAAI Conference on Artificial Intelligence. https://arxiv.org/abs/2306.08456.
Huang, Zemin, Zhiyang Chen, Zijun Wang, Tiancheng Li, and Guo-Jun Qi. 2025. “Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models.” arXiv Preprint arXiv:2505.10446. https://arxiv.org/abs/2505.10446.
Labs, Inception, Samar Khanna, Siddhant Kharbanda, et al. 2025. Mercury: Ultra-Fast Language Models Based on Diffusion.” arXiv Preprint arXiv:2506.17298. https://arxiv.org/abs/2506.17298.
Lee, Che Hyun, Heeseung Kim, Jiheum Yeom, and Sungroh Yoon. 2025. EditText: Controllable Coarse-to-Fine Text Editing with Diffusion Language Models.” arXiv Preprint arXiv:2502.19765. https://arxiv.org/abs/2502.19765.
Li, Tianyi, Mingda Chen, Bowei Guo, and Zhiqiang Shen. 2025. “A Survey on Diffusion Language Models.” arXiv Preprint arXiv:2508.10875. https://arxiv.org/abs/2508.10875.
Luxembourg, Omer, Haim Permuter, and Eliya Nachmani. 2025. Plan for Speed: Dilated Scheduling for Masked Diffusion Language Models.” arXiv Preprint arXiv:2506.19037. https://arxiv.org/abs/2506.19037.
Ni, Bo, David L. Kaplan, and Markus J. Buehler. 2024. ForceGen: End-to-End de Novo Protein Generation Based on Nonlinear Mechanical Unfolding Responses Using a Language Diffusion Model.” Science Advances 10 (6): eadl4000. https://www.science.org/doi/10.1126/sciadv.adl4000.
Nie, Shen, Fengqi Zhu, Zebin You, et al. 2025. “Large Language Diffusion Models.” arXiv Preprint arXiv:2502.09992. https://arxiv.org/abs/2502.09992.
Shen, Yongliang, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, and Yueting Zhuang. 2023. DiffusionNER: Boundary Diffusion for Named Entity Recognition.” Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. https://arxiv.org/abs/2305.13298.
Song, Yuxuan, Zheng Zhang, Cheng Luo, et al. 2025. “Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference.” arXiv Preprint arXiv:2508.02193. https://arxiv.org/abs/2508.02193.
Wang, Chenyu, Masatoshi Uehara, Yichun He, et al. 2025. “Fine-Tuning Discrete Diffusion Models via Reward Optimization with Applications to DNA and Protein Design.” International Conference on Learning Representations. https://arxiv.org/abs/2410.13643.
Wang, Xinyou, Zaixiang Zheng, Fei Ye, Dongyu Xue, Shujian Huang, and Quanquan Gu. 2024a. “Diffusion Language Models Are Versatile Protein Learners.” International Conference on Machine Learning. https://arxiv.org/abs/2402.18567.
Wang, Xinyou, Zaixiang Zheng, Fei Ye, Dongyu Xue, Shujian Huang, and Quanquan Gu. 2024b. DPLM-2: A Multimodal Diffusion Protein Language Model.” arXiv Preprint arXiv:2410.13782. https://arxiv.org/abs/2410.13782.
Wen, Junjie, Min Zhu, Jiaqi Liu, et al. 2025. dVLA: Diffusion Vision-Language-Action Model with Multimodal Chain-of-Thought.” arXiv Preprint arXiv:2509.25681. https://arxiv.org/abs/2509.25681.
Wen, Yuqi, Hao Li, K. Gu, Yiwen Zhao, Tao Wang, and Mingxiu Sun. 2025. LLaDA-VLA: Vision Language Diffusion Action Models.” arXiv Preprint arXiv:2509.06932. https://arxiv.org/abs/2509.06932.
Xiang, Jianxiang, Zhenhua Liu, Haodong Liu, Yin Bai, Jun Cheng, and Wentao Chen. 2024. DiffusionDialog: A Diffusion Model for Diverse Dialog Generation with Latent Space.” LREC-COLING 2024. https://arxiv.org/abs/2404.06760.
Xiang, Xun, Zhaoqi Qiao, Xun Xu, and Yu Zhou. 2025. IPAD: Iterative, Parallel, and Diffusion-Based Network for Scene Text Recognition.” International Journal of Computer Vision. https://arxiv.org/abs/2312.11923.
Xiong, Yida, Kun Li, Jiawei Zhang, Dan Lin, Yan Che, and Wenhu Hu. 2024. “Text-Guided Multi-Property Molecular Optimization with a Diffusion Language Model.” arXiv Preprint arXiv:2410.13597. https://arxiv.org/abs/2410.13597.
Yang, Ling, Ye Tian, Bowen Li, et al. 2025. MMaDA: Multimodal Large Diffusion Language Models.” arXiv Preprint arXiv:2505.15809. https://arxiv.org/abs/2505.15809.
Ye, Jiacheng et al. 2025. “Dream: Diffusion Language Models.” arXiv Preprint.
Yin, Junbo, Chao Zha, Wenjia He, Chencheng Xu, and Xin Gao. 2025. CFP-Gen: Combinatorial Functional Protein Generation via Diffusion Language Models.” International Conference on Machine Learning. https://arxiv.org/abs/2505.22869.
Yuan, Shilong, Wei Yuan, Hongzhi Yin, and Tieke He. 2024. ROIC-DM: Robust Text Inference and Classification via Diffusion Model.” arXiv Preprint arXiv:2401.03514. https://arxiv.org/abs/2401.03514.
Zhang, Haopeng, Xiao Liu, and Jiawei Zhang. 2023. DiffuSum: Generation Enhanced Extractive Summarization with Diffusion.” Findings of the Association for Computational Linguistics: ACL 2023. https://arxiv.org/abs/2305.01735.
Zhang, Yizhe, Jiatao Gu, Zhuofeng Wu, Shuangfei Zhai, Joshua M. Susskind, and Navdeep Jaitly. 2023. PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model.” Advances in Neural Information Processing Systems. https://arxiv.org/abs/2306.02531.