Copyright Infringement Detection
in Text-to-Image Diffusion Models
via Differential Privacy

AAAI 2026 Oral
1 College of Future Information Technology, Fudan University, Shanghai, China
2 Institute of Trustworthy Embodied AI, Fudan University, Shanghai, China
3 International Computer Science Institute, CA, USA
4 UC Berkeley, CA, USA

TL;DR: We formalize the concept of copyright infringement and its detection from the perspective of Differential Privacy (DP), and introduce a novel post-hoc detection framework D-Plus-Minus (DPM). It simulates the inclusion or exclusion processes of a specific training data point to be detected by fine-tuning models in two opposing directions: learning or unlearning branch. To facilitate standardized benchmarking, we also construct the Copyright Infringement Detection Dataset (CIDD), a comprehensive resource for evaluating detection across diverse categories.

Abstract

The widespread deployment of large vision models such as Stable Diffusion raises significant legal and ethical concerns, as these models can memorize and reproduce copyrighted content without authorization. Existing detection approaches often lack robustness and fail to provide rigorous theoretical underpinnings.

To address these gaps, we formalize the concept of copyright infringement and its detection from the perspective of Differential Privacy (DP), and introduce the conditional sensitivity metric, a concept analogous to sensitivity in DP, that quantifies the deviation in a diffusion model’s output caused by the inclusion or exclusion of a specific training data point. To operationalize this metric, we propose D-Plus-Minus (DPM), a novel post-hoc detection framework that identifies copyright infringement in text-to-image diffusion models. Specifically, DPM simulates inclusion and exclusion processes by fine-tuning models in two opposing directions: learning or unlearning. Besides, to disentangle concept-specific influence from the global parameter shifts induced by fine-tuning, DPM computes confidence scores over orthogonal prompt distributions using statistical metrics.

Moreover, to facilitate standardized benchmarking, we also construct the Copyright Infringement Detection Dataset (CIDD), a comprehensive resource for evaluating detection across diverse categories.

Our results demonstrate that DPM reliably detects infringement content without requiring access to the original training dataset or text prompts, offering an interpretable and practical solution for safeguarding intellectual property in the era of generative AI.

D-Plus-Minus (DPM) Method

Problem Settings

Detection of copyright infringement faces several practical challenges, such as scalability, inaccessibility of training data, conditional input unavailability, and insufficient theoretical guarantees. In light of these issues, our work operates under a realistic and challenging set of assumptions:

 (1) white-box access to a pretrained model.
 (2) absence of corresponding input prompt.
 (3) inaccessibility of training data.

Privacy Vulnerabilities

Differential privacy (DP) is a formal notion of algorithmic privacy, which aims to prevent the release of private information. Algorithms with DP guarantee that the model’s output does not reveal whether any single individual’s data is used.

However, Previous researches (e.g., membership inference, data extraction) have revealed significant privacy vulnerabilities in the outputs of diffusion models. So we hypothesize that diffusion models exhibit almost no conditional differential privacy, but with much more publicity.

Formalization of Copyright (Non-)Infringement

We reinterprete the detection of copyright infringement as the compliance with or violation of conditional differential publicity. Specifically, when a particular concept, such as the neighborhood images of a target image, is present or absent in the training data, it can significantly alter the model’s output in response to prompts associated with that concept. In other words:

Differential Privacy = Copyright Non-Infringement (Datapoint not in the Training Dataset)
Violation of Differential Privacy = Copyright Infringement (Datapoint in the Training Dataset)

Copyright infringement can be defined as:

Definition 1 (Copyright Infringement). Let xcDCx_{c}\in D_{C} denote a copyrighted data point or concept, and pp be an input (e.g., a text prompt) semantically aligned with xcx_{c}. We say that model GG trained on DD infringes upon xcx_{c} if there exists a measurable subset S{G(pi)piU(p)}S\subseteq\{G(p_{i})\mid p_{i}\in U(p)\} such that:

Pr[G(θD,p)S]Pr[G(θD,p)S],\Pr[G(\theta_{D},p)\in S]\gg\Pr[G(\theta_{D^{\prime}},p)\in S],

where D=D{xc}D^{\prime}=D\setminus\{x_{c}\} is a neighboring dataset.

Definition 2 (Copyright Non-Infringement). Let xx be a non-infringed data point or concept such that xDx\notin D for all training datasets considered. We say that model GG does not infringe upon xx if for any input pp and for all measurable subsets S{G(pi)piU(p)}S\subseteq\{G(p_{i})\mid p_{i}\in U(p)\} such that:

Pr[G(θD,p)S]=Pr[G(θD,p)S].\Pr[G(\theta_{D},p)\in S]=\Pr[G(\theta_{D^{\prime}},p)\in S].

To allow for a relaxed setting, we say that GG satisfies approximate non-infringement if the following (ϵ,δ)(\epsilon,\delta)-differential privacy holds:

Pr[G(θD,p)S]eϵPr[G(θD,p)S]+δ,\Pr[G(\theta_{D},p)\in S]\leq e^{\epsilon}\cdot\Pr[G(\theta_{D^{\prime}},p)\in S]+\delta,

where DD and DD^{\prime} are any neighboring training datasets, and θD\theta_{D}, θD\theta_{D^{\prime}} denote the model parameters trained on DD and DD^{\prime} respectively.

Measurement of Copyright Infringement

We introduce a new metric, conditional sensitivity, a principal metric for quantifying the extent of publicity and standardizing the confidence score of copyright infringement:

CS(M,x^i)=maxD,D:DD{x^i}|M(D)M(D)|CS(M,\hat{x}_{i})=\max_{D,D^{\prime}:D\triangle D^{\prime}\leq\{\hat{x}_{i}\}}\left|M(D)-M(D^{\prime})\right|

where DD and DD^{\prime} are neighboring datasets that differ by the inclusion or exclusion of the conditional datapoint x^i\hat{x}_{i}, and the function M(D)M(D) denotes the output of a query function when trained on dataset DD. In DPM framework, we use CLIP image encoder as a query function to capture the semantics similarity between two outputs.

Since the presence or absence of the target data point in the training dataset is unknown, the inclusion or exclusion of the conditional datapoint could be simulated by fine-tuning a model in two opposing directions: learning to include (D+) and unlearning to exclude (D-).

Fig 1: D-Plus-Minus Method. Given the neighbourhood images U(xi)U(x_{i}), i.e., several images of similar semantics extracted from the target image, of the target image xix_{i} as the training subset, we fine-tune the text-to-image model GG towards two branch: learning branch GD+G_{D^{+}} and unlearning branch GDG_{D^{-}}. Experimental results show that infringed samples lead to a significant shift in sensitivity metric, whereas non-infringed samples only cause minor changes.

We visualize the discrepancy in conditional sensitivity in Fig.1, where the larger change observed in infringed samples compared to non-infringed ones validates its use as a reliable measurement.

Detection Procedure

Step1: Prepare a target image to be detected. Pay attention that images without specific charateristic may not be considered to infringe copyright on law and cannot be detected (e.g., landscape photos).

Step2: Extract and filter a core concept. It aims to exclude content not protected by copyright law, ensuring the accuracy of the detection results. And we will collect photos associated with ONLY this concept in the next step.

Step3: Construct image-prompt pair. As fine-tuning model needs several images related to the concept, we construct a neighborhood of the target concept as our training dataset, consisting of similar semantics to be detected, and then specify a general prompt (format as: a photo of [V] [class]) with identifier (e.g., “[V]”, “sks”).

Step4: D-Plus-Minus detection framework (Branch Training & Assessment). We fine-tune the model into two branches (learning and unlearning) with the constructed image-prompt pair. To assess the effect of the concept on the model’s generation behavior, we compare the outputs between a fine-tuned model and the original model under the same relevant text prompt via conditional sensitivity metric (cosine similarity of CLIP encoder embeddings).

Step5: Statistical analysis. We construct a reference distribution by generating orthogonal images, clarifying the global parameter shifts. Since this step is time-consuming, we can omit this step, and the detection result is still accurate.

Output: Confidence score of copyright infringement. By merging the two branches together, we can get the D-Plus-Minus score ranging in [0,1].

Results

Quantitative Detection Metrics

Models are run separately on the classes of CIDD dataset in different models. Merged Total means that the ΔCS(·) are normalized altogether, while others are normalized within the class.

Class SD1.4 SDXL-1.0 SANA-0.6B FLUX.1
AUC ↑SoftAcc ↑AUC ↑SoftAcc ↑ AUC ↑SoftAcc ↑AUC ↑SoftAcc ↑
Human Face 0.90110.80580.70110.62890.80620.72850.75310.6419
Architecture 0.80210.71060.92560.84880.90430.82240.95000.8606
Arts Painting 0.85550.76040.88810.85500.81400.72040.73260.6935
Weighted Average 0.85840.76440.81700.75230.83980.75710.81220.7247
Merged Total 0.80710.67260.78000.72340.79140.68550.82570.7039

Qualitative visualization of two branches across different timesteps

As the figure shows, models tend to learn and unlearn faster with infringed samples, while slower on non-infringed ones, and cannot learn exact elements in the target images.

Copyright Infringement Detection Dataset (CIDD)

To comprehensively categorize copyright infringement in generative models, we propose a hierarchical taxonomy containing four levels of content resemblance:

Level Categories Examples
Level 1Technics
Level 2ContentHuman Face*
Level 3-1StructureArchitecture*
Level 3-2StyleArts Painting*
Level 4SemanticsPlots & Themes

Table: Hierarchical Categories of Copyright Infringement. It is ordered from low-level perceptual features to high-level conceptual constructs. “*” means the used classes in CIDD dataset.

Copyright Infringement Detection Dataset (CIDD) contains several classes of orthogonal prompts and three image classes that are most likely to be infringed, mapping to Level 2&3: human face, architecture, and arts painting.

Crucially, CIDD includes both infringed and non-infringed concepts, each of which is annotated with a binary infringement label based on its source and content provenance, and is paired with 3 to 6 neighbourhood images, enabling robust learning and evaluation under weak and probabilistic assumptions.

You can download the CIDD dataset here.

BibTeX

@misc{man2025copyrightinfringementdetectiontexttoimage,
  title={Copyright Infringement Detection in Text-to-Image Diffusion Models via Differential Privacy}, 
  author={Xiafeng Man and Zhipeng Wei and Jingjing Chen},
  year={2025},
  eprint={2509.23022},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2509.23022}
}