The widespread deployment of large vision models such as Stable Diffusion raises significant legal and ethical concerns, as these models can memorize and reproduce copyrighted content without authorization. Existing detection approaches often lack robustness and fail to provide rigorous theoretical underpinnings.
To address these gaps, we formalize the concept of copyright infringement and its detection from the perspective of Differential Privacy (DP), and introduce the conditional sensitivity metric, a concept analogous to sensitivity in DP, that quantifies the deviation in a diffusion model’s output caused by the inclusion or exclusion of a specific training data point. To operationalize this metric, we propose D-Plus-Minus (DPM), a novel post-hoc detection framework that identifies copyright infringement in text-to-image diffusion models. Specifically, DPM simulates inclusion and exclusion processes by fine-tuning models in two opposing directions: learning or unlearning. Besides, to disentangle concept-specific influence from the global parameter shifts induced by fine-tuning, DPM computes confidence scores over orthogonal prompt distributions using statistical metrics.
Moreover, to facilitate standardized benchmarking, we also construct the Copyright Infringement Detection Dataset (CIDD), a comprehensive resource for evaluating detection across diverse categories.
Our results demonstrate that DPM reliably detects infringement content without requiring access to the original training dataset or text prompts, offering an interpretable and practical solution for safeguarding intellectual property in the era of generative AI.
Detection of copyright infringement faces several practical challenges, such as scalability, inaccessibility of training data, conditional input unavailability, and insufficient theoretical guarantees. In light of these issues, our work operates under a realistic and challenging set of assumptions:
(1) white-box access to a pretrained model.Differential privacy (DP) is a formal notion of algorithmic privacy, which aims to prevent the release of private information. Algorithms with DP guarantee that the model’s output does not reveal whether any single individual’s data is used.
However, Previous researches (e.g., membership inference, data extraction) have revealed significant privacy vulnerabilities in the outputs of diffusion models. So we hypothesize that diffusion models exhibit almost no conditional differential privacy, but with much more publicity.
We reinterprete the detection of copyright infringement as the compliance with or violation of conditional differential publicity. Specifically, when a particular concept, such as the neighborhood images of a target image, is present or absent in the training data, it can significantly alter the model’s output in response to prompts associated with that concept. In other words:
Differential Privacy = Copyright Non-Infringement (Datapoint not in the Training Dataset)
Violation of Differential Privacy = Copyright Infringement (Datapoint in the Training Dataset)
Copyright infringement can be defined as:
Definition 1 (Copyright Infringement). Let denote a copyrighted data point or concept, and be an input (e.g., a text prompt) semantically aligned with . We say that model trained on infringes upon if there exists a measurable subset such that:
where is a neighboring dataset.
Definition 2 (Copyright Non-Infringement). Let be a non-infringed data point or concept such that for all training datasets considered. We say that model does not infringe upon if for any input and for all measurable subsets such that:
To allow for a relaxed setting, we say that satisfies approximate non-infringement if the following -differential privacy holds:
where and are any neighboring training datasets, and , denote the model parameters trained on and respectively.
We introduce a new metric, conditional sensitivity, a principal metric for quantifying the extent of publicity and standardizing the confidence score of copyright infringement:
where and are neighboring datasets that differ by the inclusion or exclusion of the conditional datapoint , and the function denotes the output of a query function when trained on dataset . In DPM framework, we use CLIP image encoder as a query function to capture the semantics similarity between two outputs.
Since the presence or absence of the target data point in the training dataset is unknown, the inclusion or exclusion of the conditional datapoint could be simulated by fine-tuning a model in two opposing directions: learning to include (D+) and unlearning to exclude (D-).
We visualize the discrepancy in conditional sensitivity in Fig.1, where the larger change observed in infringed samples compared to non-infringed ones validates its use as a reliable measurement.
Step1: Prepare a target image to be detected. Pay attention that images without specific charateristic may not be considered to infringe copyright on law and cannot be detected (e.g., landscape photos).
Step2: Extract and filter a core concept. It aims to exclude content not protected by copyright law, ensuring the accuracy of the detection results. And we will collect photos associated with ONLY this concept in the next step.
Step3: Construct image-prompt pair. As fine-tuning model needs several images related to the concept, we construct a neighborhood of the target concept as our training dataset, consisting of similar semantics to be detected, and then specify a general prompt (format as: a photo of [V] [class]) with identifier (e.g., “[V]”, “sks”).
Step4: D-Plus-Minus detection framework (Branch Training & Assessment). We fine-tune the model into two branches (learning and unlearning) with the constructed image-prompt pair. To assess the effect of the concept on the model’s generation behavior, we compare the outputs between a fine-tuned model and the original model under the same relevant text prompt via conditional sensitivity metric (cosine similarity of CLIP encoder embeddings).
Step5: Statistical analysis. We construct a reference distribution by generating orthogonal images, clarifying the global parameter shifts. Since this step is time-consuming, we can omit this step, and the detection result is still accurate.
Output: Confidence score of copyright infringement. By merging the two branches together, we can get the D-Plus-Minus score ranging in [0,1].
| Class | SD1.4 | SDXL-1.0 | SANA-0.6B | FLUX.1 | ||||
|---|---|---|---|---|---|---|---|---|
| AUC ↑ | SoftAcc ↑ | AUC ↑ | SoftAcc ↑ | AUC ↑ | SoftAcc ↑ | AUC ↑ | SoftAcc ↑ | |
| Human Face | 0.9011 | 0.8058 | 0.7011 | 0.6289 | 0.8062 | 0.7285 | 0.7531 | 0.6419 |
| Architecture | 0.8021 | 0.7106 | 0.9256 | 0.8488 | 0.9043 | 0.8224 | 0.9500 | 0.8606 |
| Arts Painting | 0.8555 | 0.7604 | 0.8881 | 0.8550 | 0.8140 | 0.7204 | 0.7326 | 0.6935 |
| Weighted Average | 0.8584 | 0.7644 | 0.8170 | 0.7523 | 0.8398 | 0.7571 | 0.8122 | 0.7247 |
| Merged Total | 0.8071 | 0.6726 | 0.7800 | 0.7234 | 0.7914 | 0.6855 | 0.8257 | 0.7039 |
As the figure shows, models tend to learn and unlearn faster with infringed samples, while slower on non-infringed ones, and cannot learn exact elements in the target images.
To comprehensively categorize copyright infringement in generative models, we propose a hierarchical taxonomy containing four levels of content resemblance:
| Level | Categories | Examples |
|---|---|---|
| Level 1 | Technics | — |
| Level 2 | Content | Human Face* |
| Level 3-1 | Structure | Architecture* |
| Level 3-2 | Style | Arts Painting* |
| Level 4 | Semantics | Plots & Themes |
Table: Hierarchical Categories of Copyright Infringement. It is ordered from low-level perceptual features to high-level conceptual constructs. “*” means the used classes in CIDD dataset.
Copyright Infringement Detection Dataset (CIDD) contains several classes of orthogonal prompts and three image classes that are most likely to be infringed, mapping to Level 2&3: human face, architecture, and arts painting.



Crucially, CIDD includes both infringed and non-infringed concepts, each of which is annotated with a binary infringement label based on its source and content provenance, and is paired with 3 to 6 neighbourhood images, enabling robust learning and evaluation under weak and probabilistic assumptions.
You can download the CIDD dataset here.
@misc{man2025copyrightinfringementdetectiontexttoimage,
title={Copyright Infringement Detection in Text-to-Image Diffusion Models via Differential Privacy},
author={Xiafeng Man and Zhipeng Wei and Jingjing Chen},
year={2025},
eprint={2509.23022},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.23022}
}