Copyright Infringement Detection in Text-to-Image Diffusion Models via Differential Privacy

TL;DR: We formalize the concept of copyright infringement and its detection from the perspective of Differential Privacy (DP), and introduce a novel post-hoc detection framework D-Plus-Minus (DPM). It simulates the inclusion or exclusion processes of a specific training data point to be detected by fine-tuning models in two opposing directions: learning or unlearning branch.

To facilitate standardized benchmarking, we also construct the Copyright Infringement Detection Dataset (CIDD), a comprehensive resource for evaluating detection across diverse categories.

Method

We reinterprete the detection of copyright infringement as the compliance with or violation of conditional differential publicity. Specifically, when a particular concept, such as the neighborhood images of a target image, is present or absent in the training data, it can significantly alter the model’s output in response to prompts associated with that concept. This leads to the definition of a new metric, conditional sensitivity, a principal metric for quantifying the extent of publicity and standardizing the confidence score of copyright infringement: $CS(M,\hat{x}_{i})=\max_{D,D^{\prime}:D\triangle D^{\prime}\leq\{\hat{x}_{i}\}}\left|M(D)-M(D^{\prime})\right|,$ where $D$ and $D^{\prime}$ are neighboring datasets that differ by the inclusion or exclusion of the conditional datapoint $\hat{x}_{i}$ , and the function $M(D)$ denotes the output of a query function when trained on dataset $D$ .

We visualize the discrepancy in conditional sensitivity in Fig.1, where the larger change observed in infringed samples compared to non-infringed ones validates its use as a reliable measurement.

Results

**Table 1: Quantitative Detection Metrics.** Models are run separately on the classes of CIDD dataset in different models. *Merged Total* means that the ΔCS(·) are normalized altogether, while others are normalized within the class.
Class	SD1.4		SDXL-1.0		SANA-0.6B		FLUX.1
Class	AUC ↑	SoftAcc ↑	AUC ↑	SoftAcc ↑	AUC ↑	SoftAcc ↑	AUC ↑	SoftAcc ↑
Human Face	0.9011	0.8058	0.7011	0.6289	0.8062	0.7285	0.7531	0.6419
Architecture	0.8021	0.7106	0.9256	0.8488	0.9043	0.8224	0.9500	0.8606
Arts Painting	0.8555	0.7604	0.8881	0.8550	0.8140	0.7204	0.7326	0.6935
Weighted Average	0.8584	0.7644	0.8170	0.7523	0.8398	0.7571	0.8122	0.7247
Merged Total	0.8071	0.6726	0.7800	0.7234	0.7914	0.6855	0.8257	0.7039

**Fig 2: Qualitative visualization of the Unlearning Branch and Learning Branch across different timesteps.** Models tend to learn and unlearn faster with infringed samples, while slower on non-infringed ones, and cannot learn exact elements in the target images.

Dataset

Copyright Infringement Detection Dataset (CIDD) contains several classes of orthogonal prompts and three image classes that are most likely to be infringed: human face, architecture, and arts painting.

Crucially, CIDD includes both infringed and non-infringed concepts, each of which is annotated with a binary infringement label based on its source and content provenance, and is paired with 3 to 6 neighbourhood images, enabling robust learning and evaluation under weak and probabilistic assumptions.

As the paper is under review, dataset will be made publicly available following publication.

BibTex

@misc{man2025copyrightinfringementdetectiontexttoimage,
      title={Copyright Infringement Detection in Text-to-Image Diffusion Models via Differential Privacy}, 
      author={Xiafeng Man and Zhipeng Wei and Jingjing Chen},
      year={2025},
      eprint={2509.23022},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.23022}, 
}