Skip to main content ->
Ai2

How many Van Goghs does it take to Van Gogh? Finding the imitation threshold

Sahil Verma, Yanai Elazar / November 12, 2024

The Impact of Text-to-Image models on Copyright and Privacy

Modern text-to-image models are trained using large datasets collected by scraping image text pairs from the internet. These datasets often include private, copyrighted, and licensed material. Training models on these datasets enables them to generate images with this sensitive content, which might violate copyright laws and individual privacy. This phenomenon is termed imitation – generation of images with content that has recognizable similarity to its training images.

For example, Figure 1 shows that as the number of images of celebrities increases, the generated image imitates their face more accurately. Similarly, Figure 2 shows that the art style of Piet Mondrian and Van Gogh is highly imitated in the generated images when a text-to-image model is prompted with their names.

Figure 1: Examples of real celebrity images (top) and generated images from a text-to-image model (bottom) with increasing image counts from left to right (3, 273, 3K, 10K, and 90K, respectively)

Understanding the Imitation Phenomenon: A Closer Look at Celebrity Faces and Art Styles

In this work, we ask how many instances of a concept a text-to-image model needs to be trained on to imitate it, where concept refers to a specific person or a specific art style.

Establishing such an imitation threshold is useful for several reasons. First, it offers an empirical basis for copyright infringement and privacy violation claims, suggesting that if a concept’s prevalence is below this threshold, such claims are less likely to be true. Second, it acts as a guiding principle for text-to-image model developers who want to avoid such violations.

We call this problem FIT: Finding the Imitation Threshold, and provide a schematic overview of this problem in Figure 2.

The FIT problem: Defining the Imitation Threshold for Copyright and Privacy

This is the first time that anyone has questioned the relation between the number of images of a concept and the ability of a text-to-image model to imitate it. The optimal methodology to measure the imitation threshold requires training multiple models with a varying number of images of a concept and measuring the ability of these models to imitate it, which can be very costly and resource-intensive. We propose MIMETIC^2 that estimates the threshold without incurring the cost of training multiple models.

Figure 2: An overview of FIT, where we seek the imitation threshold – the point at which a model was exposed to enough instances of a concept that it can reliably imitate it.

Estimating the Imitation Threshold Without Expensive Training

We start by collecting a large set of concepts (e.g., various politicians) per domain (e.g., the domain of politicians), and use a text-to-image model to generate images for each concept. Then, we compute the imitation score of each concept by comparing the generated images of that concept to its training images, and also estimate each concept’s frequency in the training data. Finally, by sorting the concepts based on increasing frequency, we estimate the imitation threshold for that domain using a change detection algorithm. Figure 2 illustrates this process.

Figure 3: Similarity between the training and generated images for politicians in our experiments.
Table 1: Imitation Thresholds for human face and art style imitation for the different text-to-image models and pretraining datasets we experiment with.

Key Findings: Fewer Images Than Expected to Reach the Imitation Threshold

Figure 3 shows the imitation score for 400 politicians when sorted in the order of increasing image count. We see an increase in imitation score as the image count increases, and the change detection algorithm estimates the imitation threshold to be at 234 faces for this domain. When the same experiment is repeated for several domains and sets of concepts, we find that most concepts require between 200-600 images for the text-to-image model to imitate it (as Table 1 shows). While the specific numbers for each domain aren’t very informative, the range – just a few hundred for most concepts – is what stands out. We were surprised by how few images were actually needed to achieve imitation.

Future Directions: Balancing Model Performance and Data Privacy

As text-to-image models evolve, understanding the imitation threshold is crucial for preventing legal risks related to copyright and privacy violations. The results of our research suggest that even a small number of images—just a few hundred—can trigger imitation in these models, underscoring the need for responsible data curation. Moving forward, strategies must be developed to ensure that models generate content ethically while maintaining their performance capabilities, a challenge that also extends to text-generation models.

For more details, please check out:

Subscribe to receive monthly updates about the latest Ai2 news.