Imagine a photocopier making copies of copies. Each generation gets a little blurrier, a little more degraded. That’s essentially what’s happening with Gen AI models today, and this diagram maps out exactly how.

The Cycle Begins
It starts innocently enough. An AI model (Generation N) creates content—articles, images, code, whatever. This content gets posted online, where it mingles with everything else on the web. So far, so good.
The Contamination Point
Here’s where things get interesting. Web scrapers come along, hoovering up data to build training datasets for the next generation of AI. They can’t always tell what’s human-made and what’s AI-generated. So both get scooped up together.
The diagram highlights this as the critical “Dataset Composition” decision point—that purple node where synthetic and human data merge. With each cycle, the ratio shifts. More AI content, less human content. The dataset is slowly being poisoned by its own output.
The Degradation Cascade
Train a new model (Generation N+1) on this contaminated data, and four things happen:
- Accuracy drops: The model makes more mistakes
- Creativity diminishes: It produces more generic, derivative work
- Biases amplify: Whatever quirks existed get exaggerated
- Reliability tanks: You can’t trust the outputs as much
The Vicious Circle Closes
Now here’s the kicker: this degraded Generation N+1 model goes out into the world and creates more content, which gets scraped again, which trains Generation N+2, which is even worse. Round and round it goes, each loop adding another layer of synthetic blur.
The Human Data Squeeze
Meanwhile, clean human-generated data becomes the gold standard—and increasingly rare. The blue pathway in the diagram shows this economic reality. As AI floods the web with synthetic content, finding authentic human data becomes harder and more expensive. It’s basic supply and demand, except the supply is being drowned in synthetic noise.
Why This Matters
This isn’t just a theoretical problem. We’re watching it happen in real-time. The diagram shows a self-reinforcing cycle with no natural brake. Unless we actively intervene—by filtering training data, marking AI content, or preserving human data sources—each generation of AI models will be trained on an increasingly polluted dataset.
The arrows loop back on themselves for a reason. This is a feedback system, and feedback systems can spiral. Understanding this flow is the first step to breaking it.