The Vanishing Pool of “Easy Data” for AI: Ecological Implications and Regenerative Solutions

Note: I’m sharing part of the talk I gave at Universidad Panamericana, as part of the 26th Annual Convention of the Media Ecology Association. The panel was titled: “Aesthetics, Narrative, and Artificial Intelligence”.

FdoGtz2025_MEA_SV

Artificial intelligence is transforming our informational landscape at an unprecedented speed. Just like in ecological systems, this transformation is driven by an insatiable hunger for resources —in this case, data. The parallels between data extraction in AI and natural resource exploitation are striking. In this presentation, we explore how AI reshapes our cognitive environment and propose sustainable approaches to address the challenges it brings.

What Is “Easy Data” in AI?
“Easy data” refers to public, abundant, and low-friction datasets with high informational value —resources that require minimal processing before being used to train AI models. Examples include Wikipedia, Common Crawl, public domain texts, and open social media posts. These datasets are essential to the development of large language models, as they provide diverse linguistic patterns and foundational knowledge.

However, this supply is diminishing. According to a 2024 WIRED article and a study by the Data Provenance Initiative at MIT, roughly 25% of the highest-quality data from major datasets like C4, RefinedWeb, and Dolma has become inaccessible. The reason? Many websites now restrict automated data collection through robots.txt protocols or have implemented paywalls, limiting their use in AI training. This not only impacts big tech companies but also hinders academic research and innovation.

The Data Gold Rush: Fading Out
The rapid deployment of AI systems has created unprecedented demand for data. Public sources are under increasing strain due to extraction pressures. In response, media outlets and online platforms have taken protective measures: modifying terms of service, erecting paywalls, or signing exclusive commercial agreements with AI developers. These moves are designed to safeguard intellectual property and ensure fair compensation.

This shift has led to what some experts, such as Shayne Longpre (MIT) and Yacine Jernite (Hugging Face), describe as an emerging consent crisis. The lack of clear, equitable agreements on data use has generated conflict between developers and content creators, diminishing the pool of accessible data and complicating the ethical landscape of AI development.

The Key to AI: Quality and Diversity of Sources
An AI model’s accuracy, versatility, and relevance are directly tied to the quality and diversity of its training data. Broad and up-to-date datasets from trustworthy sources —academic journals, verified news, official statistics, structured databases— enhance both performance and ethical reliability.

Conversely, training on limited or biased data may perpetuate misinformation, errors, and cultural insensitivity. Diverse sources enable models to be more culturally aware, reduce bias, and adapt better across contexts. But the scarcity of such data today is not just a technical challenge —it represents a rupture in the balance of the informational ecosystem.

AI as an Ecosystem Engineer
AI doesn’t just consume information; it actively reshapes the conditions under which information is created and shared. Like ecosystem engineers in nature, AI alters what knowledge is generated, how it circulates, and what gets prioritized. The unregulated extraction of massive datasets —text, image, audio— mirrors ecological overexploitation and leads to systemic strain.

This calls for a renewed analytical lens: we must ask not only what content AI produces, but also how the structures behind that content are formed and maintained. The extractive logic of data mining must give way to an ecological approach —one that sees data as part of a regenerative cycle of use, consent, and reciprocity.

Ecological Lessons: Over-Extraction and Feedback
A positive alternative lies in community-based data stewardship. Imagine a university, a local media collective, and a group of developers collaborating to train an AI model using curated texts and interviews. The data is labeled with context and consent; contributors receive credit and access to results. The model, in turn, supports the community through summaries, translations, or analytic tools.

In ecosystems, balance is maintained through feedback loops. In AI, this could mean governance structures that give voice and value back to the authors of the content. The transition from data mining to data stewardship is essential —prioritizing transparency, traceability, and redistribution of benefits.

Conclusion: Toward a Regenerative Data Ecosystem
The dwindling availability of high-quality data reflects a broader informational crisis. The overexploitation of open sources has triggered a partial collapse of the knowledge commons, where access to reliable content is increasingly restricted.

This mirrors warnings from thinkers like McLuhan and Postman: technologies don’t just change what we know —they reshape what counts as knowledge, truth, and participation. AI reconfigures the cognitive environment. It transforms, filters, and selects what we see and what we don’t. Therefore, a deeper understanding is needed —one that goes beyond content and interrogates the structures producing it.

It is time to move from an extractive logic to one that is ethical, regenerative, and collaborative. Data stewardship offers a new paradigm: one grounded in consent, contextualization, and shared benefit. This is not just a technical fix —it’s a cultural and ecological necessity for the future of AI and the societies it increasingly influences.

Continue Reading

Disinformation and political propaganda: An exploration of the risks of artificial intelligence

eme.23.2.105

ABSTRACT: A significant shift is currently underway in the disinformation industry. We are transitioning from the era of disinformation fuelled by fake news and social media to disinformation on a larger scale generated through artificial intelligence (AI). Therefore, the objective of this text is to analyse this disinformation phenomenon, catalysed by social media and AI, from the media ecology perspective. This work is divided into two parts. In the first part of the text, we analyse the disinformation phenomenon, highlighting the involvement of certain governments. In the second part of the text, we focus on recognizing the effects that can arise from the use of AI within the extensive landscape of the disinformation industry.

Continue Reading

Disinformation and Political Propaganda in the Era of Artificial Intelligence

A significant shift is currently underway in the disinformation industry. We are transitioning from the era of disinformation fueled by fake news and social media to disinformation on a larger scale generated through artificial intelligence (AI). Therefore, the objective of this text is to analyze this disinformation phenomenon, catalyzed by social media and AI, from the media ecology perspective. This work is divided into two parts. In the first part of the text, we analyze the disinformation phenomenon, highlighting the involvement of certain governments. In the second part of the text, we focus on recognizing the effects that can arise from the use of AI within the extensive landscape of the disinformation industry.

Disinformation and Political Propaganda in the Era of Artificial Intelligence.

This presentation stems from a paper written by my colleagues and friends Octavio Islas, Amaia Arribas, and myself. The text will be published in the special issue on Artificial Intelligence of our journal Explorations in Media Ecology (EME).

You will read the complete article soon.

Continue Reading

Why Media Still Matter?

This is a great opportunity to hear Dr. Lance Strate and Dr. Thom Gencarelli discuss the relevance of media. These two esteemed academics delve into the impact of technologies such as oral and written word, press, radio, television, internet, and even artificial intelligence in this video. Why is Media Ecology also important? What relationship does it have with General Semantics? If you’re interested in these topics, I invite you to watch the video.

Continue Reading

Exploring the Limits of Artificial Intelligence in the Artistic Domain

Some months ago, nobody was talking about artificial intelligence, but now things are different. Many magazines have designated 2023 as the year of artificial intelligence. Anyway, the challenges we are witnessing today with the introduction of artificial intelligence are not very different from what we saw several years ago with the advent of the Internet and computers, and other devices. Artificial intelligence is being used to perform various tasks that can undoubtedly bring great benefits to different fields. However, this intriguing technology still has limitations when it comes to its capacity. Artificial intelligence still lacks the competence that some human beings possess to understand and reflect the meaning and cultural context of their creations or productions. Humans can think abstractly and find innovative solutions that go beyond the limits of available data.

The Limits of Artificial Intelligence

This was part of my presentation at the 24th Annual Convention of the Media Ecology Association which took place at Fordham University, in New York City.

An AI language model can assist in the artistic field in several ways. Here are some things an AI model can do (OpenAI, 2023):

  1. Generate poetry and song lyrics: AI can compose poems in different styles and themes, as well as assist in writing song lyrics.
  2. Aid in creative writing: AI can provide ideas, suggestions, and help develop plots, characters, and dialogues for plays, novels, short stories, scripts, and more.
  3. Create visual descriptions: AI can help paint detailed and vivid mental images by describing landscapes, scenes, or any other visual elements.
  4. Design artistic or band names: If you need a creative stage name for yourself or your band, AI can assist in generating original and appealing ideas.
  5. Provide artistic information and knowledge: AI can answer questions related to artists, artistic movements, styles, famous artworks, music, and more.
Continue Reading