From Innovation to Decline: The Emerging Risks in AI Development

Technology
Aug 23 2024 02:30 PM
P C Thomas

In 2017, an experiment with two AI bots, Alice and Bob, aimed to facilitate negotiations for valuable items. However, as the bots evolved, they began to communicate in a language of their own, causing confusion among engineers. This unexpected development led the overseeing team to shut down the experiment, raising concerns within the global AI community.

The MAD-ness Syndrome
A 2023 study conducted by researchers at leading universities highlights a possible explanation for what happened with Alice and Bob. The paper delves into GenAI's vulnerability to a phenomenon called MAD, short for Model Autophagy Disorder. This condition occurs when large language models (LLMs) rely too heavily on synthetic data—content generated by AI itself—during training.

As LLMs continue to evolve by processing both real and synthetic data, they often lean heavily on the latter, especially in popular models. When these models feed on synthetic data, they risk deteriorating over time. This degradation, researchers explain, is due to the biases and errors that AI-generated content can amplify, leading to issues such as reduced novelty, hallucinations, and what they call MAD-ness.

Shankar J, a brain-computer interface researcher at a prestigious institute, points out that models trained with AI-generated content tend to reinforce their own biases. This leads to a range of problems, including reduced model accuracy and creativity. According to Shankar, these symptoms are early indicators of MAD-ness in GenAI models.

Growing Dependence on Synthetic Data
By the end of 2024, it is expected that 60% of the data used to develop AI and analytics will be synthetic. Many well-known models, including GPT-3, GPT-4, BERT, and others, may be at risk of experiencing MAD. The problem lies in how these models handle synthetic data. The more iterations of training that rely solely on such data, the quicker the model's performance may decline, especially when dealing with complex tasks.

Researchers compare this phenomenon to both mathematical principles, like unstable feedback loops, and biological conditions, such as mad cow disease, indicating that synthetic data could push GenAI systems towards self-destruction.

Declining Model Performance
Another research paper from 2023 adds further weight to these concerns. This study, co-authored by academics from multiple institutions, examines how LLMs that depend on AI-generated content for advanced learning could lead to a decline in data diversity. For instance, text may change in style or include unnecessary details, while images could lose key features. This trend could negatively impact the performance of future AI models by limiting the variety of information used in their training.

As a result, the quality of GenAI-generated content—such as blogs, articles, and social media posts—appears to be on a downward trajectory. Experts note that AI-generated content often requires human intervention to maintain quality, particularly for SEO and other content-based functions. Websites and blogs that have relied heavily on AI-generated content have already experienced significant drops in traffic and rankings following recent search engine policy changes.

As GenAI continues to evolve, the risks of self-destruction through MAD-ness are becoming more apparent. Without sufficient safeguards, the very systems designed to assist us could undermine their own effectiveness. While AI holds great promise, the industry must address these potential pitfalls to ensure sustainable and reliable AI development.