Training

Musk agrees with the view that AI training data has been exhausted and states that synthetic data will be the future direction

ChainCatcher news, according to TechCrunch, Elon Musk stated during a live conversation with Stagwell Chairman Mark Penn that the training of current AI models has largely exhausted real-world data, "We have exhausted the cumulative sum of human knowledge, which happened last year." Musk's views align with those of former OpenAI Chief Scientist Ilya Sutskever, who suggested at the NeurIPS machine learning conference that the AI industry has reached a "data peak," and that the way models are developed may need to change in the future.Musk believes that synthetic data will be a way to supplement real data, and AI will achieve self-learning through generating and self-evaluating data. This trend has been adopted by tech giants including Microsoft, Meta, OpenAI, and Anthropic, with models like Microsoft's Phi-4 and Google's Gemma combining real and synthetic data for training. Gartner predicts that by 2024, about 60% of data in AI and analytics projects will be synthetically generated.The advantages of synthetic data include cost savings; for example, the AI startup Writer spent only about $700,000 to develop its nearly entirely synthetic data-based Palmyra X 004 model, whereas the development cost for a similarly scaled OpenAI model is about $4.6 million. However, synthetic data also carries risks, including a decrease in model creativity, increased output bias, and potential model collapse, especially when the training data itself is biased, which can also affect the generated results.
ChainCatcher Building the Web3 world with innovators