DataFi: A New Blue Ocean of AI Data Economy in the Web3 Field

robot
Abstract generation in progress

Data is Asset: DataFi is Opening a New Blue Ocean

The biggest topic in the AI circle this month is undoubtedly Meta's large-scale recruitment of talent, forming a luxurious AI team mainly composed of Chinese researchers. The team is led by Alexander Wang, who is only 28 years old and founded Scale AI. Scale AI is currently valued at $29 billion and provides data services for several AI giants, including the U.S. military, OpenAI, Anthropic, and Meta, with its core business being the provision of a large amount of accurate labeled data.

Scale AI stands out among numerous unicorns because it recognized early on the critical role of data in the AI industry. Computing power, models, and data are the three pillars of AI models. If we compare a large model to a person, then the model is the body, computing power is the food, and data is the knowledge and information.

In the rapid development of large language models, the industry's focus has shifted from models to computing power. Nowadays, most models use the transformer as the basic framework, occasionally incorporating innovations like MoE or MoRe; major companies either build their own supercomputing clusters or sign long-term agreements with cloud service providers to address computing power issues. On this basis, the importance of data is becoming increasingly prominent.

Scale AI focuses on building a solid data foundation for AI models, with its business not only involving the mining of existing data but also encompassing data generation services. The company has also formed an AI training team composed of experts from various fields to provide high-quality training data for AI models.

Model training is divided into two stages: pre-training and fine-tuning. Pre-training is similar to the process of a baby learning to speak, requiring a large amount of information such as text and code gathered from the internet. Fine-tuning is akin to school education, with clear goals and directions, cultivating the model's specific abilities through carefully designed datasets.

Therefore, the AI data track mainly includes two types of datasets: one type consists of large amounts of data that require little processing, usually sourced from UGC platforms such as Reddit, Twitter, and Github, public literature databases, or corporate private databases; the other type requires careful design and selection to ensure that it can cultivate specific capabilities in the model, necessitating data cleaning, filtering, labeling, and human feedback.

With the further enhancement of model capabilities, various more refined and specialized training data will become the key factor determining model performance. In the long run, AI data represents a long-term investment track with a snowball effect; as preliminary work accumulates, data assets will possess the ability to generate compound returns, with their value continuously increasing.

Data as an Asset: DataFi is Opening a New Blue Ocean

Web3 DataFi: The Chosen AI Data Oasis

Compared to the hundreds of thousands of remote manual labeling teams formed by certain companies in multiple countries, Web3 has a natural advantage in the AI data field, giving rise to the new concept of DataFi. Ideally, the advantages of Web3 DataFi include:

  1. Smart contracts ensure data sovereignty, security, and privacy.
  2. Natural geographical arbitrage advantage: a free distributed architecture that attracts the most suitable workforce.
  3. Clear Incentives and Settlement Advantages of Blockchain
  4. Conducive to building a more efficient and open "one-stop" data market.

For ordinary users, DataFi is also the easiest decentralized AI project to participate in. Users can get involved through simple operations, including providing data, evaluating models, using AI tools for simple creation, or participating in data trading.

Data as Asset: DataFi is Opening a New Blue Ocean

The Potential Projects of Web3 DataFi

Currently, multiple DataFi projects have obtained significant funding. Here are some representative projects:

  1. Sahara AI: Committed to building a decentralized AI super infrastructure and trading market.

  2. Yupp: AI model feedback platform that collects user feedback on model outputs.

  3. Vana: Converts user personal data into monetizable digital assets.

  4. Chainbase: Focused on on-chain data, covering over 200 blockchains.

  5. Sapien: Aims to transform human knowledge on a large scale into high-quality AI training data.

  6. Prisma X: Committed to becoming an open coordination layer for robots.

  7. Masa: One of the leading subnet projects in the Bittensor ecosystem.

  8. Irys: Focused on programmable data storage and computation.

  9. ORO: Empowering ordinary people to participate in AI contributions.

  10. Gata: Positioned as a decentralized data layer.

The barriers to entry for these projects are generally not high at the moment, but once they accumulate users and ecological stickiness, the platform advantages will quickly accumulate. Therefore, early-stage projects should focus on incentives and user experience. At the same time, these platforms also need to consider how to manage participants and ensure data quality to avoid the situation of "bad money driving out good."

In addition, increasing transparency is also a major challenge faced by current on-chain projects. Many projects still lack sufficient publicly available and traceable data, which is detrimental to the long-term healthy development of Web3 DataFi.

The path for the large-scale application of DataFi can be divided into two parts: first, attracting a sufficient number of individual users to participate, forming a strong force for data collection/generation and consumers of the AI economy; second, gaining the recognition of mainstream enterprises, as they are the main source of large data orders in the short term.

DataFi represents the long-term cultivation of machine intelligence by human intelligence, while ensuring the benefits of human labor through smart contracts, ultimately achieving the mutual benefit of machine intelligence for humanity. For those who feel uncertain about the AI era or still hold blockchain ideals, participating in DataFi may be a timely choice.

Data as an Asset: DataFi is Opening a New Blue Ocean

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 4
  • Share
Comment
0/400
AirdropNinjavip
· 08-01 05:45
Another great project to Be Played for Suckers, hehe
View OriginalReply0
LoneValidatorvip
· 08-01 05:42
Really rich, 28 years old and already 29 billion.
View OriginalReply0
quiet_lurkervip
· 08-01 05:40
Is the marked data worth 29 billion? This money is too easy to earn.
View OriginalReply0
NotSatoshivip
· 08-01 05:22
It would be nice if someone was half awake~
View OriginalReply0
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)