Evolution of Blockchain Data Indexing: From Node to AI Full Chain Service

The Evolution of Blockchain Data Indexing Technology: From Raw Nodes to AI-Powered Full Chain Data Services

1. Introduction

Since the first batch of dApps emerged in 2017, the blockchain application ecosystem has become increasingly rich. When discussing decentralized applications, have we ever thought about the sources of the data used by these dApps?

In 2024, AI and Web3 have become hot topics. In the field of artificial intelligence, data is like the source of life and is crucial for the learning and thinking of AI systems. Without data support, even the most sophisticated AI algorithms cannot demonstrate intelligence.

This article will delve into the development of blockchain data accessibility, analyze the evolution of data indexing, and compare the features of data service protocols such as The Graph, Chainbase, and Space and Time, with a particular focus on the innovations of the latter two in integrating AI technology.

2. The Complexity and Simplicity of Data Indexing: From Blockchain Nodes to Full Chain Database

2.1 Data Source: Blockchain Node

Blockchain is regarded as a decentralized ledger, with nodes as its infrastructure, responsible for recording, storing, and disseminating all on-chain transaction data. However, ordinary users face technical and cost challenges in building and maintaining their own nodes. Although theoretically anyone can run a node, in practice users often rely on third-party services.

To solve this problem, RPC node providers have emerged. They manage nodes and provide data through RPC endpoints, allowing users to access blockchain data without having to build their own nodes. Public RPC endpoints are free but have rate limits, while private RPC endpoints perform better but still have room for improvement in efficiency. The standardized API interfaces of node providers lower the barrier to data access, laying the foundation for subsequent data parsing and applications.

Reading, indexing to analysis, brief introduction to Web3 data indexing track

2.2 Data Parsing: From Prototype Data to Usable Data

The raw data provided by blockchain nodes is usually encrypted and encoded, which increases the difficulty of parsing. For ordinary users and developers, directly handling this data requires a substantial amount of technical knowledge and computing resources.

The data parsing process is crucial as it transforms complex raw data into a format that is easy to understand and manipulate, allowing users to utilize this data more intuitively. The quality of the parsing directly affects the efficiency and effectiveness of data applications, making it a key link in the entire indexing process.

The Evolution of Data Indexers 2.3

As the amount of Blockchain data increases, the demand for indexers is growing. Indexers organize on-chain data and send it to databases for convenient querying. They index Blockchain data and make it readily available through SQL-like query languages such as GraphQL API (, greatly simplifying the data retrieval process.

Different types of indexers optimize data retrieval methods:

  1. Full Node Indexer: Directly extracts data from full Blockchain nodes, ensuring integrity and accuracy, but requires substantial storage and processing power.
  2. Lightweight Indexer: Relies on full nodes to retrieve specific data on demand, reducing storage requirements but may increase query time.
  3. Specialized Indexer: Optimized for specific data types or Blockchain, such as NFT data or DeFi transactions.
  4. Aggregated Indexer: Extracts data from multiple blockchains and sources, including off-chain information, providing a unified query interface, suitable for multi-chain dApps.

Currently, Ethereum archive nodes occupy 3-13.5 TB of storage space across different clients, and this increases as the Blockchain grows. In the face of large data volumes, mainstream indexing protocols support multi-chain indexing and customize data parsing frameworks for different application needs, such as The Graph's "subgraph" framework.

The indexer significantly improves data indexing and query efficiency. Compared to traditional RPC endpoints, the indexer supports efficient indexing of large amounts of data and high-speed queries. They allow for complex queries, data filtering, and analysis. Some indexers also support the aggregation of data sources from multiple blockchains, avoiding the issue of deploying multiple APIs for multi-chain dApps. By operating in a distributed manner, indexers provide stronger security and performance, reducing the interruption risks that centralized RPC providers may pose.

![Reading, indexing to analysis, a brief overview of the Web3 data indexing track])https://img-cdn.gateio.im/webp-social/moments-587ce87f6dbedee4acec7d939fed6980.webp(

) 2.4 Full-Chain Database: Aligning to Flow Priority

Using index nodes to query data typically relies on APIs as the sole data portal. However, as projects expand, there is often a need for more flexible data sources, and standardized APIs struggle to meet this demand. With the increasing complexity of application requirements, primary data indexers and their standardized index formats find it difficult to satisfy diverse query needs, such as search, cross-chain access, or off-chain data mapping.

In modern data pipeline architecture, the "stream-first" approach has become a solution to the limitations of traditional batch processing, enabling real-time data ingestion, processing, and analysis. Blockchain data service providers are also moving towards building data streams, such as The Graph's Substreams, Goldsky's Mirror, and Chainbase and SubSquid's real-time data lakes.

These services are designed to address the need for real-time transaction parsing and comprehensive query capabilities. They support application development and assist in on-chain data analysis through more advanced and mature data sources.

Redefining on-chain data challenges from the perspective of modern data pipelines allows us to view the potential of data management, storage, and provision from a fresh angle. By considering subgraphs and Ethereum ETL and other indexers as data flows rather than final outputs, we can envision the possibility of customizing high-performance datasets for any business use case.

3. AI + Database? In-depth comparison of The Graph, Chainbase, Space and Time

3.1 The Graph

The Graph network provides multi-chain data indexing and query services through a decentralized network of nodes, making it easier for developers to index Blockchain data and build decentralized applications. Its main product models include a data query execution market and a data indexing caching market, serving the product query needs of users.

Subgraphs are the fundamental data structure of The Graph network, defining how to extract and transform data from the Blockchain into a queryable format. Anyone can create a subgraph, and multiple applications can reuse it, enhancing data reusability and utilization efficiency.

The Graph network consists of four key roles: indexers, curators, delegators, and developers, working together to provide data support for web3 applications.

The Graph has shifted to a fully decentralized subgraph hosting service, with economic incentives among different participants to ensure the system operates.

The AutoAgora, Allocation Optimizer, and AgentC tools developed by Semiotic Labs enhance ecosystem performance in various ways, such as dynamic pricing, optimal resource allocation, and natural language queries. The application of these tools has further improved the intelligence and user-friendliness of The Graph by integrating AI.

![Reading, indexing to analysis, a brief overview of the Web3 data indexing track]###https://img-cdn.gateio.im/webp-social/moments-cf9a002b9b094fbbe3be7f611001b5c1.webp(

) 3.2 Chainbase

Chainbase is a full-chain data network that integrates all blockchain data into one platform, making it easier for developers to build and maintain applications. Its features include:

  • Real-time Data Lake: Provides a dedicated real-time data lake for Blockchain data streams, supporting instant data access.
  • Dual-chain architecture: The execution layer is built on Eigenlayer AVS, forming a parallel dual-chain architecture with the CometBFT consensus algorithm, enhancing cross-chain data programmability and composability.
  • Innovative data format standard: Introduce "manuscripts" data format standard to optimize the structuring and utilization of data in the encryption industry.
  • Crypto World Model: Combining AI model technology to create an AI model that can understand, predict Blockchain transactions and interact with them, such as the basic version model Theia.

Chainbase's AI model Theia is based on NVIDIA's DORA model, combining on-chain and off-chain data and spatiotemporal activities to analyze cryptographic patterns and respond through causal reasoning, deeply mining the potential value of on-chain data.

AI empowerment makes Chainbase a more competitive intelligent data service provider, able to provide broader data insights and optimize the data processing process.

![Reading, Indexing to Analysis, Brief Overview of Web3 Data Indexing Track]###https://img-cdn.gateio.im/webp-social/moments-b343cab5112c1a3d52f4e72122ae0df2.webp(

) 3.3 Space and Time

Space and Time ###SxT( aims to create a verifiable computing layer that extends zero-knowledge proofs on a decentralized data repository, providing trusted data processing for smart contracts, large language models, and enterprises.

SxT introduces Proof of SQL technology, which is an innovative zero-knowledge proof technique that ensures SQL queries executed on decentralized data warehouses are tamper-proof and verifiable. Proof of SQL generates cryptographic proofs that verify the integrity and accuracy of query results, allowing any verifier to independently confirm that the data has not been tampered with.

SxT collaborates with Microsoft AI Innovation Lab to develop generative AI tools, enabling users to process blockchain data through natural language. Space and Time Studio allows users to input natural language queries, and the AI automatically converts them into SQL and executes the queries, presenting the final results.

![Reading, indexing to analysis, brief introduction to the Web3 data indexing track])https://img-cdn.gateio.im/webp-social/moments-97443cbd177ac4ffd1665da670ffbf12.webp(

Conclusion and Outlook

Blockchain data indexing technology has evolved from the initial node data sources, through data parsing and indexers, to AI-enabled full-chain data services, undergoing a gradual improvement process. These technologies continuously evolve, enhancing data access efficiency and accuracy, providing users with an intelligent experience.

In the future, with the development of new technologies such as AI and zero-knowledge proofs, blockchain data services will become further intelligent and secure. As an infrastructure, blockchain data services will continue to play an important role, providing support for industry advancement and innovation.

![Read, index to analyze, a brief overview of the Web3 data indexing track])https://img-cdn.gateio.im/webp-social/moments-0742180b7da8a9dcddafc465a4dba9cb.webp(

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 4
  • Share
Comment
0/400
SundayDegenvip
· 22h ago
A good article is valuable insights.
View OriginalReply0
DegenGamblervip
· 22h ago
Chain data is worth buying.
View OriginalReply0
LootboxPhobiavip
· 22h ago
On-chain data is truly wonderful.
View OriginalReply0
ruggedNotShruggedvip
· 22h ago
On-chain data drives the future
View OriginalReply0
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)