📢 Gate Square Exclusive: #WXTM Creative Contest# Is Now Live!
Celebrate CandyDrop Round 59 featuring MinoTari (WXTM) — compete for a 70,000 WXTM prize pool!
🎯 About MinoTari (WXTM)
Tari is a Rust-based blockchain protocol centered around digital assets.
It empowers creators to build new types of digital experiences and narratives.
With Tari, digitally scarce assets—like collectibles or in-game items—unlock new business opportunities for creators.
🎨 Event Period:
Aug 7, 2025, 09:00 – Aug 12, 2025, 16:00 (UTC)
📌 How to Participate:
Post original content on Gate Square related to WXTM or its
Evolution of Blockchain Data Indexing: From Node to AI Full Chain Service
The Evolution of Blockchain Data Indexing Technology: From Raw Nodes to AI-Powered Full Chain Data Services
1. Introduction
Since the first batch of dApps emerged in 2017, the blockchain application ecosystem has become increasingly rich. When discussing decentralized applications, have we ever thought about the sources of the data used by these dApps?
In 2024, AI and Web3 have become hot topics. In the field of artificial intelligence, data is like the source of life and is crucial for the learning and thinking of AI systems. Without data support, even the most sophisticated AI algorithms cannot demonstrate intelligence.
This article will delve into the development of blockchain data accessibility, analyze the evolution of data indexing, and compare the features of data service protocols such as The Graph, Chainbase, and Space and Time, with a particular focus on the innovations of the latter two in integrating AI technology.
2. The Complexity and Simplicity of Data Indexing: From Blockchain Nodes to Full Chain Database
2.1 Data Source: Blockchain Node
Blockchain is regarded as a decentralized ledger, with nodes as its infrastructure, responsible for recording, storing, and disseminating all on-chain transaction data. However, ordinary users face technical and cost challenges in building and maintaining their own nodes. Although theoretically anyone can run a node, in practice users often rely on third-party services.
To solve this problem, RPC node providers have emerged. They manage nodes and provide data through RPC endpoints, allowing users to access blockchain data without having to build their own nodes. Public RPC endpoints are free but have rate limits, while private RPC endpoints perform better but still have room for improvement in efficiency. The standardized API interfaces of node providers lower the barrier to data access, laying the foundation for subsequent data parsing and applications.
2.2 Data Parsing: From Prototype Data to Usable Data
The raw data provided by blockchain nodes is usually encrypted and encoded, which increases the difficulty of parsing. For ordinary users and developers, directly handling this data requires a substantial amount of technical knowledge and computing resources.
The data parsing process is crucial as it transforms complex raw data into a format that is easy to understand and manipulate, allowing users to utilize this data more intuitively. The quality of the parsing directly affects the efficiency and effectiveness of data applications, making it a key link in the entire indexing process.
The Evolution of Data Indexers 2.3
As the amount of Blockchain data increases, the demand for indexers is growing. Indexers organize on-chain data and send it to databases for convenient querying. They index Blockchain data and make it readily available through SQL-like query languages such as GraphQL API (, greatly simplifying the data retrieval process.
Different types of indexers optimize data retrieval methods:
Currently, Ethereum archive nodes occupy 3-13.5 TB of storage space across different clients, and this increases as the Blockchain grows. In the face of large data volumes, mainstream indexing protocols support multi-chain indexing and customize data parsing frameworks for different application needs, such as The Graph's "subgraph" framework.
The indexer significantly improves data indexing and query efficiency. Compared to traditional RPC endpoints, the indexer supports efficient indexing of large amounts of data and high-speed queries. They allow for complex queries, data filtering, and analysis. Some indexers also support the aggregation of data sources from multiple blockchains, avoiding the issue of deploying multiple APIs for multi-chain dApps. By operating in a distributed manner, indexers provide stronger security and performance, reducing the interruption risks that centralized RPC providers may pose.
![Reading, indexing to analysis, a brief overview of the Web3 data indexing track])https://img-cdn.gateio.im/webp-social/moments-587ce87f6dbedee4acec7d939fed6980.webp(
) 2.4 Full-Chain Database: Aligning to Flow Priority
Using index nodes to query data typically relies on APIs as the sole data portal. However, as projects expand, there is often a need for more flexible data sources, and standardized APIs struggle to meet this demand. With the increasing complexity of application requirements, primary data indexers and their standardized index formats find it difficult to satisfy diverse query needs, such as search, cross-chain access, or off-chain data mapping.
In modern data pipeline architecture, the "stream-first" approach has become a solution to the limitations of traditional batch processing, enabling real-time data ingestion, processing, and analysis. Blockchain data service providers are also moving towards building data streams, such as The Graph's Substreams, Goldsky's Mirror, and Chainbase and SubSquid's real-time data lakes.
These services are designed to address the need for real-time transaction parsing and comprehensive query capabilities. They support application development and assist in on-chain data analysis through more advanced and mature data sources.
Redefining on-chain data challenges from the perspective of modern data pipelines allows us to view the potential of data management, storage, and provision from a fresh angle. By considering subgraphs and Ethereum ETL and other indexers as data flows rather than final outputs, we can envision the possibility of customizing high-performance datasets for any business use case.
3. AI + Database? In-depth comparison of The Graph, Chainbase, Space and Time
3.1 The Graph
The Graph network provides multi-chain data indexing and query services through a decentralized network of nodes, making it easier for developers to index Blockchain data and build decentralized applications. Its main product models include a data query execution market and a data indexing caching market, serving the product query needs of users.
Subgraphs are the fundamental data structure of The Graph network, defining how to extract and transform data from the Blockchain into a queryable format. Anyone can create a subgraph, and multiple applications can reuse it, enhancing data reusability and utilization efficiency.
The Graph network consists of four key roles: indexers, curators, delegators, and developers, working together to provide data support for web3 applications.
The Graph has shifted to a fully decentralized subgraph hosting service, with economic incentives among different participants to ensure the system operates.
The AutoAgora, Allocation Optimizer, and AgentC tools developed by Semiotic Labs enhance ecosystem performance in various ways, such as dynamic pricing, optimal resource allocation, and natural language queries. The application of these tools has further improved the intelligence and user-friendliness of The Graph by integrating AI.
![Reading, indexing to analysis, a brief overview of the Web3 data indexing track]###https://img-cdn.gateio.im/webp-social/moments-cf9a002b9b094fbbe3be7f611001b5c1.webp(
) 3.2 Chainbase
Chainbase is a full-chain data network that integrates all blockchain data into one platform, making it easier for developers to build and maintain applications. Its features include:
Chainbase's AI model Theia is based on NVIDIA's DORA model, combining on-chain and off-chain data and spatiotemporal activities to analyze cryptographic patterns and respond through causal reasoning, deeply mining the potential value of on-chain data.
AI empowerment makes Chainbase a more competitive intelligent data service provider, able to provide broader data insights and optimize the data processing process.
![Reading, Indexing to Analysis, Brief Overview of Web3 Data Indexing Track]###https://img-cdn.gateio.im/webp-social/moments-b343cab5112c1a3d52f4e72122ae0df2.webp(
) 3.3 Space and Time
Space and Time ###SxT( aims to create a verifiable computing layer that extends zero-knowledge proofs on a decentralized data repository, providing trusted data processing for smart contracts, large language models, and enterprises.
SxT introduces Proof of SQL technology, which is an innovative zero-knowledge proof technique that ensures SQL queries executed on decentralized data warehouses are tamper-proof and verifiable. Proof of SQL generates cryptographic proofs that verify the integrity and accuracy of query results, allowing any verifier to independently confirm that the data has not been tampered with.
SxT collaborates with Microsoft AI Innovation Lab to develop generative AI tools, enabling users to process blockchain data through natural language. Space and Time Studio allows users to input natural language queries, and the AI automatically converts them into SQL and executes the queries, presenting the final results.
![Reading, indexing to analysis, brief introduction to the Web3 data indexing track])https://img-cdn.gateio.im/webp-social/moments-97443cbd177ac4ffd1665da670ffbf12.webp(
Conclusion and Outlook
Blockchain data indexing technology has evolved from the initial node data sources, through data parsing and indexers, to AI-enabled full-chain data services, undergoing a gradual improvement process. These technologies continuously evolve, enhancing data access efficiency and accuracy, providing users with an intelligent experience.
In the future, with the development of new technologies such as AI and zero-knowledge proofs, blockchain data services will become further intelligent and secure. As an infrastructure, blockchain data services will continue to play an important role, providing support for industry advancement and innovation.
![Read, index to analyze, a brief overview of the Web3 data indexing track])https://img-cdn.gateio.im/webp-social/moments-0742180b7da8a9dcddafc465a4dba9cb.webp(