EthStorage Founder: Data Availability and Decentralized Storage

2023-07-03 05:04:22

Introduction

This is the last installment of the Decentralized Rollup interview series. This episode explores rollup decentralization from the perspective of "Data Availability and Decentralized Storage". We invited Qi Zhou, the founder of EthStorage, to discuss how DA can reuse the security attributes of the Ethereum mainnet, EIP-4844 and danksharding, and the security comparison of different DA models. Teacher Zhou also introduced how EthStorage can be combined with EIP-4844 in the next Ethereum upgrade.

Guest introduction

I am very happy to share with you some of our thoughts on the entire Ethereum DA technology and the decentralized storage we have done on it. I joined the Web3 industry full-time in 2018. I used to work as an engineer in big companies such as Google and Facebook. And has a PhD degree from Georgia Institute of Technology. Since 2018, I have been following and working on Web3 infrastructure. The main reason is that I also did this in big factories before, including distributed systems and distributed storage. In addition, I also think that there is still a lot of room for improvement in this aspect of the entire blockchain. No matter what we did at the beginning, such as the technology called execution sharding. So this is Ethereum's sharding 1.0, and now the technology called data sharding of Ethereum's sharding 2.0, and the subsequent data availability. In fact, they are all some innovations and work that are proved around the entire Web3 infrastructure.

So we are also closely following the Ethereum roadmap, studying and researching, and participating and improving in this community way. At the end of last year, we were very honored to receive a support from the Ethereum Foundation for our research on "Data Availability Sampling". Help the Ethereum Foundation do some theoretical work, some research work on danksharding, including how to recover data effectively. At the same time, we are also developing EthStorage, an Ethereum data layer based on Ethereum's DA technology. We can use Ethereum's smart contracts to verify off-chain data storage at scale. This is also very meaningful for Ethereum. So I am very happy to share with you today, including how EthStorage can better build a network of data storage layers based on DA technology.

Interview section

Part 1: Discussion on DA Definition

How Data Availability (DA) Keeps Rollups Safe

First of all, in the process of researching DA, I also found that many people do not understand the definition of DA. I am also very happy to discuss it today. Before that, I also discussed DA with many members of the Ethereum Foundation, such as Dankrad Feist, and the important role that DA plays in the entire Ethereum L2.

I mentioned some basic working mechanisms of the Ethereum rollup, how to move the transactions on the chain to the off-chain, and then use a series of proof methods (fraud proof and validity proof) to tell the L1 smart contract that these execution results are acceptable. Prove it is true by means of these proofs.

Then a very important core is that they hope to reuse the security of the Ethereum network itself, but at the same time be able to greatly expand the entire computing power of Ethereum. Just now I said that the expansion of computing power is actually putting the calculation on the chain off the chain, so how can the security of Ethereum be realized at the same time.

For example, in the case of Optimistic Rollup, how to ensure that someone can challenge the sequencer to do malicious things? It is very important to know what the specific original transaction under the chain looks like. If the specific original transactions off the chain are not available, then I cannot find the original transaction records to challenge the sequencer on the chain. Therefore, DA can guarantee security because it needs to allow the metadata of each off-chain transaction to be available on the chain.

Expand block space

Because all our transaction data must be uploaded to the chain, even if no calculation is required, we will still generate huge transaction data. Then the core problem it has to solve, everyone can understand that it is a very effective technology to expand the block space. If you have a good understanding of the structure of the entire blockchain, each block contains a lot of transaction content. The block itself of this transaction, we call it the block space.

Currently, the space of each block in Ethereum is about 2,300 KB. But such a number is obviously unable to meet the needs of the next expansion of Ethereum. A very fast calculation can be done here: divide the space of 200 kB by the number of each transaction is about 100 bytes, and get the number of 2000 transactions. Divide 2000 transactions by the block time of Ethereum 12, which means that the upper limit of TPS of Ethereum is limited to about 100. Well, this is actually a very small number for the entire Ethereum expansion plan.

Therefore, what Ethereum L2 cares about is how to ensure security and how to put a large amount of block data into the block space. Then, whether it is a fraud proof or a validity proof, the data in the block space of Ethereum can be reused for corresponding checks. Finally, the security of the calculation results of off-chain transactions can be guaranteed by Ethereum. So this is basically the relationship between DA and the security of Ethereum.

Understand DA from the perspective of network bandwidth cost and storage cost

The main cost of DA is two aspects, one is called the cost of network bandwidth, and the other is the cost of storage.

In terms of network bandwidth cost, for example, in the P2P network, the current block broadcast method of Bitcoin and Ethereum is to send all P2P nodes through gossip (broadcasting) to tell everyone that I have a new block. Like this. The advantage of such a network approach is that it is very secure, and all network nodes will eventually receive a backup.

The downside is that it has a large overhead on network bandwidth and latency. We know that Ethereum produces a block in 12 seconds, after the POS upgrade. So if the block is too large and it may take more than 12 seconds, a large number of blocks cannot be generated, and finally the entire network bandwidth will drop to an unacceptable level. So you can think of DA as a solution to the bandwidth problem of a large amount of data on the blockchain.

Then the second is its storage cost. In fact, the Ethereum Foundation has a lot of discussions on this aspect. In the design of the core solution, it will not allow the block data uploaded by the entire DA to be saved all the time.

This leads to another question. When I have so much data on the chain, but after a week or two, it will be discarded by the Ethereum protocol. So in this process, do we have some better decentralized solutions to save these DA data.

This is also one of our original intentions when designing EthStorage. First, many Rollups need to save data for a longer period of time. On the second aspect, with these data, I can actually use DA to better complete some full-chain applications. For example, the NFT of the whole chain, or the front end of many DApps, even including a large number of articles or comments written by everyone in social networks. Then these can be uploaded to the entire blockchain through the DA network at a lower cost, and can obtain the same security as Ethereum L1.

This is after we researched the entire technology of Ethereum DA, including discussing with many core personnel of Ethereum, we found that in this regard, Ethereum needs to have a storage layer, and it is a decentralized one that does not need to be responsible for Ethereum itself. A storage layer that upgrades the protocol, or what we call a modular storage layer, to solve the problem of long-term data storage.

Part II: Discussion on different DA schemes

The relationship between EIP-4844 and Danksharding, and why EIP-4844 needs to be deployed

Proto-danksharding is also called EIP-4844, which I think can be regarded as the next very major upgrade of Ethereum. There is a very important reason why 4844 is done. When Ethereum Gene estimates the upgrade route of Ethereum sharding, that is, the time for Danksharding, they think that the entire upgrade time is quite long, for example, it may take three years to five years. It was 2021, 2020.

Then in the process, they predict that there will be a lot of Rollup running on Ethereum soon, but because of Danksharding, the data interface provided by it is completely different from the Calldata data interface currently used by Rollup. This will cause a large number of Ethereum applications to be unable to upgrade quickly due to the new interface, and can seamlessly obtain the benefits brought to them by Danksharding.

When I went to Devcon last year, Vitalik also mentioned that he hoped that Ethereum could better serve these Layer 2, so that they can develop their contracts while using the same Danksharding interface. When Danksharding is upgraded, they can directly inherit the new benefits provided by Danksharding without having to upgrade their existing and tested contracts.

So EIP-4844 is actually a super simplified version of Danksharding, which provides the same application interface as Danksharding, including a new opcode called Data Hash; and a new data object called Binary Large Objects, which is Blob.

These data objects are designed to make rollup compatible with the data structure provided by Danksharding in advance, that is to say, Danksharding will provide similar concepts such as the same Data Hash and Blob. But through EIP-4844, they implemented these ideas in the next upgrade of Ethereum in advance. Therefore, in the entire design function of EIP-4844, you can look at their interfaces and, for example, Pre-compile and newly added instructions, then you can already vaguely see the future of the entire Danksharding, how to apply it on Ethereum A process of layer interaction.

In this regard, Ethereum also thinks from the application point of view, how can some upgrades be made in advance to allow applications to better enjoy various expansion technologies on Ethereum, and there is no need for additional upgrade costs.

But there is a problem that EIP-4844 does not solve the problem of expanding the entire block space, and Danksharding can solve it. The current Ethereum block space is about 200 KB. After Danksharding, the planned size in the specification is 32 megabytes, nearly a 100-fold improvement. So the current EIP-4844 does not actually solve the bandwidth problem of the blockchain on the block.

How Danksharding solves the problem of block space expansion

Under the design of 4844, during the broadcasting process of the data on the chain, it still uses the same method as the previous calldata, and broadcasts through the P2P network. Then this broadcasting method will eventually be limited by the physical bottleneck of the entire P2P network bandwidth. The design method of Danksharding has changed the P2P network broadcasting, and then through the data sampling technology, so that everyone does not need to download all block data, but also knows that these block data can be downloaded.

In fact, in a sense, it is a bit like the ZK method. Through data sampling, I know that the network contains (32 megabytes/block) block data brought by Danksharding. But I don't need to download all 32 megabytes of data to save locally. This can also be done if there is enough machine bandwidth and sufficient storage space performance, but for an ordinary verifier, he does not need to download all 32 megabytes of data.

Some development and experience of EIP-4844 testnet

We have recently run our internal EIP-4844 test network, and deployed the corresponding contract to test, including blob data upload, contract call and data verification, we have all gone through. So once EIP-4844 is online, we can deploy our contracts as soon as possible.

At the same time, we also hope that through our current cooperation with some developers of Ethereum, as well as some of our developed contracts, we can provide time for the development of various rollups in Ethereum, as well as learning and various tools.

So we have recently submitted a lot of code to Ethereum, the tool set for EIP-4844, including new smart contracts to support opcode, because solidity still cannot support the opcode of data hash. So all the work, we are already synchronizing with some developers of the Ethereum Foundation.

Applications and limitations of the Data Availability Committee (DAC)

Because more than 90% of the expenses paid by L2 users are paid for the availability of data. In order to better reduce the cost of data uploading, many L2 projects, including ZKSync, launched ZKPorter, and Arbitrum Made Arbitrum Nova. They provide their own data layer by providing their own DAC Data Availability Committee.

This data committee will bring some additional trust to achieve the same additional level of security as Ethereum. Therefore, when they select the data committee, they usually choose some big-name data service providers, or big-name companies to participate in the preservation of this data. But in fact, there will be many challenges and doubts, because everyone thinks that this is actually a violation of the principle of no access to decentralization, which means that everyone can participate. But the current situation is that most of the data committees are a few organizations that are very close to the Layer2 project party.

Like Arbitrum Nova, the last time I looked at it, there were probably six or seven such nodes. For example, the data committee nodes running on Google's cloud or Amazon's cloud can save this data, and they can provide all the execution costs on Arbitrum Nova. An advantage of this is that his current execution cost is about one-thousandth of that of Ethereum. Because he does not need to write all the data to Layer 1 of Ethereum. But now it is still relatively centralized, so there will be more worries about relatively high-value applications, because if there is a large amount of funds, tens of millions or hundreds of millions of funds, then he must believe that the data of the data committee is usable.

So when we designed EthStorage, we didn't actually have any concept of data committee. During the design process, we hope that everyone can participate and become a data provider. And they use encrypted proofs to prove that they have indeed stored this data. Because of this model of the data committee in theory, although I said that I have seven and eight data committee nodes, in fact, I can save only one piece of physical data, but I can show that I have seven or eight addresses. can provide this data.

Then how to prove that my data has enough physical copies to ensure the security of the data. In fact, it is a very important innovation when we are doing EthStorage, and it is also what we emphasize when we go to the Ethereum Foundation ESP (Ecological Support Program) to preach. We use the ZK encryption technology used by EthStorage to protect the nodes provided by Layer2 data. They can join without permission and can prove that they have so many copies of storage, and can better ensure the security of data.

So I think DAC is indeed a very temporary solution to the cost of uploading data to Layer1. We believe that we can provide a better data storage solution through some encryption technologies of EthStorage, coupled with some proof verification methods on Layer 1 contracts based on Ethereum. Next, with the launch of Ethereum 4844, we will take the initiative to share these innovative content and the results of its running on the network with you.

Difference between EthStorage and DAC

EthStorage is actually an Ethereum storage rollup, Storage rollup. Then we can assume that a Layer 2 is not an implementation of Ethereum EVM, but a very large database, or a key value database. It can be up to 10 TB, hundreds of TB, or even thousands, which is such a database at the PB level.

Then how to ensure that the data in my database can get the same security as Ethereum. First of all, the first step is that we need to publish all these large-scale data in the database to Layer 1 of Ethereum through DA, so that everyone can see that these data are available in the entire DA layer of Ethereum. But we can't guarantee that it can be obtained permanently, because Ethereum DA will discard the data in about two weeks or four weeks.

The second step is after we upload the data, and then save it on our Layer 2 nodes. Unlike DAC, our data storage nodes are permissionless and anyone can participate. And it proves its storage, and then gets the corresponding reward. This method is through a set of proof-of-storage mechanisms we have established. Of course, this proof-of-storage mechanism is also inspired by some design schemes of proof-of-storage systems such as Filecoin and Arweave. However, we need a network and a proof system for Ethereum's DA framework and Ethereum smart contracts to do corresponding storage proofs. So in this regard, we believe that we have a very unique contribution to the entire ecology of Ethereum, and even the entire decentralized storage.

Mechanism of Proof of Storage

Basically, all proof-of-storage mechanisms, including Filecoin and Arweave, need to encode the user's metadata first. But this encoding process needs to be encoded according to the address of the data provider, that is to say, each data provider needs to have its own different address, and then encode according to its address and metadata to save a unique replica. (only copy) stuff. For example, the data of hello world may be stored in four or five different physical machines in a traditional centralized database or in a traditional distributed system, each of which is hello world. But in EthStorage, it saves four or five or ten or twenty, and its hello world will be encoded into different data according to the address of each data provider, and then stored in different places.

The advantage of this is that we can use cryptographic mechanisms to prove that there are so many different addresses, which are different storage providers. They encoded the data and made corresponding storage proofs based on the encoded data. Basically, Filecoin and Arweave are similar to this. But they are only for static data, we are now targeting the hot data of Ethereum DA. And it can be verified through the Ethereum smart contract that there are so many physical copies of this data. That is to say, for each encoded data, we will prove that these encoded data are stored in this network, and the data corresponding to each encoded data is different, because it is encoded by the addresses of different storage providers from.

So basically we optimize and improve some existing decentralized storage ideas during the design process. But at the same time, we also need to do a lot of optimization on Ethereum's DA solution, including the modification of dynamic data, how to effectively prove and optimize gas expenses on Ethereum contracts. So there are a lot of cutting-edge technologies and research that need to be done.

How EthStorage maintains permissionless Proof of Storage

There is a kind of node in Ethereum called archive node, which will save the historical records of all transactions in Ethereum, including the state of the world. But then a huge challenge in Danksharding is that the Danksharding plan will generate about 80TB of data a year. So assuming that Ethereum has been running for three to four years, it will generate 200 to 300TB of data, and it will continue to increase. Well, this will actually pose a lot of challenges to the archive node, because in the process of running the archive node, it does not have an additional token economy to motivate everyone to save this data.

EthStorage first needs to solve the problem of token incentives for permanent storage of data. In this regard, we actually adopted Arweave's discounted cash flow model to realize incentives. And at the same time it is very efficient to let it execute on the entire smart contract.

The second is its permissionless approach. Because our incentive design encourages 10, 50 or even 100 nodes to save data in the network. So for any node, it can contact any of them, synchronize the corresponding data, and then it can become a data storage party. There may also be some optimized designs for more data incentives.

Third, because the storage node needs to save all the data at one time, it may be hundreds of terabytes or even a PB level of data in the long run. So for a single node, the cost is very high. So we made a further thing called data sharding here. In this way, for ordinary nodes, it only needs to have a capacity space of 4 TB (our current design is 4 TB, of course, it may be upgraded to 8 TB in the future), and it can save part of the archived data in the network, but we Some incentive mechanisms are also used to ensure that after everyone finally puts all these data together, they can all be saved in our layer2 network.

So there are many problems here, such as the problem of too much data caused by archiving nodes; the incentive problem of tokens; and the problem of decentralized access... We can solve these problems through Ethereum The smart contract is deployed on layer1 to realize it automatically. So for us, we just provide a data network, so that everyone can download data and generate a storage certificate as long as they have enough data costs, submit it to the Ethereum network, and then achieve the corresponding return. Our entire contract has basically been designed, and we have started debugging on Ethereum's 4844 Devnet.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

0/400

No comments

Topic
#July PPI Beats Expectations
28k Popularity
#ETH ETFs Top $30B
29k Popularity
#Gate Alpha Peak Trading Competition
141k Popularity
#Bessent on BTC Reserves
4k Popularity
#Gate Releases August Reserves Report
18k Popularity

sitemap