Why We Need a Blockchain-Native Store
Last updated
Last updated
As the number of users and transaction volume increase, the amount of data managed by a single Validator and Full Node continues to grow. For example, as of Nov 2024, Ethereum has nearly 290 million addresses (Figure 2), and an Ethereum Full Node requires about 1TB (Figure 3) of data synchronization.
The continuous growth of state bloat presents the following issues:
1. Performance Degradation: Most L1 and L2 networks incur significant costs in retrieving the state of target accounts and contracts during contract execution, and the retrieval efficiency decreases as the data volume increases. Larger data volumes prevent typical validators from caching all the state data required for transaction execution. Particularly in scenarios involving billions of users, most transaction executions require disk access, significantly increasing database read times, leading to performance degradation and higher latency. This is one of the reasons why most L1 and L2 networks perform worse in public environments than in experimental settings.
2. Network Scalability Limitations: For example, synchronizing an Ethereum node often takes several days or even weeks, severely restricting network node scalability.
3. Resource Waste: Using databases like LevelDB and RocksDB to store merklized state results in significant write amplification and bandwidth consumption, particularly in MPT and IAVL trees.
Most current L1 and L2 solutions still rely on verifiable storage architectures, but their inefficient querying and merklization performance severely limit overall blockchain throughput. The combination of MPT (Merkle Patricia Tree) and LSM Store, for instance, faces three performance issues: Long I/O Paths, Hash-Based Addressing, and State Bloat.
As illustrated in Figure 5, the existing WorldState is stored in an LSM Store in a Merklized State data format, with its Root value anchored in the block header. The account and contract data required for each transaction execution are stored in the leaf nodes of the Merkle Tree. Indexing these requires loading all nodes along the path from the root to the leaf, resulting in excessively long data access paths, increased I/O latency, and reduced data reading efficiency.
In the Ethereum mainnet, the Merklized State data volume ranges between 500GB and 1TB (after archiving). Storing this data in databases like RocksDB or LevelDB results in about five levels of compaction. Due to the randomness of hashing and the impact of the compaction mechanism, a single Merkle node may require tens or even dozens of disk accesses. In actual testing, we found that loading a single account typically requires dozens to hundreds of disk accesses. Even more critically, this delays the execution of individual transactions, which is one reason why Parallel EVM can effectively utilize multi-core capabilities and improve throughput.
MPT and IAVL’s Merklized State Data typically use hash-based addressing. While this method efficiently manages multi-version and verifiable state data, it also results in high I/O bandwidth and read amplification, leading to increased system resource consumption (e.g., during compaction). This is one reason why the performance of many parallel EVMs differs significantly between small test data sets and real blockchain environments. Additionally, this model is not well-suited to pruning and other data management operations.
As the number of users and transaction volume accumulates, the space occupied by the world state also increases. Since the memory of Validators is often limited, cache hit rates continue to drop, leading to declining read/write efficiency, further reducing the node’s “parallelism.” In Pharos’ tests, this issue becomes particularly apparent when the number of users reaches the billion level, posing a significant obstacle to the mass adoption of Web3.
As state bloat worsens, the efficiency of starting new nodes continues to decline, further impacting network scale and decentralization.
To build a decentralized, verifiable, and secure blockchain network, we believes that the current blockchain model must continue to adhere to the principles of authentication and multi-version storage, for the following reasons:
For light clients and dApps, quick access to Simple Payment Verification (SPV) enables trusted ledger information to be displayed, preventing malicious nodes from tampering with data.
For other nodes within the blockchain network, authentication allows for rapid anchoring of the world state, protecting against malicious nodes spreading fake ledger information, thus enhancing network security.
It ensures that transaction records and account states cannot be tampered with, guaranteeing the authenticity and validity of every transaction, thereby increasing user trust in the network.
A version-based ledger design is more disk and database-friendly, effectively reducing bandwidth and disk I/O pressure while improving transaction concurrency and throughput.
With the authentication feature, multiple full nodes across the network can quickly synchronize with the same world state, improving network robustness and decentralization.
For networks that may experience temporary forks, such as Ethereum, multi-version storage allows for quick rollbacks to previous versions, ensuring reliability and reducing the impact of forks on the network.
Pharos believes that authentication and multi-version storage are critical for maintaining the reliability, decentralization, and fast consensus of blockchain networks, particularly in large-scale heterogeneous networks. Without these technologies, networks would face longer consensus delays, lower reliability, and reduced decentralization, making it difficult to meet the demands of decentralized applications at scale in Web3.
Moreover, as database, OS, compiler, and network technologies continue to advance, blockchain systems can significantly improve their scalability without sacrificing decentralization. With the aid of these technologies, blockchain networks can manage and process large-scale data more efficiently, optimizing resource utilization and achieving higher throughput and lower latency.
In conclusion, truly addressing the blockchain store problem is key to building a large-scale, usable Web3 infrastructure. To this end, Pharos has introduced and implemented the first Blockchain Native Store, as detailed in the Pharos Store: The Future of Blockchain Storage.