Blockchain technologies have recently gained popularity, with blockchains becoming an integral part of operations in many business sectors, along with ushering in the age of cryptocurrency, NFTs, and DeFi.
Yet, blockchains have a long way to go before they are widely used in business processes. The majority of the resistance to blockchain adoption can be attributed to a lack of understanding of the technology itself.
Check out our blog for a good place to start learning about blockchain and related topics.
One common question that many users who are interested in working with blockchain technology have is about data storage.
Because blockchains are digital records, it’s natural to wonder how much data you can and should store on them. In this article, we’ll go over how blockchain data storage works.
How Does a Blockchain Store Information?
Before we dive into the specifics of blockchain data storage, it’s important to understand what a blockchain is and how it stores information.
A blockchain is a distributed ledger of online transactions. Information about each transaction is recorded digitally in a database distributed among several computers in a network. Data can only be added to or read from a blockchain.
The blockchain is decentralized and does not require a third party to verify the authenticity of each transaction. Some advantages favor the decentralized nature of blockchains. These are:
- No need for a central authority to manage the database
- Multiple copies of the database across several computers
- Validation of the datasets in each network node that guarantees that the information is not tampered with
- All records are immutable, meaning they cannot be reversed or deleted.
These features allow blockchains to be managed independently while ensuring the information within the database is both valid and secure. Some disadvantages of blockchains include:
- Slower access to data in situations where synchronization and verification algorithms are required.
- Data inconsistency on singular nodes is common.
- They rely on large networks needed to solve cryptographic problems to access the information. When combined, these computer networks consume significantly more electric power to maintain.
Because of the nature of blockchain databases, they are a great way to store information on assets and related transactions.
Blocks – the Key Units of a Blockchain
Each transaction in a blockchain is stored in groups of records called blocks. A block of information is filled with transaction records. Once a block is complete, it is linked to the previous one, forming a chain.
That’s where the name ‘blockchain’ starts to make sense. Every block is encrypted by hash codes, making the data secure from simple brute force attacks (there are more digital safeguards in place against malicious attackers, but this topic is out of the scope of this article).
The result is a chain of blocks permanently linked together by time stamps. The chain forms a secure timeline of records that is useful to check transaction receipts by anf1,yone who accesses the blockchain.
For more information on how blocks work, check out this article.
Where is the Blockchain Stored?
Now that you understand what blockchains are, you’re probably thinking about where the information is stored and how large they can become.
The blockchain is stored on computers in a network called nodes. The data itself is stored on a user’s hard drive. It can also be stored on a virtual server on a cloud computing network. Using cloud storage adds a layer of security for the blockchain, with the added benefit of remote access.
Blockchain nodes are of 2 basic types.
Type 1: Full Nodes
These nodes contain transaction information in the entire blockchain network. Full nodes are essential for searching records in the entire blockchain. Think of these as the main servers in the network. Full nodes are also necessary for approving updates to the existing network.
In terms of storage, a full node contains all the blocks in the blockchain. Therefore, they take a lot of storage space, especially if the blockchain is old and stores several transaction records.
Full nodes in modern blockchain networks have another variant called pruned nodes. Pruned nodes save disk space by storing only the most recent blocks. Data needed to recreate previous blocks is still stored in other locations. We have covered more about pruned nodes below.
Type 2: Light Nodes
Light nodes only store the blocks that contain recent transaction information. If a user needs to validate older transactions, a light node has to request information from full nodes.
Light nodes have a couple of benefits. The first benefit is speed. Information can be quickly retrieved and validated from the nearest location.
The second benefit, as you can guess, is the reduced storage space.
Because light nodes do not store complete blocks in the blockchain, the storage requirements are reduced. However, light nodes maintain a record of the hashtags of each block in the blockchain to verify the sequence of previous blocks.
Understanding Blockchain Data Storage
Blockchains are computer files at their core. An important distinction about blockchain data is that blocks store transaction records and not packets of data that you expect to see in typical digital assets (like documents or images). They are chunks of transaction metadata linked to each other by hash codes.
When we think about blockchain storage, we look at these things:
- Blockchain size
- Block size
- Block limits on pruned nodes
- Transaction size
Let’s go through each of these in detail.
The blockchain is a single file that contains the record of all the blocks linked together over a timeline. Blockchain size increases when more blocks are added to it through additional transactions. Mature blockchain networks can take several Gigabytes of storage.
For example, the entire bitcoin blockchain is a whopping 389 gigabytes as of April 2022. That’s an increase of over 60 GB since last year.
Ethereum, on the other hand, requires a client to download over 658 GB of data to synchronize with the blockchain database.
The block size refers to the storage of a single block within the blockchain. Small blocks can be downloaded quickly but contain a small amount of transaction data.
Conversely, a larger block might take more time to download, but it records more transactions, balancing out the throughput of a blockchain.
As more transactions are completed, blocks can quickly accumulate in a blockchain and increase its size. To optimize network performance, blockchains set a time (called block time) before, on average, a new block can be added. A small block time enables more transactions to be completed and reduces the overall transaction time.
Bitcoin has a small block size of 1 MB. But it takes about 10 minutes to add a new block. So while the database is lighter on storage, additional transactions can take time to be verified.
Litecoin (a fork of Bitcoin) has the same block size but a block time of 2.5 minutes. So transactions in Litecoin are four times faster.
Bitcoin Cash (another fork of Bitcoin) has a block limit of 32 MB but the same block time as Bitcoin. So it can complete a larger volume of transactions within the same timeframe.
Ethereum differs markedly from the above examples. In Ethereum, the block size is determined by the gas limit. The gas limit is the maximum amount of gas (or energy units) a user needs to complete the transaction.
A higher gas limit means more work is required to complete a transaction. In other words, gas represents the cost of using the Ethereum network’s computers. This cost varies with the transactional workload on the Ethereum network.
For example, it takes about 640,000 gas to store 1 kilobyte of data in a block (at a standard gas price of 20,000). Ethereum has a current block limit of 15 million gas, expandable to 30 million based on demand. That means a block size can be up to 46 Kb.
Due to the varying nature of the costs factored into the Ethereum blockchain, block size varies significantly. Therefore, comparisons of Ethereum block sizes to block sizes in other blockchains can be misleading.
Block Limits on Pruned Nodes
As we mentioned before, pruned nodes store only the most recent blocks. Pruning significantly reduces disk space.
For instance, a pruned node on a bitcoin database only stores the complete information for the last 288 blocks. That’s just 288 MB dedicated to block data. The size is slightly larger since historical data is still stored in the database.
Similarly, a pruning solution for Ethereum exists, which retains up to 1,024 blocks in a node.
Transaction size refers to the total bytes needed to be transferred between a client and the node to complete a blockchain transaction. Transaction size is dependent on different data input and output streams between the computers involved in the transaction.
Should I Worry about Storage When Hosting a Blockchain?
The short answer is no. Internet speeds and disk storage capacity have both steadily increased over the years, while their costs have decreased.
You can easily purchase terabytes (TB) of storage drives from the market to meet your blockchain storage needs for several years.
With the nearly exponential growth in technology, developers and hardware manufacturers will most likely have devised numerous solutions to address blockchain data storage issues by the time you need to upgrade.