With the advancement of computer science, the development of technologies such as big data , blockchain [2, 3], and the Internet of Things [4-6] has been promoted, and many convenient services  have also been brought to users. However, there are still many problems, such as user data privacy leakage [8-10], low algorithm efficiency , search efficiency , and other issues. Since traditional centralized institutions are not completely credible, users’ data may be leaked. Blockchain with decentralized characteristics can store data safely and protect users’ data privacy .
In 2008, Bitcoin was proposed by Satoshi Nakamoto in “Bitcoin: A Peer-to-Peer Electronic Cash System” , marking the emergence of blockchain technology. As the underlying technology of Bitcoin, blockchain has received extensive attention [15-18]. Blockchain is a distributed database technology that has the characteristics of decentralization [19-21], traceability, tamper-proof, collective maintenance, etc. . The emergence of this technology solves a series of problems such as high cost, low efficiency, and low trust brought by centralized institutions . However, the blockchain is a chain structure, which will cause the query efficiency to decrease as the number of blocks grows. Take the Bitcoin blockchain as an example. As of June 7, 2021, the block height has reached 674,000, which means that when querying historical data, hundreds of thousands of blocks may be traversed. Such a query method cannot meet the current query requirements.
Level-DB is the mainstream database in the blockchain system, which is based on the storage structure of the LSM tree. This leads to the lower reading performance of the blockchain . Besides, Level-DB only supports simple Key-Value queries, not relational queries [25, 26]. When querying transactions, users can only traverse in block order, which further reduces query efficiency . The blockchain system only supports related queries with transaction hashes as keywords and does not query with account hashes as keywords. The query method is single. In response to this problem, some current solutions are to transfer the data on the chain to the off-chain storage [28, 29] to improve query efficiency, but the off-chain storage violates the decentralized characteristics of the blockchain. Third-party databases are faced with trust issues, and they may also be attacked with a single point of failure, data loss, data tampering, and other issues. There are huge security holes in off-chain storage . Therefore, under the premise of ensuring security, improving the retrieval efficiency on the chain is a current research hotspot.
You et al. .designed a hybrid index mechanism that supports blockchain transaction traceability based on the Ethereum state tree. In this mechanism, a hash pointer is embedded in the account transaction, which points to the block where the previous transaction. Through the pointer, the Account Transaction Trace Chain (ATTC) can be quickly traced. The query method based on ATTC improves the query efficiency of account transactions, but for some active accounts with longer transaction chain length, a longer chain still needs to be traversed. Besides, users do not always want to find all the historical transactions of an account, and it is still difficult to find target transactions in massive account data. In this regard, we improve the query scheme based on ATTC and propose a SCATC index structure, which solves the shortcomings of the ATTC index structure in the query effectively. The main contributions of this paper are as follows:
1. We divide the transaction chain into subchains and connect different subchains with hash pointers to shorten the query path when querying early historical transactions. This solution is not a query mode that uses space for time. While reducing the time complexity, the space complexity does not increase significantly.
2. We design a constructing algorithm and query algorithm for the SCATC index structure. The simulation results show that the SCATC-based query is more efficient when querying the early transactions of accounts.
3. Multiple transactions of an account in the same block are merged into one, and at most one index is built within each block for the same account. This reduces the cost of index construction and storage overhead.
The paper is organized as follows. “Related works” section of this article introduces the related work of blockchain in the data query; “Preliminaries” section introduces some preliminary knowledge of blockchain; “SCATC index structure” section elaborates on the construction method and query algorithm of SCATC index structure; “Experiment and analysis” section is efficiency analysis and simulation experiment. The full text is summarized in “Conclusions” section.