Table of Contents
The World Wide Web is one of the most commonly used platforms for computer applications today. While it contains a large amount of knowledge on a wide variety of topics, most of that knowledge is captured in unstructured text documents. These documents are designed to be understood by humans and are difficult to use for computer systems. The inventor of the World Wide Web, Tim Berners-Lee, famously claimed that the information presented on the Web should be accessible as raw data presented in a well-defined structured format. The data would thus be more useful for other applications. He described the concept of the Semantic Web, where in addition to human-readable formats, the information would also be available in a structured form designed to be used by computers. In order to help reach this goal, the World Wide Web Consortium (W3C) defined a set of technologies that are recommended when developing the Semantic Web. For modelling human knowledge in a way that is useful for computers, ontology representations are commonly used. In computer science, ontologies are collections of entities, their properties, and connections between them. Building and updating ontologies present many challenges. One of the main challenges is storing an ontology in such a way that anyone could use it and contribute new knowledge. This could be achieved by setting up centralized servers. That, however, would require someone constantly maintaining and paying for the infrastructure. Once the maintenance stopped, the ontology would also stop being available and would be lost.
In recent years, we have seen a rise in the popularity of blockchain technologies. Public blockchains like Bitcoin (Nakamoto, 2009) work using a peer-to-peer network and a set of advanced cryptographic algorithms to store a list of transactions between its users. This way they can provide transparency, decentralization, and tamper-resistance, which are all features that would also be useful for managing public ontologies. There are different varieties of blockchain networks. On public blockchains, anyone can read and use the stored data. There are also private blockchains that limit who can see and change their data. Since a private blockchain allows us to ensure that only trusted users get access, there is less need for strict security protocols. Some blockchains use some concepts from public and some from private blockchains. Those chains are called hybrid blockchains. The idea of using blockchains for storing data was further expanded in 2015 when the Ethereum network (Buterin, 2014) enabled many other applications to take advantage of the technology using smart contracts. Smart contracts are programs that we can publish on the Ethereum blockchain that ensure the execution of agreements described in them. In our project, we use smart contracts to store the data about each change on the Ethereum blockchain.
Blockchains ensure decentralization, security, and transparency. While blockchains provide secure and decentralized storage, they are not designed to store large amounts of data. This limitation comes from the fact that every client has to download all of the data that is stored on the blockchain. In order to store larger amounts of data in a decentralized way, we propose the use of the IPFS network (Benet, 2014). IPFS (interplanetary file system) is a P2P system that distributes storage among multiple computers in a network. The network is designed for storing larger files. We can access the files stored in the network using their content identifiers. Each identifier is comprised of multiple parts: the encoding prefix, version, content type code, and the hash of the document. The identifiers are short enough to be stored on the blockchain using a smart contract. This way we can combine the two technologies to store larger files while keeping the security and transparency that the blockchain provides.
In our work, we combine the blockchain with the technologies of the Semantic Web to produce a system for managing ontologies. Our system uses the Ethereum smart contracts and the IPFS network to store an ontology in a completely decentralized way. This allows us to distribute a public ontology to its users without needing centralized servers. We can also enable users to submit their own changes to the ontology. All the changes to the ontology are tracked, so anyone can revert the ontology back to a previous state and continue developing it as their own branch. The implementation and evaluation of the proposed system is publicly available in the source code repository (Gašperlin, 2021). The main contributions of our work are as follows:
The rest of the paper is organized as follows: in Section 2 we present previous work related to the Semantic Web, blockchain technologies, and combining the two. In Section 3, we present our transaction manager for ontology databases that uses Ethereum and IPFS to track changes made to an ontology. In Section 4, we present data flow and describe common usage patterns. After that, in Section 5, we run a series of tests to determine the performance and cost of using our transaction manager. Finally, in Section 6, we comment on the results and propose some directions for future work.