Ceramic: Middleware built for Web3.0 social applications
Author: Chloe, IOSG Associate
Storage Protocols in the "Seed Seeking" Era: BitTorrent
When it comes to BitTorrent, many people may feel unfamiliar. However, when mentioning downloading "seeds," many can recall the scenes from a few years ago when they were "seeking seeds" online to play games or watch movies; here, seeds refer to the downloading terminology of BitTorrent. A seed file is an index file that records data such as the storage location, size, download server address, and publisher's address of the downloaded file.
In simple terms, BitTorrent is a P2P downloading protocol that is much more efficient than the traditional method of downloading from a website server. To illustrate, think of it like students in a class gathering to copy homework, but only one student has done the homework, and all the others need to copy that one answer. Each student copies at a different speed, and once more students want to copy, the process becomes difficult. Therefore, the common practice is for some students to copy multiple-choice questions, some to copy fill-in-the-blank questions, and others to copy essay questions, and then everyone exchanges their work, significantly increasing efficiency.
The principle of BitTorrent is similar to "copying homework." Each user needing to download a file only downloads a part of it, and while downloading, our computer acts as a server, transmitting this part of the file to other users. In other words, while we are downloading, we are also uploading (the part that others are downloading from our computer), so while enjoying the downloads provided by others, we are also contributing. Therefore, the more users there are downloading this file, the more seeds there are, the faster the synchronization occurs, and the quicker the download speed will be.
(Image source: IOSG Ventures)
Pioneers of Decentralized Storage: IPFS
Despite significantly improving download efficiency, BitTorrent still has some issues, and to optimize these problems, we must mention IPFS. Many readers are likely more familiar with IPFS, as well-known decentralized storage projects like Arweave and Filecoin are built on IPFS. IPFS, translated as InterPlanetary File System in Chinese, is a network transmission protocol for distributed storage and sharing of files.
When downloading with BitTorrent, a seed file must be used, and users need to include the addresses of all downloaded content in this seed file to proceed with the download. One of the significant advantages of IPFS is that it uses a DAG data structure to store data, which benefits from: IPFS uses a content-based rather than address-based addressing method to store and locate files. This means that if we want to find a file, we do not need to know where it is, only what content it contains. IPFS generates a unique hash value for each file (for example, QmSNssW5a9S3KVRCYMemjsTByrNNrtXFnxNYLfmDr9Vaan), and when a user needs to retrieve this file, they only need to ask IPFS who has this hash (QmS…Vaan) to complete the retrieval. Because hash values prevent duplicate storage, files with the same content will not be redundantly stored by IPFS. This approach optimizes storage and improves network performance.
(Image source: researchgate.net)
Dynamic Storage Solutions: Ceramic
From the above description, attentive readers may notice a significant flaw in IPFS. In IPFS, once a file is stored, it cannot be modified within the system because changing the file content alters the hash value, making it impossible for users to find the modified file using the original hash. This is a pain point often criticized about IPFS: it is not good at storing files that need to be updated or upgraded at any time. Therefore, there is an urgent need for an efficient and decentralized solution for storing dynamic data.
Fortunately, exploration in this field has already begun. Friends who pay attention to Web 3.0, SocialFi, or DID must have heard of this project's name------Ceramic. Ceramic is a decentralized open-source platform for creating, hosting, and sharing data, and many DIDs and Social Graphs are built on Ceramic.
As mentioned earlier, IPFS performs well in storing static files, but it lacks computational and state management capabilities, making it unable to achieve more advanced database-like functions such as mutability, version control, access control, and programmable logic. The emergence of Ceramic has addressed these issues to some extent.
Efficient Version Control
In Ceramic, each piece of stored information is represented as an overlay log (a log file that records the program's operation in a computer), referred to as a Stream. Conceptually, a Stream is similar to Git. Git is an open-source distributed version control system that can efficiently and quickly handle version management for projects of all sizes and is currently the most popular version control software, used to store code, track revision history, merge code changes, and revert to earlier code versions, among other things.
When Git processes data, it acts like a "snapshot," which is somewhat similar to how we share documents on Google Docs and view historical versions. Each time you submit an update or save the data state, it creates a snapshot of all files at that moment and saves the index of that snapshot. If the file has not been modified, Git does not store the file again but only retains a link to the previously stored file, greatly improving efficiency.
(Image source: IOSG Ventures)
In fact, Git can also be used on IPFS to store dynamic data. However, developers need to create a hash-log file in Git to record the mapping table of each Git log and IPFS hash update, and they need to manually maintain dynamic synchronization or use the IPNS naming system to keep dynamic updates. This operation is time-consuming and labor-intensive, leading to low efficiency.
Ceramic adopts an "overlay log" approach, where the StreamID does not change based on content, making it very convenient to store modified versions or revert to previous versions without frequently changing hash values. Additionally, Ceramic builds a new layer on top of other storage protocols, meaning it has high composability.
Users can choose where to store their data, including decentralized options like Arweave and Filecoin, as well as centralized options like AWS, all of which can utilize Ceramic for automated version control. Moreover, since each Stream only stores logs rather than data, Ceramic does not require a global ledger to synchronize data states globally, resulting in very high horizontal scalability.
(Image source: IOSG Ventures)
Convenient Authentication and Access Control
In addition to the advantages of version control, Ceramic also provides very convenient authentication and access control. When new data is to be added to a Stream, the modifier must verify their identity; otherwise, they will not be able to modify the data. Different Streams can require different authentication mechanisms, and Ceramic offers a very powerful built-in authentication mechanism------DID.
For example, there are 3ID DID suitable for end users, key DID suitable for developers, NFT DID that supports authentication using NFTs, and Safe DID suitable for DAOs that require multiple authentication, ensuring data security. At the same time, Ceramic also endows Streams with programmable logic, such as if Stream A's state changes, then Stream B can be accessed and upgraded, etc.
The emergence of Ceramic greatly empowers the construction of Web 3.0. Many DID and Web 3.0 social platform projects are already being developed on Ceramic. Notable projects include the Social Graph Middleware platform CyberConnect, Web 3.0 Twitter's Orbis, and the instant messaging platform The Convo Space, among others. We look forward to the new possibilities that Ceramic's infrastructure can bring to the application layer.