Prism Tower on the Rainbow Bridge: Web3 and Middleware — Kafka
Source: Confluent Kai Wahner
Compiled by: Masterdai
1: Background and Outlook
As early as the InfoQ international conference in 2016, Kai Wahner delivered a talk titled "Blockchain - The Next Big Thing in Middleware." At that time, he proposed an innovative reference architecture based on blockchain data and middleware, which is now widely adopted by many technology integrators and vendors.
However, he did not join any cryptocurrency startup. Instead, he joined Confluent in 2017 because he believed that "handling dynamic data of any scale through transactional and analytical workloads is a more important paradigm shift." Kai considers his decision to join Confluent to be correct, as he wrote in his blog, "Nowadays, most enterprises use Kafka as a replacement middleware for MQ, ETL, and ESB tools, or use serverless Kafka to achieve cloud-native iPaaS."
The story does not end here; the emergence and rapid adoption of blockchain, cryptocurrency, and NFTs have increased the demand for open-source infrastructure software like middleware. For good reason (scalability, reliability, real-time), many cryptocurrency market websites, blockchain monitoring infrastructures, and custodial banks are built on Kafka. The key to these customers' success lies in integrating blockchain, cryptocurrency with enterprise service software, databases, and data lakes.
**If we compare *Chainlink* to a rainbow bridge connecting on-chain networks and the off-chain world, then middleware like Kafka and Flink serves as the prism tower that refracts the rainbow light.**
2: What is Middleware
Middleware is software that lies between the operating system and the applications running on it, supporting communication and data management between distributed applications. It is sometimes referred to as a pipeline because it connects two applications, allowing data and databases to be easily passed between the "pipes." Common types of middleware include database middleware, application server middleware, and message-oriented middleware, among others. In other words, it facilitates dialogue and communication between one application and another set of applications, much like a mediator and translator at an international party. Kafka is one type of message-oriented middleware.
Apache Kafka
Kafka is an open-source distributed event streaming platform, originally created and open-sourced by LinkedIn in 2011.
Kafka enables users to build an end-to-end event streaming solution through three key functionalities:
Publish (write) and subscribe (read) to event streams, including continuously importing or exporting data from other systems.
Persist and reliably store event streams.
Process events that occur instantaneously or retrospectively.
Event streaming: Capturing data from various data sources (databases, sensors, mobile devices, cloud services, applications) in the form of streaming events; providing persistent storage for subsequent retrieval; manipulating, processing, and responding to event streams in real-time and traceable ways; and delivering events to different targets as needed. Event streaming ensures the continuous flow and expression of data, thus making the right information appear at the right time in the right place.
Use Cases for Apache Kafka
Messaging: Kafka can replace traditional message brokers. Message brokers are often used to decouple data storage and processing, buffer unprocessed response messages, and more.
Website Activity Tracking: Rebuilding the pipeline that tracks user activity into a set of real-time publish-subscribe sources. Publishing user activities on the site (page views, searches, clicks, and other actions) to a central topic, with each activity type having its own topic. These sources can be used to subscribe to a range of use cases, including real-time processing, monitoring, and loading into Hadoop or offline data warehouse systems for processing and reporting.
Log Aggregation: Using Kafka as an alternative log aggregation solution. Log aggregation typically collects log files from servers and places them in a central location (file server or HDFS) for processing. Kafka abstracts away the details of files and extracts logs or event data into clearer data streams, resulting in lower latency data processing in multiple data sources and distributed databases.
3: The Relationship Between Kafka and Web3
The following diagram illustrates a classic Dapp technology implementation principle, but it gives many people a misconception. It seems that simply having a frontend to request data from the Ethereum network can achieve interaction between on-chain and off-chain behaviors. In actual development, it often becomes more complex and adds a layer of structure between the blockchain backend and the frontend.
In such a layer of structure, we can achieve:
Kafka performs large-scale real-time data computation in a sidechain or off-chain manner; integrating blockchain technology with other enterprise software components, including CRM, big data analytics, and any other custom business applications. In other words, Kafka serves as middleware for blockchain integration.
A more common use case in cryptocurrency enterprise architecture is using Kafka as a scalable real-time data hub between blockchain and enterprise applications. Here are some examples and a discussion of some technical use cases where Kafka can provide assistance.
3.1. Kafka as the Data Hub for the Metaverse
Communication between retailers and users must be real-time, whether you want to sell physical clothes, mobile phones, or negotiate with virtual merchants in an NFT marketplace. The following architecture supports any scale of virtual world architecture in real-time by orchestrating the information flow between various encrypted and non-encrypted applications.
3.2. Kafka's Role as a Data Hub in Cryptocurrency Transactions and Markets
Users execute Bitcoin transactions in mobile wallets. Real-time applications monitor off-chain data, correlate data, display it on dashboards, and send push notifications. Another completely independent department replays historical events from Kafka logs during batch processing to conduct compliance checks using dedicated analytical tools.
The Kafka ecosystem offers many features that can utilize data from the blockchain and cryptocurrency world alongside data from traditional IT.
4: Technical Use Cases Kafka Can Provide
Monitor the health of blockchain infrastructure, cryptocurrencies, and Dapps to avoid downtime, protect infrastructure, and make blockchain data accessible. ------Infrastructure Maintenance
Real-time data processing in DeFi, NFT, and other related trading markets through Kafka Stream or ksqlDB. ------On-chain Data Analysis
Kafka as an integration channel for oracles: For example, from Chainlink to Kafka and then to IT infrastructure.
Handle backpressure through throttling and random number management on the Kafka backbone (where the production speed of upstream producers exceeds the consumption speed of downstream consumers, causing downstream buffers to overflow) and stream transactions to the chain. ------Gamefi and Socialfi
Simultaneously handle multiple chains: Parallel monitoring and correlation of transactions on Ethereum, Solana, and BSC blockchains. ------Cross-chain Correlation
We can see that as the new generation of Gamefi, DeFi, Socialfi, and mobile Dapps grow, the past methods of merely ingesting on-chain data into databases and data lakes can no longer meet the demands of existing new applications. Fully leveraging the characteristics of message middleware in architectural design can satisfy most users' needs for real-time performance and reliability.
5: Real Cases of Kafka in the Cryptocurrency and DeFi World
TokenAnalyst: Visualization of the cryptocurrency market
EthVM: Blockchain explorer and analysis engine
Kaleido: REST API gateway for blockchain and smart contracts
Chainlink: Oracle network for connecting smart contracts from the blockchain to the real world
5.1. TokenAnalyst -- Visualization of the Cryptocurrency Market
TokenAnalyst is an on-chain data analysis tool for visualizing and providing enterprise-level API services. The project announced its cessation of operations in May 2020, with most of its team members joining Coinbase. However, its case is still worth referencing—utilizing the Kafka stack (Connect, Streams, ksqlDB, Schema Registry) to integrate blockchain data from Bitcoin and Ethereum with its analytical tools.
"Accessing on-chain data requires nodes, which is not as straightforward as people imagine (data discrepancies between different versions and regions). Additionally, another challenge is keeping them in sync with the network. To achieve zero downtime and ensure the highest standards of data quality and reliability, we decided to use Kafka and the Confluent platform."
TokenAnalyst developed an internal solution called Ethsync. Each node, along with Ethsync, pushes data to its corresponding Kafka topic. As shown in the diagram below, it requires running multiple nodes for redundancy. Blocks pushed to the topic are updated, and clients accept them as new valid blocks. However, due to the nature of blockchains, forks may occur (an alternative chain becomes longer and invalidates another chain). Therefore, previously valid blocks may become invalid. (Note: This issue is also encountered by many projects today.)
Block Confirmator Based on Kafka Streams: To prevent the use of invalid blocks in downstream aggregation calculations, TokenAnalyst developed a block confirmator component based on Kafka Streams in Scala. It temporarily retains blocks to address reorganization scenarios and only propagates them once a certain number of confirmation blocks (sub-blocks that mined the block) are reached.
The confirmator not only addresses the issue of validating the canonical chain but also outputs confirmation blocks with exactly-once semantics by discarding confirmed and already registered blocks. This component allows it to fully leverage fault-tolerant recovery mechanisms, suitable for rolling deployments in zero downtime environments.
5.2. EthVM -- Blockchain Explorer and Analysis Engine
The beauty of public decentralized blockchains like Bitcoin and Ethereum lies in their transparency, with tamper-proof logs enabling blockchain explorers to monitor and analyze all transactions.
EthVM is an open-source Ethereum blockchain data processing and analysis engine powered by Apache Kafka. This tool supports blockchain auditing and decision-making. EthVM verifies transactions and the execution of smart contracts, checks balances, and monitors gas prices. The infrastructure is built using Kafka Connect, Kafka Streams, and Schema Registry, and includes a client-side visual block resource explorer.
The opportunity for EthVM to use Kafka originated from an article by Boerge Svingen on the Confluent blog titled "Using Apache Kafka at The New York Times," which describes how the iconic New York Times transitioned from a jumble of APIs, services, producers, and consumers to a Kafka-driven log-based architecture.
A blockchain is essentially a continuously growing list of records, merged into blocks and linked cryptographically—what programmers would call a linked list. In Kafka's terms, an Ethereum client is a producer—responsible for creating new entries in the log.
5.3. Kaleido -- Kafka Native Gateway for Cryptography and Smart Contracts
Kaleido provides enterprise-grade blockchain APIs to deploy and manage smart contracts, send Ethereum transactions, and query blockchain data. It abstracts the complexities of blockchain transactions, Web3 client libraries, nonce management, RLP encoding, transaction signing, and smart contract management.
Kaleido offers REST APIs for on-chain logic and data. It is powered by a fully managed high-throughput Apache Kafka infrastructure.
In blockchain-based distributed transaction systems, detecting and reacting to events is an inevitable requirement. No participant or application can control state changes; on-chain smart contract logic cannot directly communicate with off-chain systems—otherwise, the determinism of the logic would be compromised.
Events (passed through transaction "logs") are a core part of the programming model and can be easily emitted using a rich type system.
These events can not only trigger logic but also serve as data streams flowing out of the chain.
For efficient data retrieval on web and mobile
Real-time analytics engines and data lakes
Serverless or traditional computing to trigger applications and business processes
Kaleido provides two built-in features to handle these events without directly using complex raw JSON/RPC interfaces, managing checkpoint restart/recovery, or RLP decoding and type mapping. (These are Chainlink and Event streaming.)
Event Streaming: Subscribing using a REST API integrated with the REST API gateway and passing events to any HTTP interface via simple Webhooks, or through the encrypted App 2 App Messaging layer supported by Apache Kafka.
5.4. Chainlink -- Oracle Network for Connecting Smart Contracts from the Blockchain to the Real World
Chainlink is the industry-standard oracle network for connecting smart contracts to the real world. "With Chainlink, developers can build hybrid smart contracts that combine on-chain code with a wide range of secure off-chain services powered by decentralized oracle networks. Managed by a global decentralized community of hundreds of thousands, Chainlink introduces a fairer contract model. Its network currently provides billions of dollars in value to smart contracts across decentralized finance (DeFi), insurance, and gaming ecosystems. The complete vision of the Chainlink network can be found in the Chainlink 2.0 white paper."
Chainlink has not discussed its architectural design in any public blog posts or conferences. We can only glimpse its design from the job stack requirements of its recruiters—Chainlink is transitioning from traditional time-series-based monitoring to an event-driven architecture.
6: Conclusion
Before these node service providers emerged, it was very difficult for ordinary people to access on-chain data information. Now, anyone with internet access can see relatively real-time block data through blockchain explorers or on-chain data analysis tools.
However, being able to see data and the large-scale use of data are entirely different concepts.
"The deities do not climb the world tree. They traverse the great worlds using the rainbow bridge. Only the gods can use the rainbow bridge. If frost giants or other giants attempt to climb to Asgard via the rainbow bridge, their feet will be burned."
Despite the assistance of node hosting, Web3 APIs, and other services, the cost of efficient real-time communication between on-chain and off-chain remains high. Like the rainbow bridge in Norse mythology, it still serves as an exclusive passage for certain deities. The reason lies in the numerous gaps between on-chain data and off-chain business. Kafka is merely one link in the messaging chain; there are still real-time data capture, cleansing, and storage processes in between. Moreover, middleware and databases that served traditional IT businesses may not fully apply to the existing Web3 IT architecture. As more enterprises and businesses enter the cryptocurrency space, a new open-source middleware and data analysis architecture based on blockchain networks may emerge in the future.
I hope that one day, everyone will have the ability to open the rainbow bridge. We will freely traverse the nine realms created by the world tree and communicate with all living beings.