IOSG: Current Status and Outlook of On-Chain Data Analysis Platforms

IOSG Ventures
2022-07-14 11:20:54
Collection
This article will briefly describe the data architecture behind the on-chain data analysis platform, aiming to inform readers where the on-chain data analysis results come from and how they are derived.

Author: Yang, IOSG Ventures
"In the realm of 'numbers', there are hidden treasures; on-chain data conceals endless Alpha. When we follow smart money's lead, when we tirelessly search for trending NFTs in NFT Paradise, when we check StepN's daily new shoe minting data, do you ever wonder where this data comes from? Amidst numerous on-chain data analysis platforms and their complex functionalities, are you still searching for the platform that suits you best?

1. Background Introduction

With the burgeoning on-chain ecosystem, such as DeFi trading, lending, NFT minting, and trading, user behaviors are transparently recorded on-chain. The data corresponding to these on-chain behaviors reflects the flow of on-chain value, making the analysis of this data and the insights derived from it extremely valuable. On-chain data analysis platforms like Nansen, Token Terminal, Dune Analytics, Footprint Analytics, flipsidecrypto, glassnode, Skew, etc., have emerged to meet this growing demand, offering products with slightly different focuses for individual and institutional users.

This article will first briefly describe the data architecture behind on-chain data analysis platforms, aiming to inform readers about where and how these on-chain data analysis results come from. Subsequently, we will outline the mainstream data analysis platforms for individual users in the market based on dimensions such as data richness (number of supported blockchains), data granularity, data latency, platform usability, and query freedom. Finally, we will share our thoughts on the future of on-chain data indexing, querying, and analysis in Web3.

2. Introduction to the Data Architecture of On-Chain Data Analysis Platforms

Although blockchains record all raw transaction data and on-chain data itself is public and transparent, when we ask questions like: What was the trading volume of Uniswap in the past 24 hours? What percentage of BAYC holders also hold at least one Moonbirds? etc., the raw on-chain data cannot provide us with answers. We need to go through a series of data ingestion processes such as indexing, processing, and storage, and then aggregate and compute the corresponding data based on the questions posed to obtain the answers.

Directly querying the blockchain for answers is very time-consuming and labor-intensive. To enable quick retrieval of on-chain data, current mainstream on-chain data analysis platforms index the raw on-chain data, process it through a series of steps, and store it in a data warehouse managed and updated by the platform. When users track smart money transactions on Nansen or view visual analyses on Dune Analytics, their queries for so-called "on-chain data" are actually querying a database controlled by the project team rather than the blockchain itself.

The data warehouse architecture of on-chain data analysis platforms is roughly as follows:

image

  • Data Collection Layer: The platform obtains raw on-chain data from blockchain nodes. Some platforms accept data sources provided by third parties, while others (like Footprint Analytics) allow users to upload off-chain data to assist in final data analysis.

  • Data Processing Layer: Each platform extracts, transforms, and loads the raw data either through stream processing or batch processing. In stream processing, real-time raw data is continuously input and processed, usually resulting in lower data latency and higher timeliness of analysis results; while batch processing, although slightly higher in data latency and lower in timeliness, is more suitable for large-volume data processing.

  • Data Storage Layer: Processed data is stored in various data tables of the dataset according to formats predefined by the platform for subsequent use.

  • Data Integration Layer: Stored data undergoes aggregation operations. Calculations can be based on pre-set metrics (metrics computation) or triggered periodically or based on set conditions (event-driven aggregation).

  • Data Analysis Layer: The results of the computations are reported and output in real-time. For individual users, we mainly interact with on-chain data analysis platforms at the data analysis layer, such as the Business Intelligence report interface provided by Nansen, the numerous visual charts on Dune Analytics and Footprint Analytics, and the API interfaces provided by some platforms.

Each platform has adopted different solutions to build and maintain its data warehouse. For example, Nansen relies on third-party Google Cloud Platform for the construction and maintenance of its data warehouse.

(https://www.nansen.ai/post/nansen-and-google-cloud-empower-web3-investors-with-high-quality-real-time-market-intelligence)

image

On the other hand, platforms like Dune Analytics, Footprint Analytics, and Token Terminal independently build and maintain their own data warehouses. Taking Footprint Analytics as an example, its data warehouse architecture is shown in the diagram below.

image

3. Comparison of Mainstream On-Chain Data Analysis Platforms

This section compares several mainstream on-chain data analysis platforms, including Nansen, Token Terminal, Dune Analytics, and Footprint Analytics, from the perspectives of content and user experience, based on dimensions such as data richness (number of supported blockchains), data granularity, data latency, platform usability, and query freedom. Some platforms provide standardized information report interfaces for users, such as Nansen and Token Terminal.

Nansen

Nansen is one of the most familiar on-chain data analysis platforms.

Compared to other platforms, its standout feature is wallet profiling (wallet profiler/wallet labeling). By leveraging wallet labeling and combining it with other on-chain data, it extracts highly valuable information for users, such as Smart Money, helping users track the real-time movements of whales and heavy DeFi players. Other popular products include Hot Contract, which discovers emerging popular DeFi and NFT contracts; NFT Paradise, which provides real-time NFT minting data, etc.

【Supported Blockchains】Nansen currently supports on-chain data analysis for a total of 11 blockchains: Ethereum, Arbitrum, Avalanche, BSC, Celo, Fantom, Optimism, Polygon, Ronin, Terra, and Solana.

【Data Granularity】Nansen's standard version only provides users with curated data.

【Data Latency】Stream processing and batch processing. Some data analyses have achieved near real-time reporting.

【Platform Usability】Zero threshold.

【Query Freedom】Nansen's standard version only provides a standardized information template interface. For institutional clients with custom on-chain data query and analysis needs, Nansen has launched the Nansen Institutions product through Google Cloud Platform's Blockchain Datasets, allowing professional/institutional users to write SQL Queries that meet custom requirements.

It is worth mentioning that Nansen has published several on-chain analysis reports in the Nansen Research channel. These research reports provide in-depth on-chain tracking and analysis of key events, and readers may find it beneficial to occasionally read these reports (such as Nansen's report on last month's stETH depeg event https://www.nansen.ai/research/on-chain-forensics-demystifying-steth-depeg) to learn more about on-chain analysis methods.

Token Terminal

Token Terminal is renowned for providing accurate protocol revenue. Based on protocol revenue, Token Terminal calculates various metrics such as price-to-sales (P/S) and price-to-earnings (P/E) ratios for different protocols. These metrics provide a valuation benchmark for various protocols to some extent.

【Supported Blockchains】Token Terminal tracks data from over 130 protocols.

【Data Granularity】Token Terminal only provides users with curated data.

【Data Latency】Batch processing. According to recent communications between the IOSG team and Token Terminal, the data on the Token Terminal platform currently has about two days of latency.

【Platform Usability】Zero threshold.

【Query Freedom】Only provides a standardized information interface.

image

Other mainstream on-chain data analysis platforms open their data tables to users, allowing them to write code for queries, providing users with a certain degree of freedom in query content, such as Dune Analytics and Footprint Analytics.

Dune Analytics

Dune Analytics is the earliest on-chain data analysis platform to allow users to query independently, boasting the largest community of analysts and users. Dune Analytics provides highly granular raw on-chain data, enabling analysts to freely write custom queries using this data. Dune Analytics also opens up Abstraction to project teams, allowing them to create more suitable data tables for analysts based on their protocol's data content. However, independent querying has a certain threshold; analysts need to have the ability to write PostgreSQL to create data queries that meet their needs. Moreover, query latency is highly correlated with the analyst's SQL writing skills and familiarity with the data tables provided by Dune Analytics.

【Supported Blockchains】Dune Analytics provides on-chain data for a total of 6 blockchains: Ethereum, BSC, Optimism, Polygon, Gnosis Chain, and Solana.

【Data Granularity】Extremely fine.

【Data Latency】Stream processing. Data latency is about five minutes.

【Platform Usability】Dune Analytics requires analysts to have certain SQL coding skills.

【Query Freedom】High.

image

With highly granular raw data, analysts can freely create on-chain analyses in Dune Analytics, such as daily StepN new shoe minting and historical accumulation data https://dune.com/queries/627689/1170627.

Dune Analytics released Dune Engine v2 on May 30, 2022. Dune Engine v2 significantly revamped Dune Analytics' data architecture to provide users with faster query responses and better query performance while minimizing the impact on user experience.

Footprint Analytics

Compared to Nansen's low usage threshold but only providing standardized information interfaces, and Dune Analytics' freedom to query but requiring analysts to have PostgreSQL coding skills, Footprint Analytics offers users a solution that strikes a balance by providing great query freedom while lowering the usage threshold. How does it achieve this?

"On-chain data is complex, and analysts may need to write hundreds or thousands of lines of code to complete a metric calculation. To address the high analysis threshold, Footprint cleans and integrates on-chain data, giving it business meaning, allowing users to analyze blockchain data without SQL queries and coding. Anyone can build their custom charts in minutes through a rich chart interface, decrypt on-chain data, and discover value trends behind projects."

Footprint Analytics not only provides raw blockchain data but also categorizes on-chain data. The most raw on-chain data is classified as Bronze data, while filtered, cleaned, and enhanced data is classified as Silver data, and further organized data with business significance is classified as Gold data.

image

The organized Gold and Silver level data with commercial logic and business significance can be directly used for analysis. With Gold and Silver level data, Footprint Analytics provides users with a service that allows them to query on-chain data simply by dragging and dropping data tables. Regardless of whether readers can write SQL-like code, they can quickly create a data analysis information interface that meets their customized needs and visualize the required information through intuitive and interactive charts.

【Supported Blockchains】Footprint Analytics currently provides on-chain data for a total of 17 blockchains: Ethereum, Arbitrum, Avalanche, Boba, BSC, Celo, Fantom, Harmony, IOTEX, Moonbeam, Moonriver, Polygon, Thundercore, and Solana.

【Data Granularity】Footprint Analytics provides both extremely fine raw data and curated data for users.

【Data Latency】Currently, Footprint Analytics processes the collected raw data once a day, resulting in a data latency of one day.

【Platform Usability】On the Footprint Analytics platform, users can freely analyze on-chain data without SQL queries and coding. For analysts with SQL coding skills, Footprint also provides raw data for analysis.

【Query Freedom】High.

Readers may want to head over to Footprint Analytics now; you can get started creating your own on-chain analysis interface in just a few minutes.
image
image

4. A Vision for Decentralized On-Chain Data Analysis

On-chain data analysis is so important, yet today's users can only rely on centralized platforms like Nansen and Dune Analytics to assist in investment decisions. On these platforms, users cannot verify whether the data used has been tampered with and must trust that the datasets provided by the platform are accurate and true. "Don't Trust. Verify." has become an empty phrase in the context of on-chain data analysis.

As the Web3 wave surges, the on-chain ecosystem becomes increasingly rich. Future smart contracts and decentralized applications may not only require raw on-chain data and data provided by oracles as input information but may also need to input analysis results derived from raw on-chain data calculations. At that time, can we still trust and use these centralized on-chain data analysis platforms for such purposes? The answer is likely no.

The IOSG team has recently seen project teams taking the first steps toward achieving decentralized on-chain data querying and analysis. Due to space limitations, we will discuss this further next time—on the road to decentralized on-chain data analysis.

References:

https://www.nansen.ai/post/nansen-and-google-cloud-empower-web3-investors-with-high-quality-real-time-market-intelligence https://cloud.google.com/customers/nansen https://www.nansen.ai/research/on-chain-forensics-demystifying-steth-depeg https://docs.dune.com/data-tables/data-tables https://docs.dune.com/dune-engine-v2-beta/query-engine https://www.footprint.network/@Footprint/Footprint-Datasets-Data-Dictionary https://www.youtube.com/watch?v=Pp9_wgYZB3I

ChainCatcher reminds readers to view blockchain rationally, enhance risk awareness, and be cautious of various virtual token issuances and speculations. All content on this site is solely market information or related party opinions, and does not constitute any form of investment advice. If you find sensitive information in the content, please click "Report", and we will handle it promptly.
ChainCatcher Building the Web3 world with innovators