Web3 Version of ChatGPT Product Review: Basic Understanding is Qualified, but Overall Unsatisfactory

2023-08-24 17:07:57

Collection

This article will evaluate AI-driven Web3 chatbots such as MinMax, QnA3, and Web3 Analytics, comparing their capabilities in understanding, generating, learning, and optimizing from multiple perspectives, and providing a comprehensive assessment of user experience and intelligence level.

Scan with WeChat

Author: bayemon.eth, ChainCatcher

After the explosive popularity of chatGPT at the end of last year, the "fashion trendsetters" in the Web3 space have been exploring the infinite possibilities of "AI + Web3." Compared to traditional industries with complete knowledge systems, Web3, as a nascent world that has not yet formed a complete learning mechanism, seems to need roles like chatGPT that can provide inspiration and timely answers at critical moments.

Although the current hot topics of "AI + Web3" still revolve around identity verification networks like Worldcoin, narrative-driven Telegram Bots like Unibot and Lootbot, as well as zkML and other technologies that may further link with scaling solutions in the future, AI-driven chatbots such as MinMax, QnA3, and Web3 Analytics have emerged in the community, proving that teams have noticed the gap in knowledge transmission within Web3 and want to create a chatGPT for the Web3 professional field. This article will evaluate the three Web3 chatbots mentioned above, comparing their capabilities in understanding, generating, learning, and optimizing from multiple perspectives, and providing a comprehensive assessment of user experience and intelligence level.

Evaluation Criteria

The first step in the evaluation is, of course, to create a folder and design a series of evaluation criteria. For an interactive model, user experience comes from the interaction process and the intelligence level of the model. The interaction experience will mainly focus on UI design, while the measurement of the model's intelligence will include the following aspects:

Understanding & Generating Ability:

Ability to accurately understand user input questions, relate to context, and generate natural, fluent, and logical responses.
Clarity and conciseness in responses, providing useful solutions and suggestions for problem-oriented questions.
Ability to provide useful solutions and suggestions.

Learning & Interaction Optimization Ability:

Ability to summarize and provide accurate information and answers based on user-provided materials and data sources.
Continuous learning and improvement of understanding and background knowledge in specific industries.
Ability to reason from interactions with users and improve responses through interaction.
Ability to optimize based on user feedback and behavior, providing a better user experience.

Multilingual Processing:

Ability to understand and respond to answers in multiple languages, including natural language and machine language.
Ability to provide clear, accurate, and culturally appropriate responses.

Interaction Experience

MinMax

At first glance, the default black background with green text raises reasonable doubts about whether the UI team firmly believes in "Keep the bar green to keep the code clean" (or simply for eye protection). Due to the human eye's high sensitivity to green, the first thing one notices in the MinMax UI is the Popular Queries and Popular Questions modules, which directly display high-search concepts and questions to users, leveraging a bit of "herd mentality." Clicking on them jumps directly to the relevant concept and question interface.

One downside is that the word cloud design of these two modules may be based on search volume changing font sizes to emphasize concepts and questions with more searches; however, due to the current low user count or overly average search frequency, the word cloud does not show a more intuitive comparison. The advantages of the word cloud will only become apparent after increasing the volume and enhancing search differentiation.

After all, MinMax's positioning is that of a search engine, so the emphasis on the chatbot is limited to a small white box on the homepage.

The chatbot interface still features the traditional black and green color scheme, automatically generating a greeting message and still including a few "hot searches." In summary, the MinMax chat interface is quite simple, and because its positioning is a search engine, it enjoys making some information associations during the chat.

Additionally, MinMax allows users to log in directly via email, Google, Twitter, or Facebook, and does not require users to have a wallet, making it relatively more beginner-friendly.

Web3 Analytics

Compared to MinMax, Web3 Analytics is designed as a pure chatbot, with the homepage being the chat interface. The black and blue color scheme evokes memories of classic Visual Studio, with the left sidebar for historical conversations and the right sidebar for feedback features still under development. The historical conversation feature goes without saying, but how the feedback section will be presented is worth keeping an eye on.

The automatically generated greeting message from Web3 Analytics emphasizes social media like Telegram and Discord, as well as the project token W AI, in addition to hot search entries. The emphasis on Telegram and Discord likely stems from Web3 Analytics also being a participant in the Telegram/Discord Bot narrative. Notably, the team has introduced the concept of "Train AI to Earn," allowing users to earn project tokens by asking the bot questions. Because it involves tokens, it requires wallet login, and if users ask the bot questions without logging in, after a maximum of 3 questions, the webpage will continuously prompt the need to log in and obtain W AI, and users cannot continue using it without logging in.

QnA3

Unlike the strong programmer style of the previous two bots, QnA3's pink and purple color scheme is directly dopamine-inducing. The homepage displays hot search questions and introduces the "Vote to Earn" feature, which requires users to log in via a wallet due to the involvement of points and future token exchanges. There are currently two modes for earning points:

Vote to Earn: Users who successfully predict the top three questions can earn points.
Ask to Earn: Users can earn project points by completing daily question tasks.

Currently, QnA3 is deployed on the BNB Chain, and receiving points requires paying Gas, which can later be used for project token airdrops. As another project that plans to issue tokens, QnA3 also requires wallet address login for future Tokenomics monetization operations.

Additionally, QnA3's homepage features a news option, using "Whales are asking" to engage users and attract traffic to encourage them to click through to the transition page for ongoing attention.

However, in the simplified Chinese mode, the "Whales are asking" link alternates between Chinese and English, which definitely needs further optimization.

Model Intelligence Evaluation

Note: Since QnA3 will have responses from both Knowledge Graph and Web3 News during the conversation, the former retrieves information from the database through the knowledge graph, while the latter is an integration of related inquiries. Therefore, in the model intelligence evaluation, QnA3 will encompass the responses from both bots.

1. Understanding & Generating Ability

- Regarding Understanding Ability:

For beginners, the first day of encountering Web3 may involve learning about consensus mechanisms and algorithms from various materials. However, over time, they might only remember PoW and PoS, so it might be a good opportunity to refresh their memory.

Let's see what chatGPT has to say:

MinMax

Web3 Analytics

QnA3

Regarding the responses about consensus algorithms, at first glance, all three bots provide reasonable explanations and clearly list them out. However, upon closer inspection, it is found that QnA3's Knowledge Graph is muddling through, possibly due to issues with indexing or traversing the knowledge graph database, as the relevant content for PoW and PoS is output twice.

In terms of specific content, all three parties' introductions to common consensus algorithms generally cover PoS, PoW, D PoS, and PBFT (Byzantine Fault Tolerance), but the explanations are somewhat lacking. For example, MinMax's explanation of PBFT is "PBFT is a Byzantine fault tolerance algorithm that processes Byzantine faults through consensus," which sounds like asking "What is tomato scrambled eggs?" and the bot replies, "Tomato scrambled eggs is a dish that requires tomatoes and eggs to be scrambled," providing no additional information beyond the literal meaning.

While accuracy is certainly important in model training, and avoiding "off-topic" responses is one of the ultimate goals, AI's answers sometimes overly pursue "accuracy" and output a bunch of "nonsense," which should also be considered a serious case of model overfitting. Therefore, it is suggested that future algorithm optimizations could consider adding some metrics that appeal to personalized and differentiated responses on top of accuracy measurement.

- Regarding Contextual Relevance:

The conversation was still relatively normal until I pressed the bot to help me specifically explain the first consensus algorithm (PoW) mentioned in the previous question. The expected response from chatGPT would be:

MinMax

The only AI that could score points on this question, providing a logical response that first mentions the application, core ideas, consensus process, advantages, disadvantages, and improvements of PoW.

Web3 Analytics

This one provided an answer completely unrelated to Web3, leading to doubts about whether the team includes members from prestigious universities both domestically and abroad, including teacher Luo Xiang…

QnA3

Compared to the content generated by Web3 Analytics, which is almost entirely unrelated to Web3, QnA3's two models at least generated content that is somewhat related to Web3, but still did not fully understand what "the first one" in my question referred to. The Knowledge Graph even had language confusion, outputting English content.

Web3 News understood the intent of the question but clearly did not grasp what "the first one" referred to, and also included nonsensical statements like "the first Bitcoin refers to Bitcoin."

In summary, from the perspective of contextual linkage ability, currently, only MinMax passes among the three types of dialogue AI.

- Regarding Generating Ability

Here we still consider text-based generating ability, first asking the AI to briefly explain the difference between PoW and PoS, and then asking the AI to output in table form.

chatGPT

Note: chatGPT also did not notice the hint in the first half of the sentence.

MinMax

The table clearly summarizes the differences between the two from different angles, and before the conversation ends, it adds relevant resource links for users to further explore parts they are more interested in.

However, MinMax did not understand the hint in my first half of the sentence and did not provide a summary explanation.

Web3 Analytics

Web3 Analytics understood the first half of the sentence, briefly outputting the difference between PoW and PoS, and very rigorously noted the information source.

In the table section, it seems like Web3 Analytics imagined a grand business battle, with PoS directly becoming a mechanism for selecting miners based on shares and/or age, while the table itself is a bit too simplistic.

QnA3

QnA3 is indeed the best among the three in terms of understanding and table output.

The table points are complete, and there is a summary. From a content perspective, it is the most complete among the four responses, but I just don't understand why the Knowledge Graph is so keen on answering questions in English on a Chinese exam.

QnA3 Web3 News's response is relatively the most compliant among the bots, as it first explains the differences between the two (even breaking it down into points to make the response look more organized) and also outputs a comparison table involving different aspects.

Therefore, in terms of functional generation, QnA3 and MinMax can be prioritized, as the table can basically be screenshot and used directly.

2. Learning Ability

To assess an AI model's learning ability, one must first find a piece of "new knowledge" that does not exist in its current database. However, through repeated conversations, I have not found a question that all three AI models cannot answer simultaneously. Therefore, for MinMax and Web3 Analytics, the question to assess their learning ability is the new standard ERC-6551 for NFT-bound accounts, while for QnA3, which already understands ERC-6551 and can output certain details, the question is the latest governance proposal released by MakeDAO today.

MinMax

After providing the relevant information, MinMax can integrate the information and output content that touches on the core ideas of ERC-6551. Although it does not involve too much technical innovation, for a complete novice who knows nothing about ERC-6551 and wants to quickly understand the basic content, the information is sufficient.

I also casually asked about MakeDAO.

In summary, although it cannot achieve real-time data capture and maintain the effectiveness of the training set, from the perspective of "learning," MinMax can indeed clearly and logically present the "learning results" after providing target content.

Web3 Analytics

Even after providing the specific content of the ERC-6551 standard, Web3 Analytics failed to summarize it, instead outputting an introductory section from an extended article on ERC-6551, with a high duplication rate of 80%.

Similarly, here is Web3 Analytics's response regarding MakerDAO's latest proposal:

It can be seen that for information already included in the dataset, Web3 Analytics does have the ability to output it in points. Therefore, in terms of learning ability, the WA team may still need to optimize the AI's ability to summarize and output external information provided by users.

QnA3

Perhaps due to a weekend dataset update, QnA3 Web3 News can already output the latest proposal released by MakerDAO last Friday, but the Knowledge Graph information still lingers at May of this year.

After providing the relevant link to MakerDAO's latest proposal, the Knowledge Graph still failed to output the most critical DSR adjustment issue in the proposal. Therefore, the learning ability of the Knowledge Graph still needs further optimization.

In summary, although the dataset updates may lag behind the speed of Web3 technology iterations, in terms of the model's ability to learn external knowledge, MinMax can be prioritized. Web3 Analytics and QnA3, while having relatively efficient information iterations, still have room for improvement in overall learning ability.

3. Multilingual Processing Ability

- Natural Language

To meet the demand for barrier-free cross-cultural communication in the Web3 world under the current globalized context, AI is required to possess certain multilingual content creation and information retrieval capabilities.

In terms of Chinese and English, both MinMax and Web3 Analytics can answer questions without barriers, and the generated content also aligns with language habits. For QnA3's Knowledge Graph, while the English content is of the highest quality among the three models, answering Chinese questions in English is indeed not very appropriate. Additionally, even when it can sometimes answer in Chinese, the content tends to be overly literal and does not conform to Chinese language habits. Therefore, for the excellent content of QnA3's Knowledge Graph, further improving compatibility with other languages may be a feasible measure to increase adoption.

- Machine Language

To briefly summarize the application of AI models in the daily work of Web3ers: translator + debugger.

If testing natural language assesses an AI model's qualification as a translator, this section will see if these three are qualified debuggers. Here, a very simple piece of code that is easy for Solidity beginners to make mistakes with is chosen:

To explain briefly, the error in the code lies in the fact that the pure keyword cannot change the on-chain state. In simpler terms, functions with the pure suffix can be understood as only being able to "look" without modifying any variables, meaning that the operation of adding 1 to the number on the fifth line is impossible. Note: From the perspective of a Solidity beginner, the requirements for a debugger are------the model needs to point out where the error is, provide an explanation, and modify the code.

Let's see how GPT does it:

MinMax

Me: Can you help me debug?

MinMax: I can, I am equipped.

Like my brain crashing the moment I see code, MinMax directly reported an error and terminated the conversation during the debugging process. Additionally, the code block displayed by MinMax clearly needs modification, finally revealing MinMax's only shortcoming in external learning and natural language communication.

Web3 Analytics

Web3 Analytics loses a point for understanding, as it fails to include the subject.

Although it appears somewhat less intelligent in certain conversations, Web3 Analytics's debugging ability is still satisfactory. It explains the basic concepts that appear in the code, the source of the error, provides the modified code, and also briefly explains the practical significance of the erroneous code in contract deployment. Well, I can temporarily forgive it for depriving me of my subject omission rights.

QnA3

From the debugging perspective, QnA3 has no issues, pointing out where the error lies and making the necessary modifications, fully meeting the requirements stated at the beginning of this section. The only downside is that the font color and background of the code block are too close, which may require further UI improvements.

PS: However, after evaluating for so long, QnA3 Web3 News only provides answers to some questions, and the person involved has not figured out the conditions that trigger Web3 News responses; meanwhile, regarding the first question mentioned by the Knowledge Graph, I recall that simple contracts may not require a constructor (please correct me if I'm wrong).

In summary, aside from MinMax, which struggles with debugging, both Web3 Analytics and QnA3, despite having some minor flaws, generally qualify as competent debuggers. However, since there are still minor shortcomings, why not just use chatGPT?

Conclusion

Web3 dialogue AI models generally possess a certain level of understanding, generating, and learning abilities, and can handle multilingual responses while serving as good partners for programmers. These "basic qualities" can generate logical frameworks for novices who only understand basic concepts and want to learn more about related knowledge frameworks.

However, for those who have already delved deeply into the field (perhaps this group of people may not even think of using chatbots to solve problems), the AI's functions seem limited to generating tables, summarizing, and other "minor tasks," failing to provide further references in terms of content increment or personalized viewpoints. In summary, I believe that as people's understanding of the Web3 field gradually increases, once a certain critical point is reached, the content increment that the model can provide will gradually approach zero.

It is worth noting that besides MinMax, Web3 Analytics, and QnA3 evaluated in this article, a similar AI dialogue model called SuperSight is currently undergoing internal testing. The emergence of more similar tools reveals, on one hand, the market's emphasis on the integration trend of "AI + Web3" and consideration of user needs, and on the other hand, for project parties, future product iterations should focus on creating unique product features to avoid the "reinventing the wheel" phenomenon. However, regarding the current technological level and the entire market, the practicality and versatility of Web3 AI dialogue models still need to be strengthened, and perhaps large-scale applications will have to wait until artificial intelligence technology and machine learning algorithms are further enhanced, as well as the deep integration of Web3 + AI in the future.

AI x Crypto Dynamic Research

What new narratives will emerge when Crypto meets AI?

Topic or theme

ChainCatcher reminds readers to view blockchain rationally, enhance risk awareness, and be cautious of various virtual token issuances and speculations. All content on this site is solely market information or related party opinions, and does not constitute any form of investment advice. If you find sensitive information in the content, please click "Report", and we will handle it promptly.

ChainCatcher Selection Collection of original articles and selected news from ChainCatcher.

Dragonfly bets? Understand the stablecoin payment platform Codex designed specifically for enterprises in one article | CryptoSeed

Daily Report | Binance stated that the OM flash crash was mainly triggered by cross-platform liquidation; OKX released an announcement regarding the price fluctuations of MANTRA (OM)

Mention the project

Minmax Build AI engines for Web3/digital assets

QnA3.AI AI-driven Web3 knowledge sharing platform

Web3 Analytics Web3 AI Assistant

SuperSight Large language models specific to the cryptocurrency industry