Besides Solidity, what other EVM languages are worth paying attention to?

jtriley

2023-03-19 09:51:06

Collection

Introducing six languages: Solidity, Vyper, Fe, Huff, Yul, and ETK.

Scan with WeChat

Author: jtriley.eth

Compiled by: 0x11, Foresight News

The Ethereum Virtual Machine (EVM) is a 256-bit, stack-based, globally accessible Turing machine. Due to its architecture being distinctly different from other virtual machines and physical machines, the EVM requires domain-specific languages (DSLs) (Note: domain-specific languages refer to computer languages focused on a specific application domain).

In this article, we will explore the latest technologies in EVM DSL design, introducing six languages: Solidity, Vyper, Fe, Huff, Yul, and ETK.

Language Versions

Solidity: 0.8.19
Vyper: 0.3.7
Fe: 0.21.0
Huff: 0.3.1
ETK: 0.2.1
Yul: 0.8.19

To read this article, you should have a basic understanding of EVM, stacks, and programming.

Overview of the Ethereum Virtual Machine

The EVM is a Turing machine based on a 256-bit stack. However, before delving into its compilers, some features should be introduced.

Since the EVM is "Turing complete," it is subject to the "halting problem." In short, there is no way to determine whether a program will terminate in the future before its execution. The EVM addresses this issue through "Gas," a metric for computational units, which generally correlates with the physical resources required to execute instructions. The amount of Gas for each transaction is limited, and the initiator of the transaction must pay ETH proportional to the Gas consumed by the transaction. One of the implications of this strategy is that if there are two functionally identical smart contracts, the one that consumes less Gas will be adopted more widely. This leads to extreme competition for Gas efficiency among protocols, with engineers striving to minimize Gas consumption for specific tasks.

Additionally, when calling a contract, it creates an execution context. In this context, the contract has a stack for operations and processing, a linear memory instance for reading and writing, a local persistent storage for contract reads and writes, and the data attached to the call, known as "calldata," can be read but not written.

An important note about memory is that while there is no definitive "upper limit" on its size, it is still finite. The Gas cost for expanding memory is dynamic: once a threshold is reached, the cost of expanding memory will grow quadratically, meaning the Gas cost is proportional to the square of the additional memory allocation.

Contracts can also use various instructions to call other contracts. The "call" instruction sends data and optional ETH to the target contract, creating its own execution context until the execution of the target contract stops. The "staticcall" instruction is similar to "call," but adds a check that asserts no part of the global state has been updated before the static call completes. Finally, the "delegatecall" instruction behaves like "call," but retains some environmental information from the previous context. This is commonly used for external libraries and proxy contracts.

Why Language Design Matters

When interacting with atypical architectures, domain-specific languages (DSLs) are necessary. While compiler toolchains like LLVM exist, relying on them to handle smart contracts is less than ideal in scenarios where program correctness and computational efficiency are crucial.

Program correctness is vital because smart contracts are immutable by default, and given the properties of blockchain virtual machines (VMs), smart contracts are a popular choice for financial applications. While there are upgradeable solutions for the EVM, they are at best a patch and at worst a vulnerability to arbitrary code execution.

Computational efficiency is also critical, as minimizing computation has economic advantages, but not at the expense of security.

In short, EVM DSLs must balance program correctness and Gas efficiency, achieving one through various trade-offs without sacrificing too much flexibility.

Language Overview

For each language, we will describe their notable features and design choices, including a simple counting function smart contract. Language popularity is determined based on total value locked (TVL) data from Defi Llama.

Solidity

Solidity is a high-level language with syntax similar to C, Java, and JavaScript. It is the most popular language by TVL, with a TVL ten times that of the second place. It uses an object-oriented model for code reuse, treating smart contracts as class objects and leveraging multiple inheritance. The compiler is written in C++ and plans to migrate to Rust in the future.

Mutable contract fields are stored in persistent storage unless their values are known at compile time (constants) or deployment time (immutable). Methods declared within contracts can be specified as pure, view, payable, or, by default, non-payable but state-modifying. Pure methods do not read data from the execution environment and cannot read or write to persistent storage; that is, given the same input, pure methods will always return the same output, and they do not produce side effects.

View methods can read data from persistent storage or the execution environment, but they cannot write to persistent storage or create side effects, such as adding transaction logs. Payable methods can read and write to persistent storage, read data from the execution environment, produce side effects, and can receive ETH attached to the call. Non-payable methods are the same as payable methods but have runtime checks to assert that no ETH is attached in the current execution context.

Note: Attaching ETH to a transaction is separate from paying Gas fees; the attached ETH is received by the contract and can be accepted or rejected based on the recovery context.

When declared within the scope of a contract, methods can specify one of four visibility modifiers: private, internal, public, or external. Private methods can be accessed internally through the "jump" instruction within the current contract. No inherited contracts can directly access private methods. Internal methods can also be accessed internally through the "jump" instruction, but inherited contracts can directly use internal methods. Public methods can be accessed by external contracts through the "call" instruction, creating a new execution context, and can be accessed internally via jumps when called directly. Public methods can also be accessed from the same contract in a new execution context by prefixing the method call with "this."

Note: The "jump" instruction manipulates the program counter, while the "call" instruction creates a new execution context for the target contract's execution. Whenever possible, using "jump" instead of "call" is more Gas-efficient.

Solidity also provides three ways to define libraries. The first is an external library, which is a stateless contract deployed separately on-chain, dynamically linked when called, and accessed via the "delegatecall" instruction. This is the least common method due to insufficient tool support for external libraries, the high cost of "delegatecall," which must load additional code from persistent storage, and the need for multiple transactions for deployment. The definition of an internal library is the same as that of an external library, except that each method must be defined as an internal method.

At compile time, internal libraries are embedded into the final contract, and during dead code analysis, unused methods in the library will be removed. The third method is similar to an internal library, but instead of defining data structures and functions within the library, they are defined at the file level and can be directly imported and used in the final contract. This third method provides better human-machine interaction, allowing the use of custom data structures, applying functions in the global scope, and applying alias operators to certain functions to a certain extent.

The compiler provides two optimization channels. The first is an instruction-level optimizer that performs optimization operations on the final bytecode. The second is a recently added use of the Yul language (detailed later) as an intermediate representation (IR) during the compilation process, followed by optimization operations on the generated Yul code.

To interact with public and external methods in contracts, Solidity specifies an application binary interface (ABI) standard for interacting with its contracts. Currently, the Solidity ABI is regarded as the de facto standard for EVM DSLs. Ethereum ERC standards that specify external interfaces are implemented according to Solidity's ABI specifications and style guidelines. Other languages also follow Solidity's ABI specifications with minimal deviations.

Solidity also provides inline Yul blocks, allowing low-level access to the EVM instruction set. Yul blocks contain a subset of Yul functions; for more details, see the Yul section. This is commonly used for Gas optimization, leveraging features not supported by high-level syntax, and customizing storage, memory, and calldata.

Due to Solidity's popularity, developer tools are very mature and well-designed, with Foundry being a prominent representative in this regard.

Here is a simple contract written in Solidity:

Vyper

Vyper is a high-level language with syntax similar to Python. It is almost a subset of Python, with only a few minor differences. It is the second most popular EVM DSL. Vyper is optimized for security, readability, auditability, and Gas efficiency. It does not adopt an object-oriented model, inline assembly, and does not support code reuse. Its compiler is written in Python.

Variables stored in persistent storage are declared at the file level. If their values are known at compile time, they can be declared as "constant"; if their values are known at deployment time, they can be declared as "immutable"; if they are marked as public, the final contract will expose a read-only function for that variable. The values of constants and immutables are accessed internally by their names, but mutable variables in persistent storage can be accessed by prefixing their names with "self." This is useful for preventing namespace conflicts between storage variables, function parameters, and local variables.

Similar to Solidity, Vyper also uses function attributes to indicate the visibility and mutability of functions. Functions marked as "@external" can be accessed from external contracts via the "call" instruction. Functions marked as "@internal" can only be accessed within the same contract and must be prefixed with "self." Functions marked as "@pure" cannot read data from the execution environment or persistent storage, nor can they write to persistent storage or create any side effects.

Functions marked as "@view" can read data from the execution environment or persistent storage but cannot write to persistent storage or create side effects. Functions marked as "@payable" can read or write to persistent storage, create side effects, and accept ETH. Functions that do not declare this mutability attribute default to non-payable, meaning they are the same as payable functions but cannot receive ETH.

The Vyper compiler also chooses to store local variables in memory rather than on the stack. This simplifies contracts and makes them more efficient, addressing the common "stack too deep" issue found in other high-level languages. However, this also introduces some trade-offs.

Additionally, since the memory layout must be known at compile time, the maximum capacity of dynamic types must also be known at compile time, which is a limitation. Furthermore, allocating large amounts of memory can lead to nonlinear Gas consumption, as mentioned in the EVM overview section. However, for many use cases, this Gas cost can be negligible.

While Vyper does not support inline assembly, it provides more built-in functions to ensure that almost every function in Solidity and Yul can also be implemented in Vyper. Built-in functions allow access to low-level bitwise operations, external calls, and proxy contract operations, and custom storage layouts can be achieved through compile-time provided overlay files.

Vyper does not have a rich development toolkit, but it has more tightly integrated tools and can also be plugged into Solidity development tools. Notable Vyper tools include the Titanaboa interpreter, which has many built-in tools related to EVM and Vyper for experimentation and development, and Dasy, a Lisp based on Vyper with compile-time code execution capabilities.

Here is a simple contract written in Vyper:

Fe is a high-level language similar to Rust, currently under active development, with most features yet to be released. Its compiler is primarily written in Rust but uses Yul as its intermediate representation (IR), relying on a Yul optimizer written in C++. This is expected to change with the addition of the Rust-native backend Sonatina. Fe uses modules for code sharing, thus not employing an object-oriented model but reusing code through a module-based system, declaring variables, types, and functions within modules, which can be imported in a Rust-like manner.

Persistent storage variables are declared at the contract level and cannot be publicly accessed unless a manually defined getter function is provided. Constants can be declared at the file or module level and can be accessed within the contract. Currently, immutable deployment-time variables are not supported.

Methods can be declared at the module level or within contracts, defaulting to pure and private. To make a contract method public, the "pub" keyword must be prefixed to its definition, allowing it to be accessed externally. To read from persistent storage variables, the first parameter of the method must be "self," prefixing the variable name with "self." to grant the method read-only access to local storage variables. To read and write to persistent storage, the first parameter must be "mut self." The "mut" keyword indicates that the contract's storage is mutable during the method's execution. Accessing environment variables is done by passing a "Context" parameter to the method, typically named "ctx."

Functions and custom types can be declared at the module level. By default, module items are private unless prefixed with the "pub" keyword for access. However, this should not be confused with the "pub" keyword at the contract level. Public members of a module can only be accessed within the final contract or other modules.

Fe currently does not support inline assembly; instead, instructions are wrapped in internal functions or special functions that are resolved to instructions at compile time.

Fe follows Rust's syntax and type system, supporting type aliases, enums with subtypes, traits, and generics. Support in this area is currently limited but is in progress. Traits can be defined and implemented for different types, but generics and trait constraints are not supported. Enums support subtypes and can have methods implemented on them, but they cannot be encoded in external functions. Although Fe's type system is still evolving, it shows great potential for enabling developers to write safer, compile-time-checked code.

Here is a simple contract written in Fe:

Huff

Huff is an assembly language with manual stack control and minimal abstraction over the EVM instruction set. Any included Huff files can be resolved at compile time through the "#include" directive, enabling code reuse. Originally written by the Aztec team for highly optimized elliptic curve algorithms, the compiler was later rewritten in TypeScript and then in Rust.

Constants must be defined at compile time, and currently, immutable variables are not supported, and there is no explicit definition of persistent storage variables in the language. Since naming storage variables is a high-level abstraction, writing to persistent storage in Huff is done via the opcode "sstore" for writing and "sload" for reading. Custom storage layouts can be defined by the user or can follow conventions starting from zero and incrementing each variable using the compiler's intrinsic "FREESTORAGEPOINTER." Making storage variables externally accessible requires manually defining a code path that can read and return the variable to the caller.

External functions are also an abstraction introduced by high-level languages, so there is no concept of external functions in Huff. However, most projects follow the ABI specifications of other high-level languages to varying degrees, most commonly Solidity. A common pattern is to define a "dispatcher" that loads the raw call data and uses it to check for a matching function selector. If matched, it executes the subsequent code. Since dispatchers are user-defined, they may follow different dispatching patterns.

Solidity sorts its selectors in alphabetical order within its dispatcher, Vyper sorts them numerically and performs a binary search at runtime, while most Huff dispatchers sort by expected function usage frequency, rarely using jump tables. Currently, jump tables are not natively supported in the EVM, so introspective instructions like "codecopy" are needed to implement them.

Internal functions are defined using the "#define fn" directive, which can accept template parameters for flexibility and specify the expected stack depth at the start and end of the function. Since these functions are internal, they cannot be accessed externally, and internal access requires using the "jump" instruction.

Other control flows, such as conditional statements and loops, can be defined using jump targets. Jump targets are defined by an identifier followed by a colon. These targets can be jumped to by pushing the identifier onto the stack and executing a jump instruction. This resolves to bytecode offsets at compile time.

Macros are defined using "#define macro," and otherwise, they are similar to internal functions. The key difference is that macros do not generate "jump" instructions at compile time but instead copy the body of the macro directly into each call in the file.

This design trades off reducing arbitrary jumps against runtime Gas costs, at the expense of increased code size when called multiple times. The "MAIN" macro is treated as the entry point of the contract, and the first instruction in its body will become the first instruction in the runtime bytecode.

Other built-in features of the compiler include generating event hashes for logging, generating function selectors for dispatching, generating error selectors for error handling, and code size checkers for internal functions and macros.

Note: Stack comments like "// [count]" are not required; they are merely used to indicate the state of the stack at the end of that line's execution.

Here is a simple contract written in Huff:

ETK

The Ethereum ToolKit (ETK) is an assembly language with manual stack management and minimal abstraction. Code can be reused through the "%include" and "%import" directives, and the compiler is written in Rust.

A notable difference between Huff and ETK is that Huff adds slight abstraction for initcode, also known as constructor code, which can be overridden by defining a special "CONSTRUCTOR" macro. In ETK, these are not abstracted, and initcode and runtime code must be defined together.

Similar to Huff, ETK reads and writes persistent storage through the "sload" and "sstore" instructions. However, there are no constant or immutable keywords, but one of the two macros in ETK can be used to simulate constants, namely expression macros. Expression macros do not resolve to instructions but generate numeric values that can be used in other instructions. For example, it may not fully generate a "push" instruction but may generate a number to be included in a "push" instruction.

As mentioned earlier, external functions are a concept from high-level languages, so creating publicly accessible code paths externally requires creating a function selector dispatcher.

Internal functions cannot be explicitly defined like in other languages but can have user-defined aliases for jump targets specified, allowing jumps to them by their names. This also allows for other control flows, such as loops and conditional statements.

ETK supports two types of macros. The first is expression macros, which can accept any number of parameters and return numeric values usable in other instructions. Expression macros do not generate instructions but generate immediate values or constants. However, instruction macros accept any number of parameters and generate any number of instructions at compile time. Instruction macros in ETK are similar to Huff macros.

Here is a simple contract written in ETK:

Yul

Yul is an assembly language with high-level control flow and a significant amount of abstraction. It is part of the Solidity toolchain and can optionally be used in the Solidity compilation channel. Yul does not support code reuse, as it is designed to be a compilation target rather than a standalone language. Its compiler is written in C++ and plans to migrate with the rest of the Solidity channel to Rust.

In Yul, code is divided into objects, which can contain code, data, and nested objects. Therefore, there are no constants or external functions in Yul. A function selector dispatcher needs to be defined to expose code paths externally.

In addition to stack and control flow instructions, most instructions in Yul are exposed as functions. Instructions can be nested to shorten code length or assigned to temporary variables, which can then be passed to other instructions. Conditional branches can be executed using "if" blocks, which execute if the value is non-zero, but there is no "else" block, so handling multiple code paths requires using "switch" to handle an arbitrary number of cases and a "default" fallback option. Loops can be executed using "for" loops; while its syntax differs from other high-level languages, it provides the same basic functionality. Internal functions can be defined using the "function" keyword, similar to function definitions in high-level languages.

Most features in Yul are exposed in Solidity using inline assembly blocks. This allows developers to break abstraction and write custom functions or use features not available in high-level syntax. However, using this feature requires a deep understanding of how Solidity behaves with calldata, memory, and storage.

There are also some unique functions. The "datasize," "dataoffset," and "datacopy" functions operate on Yul objects through their string aliases. The "setimmutable" and "loadimmutable" functions allow setting and loading immutable parameters in constructors, although their use is limited. The "memoryguard" function indicates that only a given memory range is allocated, allowing the compiler to use memory beyond the protected range for additional optimizations. Finally, "verbatim" allows the use of instructions unknown to the Yul compiler.

Here is a simple contract written in Yul:

Characteristics of an Excellent EVM DSL

An excellent EVM DSL should learn from the strengths and weaknesses of each language listed here and should also include the fundamentals found in almost all modern languages, such as conditional statements, pattern matching, loops, functions, and more. Code should be explicit, adding minimal implicit abstractions for the sake of code aesthetics or readability. In high-risk environments where correctness is crucial, every line of code should be explicitly interpretable. Furthermore, a well-defined module system should be at the core of any great language. It should clearly state which items are defined in which scopes and which can be accessed. By default, every item in a module should be private, with only explicitly public items accessible externally.

In a resource-constrained environment like the EVM, efficiency is important. Efficiency is often achieved by providing low-cost abstractions, such as compile-time code execution through macros, rich type systems to create well-designed reusable libraries, and common on-chain interaction wrappers. Macros generate code at compile time, which is very useful for reducing boilerplate code for common operations, and in cases like Huff, it can be used to balance code size against runtime efficiency.

A rich type system allows for more expressive code, more compile-time checks to catch errors before runtime, and, when combined with type-checked compiler internal functions, can eliminate much of the need for inline assembly. Generics also allow nullable values (e.g., external code) to be wrapped in "option" types, or error-prone operations (e.g., external calls) to be wrapped in "result" types. These two types are examples of how library authors can enforce developers to handle each result by defining code paths or recovering failed results. However, keep in mind that these are compile-time abstractions that will resolve to simple conditional jumps at runtime. Forcing developers to handle each result at compile time increases initial development time, but the benefit is far fewer surprises at runtime.

Flexibility is also important for developers, so while the default for complex operations should be safe and potentially less efficient routes, there are times when more efficient code paths or unsupported features need to be used. For this reason, inline assembly should be open to developers without barriers. Solidity's inline assembly sets some barriers for simplicity and better optimizer pass, but when developers need full control over the execution environment, they should be granted those rights.

Some potentially useful features include the ability to manipulate the properties of functions and other items at compile time. For example, the "inline" property can copy the body of simple functions into each call instead of creating more jumps for efficiency. The "abi" property can allow manual overrides of the ABI generated for a given external function to accommodate different coding styles of languages. Additionally, an optional function dispatcher can be defined, allowing customization within the high-level language to optimize for code paths expected to be used more frequently. For example, checking if the selector is "transfer" or "transferFrom" before executing "name."

Conclusion

The design of EVM DSLs has a long way to go. Each language has its unique design decisions, and I look forward to seeing how they evolve in the future. As developers, it is in our best interest to learn as many languages as possible. First, learning multiple languages and understanding their differences and similarities will deepen our understanding of programming and underlying machine architectures.

Second, languages have profound network effects and strong retention characteristics. There is no doubt that large players are building their programming languages, from C#, Swift, and Kotlin to Solidity, Sway, and Cairo. Learning to switch seamlessly between these languages provides unparalleled flexibility for a software engineering career. Finally, it is important to recognize that a significant amount of work is required behind each language. No one is perfect, but countless talented individuals have put in tremendous effort to create safe and enjoyable experiences for developers like us.

Industry Chat

Thoughts and Essays on the Cryptocurrency Industry

Topic or theme

ChainCatcher reminds readers to view blockchain rationally, enhance risk awareness, and be cautious of various virtual token issuances and speculations. All content on this site is solely market information or related party opinions, and does not constitute any form of investment advice. If you find sensitive information in the content, please click "Report", and we will handle it promptly.