DCIPs/EIPS/eip-5450.md

7.5 KiB

eip title description author discussions-to status type category created requires
5450 EOF - Stack Validation Deploy-time validation of stack usage for EOF functions. Andrei Maiboroda (@gumb0), Paweł Bylica (@chfast), Alex Beregszaszi (@axic), Danno Ferrin (@shemnon) https://ethereum-magicians.org/t/eip-5450-eof-stack-validation/10410 Review Standards Track Core 2022-08-12 3540, 3670, 4200, 4750

Abstract

Introduce extended validation of EOF code sections to guarantee that neither stack underflow nor overflow can happen during execution of validated contracts.

Motivation

The current EVM performs a number of validity checks for each executed instruction, such as checking for instruction being defined, stack overflow and underflow, and enough amount of gas remaining.

This EIP minimizes the number of such checks required at run-time by verifying that no exceptional conditions can happen and preventing the execution and deployment of any invalid code.

The operand stack validation provides several benefits:

  • removes run-time stack underflow check for all instructions,
  • removes run-time stack overflow check for all instruction except CALLF,
  • ensures that an execution terminates with one of the terminating instructions,
  • prevents the deployment of code with unreachable instructions, which discourages the use of code sections for data storage.

It also has some disadvantages:

  • adds constraints to the code structure (similar to JVM, CPython bytecode, WebAssembly and others); however, these constraints can be lifted in a backward-compatible manner if they are shown to be user-unfriendly,
  • adds second validation pass; however, validation's computational and space complexity remains linear.

The guarantees created by these validation rules also improve the feasabiliy of Ahead-Of-Time and Just-In-Time compilation of EVM code. Single pass transpilation passes can be safely executed with the code validation and advanced stack/register handling can be applied with the stack height validaitons. While not as impactful to a mainnet validator node that is bound mostly by storage state sizes, these can significantly speed up witness validation and other non-mainnet use cases.

Specification

Code validation

Remark: We rely on the notions of operand stack and type section as defined by EIP-4750.

Each code section is validated independently.

Instructions validation

In the first validation phase defined in EIP-3670 (and extended by EIP-4200 and EIP-4750) instructions are inspected independently to check if their opcodes and immediate values are valid.

Operand stack validation

In the second validation phase control-flow analysis is performed on the code.

Operand stack height here refers to the number of stack values accessible by this function, i.e. it does not take into account values of caller functions' frames (but does include this function's inputs). Note that validation procedure does not require actual operand stack implementation, but only to keep track of its height.

Terminating instructions refers to the instructions either:

  • ending function execution: RETF, or
  • ending whole EVM execution: STOP, RETURN, REVERT, INVALID.

For each reachable instruction in the code the operand stack height is recorded. The first instruction has the recorded stack height equal to the number of inputs to the function type matching the code (type[code_section_index].inputs).

The FIFO queue worklist is used for tracking "to be inspected" instructions. Initially, it contains the first instruction.

While worklist is not empty, dequeue an instruction and:

  1. Determine the effect the instruction has on the operand stack:
    1. Check if the recorded stack height satisfies the instruction requirements. Specifically:
      • for CALLF instruction the recorded stack height must be at least the number of inputs of the called function according to its type defined in the type section.
      • for RETF instruction the recorded stack height must be exactly the number of outputs of the function matching the code.
    2. Compute new stack height after the instruction execution.
  2. Determine the list of successor instructions that can follow the current instructions:
    1. The next instruction for all instructions other than terminating instructions and unconditional jump.
    2. All targets of a conditional or unconditional jump.
  3. For each successor instruction:
    1. Check if the instruction is present in the code (i.e. execution must not "fall off" the code).
    2. If the instruction does not have stack height recorded (visited for the first time):
      1. Record the instruction stack height as the value computed in 1.2.
      2. Add the instruction to the worklist.
    3. Otherwise, check if the recorded stack height equals the value computed in 1.2.

When worklist is empty:

  1. Check if all instruction were reached by the analysis.
  2. Determine the function maximum operand stack height:
    1. Compute the maximum stack height as the maximum of all recorded stack heights.
    2. Check if the maximum stack height does not exceed the limit of 1023 (0x3FF).
    3. Check if the maximum stack height matches the corresponding code section's max_stack_height within the type section.

The computational and space complexity of this pass is O(len(code)). Each instruction is visited at most once.

Execution

Given the deploy-time validation guarantees, an EVM implementation is not required anymore to have run-time stack underflow nor overflow checks for each executed instruction. The exception is the CALLF performing operand stack overflow check for the entire called function.

Rationale

Stack overflow check only in CALLF

In this EIP, we provide a more efficient variant of the EVM where stack overflow check is performed only in CALLF instruction using the called function's max_stack_height information. This decreases flexibility of an EVM program because max_stack_height corresponds to the worst-case control-flow path in the function.

Unreachable code

The operand stack validation algorithm rejects any code having any unreachable instructions. This check can be performed very cheaply. It prevents deploying degenerated code. Moreover, it enables combining instruction validation and operand stack validation into single pass.

Clean stack upon termination

It is currently required that the operand stack is empty (in the current function context) after the RETF instruction. Otherwise, the RETF semantic would be more complicated. For n function outputs and s the stack height at RETF the EVM must erase s-n non-top stack items and move the n stack items to the place of erased ones. Cost of such operation may be relatively cheap but is not constant. However, lifting the requirement and modifying the RETF semantic as described above is backward compatible and can be easily introduced in the future.

Backwards Compatibility

This change requires a "network upgrade", since it modifies consensus rules.

It poses no risk to backwards compatibility, as it is introduced only for EOF1 contracts, for which deploying undefined instructions is not allowed, therefore there are no existing contracts using these instructions. The new instructions are not introduced for legacy bytecode (code which is not EOF formatted).

Security Considerations

Needs discussion.

Copyright and related rights waived via CC0.