DCIPs/EIPS/eip-5450.md

130 lines
7.5 KiB
Markdown

---
eip: 5450
title: EOF - Stack Validation
description: Deploy-time validation of stack usage for EOF functions.
author: Andrei Maiboroda (@gumb0), Paweł Bylica (@chfast), Alex Beregszaszi (@axic), Danno Ferrin (@shemnon)
discussions-to: https://ethereum-magicians.org/t/eip-5450-eof-stack-validation/10410
status: Review
type: Standards Track
category: Core
created: 2022-08-12
requires: 3540, 3670, 4200, 4750
---
## Abstract
Introduce extended validation of EOF code sections to guarantee that neither stack underflow nor overflow can happen during execution of validated contracts.
## Motivation
The current EVM performs a number of validity checks for each executed instruction, such as checking
for instruction being defined, stack overflow and underflow, and enough amount of gas remaining.
This EIP minimizes the number of such checks required at run-time
by verifying that no exceptional conditions can happen
and preventing the execution and deployment of any invalid code.
The operand stack validation provides several benefits:
- removes run-time stack underflow check for all instructions,
- removes run-time stack overflow check for all instruction except `CALLF`,
- ensures that an execution terminates with one of the terminating instructions,
- prevents the deployment of code with unreachable instructions, which discourages the use of code sections for data storage.
It also has some disadvantages:
- adds constraints to the code structure (similar to JVM, CPython bytecode, WebAssembly and others); however, these constraints can be lifted in a backward-compatible manner if they are shown to be user-unfriendly,
- adds second validation pass; however, validation's computational and space complexity remains linear.
The guarantees created by these validation rules also improve the feasabiliy of Ahead-Of-Time and Just-In-Time compilation of EVM code. Single pass transpilation passes can be safely executed with the code validation and advanced stack/register handling can be applied with the stack height validaitons. While not as impactful to a mainnet validator node that is bound mostly by storage state sizes, these can significantly speed up witness validation and other non-mainnet use cases.
## Specification
### Code validation
*Remark:* We rely on the notions of *operand stack* and *type section* as defined by [EIP-4750](./eip-4750.md).
Each code section is validated independently.
#### Instructions validation
In the first validation phase defined in [EIP-3670](./eip-3670.md) (and extended by [EIP-4200](./eip-4200.md) and [EIP-4750](./eip-4750.md)) instructions are inspected independently to check if their opcodes and immediate values are valid.
#### Operand stack validation
In the second validation phase control-flow analysis is performed on the code.
*Operand stack height* here refers to the number of stack values accessible by this function, i.e. it does not take into account values of caller functions' frames (but does include this function's inputs). Note that validation procedure does not require actual operand stack implementation, but only to keep track of its height.
*Terminating instructions* refers to the instructions either:
- ending function execution: `RETF`, or
- ending whole EVM execution: `STOP`, `RETURN`, `REVERT`, `INVALID`.
For each reachable instruction in the code the operand stack height is recorded.
The first instruction has the recorded stack height equal to the number of inputs to the function type matching the code (`type[code_section_index].inputs`).
The FIFO queue *worklist* is used for tracking "to be inspected" instructions. Initially, it contains the first instruction.
While *worklist* is not empty, dequeue an instruction and:
1. Determine the effect the instruction has on the operand stack:
1. **Check** if the recorded stack height satisfies the instruction requirements. Specifically:
- for `CALLF` instruction the recorded stack height must be at least the number of inputs of the called function according to its type defined in the type section.
- for `RETF` instruction the recorded stack height must be exactly the number of outputs of the function matching the code.
2. Compute new stack height after the instruction execution.
2. Determine the list of successor instructions that can follow the current instructions:
1. The next instruction for all instructions other than terminating instructions and unconditional jump.
2. All targets of a conditional or unconditional jump.
3. For each successor instruction:
1. **Check** if the instruction is present in the code (i.e. execution must not "fall off" the code).
2. If the instruction does not have stack height recorded (visited for the first time):
1. Record the instruction stack height as the value computed in 1.2.
2. Add the instruction to the *worklist*.
3. Otherwise, **check** if the recorded stack height equals the value computed in 1.2.
When *worklist* is empty:
1. **Check** if all instruction were reached by the analysis.
2. Determine the function maximum operand stack height:
1. Compute the maximum stack height as the maximum of all recorded stack heights.
2. **Check** if the maximum stack height does not exceed the limit of 1023 (`0x3FF`).
3. **Check** if the maximum stack height matches the corresponding code section's `max_stack_height` within the type section.
The computational and space complexity of this pass is *O(len(code))*. Each instruction is visited at most once.
### Execution
Given the deploy-time validation guarantees, an EVM implementation is not required anymore to have run-time stack underflow nor overflow checks for each executed instruction. The exception is the `CALLF` performing operand stack overflow check for the entire called function.
## Rationale
### Stack overflow check only in CALLF
In this EIP, we provide a more efficient variant of the EVM where stack overflow check is performed only in `CALLF` instruction using the called function's `max_stack_height` information. This decreases flexibility of an EVM program because `max_stack_height` corresponds to the worst-case control-flow path in the function.
### Unreachable code
The operand stack validation algorithm rejects any code having any unreachable instructions. This check can be performed very cheaply. It prevents deploying degenerated code. Moreover, it enables combining instruction validation and operand stack validation into single pass.
### Clean stack upon termination
It is currently required that the operand stack is empty (in the current function context) after the `RETF` instruction.
Otherwise, the `RETF` semantic would be more complicated. For `n` function outputs and `s` the stack height at `RETF` the EVM must erase `s-n` non-top stack items and move the `n` stack items to the place of erased ones. Cost of such operation may be relatively cheap but is not constant.
However, lifting the requirement and modifying the `RETF` semantic as described above is backward
compatible and can be easily introduced in the future.
## Backwards Compatibility
This change requires a "network upgrade", since it modifies consensus rules.
It poses no risk to backwards compatibility, as it is introduced only for EOF1 contracts, for which deploying undefined instructions is not allowed, therefore there are no existing contracts using these instructions. The new instructions are not introduced for legacy bytecode (code which is not EOF formatted).
## Security Considerations
Needs discussion.
## Copyright
Copyright and related rights waived via [CC0](../LICENSE.md).