7.5 KiB
eip | title | description | author | discussions-to | status | type | category | created | requires |
---|---|---|---|---|---|---|---|---|---|
5450 | EOF - Stack Validation | Deploy-time validation of stack usage for EOF functions. | Andrei Maiboroda (@gumb0), Paweł Bylica (@chfast), Alex Beregszaszi (@axic), Danno Ferrin (@shemnon) | https://ethereum-magicians.org/t/eip-5450-eof-stack-validation/10410 | Review | Standards Track | Core | 2022-08-12 | 3540, 3670, 4200, 4750 |
Abstract
Introduce extended validation of EOF code sections to guarantee that neither stack underflow nor overflow can happen during execution of validated contracts.
Motivation
The current EVM performs a number of validity checks for each executed instruction, such as checking for instruction being defined, stack overflow and underflow, and enough amount of gas remaining.
This EIP minimizes the number of such checks required at run-time by verifying that no exceptional conditions can happen and preventing the execution and deployment of any invalid code.
The operand stack validation provides several benefits:
- removes run-time stack underflow check for all instructions,
- removes run-time stack overflow check for all instruction except
CALLF
, - ensures that an execution terminates with one of the terminating instructions,
- prevents the deployment of code with unreachable instructions, which discourages the use of code sections for data storage.
It also has some disadvantages:
- adds constraints to the code structure (similar to JVM, CPython bytecode, WebAssembly and others); however, these constraints can be lifted in a backward-compatible manner if they are shown to be user-unfriendly,
- adds second validation pass; however, validation's computational and space complexity remains linear.
The guarantees created by these validation rules also improve the feasabiliy of Ahead-Of-Time and Just-In-Time compilation of EVM code. Single pass transpilation passes can be safely executed with the code validation and advanced stack/register handling can be applied with the stack height validaitons. While not as impactful to a mainnet validator node that is bound mostly by storage state sizes, these can significantly speed up witness validation and other non-mainnet use cases.
Specification
Code validation
Remark: We rely on the notions of operand stack and type section as defined by EIP-4750.
Each code section is validated independently.
Instructions validation
In the first validation phase defined in EIP-3670 (and extended by EIP-4200 and EIP-4750) instructions are inspected independently to check if their opcodes and immediate values are valid.
Operand stack validation
In the second validation phase control-flow analysis is performed on the code.
Operand stack height here refers to the number of stack values accessible by this function, i.e. it does not take into account values of caller functions' frames (but does include this function's inputs). Note that validation procedure does not require actual operand stack implementation, but only to keep track of its height.
Terminating instructions refers to the instructions either:
- ending function execution:
RETF
, or - ending whole EVM execution:
STOP
,RETURN
,REVERT
,INVALID
.
For each reachable instruction in the code the operand stack height is recorded.
The first instruction has the recorded stack height equal to the number of inputs to the function type matching the code (type[code_section_index].inputs
).
The FIFO queue worklist is used for tracking "to be inspected" instructions. Initially, it contains the first instruction.
While worklist is not empty, dequeue an instruction and:
- Determine the effect the instruction has on the operand stack:
- Check if the recorded stack height satisfies the instruction requirements. Specifically:
- for
CALLF
instruction the recorded stack height must be at least the number of inputs of the called function according to its type defined in the type section. - for
RETF
instruction the recorded stack height must be exactly the number of outputs of the function matching the code.
- for
- Compute new stack height after the instruction execution.
- Check if the recorded stack height satisfies the instruction requirements. Specifically:
- Determine the list of successor instructions that can follow the current instructions:
- The next instruction for all instructions other than terminating instructions and unconditional jump.
- All targets of a conditional or unconditional jump.
- For each successor instruction:
- Check if the instruction is present in the code (i.e. execution must not "fall off" the code).
- If the instruction does not have stack height recorded (visited for the first time):
- Record the instruction stack height as the value computed in 1.2.
- Add the instruction to the worklist.
- Otherwise, check if the recorded stack height equals the value computed in 1.2.
When worklist is empty:
- Check if all instruction were reached by the analysis.
- Determine the function maximum operand stack height:
- Compute the maximum stack height as the maximum of all recorded stack heights.
- Check if the maximum stack height does not exceed the limit of 1023 (
0x3FF
). - Check if the maximum stack height matches the corresponding code section's
max_stack_height
within the type section.
The computational and space complexity of this pass is O(len(code)). Each instruction is visited at most once.
Execution
Given the deploy-time validation guarantees, an EVM implementation is not required anymore to have run-time stack underflow nor overflow checks for each executed instruction. The exception is the CALLF
performing operand stack overflow check for the entire called function.
Rationale
Stack overflow check only in CALLF
In this EIP, we provide a more efficient variant of the EVM where stack overflow check is performed only in CALLF
instruction using the called function's max_stack_height
information. This decreases flexibility of an EVM program because max_stack_height
corresponds to the worst-case control-flow path in the function.
Unreachable code
The operand stack validation algorithm rejects any code having any unreachable instructions. This check can be performed very cheaply. It prevents deploying degenerated code. Moreover, it enables combining instruction validation and operand stack validation into single pass.
Clean stack upon termination
It is currently required that the operand stack is empty (in the current function context) after the RETF
instruction.
Otherwise, the RETF
semantic would be more complicated. For n
function outputs and s
the stack height at RETF
the EVM must erase s-n
non-top stack items and move the n
stack items to the place of erased ones. Cost of such operation may be relatively cheap but is not constant.
However, lifting the requirement and modifying the RETF
semantic as described above is backward
compatible and can be easily introduced in the future.
Backwards Compatibility
This change requires a "network upgrade", since it modifies consensus rules.
It poses no risk to backwards compatibility, as it is introduced only for EOF1 contracts, for which deploying undefined instructions is not allowed, therefore there are no existing contracts using these instructions. The new instructions are not introduced for legacy bytecode (code which is not EOF formatted).
Security Considerations
Needs discussion.
Copyright
Copyright and related rights waived via CC0.