DCIPs/EIPS/eip-4573.md

196 lines
8.3 KiB
Markdown
Raw Normal View History

---
eip: 4573
title: Procedures for the EVM
description: Introduces support for EVM Procedures.
status: Stagnant
type: Standards Track
category: Core
author: Greg Colvin (@gcolvin), Greg Colvin <greg@colvin.org>
discussions-to: https://ethereum-magicians.org/t/eip-4573-named-procedures-for-evm-code-sections/7776
created: 2021-12-16
requires: 2315, 3540, 3670, 3779, 4200
---
## Abstract
Five EVM instructions are introduced to define, call, and return from named EVM _procedures_ and access their _call frames_ in memory - `ENTERPROC`, `LEAVEPROC`, `CALLPROC`, `RETURNPROC`, and `FRAMEADDRESS`.
## Motivation
Currently, Ethereum bytecode has no syntactic structure, and _subroutines_ have no defined interfaces.
We propose to add _procedures_ -- delimited blocks of code that can be entered only by calling into them via defined interfaces.
Also, the EVM currently has no automatic management of memory for _procedures_. So we also propose to automatically reserve call frames on an in-memory stack.
Constraints on the use of _procedures_ must be validated at contract initialization time to maintain the safety properties of [EIP-3779](./eip-3779.md): Valid programs will not halt with an exception unless they run out of gas or recursively overflow stack.
### Prior Art
The terminology is not well-defined, but we will follow Intel in calling the low-level concept _subroutines_ and the higher level concept _procedures_. The distinction is that _subroutines_ are little more than a jump that knows where it came from, whereas procedures have a defined interface and manage memory as a stack. [EIP-2315](./eip-2315.md) introduces _subroutines_, and this EIP introduces _procedures_.
## Specification
### Instructions
#### ENTERPROC (0x??) dest_section: uint8, dest_offset: uint8, n_inputs: uint16, n_outputs: uint16, n_locals: uint16
```
frame_stack.push(FP)
FP -= n_locals * 32
PC +- <length of immediates>
```
Marks the entry point to a procedure
* at offset `dest_offset` from the beginning of the `dest_section`.
* taking `n_inputs` arguments from the data stack,
* returning `n_outputs` values on the `data stack`, and
* reserving `n_locals` words of data in memory on the `frame stack`.
Procedures can only be entered via a `CALLPROC` to their entry point.
#### LEAVEPROC (0x??)
```
FP = frame_stack.pop()
asm RETURNSUB
```
> Pop the `frame stack` and return to the calling procedure using `RETURNSUB`.
Marks the end of a procedure. Each `ENTERPROC` requires a closing `LEAVEPROC`.
*Note: Attempts to jump into a procedure (including its `LEAVEPROC`) from outside of the procedure or to jump or step to `ENTERPROC` at all must be prevented at validation time. `CALLPROC` is the only valid way to enter a procedure.*
#### CALLPROC (0x??) dest_section: uint16, dest_proc: uint16
```
FP -= n_locals
asm JUMPSUB <offset of section> + <offset of procedure>
```
> Allocate a *stack frame* and transfer control and `JUMPSUB` to the Nth (N=*dest_proc*) _procedure_ in the Mth(M=*dest_section*) _section_ of the code. _Section 0_ is the current code section, any other code sections are indexed starting at _1_.
*Note: That the procedure is defined and the required `n_inputs` words are available on the `data stack` must be shown at validation time.*
#### RETURNPROC (0x??)
```
FP += n_locals
asm RETURNSUB
```
> Pop the `frame stack` and return control to the calling procedure using `RETURNSUB`.
*Note: That the promised `n_outputs` words are available on the `data stack` must be shown at validation time.*
#### FRAMEADDRESS (0x??) offset: int16
```
asm PUSH2 FP + offset
```
> Push the address `FP + offset` onto the data stack.
Call frame data is addressed at an immediate `offset` relative to `FP`.
Typical usage includes storing data on a call frame
```
PUSH 0xdada
FRAMEADDRESS 32
MSTORE
```
and loading data from a call frame
```
FRAMEADDRESS 32
MLOAD
```
### Memory Costs
Presently,`MSTORE` is defined as
```
memory[stack[0]...stack[0]+31] = stack[1]
memory_size = max(memory_size,floor((stack[0]+32)÷32)
```
* where `memory_size` is the number of active words of memory above _0_.
We propose to treat memory addresses as signed, so the formula needs to be
```
memory[stack[0]...stack[0]+31] = stack[1]
if (stack[0])+32)÷32) < 0
negative_memory_size = max(negative_memory_size,floor((stack[0]+32)÷32))
else
positive_memory_size = max(positive_memory_size,floor((stack[0]+32)÷32))
memory_size = positive_memory_size + negative_memory_size
```
* where `negative_memory_size` is the number of active words of memory below _0_ and
* where `positive_memory_size` is the number of active words of memory at or above _0_.
### Call Frame Stack
These instructions make use of a `frame stack` to allocate and free frames of local data for _procedures_ in memory. Frame memory begins at address 0 in memory and grows downwards, towards more negative addresses. A frame is allocated for each procedure when it is called, and freed when it returns.
Memory can be addressed relative to the frame pointer `FP` or by absolute address. `FP` starts at 0, and moves downward towards more negative addresses to point to the frame for each `CALLPROC` and moving upward towards less negative addresses to point to the previous frame for the corresponding `RETURNPROC`.
Equivalently, in the EVM's twos-complement arithmetic, `FP` moves from the highest address down, as is common in many calling conventions.
For example, after an initial `CALLPROC` to a procedure needing two words of data the `frame stack` might look like this
```
0-> ........
........
FP->
```
Then, after a further `CALLPROC` to a procedure needing three words of data the `frame stack` would like this
```
0-> ........
........
-64-> ........
........
........
FP->
```
After a `RETURNPROC` from that procedure the `frame stack` would look like this
```
0-> ........
........
FP-> ........
........
........
```
and after a final `RETURNPROC`, like this
```
FP-> ........
........
........
........
........
```
## Rationale
There is actually not much new here. It amounts to [EIP-615](./eip-615.md), refined and refactored into bite-sized pieces, along lines common to other machines.
This proposal uses the [EIP-2315](./eip-2315.md) return stack to manage calls and returns, and steals ideas from [EIP-615](./eip-615.md), [EIP-3336](./eip-3336.md), and [EIP-4200](./eip-4200.md). `ENTERPROC` corresponds to `BEGINSUB` from EIP-615. Like EIP-615 it uses a frame stack to track call-frame addresses with `FP` as _procedures_ are entered and left, but like EIP-3336 and EIP-3337 it moves call frames from the data stack to memory.
Aliasing call frames with ordinary memory supports addressing call-frame data with ordinary stores and loads. This is generally useful, especially for languages like C that provide pointers to variables on the stack.
The design model here is the _subroutines_ and _procedures_ of the Intel x86 architecture.
* `JUMPSUB` and `RETURNSUB` (from [EIP-2315](./eip-2315.md) -- like `CALL` and `RET` -- jump to and return from _subroutines_.
* `ENTERPROC` -- like `ENTER` -- sets up the stack frame for a _procedure_.
* `CALLPROC` amounts to a `JUMPSUB` to an `ENTERPROC`.
* `RETURNPROC` amounts to an early `LEAVEPROC`.
* `LEAVEPROC` -- like `LEAVE` -- takes down the stack frame for a _procedure_. It then executes a `RETURNSUB`.
## Backwards Compatibility
This proposal adds new EVM opcodes. It doesn't remove or change the semantics of any existing opcodes, so there should be no backwards compatibility issues.
## Security
Safe use of these constructs must be checked completely at validation time -- per EIP-3779 -- so there should be no security issues at runtime.
`ENTERPROC` and `LEAVEPROC` must follow the same safety rules as for `JUMPSUB` and `RETURNSUB` in EIP-2315. In addition, the following constraints must be validated:
* Every`ENTERPROC` must be followed by a `LEAVEPROC` to delimit the bodies of _procedures_.
* There can be no nested _procedures_.
* There can be no jump into the body of a procedure (including its `LEAVEPROC`) from outside of that body.
* There can be no jump or step to `BEGINPROC` at all -- only `CALLPROC`.
* The specified `n_inputs` and `n_outputs` must be on the stack.
## Copyright
Copyright and related rights waived via [CC0](../LICENSE.md).