forked from DecentralizedClimateFoundation/DCIPs
268 lines
14 KiB
Markdown
268 lines
14 KiB
Markdown
|
---
|
||
|
eip: 3690
|
||
|
title: EOF - JUMPDEST Table
|
||
|
description: A special EOF section for storing the list of JUMPDESTs, which simplifies execution time analysis.
|
||
|
author: Alex Beregszaszi (@axic), Paweł Bylica (@chfast), Andrei Maiboroda (@gumb0)
|
||
|
discussions-to: https://ethereum-magicians.org/t/eip-3690-eof-jumpdest-table/6806
|
||
|
status: Stagnant
|
||
|
type: Standards Track
|
||
|
category: Core
|
||
|
created: 2021-06-23
|
||
|
requires: 3540, 3670
|
||
|
---
|
||
|
|
||
|
## Abstract
|
||
|
|
||
|
Introduce a section in the EOF format ([EIP-3540](./eip-3540.md)) for storing the list of `JUMPDEST`s, validate the correctness of this list at the time of contract creation, and remove the need for `JUMPDEST`-analysis at execution time. In EOF contracts, the `JUMPDEST` instruction is not needed anymore and becomes invalid. Legacy contracts are entirely unaffected by this change.
|
||
|
|
||
|
## Motivation
|
||
|
|
||
|
Currently existing contracts require no validation of correctness, but every time they are executed, a list must be built containing all the valid jump-destinations. This is an overhead which can be avoided, albeit the effect of the overhead depends on the client implementation.
|
||
|
|
||
|
With the structure provided by EIP-3540 it is easy to store and transmit a table of valid jump-destinations instead of using designated `JUMPDEST` (0x5b) opcodes in the code.
|
||
|
|
||
|
The goal of this change is that we trade less complexity (and processing time) at execution time for more complexity at contract creation time. Through benchmarks we have identified that the mandatory execution preparation time is the same as before for extreme cases (i.e. deliberate edge cases), while it is ~10x faster for the average case.
|
||
|
|
||
|
Finally, this change puts an implicit bound on "initcode analysis" which is now limited to jumpdests section loading of max size of 0xffff. The legacy code remains vulnerable.
|
||
|
|
||
|
## Specification
|
||
|
|
||
|
This feature is introduced on the very same block [EIP-3540](./eip-3540.md) is enabled, therefore every EOF1-compatible bytecode MUST have a JUMPDEST-table if it uses jumps.
|
||
|
|
||
|
*Remark:* We rely on the notation of *initcode*, *code* and *creation* as defined by [EIP-3540](./eip-3540.md), and extend validation rules of [EIP-3670](./eip-3670.md).
|
||
|
|
||
|
### EOF container changes
|
||
|
|
||
|
1. A new EOF section called `jumpdests` (`section_kind = 3`) is introduced. It contains a sequence of *n* unsigned integers *jumploc<sub>i</sub>*.
|
||
|
2. The *jumploc<sub>i</sub>* values are encoded with [unsigned LEB128](https://en.wikipedia.org/wiki/LEB128#Unsigned_LEB128).
|
||
|
|
||
|
| description | encoding |
|
||
|
|---------------------|-----------------|
|
||
|
| jumploc<sub>0</sub> | unsigned LEB128 |
|
||
|
| jumploc<sub>1</sub> | unsigned LEB128 |
|
||
|
| ... | |
|
||
|
| jumploc<sub>n</sub> | unsigned LEB128 |
|
||
|
|
||
|
3. The jump destinations represent the set of valid code positions as arguments to jump instructions. They are delta-encoded therefore partial sum must be performed to retrieve the absolute offsets.
|
||
|
```python
|
||
|
def jumpdest(n: int, jumpdests_table: list[int]) -> int:
|
||
|
return sum(jumpdests_table[:n+1])
|
||
|
```
|
||
|
|
||
|
### Validation rules
|
||
|
|
||
|
> This section extends contact creation validation rules (as defined in EIP-3540).
|
||
|
|
||
|
4. The `jumpdests` section MUST be present if and only if the `code` section contains `JUMP` or `JUMPI` opcodes.
|
||
|
5. If the `jumpdests` section is present it MUST directly precede the `code` section. In this case a valid EOF bytecode will have the form of `format, magic, version, [jumpdests_section_header], code_section_header, [data_section_header], 0, [jumpdests_section_contents], code_section_contents, [data_section_contents]`.
|
||
|
6. The LEB128 encoding of a `jumploc` must be valid: the encoding must be complete and not read out of `jumpdests` section. As an additional constraint, the shortest possible encoding must be used.
|
||
|
7. With an exception of the first entry, the value of `jumploc` MUST NOT be 0.
|
||
|
8. Every `jumploc` MUST point to a valid opcode. They MUST NOT point into PUSH-data or outside of the code section.
|
||
|
9. The `JUMPDEST` (0x5b) instruction becomes undefined (Note: According to rules of EIP-3670, deploying the code will fail if it contains `JUMPDEST`)
|
||
|
|
||
|
### Execution
|
||
|
|
||
|
10. When executing `JUMP` or `JUMPI` instructions, the jump destination MUST be in the `jumpdests` table. Otherwise, the execution aborts with *bad jump destination*. In case of `JUMPI`, the check is done only when the jump is to be taken (no change to the previous behaviour).
|
||
|
|
||
|
## Rationale
|
||
|
|
||
|
### Jumpdests section is bounded
|
||
|
|
||
|
The length of the `jumpdests` section is bounded by the EOF maximum section size value 0xffff. Moreover, for deployed code this additionally limited by the max bytecode size 0x6000. Then any valid `jumpdests` section may not be more larger than 0x3000.
|
||
|
|
||
|
### Delta encoding
|
||
|
|
||
|
Delta-encoding is very efficient for this job. From a quick analysis of a small set of contracts `JUMPDEST` opcodes are relatively close to each other. In the delta-encoding the values almost never exceed 128. Combined with any form of variable-length quantity (VLQ) where values < 128 occupy one byte, encoding of single jumpdest takes ~1 byte. We also remove `JUMPDEST` opcodes from the code section therefore the total bytecode length remains the same if extreme examples are ignored.
|
||
|
|
||
|
By extreme examples we mean contracts having a distance between two subsequent JUMPDESTs larger than 128. Then the LEB128 encoding of such distance requires more than one byte and the total bytecode size will increase by the additional number of bytes used.
|
||
|
|
||
|
### LEB128 for offsets
|
||
|
|
||
|
The LEB128 encoding is the most popular VLQ used in DWARF and WebAssembly.
|
||
|
|
||
|
LEB128 allows encoding a fixed value with arbitrary number of bytes by having zero payloads for most significant bits of the value. To ensure there exists only single encoding of a given value, we additionally require the shortest possible LEB128 encoding to be used. This constraint is also required by WebAssembly.
|
||
|
|
||
|
### Size-prefix for offsets
|
||
|
|
||
|
This is another option for encoding inspired by UTF-8. The benefit is that the number of following bytes is encoded in the first byte (the top two bits), so the expected length is known upfront.
|
||
|
|
||
|
A simple decoder is the following:
|
||
|
```python
|
||
|
def decode(input: bytes) -> int:
|
||
|
size_prefix = input[0] >> 6
|
||
|
if size_prefix == 0:
|
||
|
return input[0] & 0x3f
|
||
|
elif size_prefix == 1:
|
||
|
return (input[0] & 0x3f) << 8 | input[1]
|
||
|
elif size_prefix == 2:
|
||
|
return (input[0] & 0x3f) << 16 | (input[1] << 8) | input[2]
|
||
|
# Do not support case 3
|
||
|
assert(False)
|
||
|
```
|
||
|
|
||
|
### Empty table
|
||
|
|
||
|
In case code does not use jumps, an empty JUMPDEST table is represented by omitting `jumpdests` section as opposed to a section that is always present, but allowed to be empty. This is consistent with the requirement of EIP-3540 for section size to be non-zero. Additionally, omitting the section saves 3 bytes of code storage.
|
||
|
|
||
|
### Why jumpdests before code?
|
||
|
|
||
|
The contents of `jumpdests` section are always needed to start EVM execution. For chunked and/or merkleized bytecode it is more efficient to have `jumpdests` just after the EOF header so they can share the same first chunk(s).
|
||
|
|
||
|
### Code chunking / merkleization
|
||
|
|
||
|
In code chunking the contract code is split into (fixed size) chunks. Due to the requirement of jumpdest-analysis, it must be known where the first instruction starts in a given chunk, in case the split happened within a PUSH-data. This is commonly accomplished with reserving the first byte of the chunk as the "first instruction offset" (FIO) field.
|
||
|
|
||
|
With this EIP, code chunking does not need to have such a field. However, the jumpdest table must be provided instead (for all the chunks up until the last jump location used during execution).
|
||
|
|
||
|
### Benchmarks / performance analysis
|
||
|
|
||
|
We compared the performance of `jumpdests` section loading to JUMPDEST analysis in evmone/Baseline interpreter. In both cases a bitset of valid jumpdest positions is built.
|
||
|
|
||
|
We used the worst case for `jumpdests` section as the benchmark baseline. This is the case where every position in the code section is valid jumpdest. I.e. the bytecode has as many jumpdests as possible making the jumpdests section as large as possible. The encoded representation is `0x00, 0x01, 0x01, 0x01, ...`.
|
||
|
|
||
|
This also happen to be the worst case for the JUMPDEST analysis.
|
||
|
|
||
|
Further, we picked 5 popular contracts from the Ethereum mainnet.
|
||
|
|
||
|
| case | size | num JUMPDESTs | JUMPDEST analysis (cycles/byte) | jumpdests load (cycles/byte) | performance change |
|
||
|
| ----------------- | ----- | ----- | ---- | ---- | ------- |
|
||
|
| worst | 65535 | 65535 | 9.11 | 9.36 | 2.75% |
|
||
|
| RoninBridge | 1760 | 71 | 3.57 | | -89.41% |
|
||
|
| UniswapV2ERC20 | 2319 | 61 | 2.10 | | -88.28% |
|
||
|
| DepositContract | 6358 | 123 | 1.86 | | -90.24% |
|
||
|
| TetherToken | 11075 | 236 | 1.91 | | -89.58% |
|
||
|
| UniswapV2Router02 | 21943 | 468 | 2.26 | | -91.17% |
|
||
|
|
||
|
For the worst case the performance difference between JUMPDEST analysis and jumpdests section loading is very small. The performance very slow compared to memory copy (0.15 cycles/byte).
|
||
|
|
||
|
However, the maximum length for the worst cases is different. For JUMPDEST analysis this is 24576 (0x6000) for deployed contracts and only limited by EVM memory cost in case of _initcode_ (can be over 1MB). For jumpdests sections, the limit is 12288 for deployed contracts (the deployed bytecode length limit must be split equally between jumpdests and code sections). For _initcode_ case, the limit is 65535 because this is the maximum section size allowed by EOF.
|
||
|
|
||
|
For "popular" contracts the gained efficiency is ~10x because the jumpdests section is relatively small compared to the code section and therefore there is much less bytes to loop over than in JUMPDEST analysis.
|
||
|
|
||
|
## Reference Implementation
|
||
|
|
||
|
We extend the `validate_code()` function of [EIP-3670](./eip-3670.md):
|
||
|
```python
|
||
|
# The same table as in EIP-3670
|
||
|
valid_opcodes = ...
|
||
|
|
||
|
# Remove JUMPDEST from the list of valid opcodes
|
||
|
valid_opcodes.remove(0x5b)
|
||
|
|
||
|
# This helper decodes a single unsigned LEB128 encoded value
|
||
|
# This will abort on truncated (short) input
|
||
|
def leb128u_decode(input: bytes) -> (int, int):
|
||
|
ret = 0
|
||
|
shift = 0
|
||
|
consumed_bytes = 0
|
||
|
while True:
|
||
|
# Check for truncated input
|
||
|
assert(consumed_bytes < len(input))
|
||
|
# Only allow up to 4-byte long leb128 encodings
|
||
|
assert(consumed_bytes <= 3)
|
||
|
input_byte = input[consumed_bytes]
|
||
|
consumed_bytes += 1
|
||
|
ret |= (input_byte & 0x7f) << shift
|
||
|
if (input_byte & 0x80) == 0:
|
||
|
# Do not allow additional leading zero bits.
|
||
|
assert(input_byte != 0 || consumed_bytes == 0)
|
||
|
break
|
||
|
shift += 7
|
||
|
return (ret, consumed_bytes)
|
||
|
|
||
|
# This helper parses the jumpdest table into a list of relative offsets
|
||
|
# This will abort on truncated (short) input
|
||
|
def parse_table(input: bytes) -> list[int]:
|
||
|
jumpdests = []
|
||
|
pos = 0
|
||
|
while pos < len(input):
|
||
|
value, consumed_bytes = leb128u_decode(input[pos:])
|
||
|
jumpdests.append(value)
|
||
|
pos += consumed_bytes
|
||
|
return jumpdests
|
||
|
|
||
|
# This helper translates the delta offsets into absolute ones
|
||
|
# This will abort on invalid 0-value entries
|
||
|
def process_jumpdests(delta: list[int]) -> list[int]:
|
||
|
jumpdests = []
|
||
|
partial_sum = 0
|
||
|
first = True
|
||
|
for d in delta:
|
||
|
if first:
|
||
|
first = False
|
||
|
else:
|
||
|
assert(d != 0)
|
||
|
partial_sum += d
|
||
|
jumpdests.append(partial_sum)
|
||
|
return jumpdests
|
||
|
|
||
|
# Fails with assertion on invalid code
|
||
|
# Expects list of absolute jumpdest offsets
|
||
|
def validate_code(code: bytes, jumpdests: list[int]):
|
||
|
pos = 0
|
||
|
while pos < len(code):
|
||
|
# Ensure the opcode is valid
|
||
|
opcode = code[pos]
|
||
|
pos += 1
|
||
|
assert(opcode in valid_opcodes)
|
||
|
|
||
|
# Remove touched offset
|
||
|
try:
|
||
|
jumpdests.remove(pos)
|
||
|
except ValueError:
|
||
|
pass
|
||
|
|
||
|
# Skip pushdata
|
||
|
if opcode >= 0x60 and opcode <= 0x7f:
|
||
|
pos += opcode - 0x60 + 1
|
||
|
|
||
|
# Ensure last PUSH doesn't go over code end
|
||
|
assert(pos == len(code))
|
||
|
|
||
|
# The table is invalid if there are untouched locations
|
||
|
assert(len(jumpdests) == 0)
|
||
|
```
|
||
|
|
||
|
## Test Cases
|
||
|
|
||
|
#### Valid bytecodes
|
||
|
|
||
|
- No jumpdests
|
||
|
- Every byte is a jumpdest
|
||
|
- Distant jumpdests (0x7f and 0x3f01 bytes apart)
|
||
|
- Max number of jumpdests
|
||
|
- 1-byte offset encoding: initcode of max size (64K) with jumpdest at each byte - table contains 65536 1-byte offsets, first one is 0x00, all others equal 0x01
|
||
|
- 2-byte offset encoding: inicode of max size with jumpdests 0x80 (128) bytes apart - table contains 512 offsets, first one is 0x7f (127), all others equal 0x8001
|
||
|
- 3-byte offset encoding: inicode of max size with jumpdests 0x4000 (16384) bytes apart - table contains 4 offsets: 0xFF7F (16383), 0x808001, 0x808001, 0x808001
|
||
|
|
||
|
#### Invalid bytecodes
|
||
|
|
||
|
- Empty jumpdest section
|
||
|
- Multiple jumpdest sections
|
||
|
- jumpdest section after the code section
|
||
|
- jumpdest section after the data section
|
||
|
- Final jumploc in the table is truncated (not a valid LEB128)
|
||
|
- LEB128 encoding with extra 0s (non-minimal encoding)
|
||
|
- Jumpdest location pointing to PUSH data
|
||
|
- Jumpdest location out of code section bounds
|
||
|
- pointing into data section
|
||
|
- pointing into jumpdest section
|
||
|
- pointing outside container bounds
|
||
|
- Duplicate jumpdest locations (0 deltas in table other than 1st offset)
|
||
|
- Code containing `JUMP` but no jumpdest table
|
||
|
- Code containing `JUMPI` but no jumpdest table
|
||
|
- Code containing jumpdest table but not `JUMP`/`JUMPI`
|
||
|
- Code containing `JUMPDEST`
|
||
|
|
||
|
## Backwards Compatibility
|
||
|
|
||
|
This change poses no risk to backwards compatibility, as it is introduced at the same time EIP-3540 is. The requirement of a JUMPDEST table does not cover legacy bytecode.
|
||
|
|
||
|
## Security Considerations
|
||
|
|
||
|
The authors are not aware of any security or DoS risks posed by this change.
|
||
|
|
||
|
## Copyright
|
||
|
|
||
|
Copyright and related rights waived via [CC0](../LICENSE.md).
|