--- eip: 2464 title: "eth/65: transaction announcements and retrievals" description: Introduces `NewPooledTransactionHashes`, `GetPooledTransactions`, and `PooledTransactions`. author: Péter Szilágyi , Péter Szilágyi (@karalabe), Gary Rong , Tim Beiko (@timbeiko) discussions-to: https://github.com/ethereum/EIPs/issues/2465 status: Final type: Standards Track category: Networking created: 2020-01-13 requires: 2364 --- ## Abstract This EIP introduces three additional message types into the `eth` protocol (releasing a new version, `eth/65`): `NewPooledTransactionHashes (0x08)` to announce a set of transactions without their content; `GetPooledTransactions (0x09)` to request a batch of transactions by their announced hash; and `PooledTransactions (0x0a)` to reply to a transaction request. This permits reducing the bandwidth used for transaction propagation from linear complexity in the number of peers to square root; and also reducing the initial transaction exchange from 10s-100s MB to `len(pool) * 32B ~= 128KB`. ## Motivation The `eth` network protocol has two ways to propagate a newly mined block: it can be broadcast to a peer in its entirety (via `NewBlock (0x07)` in `eth/64` and prior or it can be announced only (via `NewBlockHashes (0x01)`). This duality allows nodes to do the high-bandwidth broadcasting (10s-100s KB) for a square root number of peers; and the low-bandwidth announcing (10s-100s B) for the remaining linear number of peers. The square root broadcast is enough to reach all well connected nodes, but the linear announce is needed to get across degenerate topologies. This works well. The `eth` protocol, however, does not have a similar dual mechanism for propagating transactions, so nodes need to rely on broadcasting (via `Transactions (0x02)`). To cater for degenerate topologies, transactions cannot be broadcast square rooted, rather they need to be transferred linearly to all peers. With N peers, each node will transfer the same transaction N times (counting both directions), whereas 1 would be enough in a perfect world. This is a significant waste. A similar issue arises when a new network connection is made between two nodes, as they need to sync up their transaction pools, but the pool is just a soup of dangling transactions. Without a way to deduplicate transactions remotely, each node is forced to naively transfer their entire list of transactions to the other side. With pools containing thousands of transactions, a naive transfer amounts to 10s-100s MB, most of which is useless. There is no better way, however. This EIP introduces three additional message types into the `eth` protocol (releasing a new version, `eth/65`): `NewPooledTransactionHashes (0x08)` to announce a set of transactions without their content; `GetPooledTransactions (0x09)` to request a batch of transactions by their announced hash; and `PooledTransactions (0x0a)` to reply to a transaction request. This permits reducing the bandwidth used for transaction propagation from linear complexity in the number of peers to square root; and also reducing the initial transaction exchange from 10s-100s MB to `len(pool) * 32B ~= 128KB`. With transaction throughput (and size) picking up in Ethereum, transaction propagation is the current dominant component of the used network resources. Most of these resources are however wasted, as the `eth` protocol does not have a mechanism to deduplicate transactions remotely, so the same data is transferred over and over again across all network connections. This EIP proposes a tiny extension to the `eth` protocol, which permits nodes to agree on the set of transactions that need to be transferred across a network connection, before doing the costly exchange. This should help reduce the global (operational) bandwidth usage of the Ethereum network by at least an order of magnitude. ## Specification Add three new message types to the `eth` protocol: * `NewPooledTransactionHashes (0x08): [hash_0: B_32, hash_1: B_32, ...]` * Specify one or more transactions that have appeared in the network and which have **not yet been included in a block**. To be maximally helpful, nodes should inform peers of all transactions that they may not be aware of. * There is **no protocol violating hard cap** on the number of hashes a node may announce to a remote peer (apart from the 10MB `devp2p` network packet limit), but 4096 seems a sane chunk (128KB) to avoid a single packet hogging a network connection. * Nodes should only announce hashes of transactions that the remote peer could reasonably be considered not to know, but it is better to be over zealous than to have a nonce gap in the pool. * `GetPooledTransactions (0x09): [hash_0: B_32, hash_1: B_32, ...]` * Specify one or more transactions to retrieve from a remote peer's **transaction pool**. * There is **no protocol violating hard cap** on the number of transactions a node may request from a remote peer (apart from the 10MB `devp2p` network packet limit), but the recipient may enforce an arbitrary cap on the reply (size or serving time), which **must not** be considered a protocol violation. To keep wasted bandwidth down (unanswered hashes), 256 seems like a sane upper limit. * `PooledTransactions (0x0a): [[nonce: P, receivingAddress: B_20, value: P, ...], ...]` * Specify transactions **from the local transaction pool** that the remote node requested via a `GetPooledTransactions (0x09)` message. The items in the list are transactions in the format described in the main Ethereum specification. * The transactions **must** be in same order as in the request, but it is **ok** to skip transactions that are not available. This way if the response size limit is reached, requesters will know which hashes to request again (everything from the last returned transaction) and which to assume unavailable (all gaps before the last returned transaction). * A peer may respond with an empty reply **iff** none of the hashes match transactions in its pool. It is allowed to announce a transaction that will not be served later if it gets included in a block in between. ## Rationale **Q: Why limit `GetPooledTransactions (0x09)` to retrieving items from the pool?** Apart from the transaction pool, transactions in Ethereum are always bundled together by the hundreds in block bodies and existing network retrievals honor this data layout. Allowing direct access to individual transactions in the database has no actionable use case, but would expose costly database reads into the network. For transaction propagation purposes there is no reason to allow disk access, as any transaction finalized to disk will be broadcast inside a block anyway, so at worse there is a few hundred millisecond delay when a node gets the transaction. Block propagation may be made a bit more optimal by transferring the contained transactions on demand only, but that is a whole EIP in itself, so better relax the protocol when all the requirements are known and not in advance. It would probably be enough to maintain a set of transactions included in recent blocks in memory. **Q: Should `NewPooledTransactionHashes (0x08)` deduplicate from disk?** Similarly to `GetPooledTransactions (0x09)`, `NewPooledTransactionHashes (0x08)` should also only operate on the transaction pool and should ignore the disk altogether. During healthy network conditions, a transaction will propagate through much faster than it's included in a block, so it will essentially be non-existent that a newly announced transaction is already on disk. By avoiding disk deduplication, we can avoid a DoS griefing by remote transaction announces. If we want to be really correct and avoid even the slightest data race when deduplicating announcements, we can use the same recently-included-transactions trick that we discussed above to discard announcements that have recently become stale. **Q: Why not reuse `Transaction (0x02)` instead of a new `PooledTransactions (0x0a)`?** Originally this EIP reused the existing `Transaction (0x02)` message as the reply to the `GetPooledTransactions (0x09)` request. This makes client code more complicated, because nodes constantly gossip `Transaction (0x02)` messages to each other as broadcasts, so it's hard to match up which of the many messages is the actual reply to the request. By keeping `Transaction (0x02)` and `PooledTransactions (0x0a)` as separate messages, we can also leave the protocol more flexible for future optimizations (e.g. adding request IDs, which are meaningless for gossip broadcasts). ## Backwards Compatibility This EIP extends the `eth` protocol in a backwards incompatible way and requires rolling out a new version, `eth/65`. However, `devp2p` supports running multiple versions of the same wire protocol side-by-side, so rolling out `eth/65` does not require client coordination, since non-updated clients can keep using `eth/64`. This EIP does not change the consensus engine, thus does _not_ require a hard fork. ## Security Considerations None. ## Copyright Copyright and related rights waived via [CC0](../LICENSE.md).