DCIPs/EIPS/eip-3450.md

191 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
eip: 3450
title: Standardized Shamir Secret Sharing Scheme for BIP-39 Mnemonics
author: Daniel Streit (@danielstreit)
discussions-to: https://ethereum-magicians.org/t/erc-3450-standard-for-applying-shamirs-to-bip-39-mnemonics/5844
status: Stagnant
type: Standards Track
category: ERC
created: 2021-03-29
---
## Simple Summary
A standardized algorithm for applying Shamir's Secret Sharing Scheme to BIP-39 mnemonics.
## Abstract
A standardized approach to splitting a BIP-39 mnemonic into _N_ BIP-39 mnemonics, called shares, so that _T_ shares are required to recover the original mnemonic and no information about the original mnemonic, other than its size, is leaked with less than _T_ shares.
## Motivation
We'd like to make it easier for less-technical users to store keys securely.
Currently, many users use BIP-39 mnemonics to store entropy values underlying their keys. These mnemonics are a single point of failure. If lost, the user may never regain access to the assets locked by the keys. If stolen, a malicious actor can steal the assets.
Shamir's Secret Sharing Scheme addresses this concern directly. It creates "shares" of the secret, such that a subset can be used to recover the secret, but only if a minimum threshold of shares is reached. Without the minimum, no information about the original secret is leaked.
One concern with Shamir's Secret Sharing Scheme is there is no canonical, standard implementation. This puts recovery at risk, as tooling may change over time.
Here, we propose a standardized implementation of Shamir's Secret Sharing Scheme applied specifically to BIP-39 mnemonics, so users can easily create shares of their mnemonic, destroy the original, store the shares appropriately, and confidently recover the original mnemonic at a later date.
## Specification
### Shamir's Secret Sharing Scheme
Shamir's Secret Sharing Scheme is a cryptographic method to split a secret into _N_ unique parts, where any _T_ of them are required to reconstruct the secret.
First, a polynomial _f_ of degree _T_ 1 is constructed. Then, each share is a point on the polynomial's curve: an integer _x_, and its corresponding _y_ point _f_(_x_).
With any set of _T_ shares (or points), the initial polynomial can be recovered using polynomial interpolation.
When constructing the initial polynomial, the secret is stored as the coefficient of x<sup>0</sup> and the rest of the coefficients are randomly generated.
### BIP-39 Mnemonics
BIP-39 is a common standard for storing entropy as a list of words. It is easier to work with for human interactions than raw binary or hexadecimal representations of entropy.
BIP-39 mnemonics encode two pieces of data: the original entropy and a checksum of that entropy. The checksum allows the mnemonic to be validated, ensuring that the user entered it correctly.
#### Generating the Mnemonic
The mnemonic must encode entropy in a multiple of 32 bits. With more entropy security is improved but the sentence length increases. We refer to the initial entropy length as ENT. The allowed size of ENT is 128-256 bits.
First, an initial entropy of ENT bits is generated. A checksum is generated by taking the first `ENT / 32` bits of its SHA256 hash. This checksum is appended to the end of the initial entropy. Next, these concatenated bits are split into groups of 11 bits, each encoding a number from 0-2047, serving as an index into a word list. Finally, we convert these numbers into words and use the joined words as a mnemonic sentence.
The following table describes the relation between the initial entropy length (ENT), the checksum length (CS), and the length of the generated mnemonic sentence (MS) in words.
```
CS = ENT / 32
MS = (ENT + CS) / 11
| ENT | CS | ENT+CS | MS |
+-------+----+--------+------+
| 128 | 4 | 132 | 12 |
| 160 | 5 | 165 | 15 |
| 192 | 6 | 198 | 18 |
| 224 | 7 | 231 | 21 |
| 256 | 8 | 264 | 24 |
```
#### Recovering the Entropy
The initial entropy can be recovered by reversing the process above. The mnemonic is converted to bits, where each word is converted to 11 bits representing its index in the word list. The entropy portion is defined in the table above, based on the size of the mnemonic.
#### Word List
This specification only supports the BIP-39 English word list, but this may be expanded in the future.
See [word list](../assets/eip-3450/wordlist.txt).
### Applying Shamir's Scheme to BIP-39 Mnemonics
To ensure that the shares are valid BIP-39 mnemonics, we:
1. Convert the target BIP-39 mnemonic to its underlying entropy
2. Apply Shamir's Scheme to the entropy
3. Convert each resulting share's _y_ value to a BIP-39 mnemonic
By converting to entropy before applying Shamir's Scheme, we omit the checksum from the initial secret, allowing us to calculate a new checksum for each share when converting the share _y_ values to mnemonics, ensuring that they are valid according to BIP-39.
When applying Shamir's Scheme to the entropy, we apply it separately to each byte of the entropy and GF(256) is used as the underlying finite field. Bytes are interpreted as elements of GF(256) using polynomial representation with operations modulo the Rijndael irreducible polynomial _x_<sup>8</sup> + _x_<sup>4</sup> + _x_<sup>3</sup> + _x_ + 1, following AES.
### Share Format
A share represents a point on the curve described by the underlying polynomial used to split the secret. It includes two pieces of data:
- An ID: the _x_ value of the share
- A BIP-39 mnemonic: the _y_ value of the share represented by a mnemonic
### Creating Shares
Inputs: BIP-39 mnemonic, number of shares (_N_), threshold (_T_)
Output: N Shares, each share including an ID, { _x_ | 0 &lt; _x_ &lt; 256 }, and a BIP-39 mnemonic of the same length as the input one
1. Check the following conditions:
- 1 < T <= N < 256
- The mnemonic is valid according to [BIP-39](#generating-the-mnemonic)
2. [Recover the underlying entropy of the mnemonic](#recovering-the-entropy) as a vector of bytes
3. Define values:
- Let _E_ be the byte-vector representation of the mnemonic's entropy
- Let _n_ be the length of _E_
- Let _coeff<sub>1</sub>_, ... , _coeff<sub>T - 1</sub>_ be byte-vectors belonging to GF(256)_<sup>n</sup>_ generated randomly, independently with uniform distribution from a source suitable for generating cryptographic keys
4. Evaluate the polynomial for each share
- For each _x_ from 1 to _N_, evaluate the polynomial _f(x)_ = _E_ + _coeff<sub>1</sub>x<sup>1</sup>_ + ... + _coeff<sub>T - 1</sub>x<sup>T - 1</sup>_, where _x_ is the share ID and _f(x)_ is the share value (as a vector of bytes)
5. Using _f(x)_ as the underlying entropy, [generate a mnemonic](#generating-the-mnemonic) for each share
6. Return the ID and mnemonic for each share
### Recovering the Mnemonic
To recover the original mnemonic, we interpolate a polynomial _f_ from the given set of shares (or points on the polynomial) and evaluate _f(0)_.
#### Polynomial Interpolation
Given a set of _m_ points (_x<sub>i</sub>_, _y<sub>i</sub>_), 1 &le; _i_ &le; _m_, such that no two _x<sub>i</sub>_ values equal, there exists a polynomial that assumes the value _y<sub>i</sub>_ at each point _x<sub>i</sub>_. The polynomial of lowest degree that satisfies these conditions is uniquely determined and can be obtained using the Lagrange interpolation formula given below.
Since Shamir's Secret Sharing Scheme is applied separately to each of the _n_ bytes of the shared mnemonic's entropy, we work with _y<sub>i</sub>_ as a vector of _n_ values, where _y<sub>i</sub>_[_k_] = _f<sub>k</sub>_(_x<sub>i</sub>_), 1 &le; _k_ &le; _n_, and _f<sub>k</sub>_ is the polynomial in the _k_-th instance of the scheme.
#### Interpolate(_x_, {(_x<sub>i</sub>_, _y<sub>i</sub>_), 1 &le; _i_ &le; _m_})
Input: the desired index _x_, a set of index/value-vector pairs {(_x<sub>i</sub>_, _y_<sub>_i_</sub>), 1 &le; _i_ &le; _m_} &subseteq; GF(256) &times; GF(256)<sup>_n_</sup>
Output: the value-vector (_f_<sub>1</sub>(_x_), ... , _f<sub>n</sub>_(_x_))
![f_k(x) = \sum_{i=1}^m y_i[k] \prod_{\underset{j \neq i}{j=1}}^m \frac{x - x_j}{x_i - x_j}](../assets/eip-3450/lagrange.gif)
#### Recover the Mnemonic
Input: A set of _m_ Shares
Output: The original mnemonic
1. [Recover the underlying entropy of each share's mnemonic](#recovering-the-entropy) as a vector of bytes
2. Calculate _E_ = Interpolate(0, [(_x<sub>1</sub>_, _y<sub>1</sub>_),...,(_x<sub>m</sub>_, _y<sub>m</sub>_)]), where _x_ is the share ID and _y_ is the byte-vector of the share's mnemonic's entropy
3. Using _E_ as the underlying entropy, [generate a mnemonic](#generating-the-mnemonic) and return it
## Rationale
### Choice of Field
The field GF(256) was chosen, because the field arithmetic is easy to implement in any programming language and many implementations are already available since it is used in the AES cipher. Although using GF(256) requires that we convert the mnemonic to its underlying entropy as a byte-vector, this is also easy to implement and many implementations of it exist in a variety of programming languages.
GF(2048) was also considered. Using GF(2048), we could have applied Shamir's Scheme directly to the mnemonic, using the word indexes as the values. This would have allowed us to avoid converting the mnemonic to its underlying entropy. But, the resulting shares would not have been valid BIP-39 mnemonics - the checksum portion would not be a valid checksum of the entropy. And, working around this would add considerable complexity.
Another option was GF(2<sup>_n_</sup>) where _n_ is the size of the entropy in bits. We'd still convert the mnemonic to entropy, but then apply Shamir's Scheme over the entire entropy rather than on a vector of values. The downside of this approach is we'd need a different field for each mnemonic strength along with an associated irreducible polynomial. Additionally, this would require working with very large numbers that can be cumbersome to work with in some languages.
### Valid Share Mnemonics and Share IDs
The shares produced by the specification include an ID, in addition to the BIP-39 mnemonic.
Other options could have encoded the share ID into the mnemonic, simplifying storage - only the mnemonic would need to be stored.
One possibility would be to store the ID instead of the checksum in the mnemonic. The downside of this approach is that the shares would not be _valid_ BIP-39 mnemonics because the "checksum" section of the mnemonic would not match the "entropy" section. Shares with valid BIP-39 mnemonics are useful because they are indistinguishable from any other. And users could store the ID in a variety of ways that obscure it.
### Validation on Recovery
We decided _not_ to include a validation mechanism on recovering the original mnemonic. This leaks less information to a potential attacker. There is no indication they've gotten the requisite number of shares until they've obtained _T_ + 1 shares.
We could provide recovery validation by replacing one of the random coefficients with a checksum of the original mnemonic. Then, when recovering the original mnemonic and the polynomial, we could validate that the checksum coefficient is the valid checksum of recovered mnemonic.
## Test Cases
Coming soon.
All implementations must be able to:
- Split and recover each `mnemonic` with the given `numShares` and `threshold`.
- Recover the `mnemonic` from the given `knownShares`.
## Security Considerations
The shares produced by the specification include an ID in addition to the BIP-39 mnemonic. This raises two security concerns:
Users **must** keep this ID in order to recover the original mnemonic. If the ID is lost, or separated from the share mnemonic, it may not be possible to recover the original. (Brute force recovery may or may not be possible depending on how much is known about the number of shares and threshold)
The additional data may hint to an attacker of the existence of other keys and the scheme under which they are stored. Therefore, the ID should be stored in a way that obscures its use.
## Copyright
Copyright and related rights waived via [CC0](../LICENSE.md).