DCIPs/EIPS/eip-1444.md

323 lines
14 KiB
Markdown
Raw Normal View History

---
eip: 1444
title: Localized Messaging with Signal-to-Text
author: Brooklyn Zelenka (@expede), Jennifer Cooper (@jenncoop)
discussions-to: https://ethereum-magicians.org/t/eip-1444-localized-messaging-with-signal-to-text/
status: Stagnant
type: Standards Track
category: ERC
created: 2018-09-23
---
## Simple Summary
A method of converting machine codes to human-readable text in any language and phrasing.
## Abstract
An on-chain system for providing user feedback by converting machine-efficient codes into human-readable strings in any language or phrasing. The system does not impose a list of languages, but rather lets users create, share, and use the localizated text of their choice.
## Motivation
There are many cases where an end user needs feedback or instruction from a smart contact. Directly exposing numeric codes does not make for good UX or DX. If Ethereum is to be a truly global system usable by experts and lay persons alike, systems to provide feedback on what happened during a transaction are needed in as many languages as possible.
Returning a hard-coded string (typically in English) only serves a small segment of the global population. This standard proposes a method to allow users to create, register, share, and use a decentralized collection of translations, enabling richer messaging that is more culturally and linguistically diverse.
There are several machine efficient ways of representing intent, status, state transition, and other semantic signals including booleans, enums and [ERC-1066 codes](./eip-1066.md). By providing human-readable messages for these signals, the developer experience is enhanced by returning easier to consume information with more context (ex. `revert`). End user experience is enhanced by providing text that can be propagated up to the UI.
## Specification
### Contract Architecture
Two types of contract: `LocalizationPreferences`, and `Localization`s.
The `LocalizationPreferences` contract functions as a proxy for `tx.origin`.
```diagram
+--------------+
| |
+------> | Localization |
| | |
| +--------------+
|
|
+-----------+ +-------------------------+ | +--------------+
| | | | <------+ | |
| Requestor | <------> | LocalizationPreferences | <-------------> | Localization |
| | | | <------+ | |
+-----------+ +-------------------------+ | +--------------+
|
|
| +--------------+
| | |
+------> | Localization |
| |
+--------------+
```
### `Localization`
A contract that holds a simple mapping of codes to their text representations.
```solidity
interface Localization {
function textFor(bytes32 _code) external view returns (string _text);
}
```
#### `textFor`
Fetches the localized text representation.
```solidity
function textFor(bytes32 _code) external view returns (string _text);
```
### `LocalizationPreferences`
A proxy contract that allows users to set their preferred `Localization`. Text lookup is delegated to the user's preferred contract.
A fallback `Localization` with all keys filled MUST be available. If the user-specified `Localization` has not explicitly set a loalization (ie. `textFor` returns `""`), the `LocalizationPreferences` MUST redelegate to the fallback `Localization`.
```solidity
interface LocalizationPreferences {
function set(Localization _localization) external returns (bool);
function textFor(bytes32 _code) external view returns (bool _wasFound, string _text);
}
```
#### `set`
Registers a user's preferred `Localization`. The registering user SHOULD be considered `tx.origin`.
```solidity
function set(Localization _localization) external;
```
#### `textFor`
Retrieve text for a code found at the user's preferred `Localization` contract.
The first return value (`bool _wasFound`) represents if the text is available from that `Localization`, or if a fallback was used. If the fallback was used in this context, the `textFor`'s first return value MUST be set to `false`, and is `true` otherwise.
```solidity
function textFor(bytes32 _code) external view returns (bool _wasFound, string _text);
```
### String Format
All strings MUST be encoded as [UTF-8](https://www.ietf.org/rfc/rfc3629.txt).
```solidity
"Špeĉiäl chârãçtérs are permitted"
"As are non-Latin characters: アルミ缶の上にあるみかん。"
"Emoji are legal: 🙈🙉🙊🎉"
"Feel free to be creative: (ノ◕ヮ◕)ノ*:・゚✧"
```
### Templates
Template strings are allowed, and MUST follow the [ANSI C `printf`](https://pubs.opengroup.org/onlinepubs/009696799/utilities/printf.html) conventions.
```solidity
"Satoshi's true identity is %s"
```
Text with 2 or more arguments SHOULD use the POSIX parameter field extension.
```solidity
"Knock knock. Who's there? %1$s. %1$s who? %2$s!"
```
## Rationale
### `bytes32` Keys
`bytes32` is very efficient since it is the EVM's base word size. Given the enormous number of elements (card(A) > 1.1579 × 10<sup>77</sup>), it can embed nearly any practical signal, enum, or state. In cases where an application's key is longer than `bytes32`, hashing that long key can map that value into the correct width.
Designs that use datatypes with small widths than `bytes32` (such as `bytes1` in [ERC-1066](./eip-1066.md)) can be directly embedded into the larger width. This is a trivial one-to-one mapping of the smaller set into the the larger one.
### Local vs Globals and Singletons
This spec has opted to not _force_ a single global registry, and rather allow any contract and use case deploy their own system. This allows for more flexibility, and does not restrict the community for opting to use singleton `LocalizationPreference` contracts for common use cases, share `Localization`s between different proxys, delegate translations between `Localization`s, and so on.
There are many practical uses of agreed upon singletons. For instance, translating codes that aim to be fairly universal and integrated directly into the broader ecosystem (wallets, frameworks, debuggers, and the like) will want to have a single `LocalizationPreference`.
Rather the dispersing several `LocalizationPreference`s for different use cases and codes, one could imagine a global "registry of registries". While this approach allows for a unified lookups of all translations in all use cases, it is antithetical to the spirit of decentralization and freedom. Such a system also increases the lookup complexity, places an onus on getting the code right the first time (or adding the overhead of an upgradable contract), and need to account for use case conflicts with a "unified" or centralized numbering system. Further, lookups should be lightweight (especially in cases like looking up revert text).
For these reasons, this spec chooses the more decentralized, lightweight, free approach, at the cost of on-chain discoverability. A registry could still be compiled, but would be difficult to enforce, and is out of scope of this spec.
### Off Chain Storage
A very viable alternative is to store text off chain, with a pointer to the translations on-chain, and emit or return a `bytes32` code for another party to do the lookup. It is difficult to guarantee that off-chain resources will be available, and requires coordination from some other system like a web server to do the code-to-text matching. This is also not compatible with `revert` messages.
### ASCII vs UTF-8 vs UTF-16
UTF-8 is the most widely used encoding at time of writing. It contains a direct embedding of ASCII, while providing characters for most natural languages, emoji, and special characters.
Please see the [UTF-8 Everywhere Manifesto](https://utf8everywhere.org/) for more information.
### When No Text is Found
Returning a blank string to the requestor fully defeats the purpose of a localization system. The two options for handling missing text are:
1. A generic "text not found" message in the preferred language
2. The actual message, in a different language
#### Generic Option
This designed opted to not use generic fallback text. It does not provide any useful information to the user other than to potentially contact the `Localization` maintainer (if one even exists and updating is even possible).
#### Fallback Option
The design outlined in this proposal is to providing text in a commonly used language (ex. English or Mandarin). First, this is the language that will be routed to if the user has yet to set a preference. Second, there is a good chance that a user may have _some_ proficiency with the language, or at least be able to use an automated translation service.
Knowing that the text fell back via `textFor`s first return field boolean is _much_ simpler than attempting language detection after the fact. This information is useful for certain UI cases. for example where there may be a desire to explain why localization fell back.
### Decentralized Text Crowdsourcing
In order for Ethereum to gain mass adoption, users must be able to interact with it in the language, phrasing, and level of detail that they are most comfortable with. Rather than imposing a fixed set of translations as in a traditional, centralized application, this EIP provides a way for anyone to create, curate, and use translations. This empowers the crowd to supply culturally and linguistically diverse messaging, leading to broader and more distributed access to information.
### `printf`-style Format Strings
C-style `printf` templates have been the de facto standard for some time. They have wide compatibility across most languages (either in standard or third-party libraries). This makes it much easier for the consuming program to interpolate strings with low developer overhead.
#### Parameter Fields
The POSIX parameter field extension is important since languages do not share a common word order. Parameter fields enable the reuse and rearrangement of arguments in different localizations.
```solidity
("%1$s is an element with the atomic number %2$d!", "Mercury", 80);
// => "Mercury is an element with the atomic number 80!"
```
#### Simplified Localizations
Localization text does not require use of all parameters, and may simply ignore values. This can be useful for not exposing more technical information to users that would otherwise find it confusing.
```ruby
#!/usr/bin/env ruby
sprintf("%1$s é um elemento", "Mercurio", 80)
# => "Mercurio é um elemento"
```
```clojure
#!/usr/bin/env clojure
(format "Element #%2$s" "Mercury" 80)
;; => Element #80
```
### Interpolation Strategy
Please note that it is highly advisable to return the template string _as is_, with arguments as multiple return values or fields in an `event`, leaving the actual interpolation to be done off chain.
```solidity
event AtomMessage {
bytes32 templateCode;
bytes32 atomCode;
uint256 atomicNumber;
}
```
```javascript
#!/usr/bin/env node
var printf = require('printf');
const { returnValues: { templateCode, atomCode, atomicNumber } } = eventResponse;
const template = await AppText.textFor(templateCode);
// => "%1$s ist ein Element mit der Ordnungszahl %2$d!"
const atomName = await PeriodicTableText.textFor(atomCode);
// => "Merkur"
printf(template, atomName, 80);
// => "Merkur ist ein Element mit der Ordnungszahl 80!"
```
### Unspecified Behaviour
This spec does not specify:
* Public or private access to the default `Localization`
* Who may set text
* Deployer
* `onlyOwner`
* Anyone
* Whitelisted users
* and so on
* When text is set
* `constructor`
* Any time
* Write to empty slots, but not overwrite existing text
* and so on
These are intentionally left open. There are many cases for each of these, and restricting any is fully beyond the scope of this proposal.
## Implementation
```solidity
pragma solidity ^0.4.25;
contract Localization {
mapping(bytes32 => string) private dictionary_;
constructor() public {}
// Currently overwrites anything
function set(bytes32 _code, string _message) external {
dictionary_[_code] = _message;
}
function textFor(bytes32 _code) external view returns (string _message) {
return dictionary_[_code];
}
}
contract LocalizationPreference {
mapping(address => Localization) private registry_;
Localization public defaultLocalization;
bytes32 private empty_ = keccak256(abi.encodePacked(""));
constructor(Localization _defaultLocalization) public {
defaultLocalization = _defaultLocalization;
}
function set(Localization _localization) external returns (bool) {
registry_[tx.origin] = _localization;
return true;
}
function get(bytes32 _code) external view returns (bool, string) {
return get(_code, tx.origin);
}
// Primarily for testing
function get(bytes32 _code, address _who) public view returns (bool, string) {
string memory text = getLocalizationFor(_who).textFor(_code);
if (keccak256(abi.encodePacked(text)) != empty_) {
return (true, text);
} else {
return (false, defaultLocalization.textFor(_code));
}
}
function getLocalizationFor(address _who) internal view returns (Localization) {
if (Localization(registry_[_who]) == Localization(0)) {
return Localization(defaultLocalization);
} else {
return Localization(registry_[tx.origin]);
}
}
}
```
## Copyright
Copyright and related rights waived via [CC0](../LICENSE.md).