Node redundancy
Elrond Validator Nodes can be configured to have one or more hot-standby nodes. This means additional nodes will run on different servers, in sync with the Main Validator node. Their role is to stand in for the Main Validator node in case it fails, to ensure high availability.
This is a redundancy mechanism which allows the Main Validator operator to start additional 'n' hot-standby nodes,
each of them running the same 'validatorKey.pem' file. The difference between
configurations consists on an option inside the prefs.toml
file.
Hot standby nodes are configured using the 'RedundancyLevel' option in the 'prefs.toml' configuration file:
- a 0 value will represent that the node is the Main Validator.
The value 0 will be the default, therefore if the option is missing it will still make that node the Main Validator by default. With consideration to backwards compatibility, the already-running Validators are not affected by the addition of this option. Moreover, we never overwrite the
prefs.toml
files during the node's upgrade.
The values of RedundancyLevel
are interpreted as follows:
- a positive value will represent the "order of the hot-standby node" in the automatic fail-over sequence.
Example: suppose we have 3 nodes running with the same BLS key. One has the redundancy level set to 0,
another has 1 and another with 3. The node with level 0 will propose and sign blocks. The other 2 will
sync data with the same shard as the Main Validator (and shuffle in and out of the same shards) but will
not sign anything. If the Main Validator fails, the hot-standby node
with level 1 will start producing/signing blocks after
level*5
missed rounds. So, after 5 missed rounds by the Main Validator, the hot-standby node with level 1 will take the turn. If hot-standby node 1 is down as well, hot-standby node 2 will step in after3*5 = 15 rounds
after the Main Validator failed and 10 rounds after the failed hot-standby node 1 should have been produced a block. - a large value for this level option (say 1 million), or a negative value (say -1) will mean that the hot-standby nodes won't get the chance to produce/sign blocks but will sync with the network and shuffle between shards just as the Main Validator will.
tip
The hot-standby nodes will advertise on the network a different public key (autogenerated at start-up) and thus, concealing the real public key that will be used when signing the header blocks.
tip
If the Main Validator (RedundancyLevel 0) gets back online, the hot-standby node(s) revert to standby mode.
warning
Do not use the same redundancy level on more than one node. Otherwise, the nodes with the same RedundancyLevel
value will start signing blocks in parallel in the same time. Although the protocol is not negatively affected by double signing, in the near future the BLS key that will perform double signing will have its stake slashed.
The random BLS key on hot-standby nodes has the following purposes:
- the hot-standby node(s) will not cause BLS signature re-verification when idle.
- it slightly prevents DDoS attacks as an attacker can not find all IPs behind a targeted BLS public key: when an attacker takes down the Main Validator, the hot-standby nodes will advertise the public key when they will need to sign blocks, but not sooner.