Nimbus update: September 11th

tl;dr

  • We have a new, simpler sync that greatly speeds up syncing time: we invite you to test it out by updating to the latest devel, removing your database (make clean-medalla) and resyncing ;)
  • If that doesn't sound like your cup of tea, we have a special database pruner to help you prune your existing database.
  • Our multiclient interop scripts are working as expected with other clients: we're now able to start a fleet of Lighthouse, Nimbus, and Prysm nodes for local testing (and hopefully we'll soon be adding Teku to that list too).
  • We've fixed most of the issues raised in the first phases of our audit.
  • We have a new troubleshooting page.  This should be your first port of call if you're experiencing any problems on Medalla.

Audit: quick summary

Over the last few weeks, the team has been focused on resolving security issues raised during the first phases of our audit. We've fixed most of the issues raised already. Some of these fixes included improving the nim standard library to better support security-focused applications like ours.

Some basic stats so far:

Medalla

Since Medalla recovered from the Prysm incident over three weeks ago, everything has been pretty smooth sailing. There are nearly 50,000 active validators, and another 10,000 waiting in the activation queue. Participation has been healthy, hovering around the 75-80% mark, and we've had no finalisation problems.

It looks like there will be a (short-lived) test of the genesis process at the end of this month (the resulting beacon-chain will run for a couple of days), applying lessons learned from Medalla. Rest assured, this won’t disrupt the existing Medalla testnet in any way. For those of you who are eager to participate, we'll be annoucing more on this soon.

Three outstanding issues

On our side, we still have three main issues to iron out (the second of which, we've made very good progess on). Our current focus is to be able to run Nimbus without restarting every few hours. In order to achieve this, we need to address three things:

  1. The size of the log file. As it stands, the log file increases by several GBs every few days, which is clearly not sustainable. To fix this, we plan on changing the default logging to something less verbose. By popular request, we're also planning on implementing a new logging level that allows users to keep track of just their own attestations and blocks.
  2. The size of the database (it's fast approaching 100 GB). To get this size down to something reasonable, we need to start pruning the database of unnecessary blocks and states. Up until now, pruning has been deprioritised over other optimisations (since disk space is cheap). Now that we're making good progress with optimizations, we're starting to focus on the database too.
  3. A RAM resource leak. RAM usage can sometimes spike up to close to 10 GB (we recommend restarting your node every 6 hours or so to get around this). This is a long-standing problem that has been significantly improved since our last update. Right now, we believe this leak has its roots in libp2p; specifically, some asynchronous timers are holding objects and not freeing them.

Tip: You can use the LOG_LEVEL=INFO or LOG_LEVEL=NOTICE options to reduce the size of the logs. To do so, run
make LOG_LEVEL=INFO medalla

New sync + database pruner

*The requester MUST close the write side of the stream once it finishes writing the request message. At this point, the stream will be half-closed.
-- Ethereum 2.0 networking specification

We've merged a PR that fixes some important sync problems that we've been seeing on medalla.

It turns out that our sync problems stemmed from not following the networking specification correctly.

To elaborate a little: we weren't closing the write side of the stream once we finished sending a request message to another peer. Since the close stream message is used as an end-of-request marker, potential peers were stuck waiting for us to close it before processing our request. Since from our perspective, they didn't seem to be responding, we would end up timing them out and removing them from our sync set.

In addition, we now have a special database pruner to help prune an existing database, as well as a new, simpler sync that greatly speeds up syncing time: we invite you to test it out by updating to the latest devel, removing your database (make clean-medalla) and resyncing ;)

Multiclient interop scripts

With regards to our multinet scripts, the big news is that integration with other clients is working as expected, and we're now able to start a fleet of Lighthouse, Nimbus, and Prysm nodes for local testing.

Why is this important? By running tests on a single local machine using the multinet scripts, we're able to eliminate network connectivity and peer discovery issues. This allows us to pinpoint bugs that have been very hard to pinpoint on Medalla so far.

Master vs devel

A reminder that as it stands, we are continuously making improvements to both stability and memory usage. So please make sure you keep your client up to date! This means restarting your node and updating your software regularly from the devel branch (we recommend doing this at least once a day). If you can't find a solution to your problem here, feel free to hit us up on our discord!

If, after updating to the latest devel, you're unable to build medalla or feel like your node is functioning worse than before you switch to master, and rebuild.

Note: While the master branch of the nim-beacon-chain repository is more stable, the latest updates happen in the devel branch which is (usually) merged into master every week on Tuesday.

Troubleshooting

We've created a new troubleshooting page.  This should be the first port of call if you're experiencing any problems on Medalla.

Medalla data challenge

The Ethereum Foundation is sponsoring a Medalla data analysis and data visualisation blog post challenge; the deadline for submissions is Tuesday, October 20, and there are $15,000 worth of prizes up for grabs.

Here are all the details you need to get involved.


Enjoy the weekend and 'til next time 💛