Nimbus: Medalla update

Medalla, the first open and community focused eth2 multi-client testnet, launched last week. As of today, there are over 26,000 active validators, and the chain seems to be finalising correctly. Barring a major hiccup, it looks like it may well be our final full-on dress rehearsal before the real thing later this year (however there will probably be a couple more short-lived testnets to make sure we iron out the smaller hiccups experienced in the first few hours after launch).

Medalla stays true to the mainnet spec, which means it is a significant step up in size and complexity from previous testnets.

And while there are certainly some rough edges to smooth out before mainnet launch, the point of testnets is to help us discover these rough edges before real money is at stake.


The first thing to say is that, if you’re currently missing attestations, the problem is on our end (it’s affecting all Nimbus users right now, including ourselves). And we’re working hard on tracking the root cause.

Apart from that, we identified four major issues on launch, and have fixed two of them so far.

Fixes

The first one is that the increased number of attestations in Medalla (compared to Altona) revealed limits in the glue code between libp2p and our beacon node. In particular, in the first few hours after launch (when the chain wasn’t finalising because too few validators were online) we were doing the same work multiple times (re-verifying attestations for the same blocks). This was a major source of slowness. And has now been fixed.

The second is an attestation processing bug. We wanted to optimise database loading by delaying the deserialisation of public keys and signatures. We explored two approaches. Unfortunately the first one we merged had a significant impact on sync (the ability to access the network’s latest state). It was reverted and we are now using the second approach, which gives the same benefits but has no impact on the rest of the stack.

Under investigation

Medalla blocks are often filled to the maximum of 128 attestations, which means we’re now reaching the limits of our signature library Milagro. To fix this, we’re in the process of switching to blst (this should be done by the end of the week).

We also discovered that we have race conditions on incoming and outgoing requests from the same peer. This led to issues with Lighthouse (amongst others). We’re working on fixing this.

Finally, we’ve had multiple reports of low-peer counts but we expect this is due to the long time spent on processing attestations, which makes the node less responsive to the network, and causes us to be kicked by peers.


That’s it for the time being. We've been humbled by the amount of people who've been interested in running Nimbus, and who are investing their precious time in helping us find and debug issues. This is exactly what we need. Thank you for hanging in there and sticking with us as we learn and improve 🙏 💛

P.S. Make sure you stay on the lookout for any critical updates to Nimbus. The best way to do so is through the #medalla-announcements channel on our discord. For our simple and self-contained guide on how to become a Medalla validator, see here.