July 14, 2020

By Cayle Sharrock

Stress test results

Stress test 3 results

Starting at block 50868 on July 9 at 14:11 (UTC), Tari core developer @Hansieodendaaal bombarded testnet with over 38000 transactions.

A full Tari block can currently accommodate around 650 transactions, so this represented a significant flood of the network.

Ultimately, here’s what happened:

  TX %
Transactions sent 38,160  
Transactions successfully broadcast 32,936 86%
Transactions mined (out of total) 32,922 86%
Transactions mined (inc retries, out of those broadcast) 32,922 100%

Here are the key takeaways:

  • Of the transactions that were broadcast, nearly all of them were ultimately mined. Our accounting here is not perfect – a full audit of the transaction list will take longer, but it looks like it may well have been 100%.
  • 14% of the transactions did not get broadcast. This meant they were stuck in “pending” mode somewhere. The core devs are still investigating what happened, but it looks like it may have been because one of the receiving wallets was not reachable at any point during the test. This may explain the majority of the transaction failures, but it’s still speculation at this point.
  • 1,500 transactions were only broadcast and mined at the “second bite of the cherry”, once the sending wallets decided to retry the transactions. This indicates that the peer-to-peer process is not perfect yet, but is reasonably eventually consistent.

Overall this represents a significant improvement over the last stress test. A quick trip down memory lane:

  • The first stress test was a disaster. A bug in the message deduplication code resulted in the test essentially DDOSing the network. Only 5% of transactions were mined in that test.
  • The second test went much better; there was no inadvertent DDOSing, but the test highlighted areas in the code that blocked up the main execution thread and prevented efficient message handling. Default message buffers were also much too small, resulting in many dropped messages. In that test, the majority of transactions were broadcast, but only 25% were eventually mined.
  • This test had 86% of transactions broadcast, and all of those were mined.

The community is still poring over the GB of logs produced in the tests, but some early themes for improving the results for the next test are emerging:

  • If a node is running, but cannot talk to the network for some reason, it should make more noise. This is particularly true of the seed nodes, that Aurora clients rely on to communicate with the network.
  • Buffer sizes can be further tweaked to reduce bottlenecks.
  • Find out where transactions are getting stuck in the signing protocol to move that 86% towards 100%.

A more detailed analysis was posted on IRC.