Mikael Ronstrom

Friday, March 22, 2024

New LTS version of RonDB, RonDB 22.10.2

After a very thorough development and test period we are proud to announce the general availability of the RonDB 22.10 LTS serie with the release of RonDB 22.10.2 today. There is also a new version of the old LTS version RonDB 21.04.16 released today.

A complete list of the new features is provided in the RonDB Documentation.

The most important new feature is the support of variable sized disk rows. This means a very significant saving of disk space, up to 10x more data can be stored on the same disk space as with RonDB 21.04. In addition RonDB 22.10 contains a major quality improvement using disk columns. This means that RonDB is now prepared for the introduction of features in the Hopsworks Feature Store using disk columns. This will provide significant cost savings in storing lots of features in the Hopsworks Online Feature Store.

Another important feature in RonDB 22.10 is that all major data structures are now using the Global Data Manager, this ensures that the memory management inside RonDB is much more flexible. This was the final step of a long project described here.

As usual we have also been working on performance in this new version. Write throughput has been siginficantly improved, enterprise applications using read locks will see greater scalability and also throughput at very high load has been significantly improved leading to up to 30% better throughput in RonDB 22.10.2 compared to RonDB 21.04. This improvement comes from further improvements to ideas presented in this blog. This change makes it possible to be much more flexible in using CPU resources. More details on performance comes later.

In this blog we have described the testing process that we have had with RonDB. RonDB 22.10 is already running in production in a number of installations and is part of Hopsworks 3.7 released recently. This Hopsworks version introduces support of fine-tuning LLMs for GenAI. This version of Hopsworks also supports multi-region support.

Come to our webinar where we will present more information about RonDB and what it can be used for.

Since Oracle now made MySQL 8.0 an LTS version we plan to merge the changes from MySQL 8.0 series into future RonDB 22.10 versions. Bug fixes and features of interest to Oracle is contributed back to MySQL.

For those interested in following the development of RonDB in real-time the tree is here, the branch where RonDB 22.10.2 is found is called 22.10.1. The development branch of RonDB 22.10 is found in 22.10-main. We are also working on new features of RonDB in forks of this tree by the developers. 22.10-main already have a new feature supporting more than 20k table objects and an even more improved thread model providing an improvement of around 5% better throughput. There is also long-term development of rate limits and quotas, enabling RonDB to be used in multi-tenant environments and pushdown of aggregations, enabling a speedup of 10-1000x of some queries used in Feature Stores.

Tuesday, March 05, 2024

Testing of RonDB releases

Since RonDB is a fork of MySQL NDB Cluster it contains a lot of tests that is part of the RonDB development tree. This includes unit tests for various functionalities. It includes many hundreds of MTR test cases that takes between a few seconds to a few minutes to run. These tests are mostly test cases that use SQL commands to test the functionality of RonDB, in addition it tests backup and restore and a few other tools in RonDB. These tests are executed with debug compiled binaries, binaries compiled with error injection, binaries compiled for production and finally the binaries we use in the releases.

Another very important part of RonDB testing is the autotests. These tests are using the NDB API to test its functionality, it also has a lot of focus on testing recovery. This test suite contains thousands of tests that takes 36 hours to go through one test run when executed serially. It can be parallelised by running it on multiple clusters. This test suite can also be executed on different configurations with different number of replicas, different number of node groups, different number of CPUs per node and different memory sizes in the nodes.

RonDB is heavily used in Hopsworks. One part of Hopsworks is HopsFS. This is a distributed file system which is built on top of RonDB. It is written in Java and thus interfaces with ClusterJ, the Java API to RonDB that uses an easy to program model of the NDB API. HopsFS has a whole range of test cases related to it that also will be executed on a daily basis, this includes both functional tests and load tests.

RonDB is also used to handle metadata in Hopsworks and it is used as the Online Feature Store in Hopsworks. This means that the Hopsworks users will define new tables and new table structures on the fly. These parts of Hopsworks again have a set of functional tests and load tests.

Next there are upgrade tests verifying that we can perform an online upgrade of RonDB and these tests also include verifying that we can downgrade back to the old version if the upgrade didn't work as it should.

There are test cases also to handle replication to other clusters. This is a very important part of the Hopsworks framework that we support setups with multiple regions.

There are also benchmark suites, mostly Sysbench, DBT2 (~ TPC-C), DBT3 (~TPC-H) and YCSB that we regularly execute.

Hopsworks supports managed RonDB in the cloud. This offering includes support for reconfiguration of the RonDB Cluster as an online operation where we can scale resources such as MySQL Servers, REST API servers, RonDB data nodes. This management framework also has its own set of test suites that is regularly executed.

We are developing a REST API server, it is already completed in a Go version and a new C++ version of it is in development. This adds yet more tests of the RonDB functionality.

The latest addition is that we are now also developing a Kubernetes operator for RonDB. Again this operator contains CI/CD that ensures that every RonDB releases can be handled in this Kubernetes framework.

When a RonDB release is finished it has gone through all of those stages.

After release the RonDB software is used by community users and the Hopsworks customers. Any bugs found by them is immediately fed into the development process. Among other things a community user has added a CommonLisp NDB API to what is supported by RonDB.

As is hopefully clear from this picture a RonDB release is heavily tested before its release. The next LTS version of RonDB will be RonDB 22.10.1. This software have been moving through all these test frameworks and is going to be made into GA very soon. Since this is a new LTS version we have been especially careful in our testing of this version. At the moment the RonDB 22.10.2 version is going through heavy MTR testing.

This hopefully makes it clear that building a DBMS and building a data platform that uses it actually is very beneficial for the quality of the DBMS product. Thus RonDB have been through a much more varied set of tests than most DBMSs are facing that works strictly as a DBMS.

We often find bugs that originates from MySQL NDB Cluster. We try our best to be a good open source citizens by feeding back those as contributions to Oracle so that they can be included in future releases of MySQL NDB Cluster. In our view a bigger community for MySQL NDB Cluster is also good for RonDB.

Similarly of course we benefit from bug fixes that originates from Oracle. We are currently integrating MySQL 8.0.35 and 8.0.36 into RonDB 22.10 series. RonDB 22.10.1 is based on MySQL 8.0.34.

Wednesday, January 31, 2024

The completion of a 12 year long project in RonDB

In 2012 a project was started to change the memory model of MySQL NDB Cluster. The first step was some early prototypes developed in 2012 and 2013 by Jonas Oreland. When Jonas left Oracle for Google it took a while before the project got up and running again. The first project was to change the memory model for the operation records used by transactions. This project started in 2015.

It took quite some time to complete. The requirement on maintained performance was high, this required going through the changes ensuring that we either gained or at most lost 1-2% performance. The traditional model used in NDB had a very simple model that had extremely good performance, to maintain the good performance the developer Mauritz experimented with eight different new memory models before settling for a model we call TransientPool. This pool relies on that memory objects are allocated for a short time (typical for short transactions). I assisted Mauritz in ensuring that we maintained performance.

Finishing this step completed most of the framework for the new memory management model. However it only took care of a fairly small part of all memory parts in NDB. Another step was completed around 2018-2019 that finalised all work on operation records. This was the most significant part of the change and the most important one.

When I joined Hopsworks we wanted to avoid having loads of configuration parameters affecting setup of RonDB 21.04. To handle this we simply configure to support 20000 table objects (table, ordered indexes and unique indexes). This still used the old memory management model. In RonDB 22.10 the work was finalised, the final part was to move also all memory related to metadata to the new memory management model (called SchemaMemory) and also the memory used by replication to other RonDB clusters (called ReplicationMemory).

Thus with the release of RonDB 22.10.1 we have finished this very long project transitioning MySQL NDB Cluster to a new memory management model in RonDB. This means that all memory parts share a common memory pool that is allocated at startup. This pool have around 11 different parts and when one part requires much memory it can get from the shared global memory and there is a priority of who gets memory in a situation when the free memory is low.

The new memory management model in RonDB 22.10.1 also includes that one can use a malloc and free-like model to get memory from the different pools. This will be useful for all sorts of new developments in RonDB.

Tuesday, January 02, 2024

Major update to the RonDB documentation

My colleague Vincent has spent some time improving the RonDB documentation.

New/rewritten chapters/sections are:

Main page: https://docs.rondb.com
Installing: https://docs.rondb.com/rondb_installation/
Local Quickstart: https://docs.rondb.com/rondb_quickstart_local/
Start Distributed: https://docs.rondb.com/rondb_programs/
Recovery (entire chapter): https://docs.rondb.com/rondb_high_availability/
Two-Phase Commit Protocol: https://docs.rondb.com/rondb_nonblocking_2pc/
Transaction Model (only ACID section, further PR incoming) https://docs.rondb.com/intro_transactions/

Further UI changes/fixes:

Added dark mode
HTTP links are visible again
Recognition of programming language in code snippets (using Lua filter)
Order & naming of chapters
A number of new images based on our Cheetah logo