DBS vs. MySQL vs. MariaDB vs. AWS Aurora (MySQL compatible)

Benchmark: DBS vs. MySQL vs. MariaDB vs. AWS Aurora (MySQL compatible)

 

With the release of DBS 0.99 Beta RC-1, it's time to gather performance metrics and assess how DBS compares to established solutions such as MySQL, MariaDB, and AWS Aurora (MySQL-compatible).
This article outlines our straightforward yet practical benchmarking approach, detailing the datasets used and the results obtained.

Throughout the development of DBS, from its prototype to Beta 0, performance measurements have consistently shown promising results, suggesting the project was headed in the right direction. However, these early results were somewhat skewed due to testing incomplete versions-lacking key features like security controls and user sessions.

The RC-1 version, now including all planned features, offers more consistent and accurate metrics that better reflect real-world conditions.

It's important to note, however, that while this version gives us a glimpse of the first General Availability (GA) release, it remains a Minimum Viable Product (MVP) for version 1.

Lastly, and perhaps most importantly, the results presented here may seem surprising. They are easily reproducible and verifiable, as this post includes detailed information on the lab setup, data sources, and queries used. All of these elements are fully transparent and available as usable material to anyone wishing to validate the findings.


Lab Environment

Infrastructure Overview

The benchmark environment is set up on AWS for simplicity. It includes 5 EC2-instances for the client applications and 1 EC2-instance for the server applications. For the tests involving AWS Aurora, the RDS/Serverless (MySQL compatible) service is used.

Benchmark DBS Lab Environment
AWS Applied Configuration
Client: 5 x EC2-Instance
Server: 1 x EC2-Instance
Aurora (RDS)
  • Amazon Linux 2
  • Kernel 5.10 64-bit (x86)
  • Instance Type: t2.micro (1 vCPU, 1 GiB Memory)
  • Storage Volume Type: SSD gp3 (16 GiB)
  • Standard Network Profile (Allow SSH)
  • Amazon Linux 2
  • Kernel 5.10 64-bit (x86)
  • Instance Type: t2.xlarge (4 vCPU, 16 GiB Memory)
  • Storage Volume Type: SSD gp3 (16 GiB)
  • Standard Network Profile (Allow SSH)
  • Aurora Standard (MySQL compatible)
  • Serverless v2, default.aurora-mysql8.0
  • From 2 ACUs to 8 ACUs, autoscaling
  • SSD Volume, autoscaling

Datasets

The datasets used for the benchmark are not particularly large but are sufficient to run multiple join queries. While these queries are not overly complex, they require the applications to efficiently utilise their resources, ensuring meaningful measurements for identifying key performance indicators.

These datasets are primarily "IP geo-location" oriented and consist of the following sources:

  • A list of dialogues, with location labels/names, totaling 120,668 records.
  • A list of regions, containing continental positioning data, totaling 31 records.
  • A list of countries, with country-level coordinates, totaling 252 records.
  • A list of cities, providing location data for each place within its respective country, totaling 141,786 records.
  • A list of IP addresses, serving as one geo-location source to be linked with the other datasets, totaling 4,719,740 records.
  • A list of URLs, another geo-location source, to be linked with the previous data, totaling 872,238 records.
  • The url_loc lists used to simulate the "write I/O" load increase, with 16,000 INSERT commands in total into flat datastores from 4 concurrent client applications.
Note: All of the data are test data, mostly generated randomly, and should not be considered reliable for use in any other context.
Benchmark DBS Lab Datasets


Setup

It is suggested to generally follow these steps to make the setup process easier and faster:

I. Client instances
  1. Launch & initialise the first client dedicated EC2-Instance.
  2. When it is running, update Linux packages (sudo yum update -y).
  3. Power off the instance and make a snapshot of the volume.
  4. Prepare the next client dedicated EC2-Instances without launching them.
  5. Replace the volume of each client dedicated EC2 instance from 2 to 5 with the restore of the first snapshot.
  6. Launch the 5 client dedicated EC2-Instances [with volume replaced by snapshot].
  7. Install one of the software packages according to the below procedure.
II. Server instance
  1. Launch & initialise the server dedicated EC2-Instance.
  2. When it is running, install AWS CloudWatch agent (sudo yum install amazon-cloudwatch-agent -y)
  3. Update Linux packages (sudo yum update -y).
  4. Power off the instance and make a snapshot of the volume.
  5. Launch the server dedicated EC2-Instance [with volume replaced by snapshot].
  6. Install one of the software packages according to the below procedure.

Note: Both instance volume snapshots are also useful to avoid any potential conflicts between MySQL and MariaDB software.

Once the software is configured and running, perform benchmark sequences and collect the measures. Then:

  • Power off all EC2-Instances.
  • Replace all instance volumes with their corresponding snapshot.
  • Repeat operation from step I.6 client-side and from step II.5 server-side to proceed with the next one.

III. Security Groups

It is required to allow ports 3306 (MySQL/PariaDB) & 4404 (DBS), with the IP version set to IPv4 and the Protocol set to TCP, in the Inbound rules.

The MySQL edition used for the benchmark is the Community Edition.

MySQL not being part of AWS Linux 2 repository, it is required to install it directly from MySQL yum repository.
Note: Although version 8.4.4 is currently published, the procedure below installs version 8.0.41, as many dependencies are "unresolved" with the latest version in the AWS Linux 2 repository. Moreover, this makes the comparison more aligned in terms of version with Aurora, since the latter provides compatibility with MySQL 8.0.39 at the time this benchmark is run.

  1. To avoid potential "Oracle GPG package signature failure" problem:
    # Download MySQL GPG key sudo rpm --import https://repo.mysql.com/RPM-GPG-KEY-mysql-2023
  2. Add the MySQL yum repository:
    # 1. Download the MySQL yum repo. RPM sudo wget https://dev.mysql.com/get/mysql80-community-release-el7-5.noarch.rpm # 2. Install the MySQL yum repo. sudo rpm -ivh mysql80-community-release-el7-5.noarch.rpm
  3. Install MySQL Community Server:
    # Install MySQL Community Server and its dependencies sudo yum install mysql-server -y
  4. Enable & start MySQL Server service:
    # 1. Enable MySQL Server service sudo systemctl enable mysqld # 2. Start MySQL Server service sudo systemctl start mysqld
  5. Verify that MySQL Server is running:
    # Check MySQL Server service status sudo systemctl status mysqld
  6. Initialise the MySQL Server configuration:
    Before all, it is needed to retrieve the default root password that has been randomly generated during the installation process.
    # Retrieve temporary password generated for 'root@localhost' from MySQL Server log sudo grep 'temporary password' /var/log/mysqld.log
    Now the secure configuration tool can be launched:
    # Start MySQL Server secure configuration tool sudo mysql_secure_installation
    The following options are selected:
    # Start MySQL Server secure configuration tool -> Securing the MySQL server deployment. -> Enter password for user root: Enter the retrieved password -> The existing password for the user account root has expired. Please set a new password. -> New password: Enter the new password -> Re-enter new password: Confirm the new password -> Change the password for root ? ((Press y|Y for Yes, any other key for No) : y -> By default, a MySQL installation has an anonymous user, -> allowing anyone to log into MySQL without having to have -> a user account created for them. This is intended only for -> testing, and to make the installation go a bit smoother. -> You should remove them before moving into a production -> environment. -> Remove anonymous users? (Press y|Y for Yes, any other key for No) : y -> Normally, root should only be allowed to connect from -> 'localhost'. This ensures that someone cannot guess at -> the root password from the network. -> Disallow root login remotely? (Press y|Y for Yes, any other key for No) : n -> ... skipping. -> By default, MySQL comes with a database named 'test' that -> anyone can access. This is also intended only for testing, -> and should be removed before moving into a production -> environment. -> Remove test database and access to it? (Press y|Y for Yes, any other key for No) : y -> - Dropping test database... -> Success. -> - Removing privileges on test database... -> Success. -> Reloading the privilege tables will ensure that all changes -> made so far will take effect immediately. -> Reload privilege tables now? (Press y|Y for Yes, any other key for No) : y -> Success. -> All done!
    Note: Despite answering N to the "Disallow root login remotely?" question, you may encounter an Access Denied error during the first remote connection with mysql client. If that happens, you will need to connect to the server locally and execute the following commands:
    # Solve 'access denied' issue for remote login mysql -h server_ip_address -u root -p -e "UPDATE mysql.user SET host = '%' WHERE user = 'root' AND host = 'localhost';" # Restart MySQL Server service sudo systemctl restart mysqld
  7. It is more than likely that the MySQL server is listening on TCPIPv6 type addresses only. To verify the addresses bound by MySQL Server for listening:
    # List addresses & ports MySQL Server is listening sudo netstat -tulpn | grep LISTEN | grep mysqld -> tcp6 0 0 :::3306  :::* LISTEN pid/mysqld -> tcp6 0 0 :::33060 :::* LISTEN pid/mysqld
    If so, it is suggested to make MySQL Server also listen on TCPIPv4 type addresses to simplify connectivity from the client.
    To do this, edit the MySQL Server configuration file /etc/my.cnf and add the following under the [mysqld] section:
    [mysqld] bind-address=0.0.0.0
    Then, it is required to restart the MySQL Server to apply the changes:
    # Restart MySQL Server service sudo systemctl restart mysqld
    Check again the addresses bound by MySQL Server for listening:
    # List addresses & ports MySQL Server is listening sudo netstat -tulpn | grep LISTEN | grep mysqld -> tcp  0 0 0.0.0.0:3306 0.0.0.0:* LISTEN pid/mysqld -> tcp6 0 0 :::33060     :::*      LISTEN pid/mysqld

On each client dedicated EC2-Instance, repeat I. MySQL Server steps from 1 to 2. Then:
  1. Install MySQL Community Client:
    # Install MySQL Community Client and its dependencies sudo yum install mysql -y
  2. Verify that client is able to connect to the server:
    # Connect MySQL Server instance mysql -h host_ipv4_address -u root -p

A. From the benchmark material archive:
  1. Copy mysql_mariadb/Bench_MySQL8_DDL.sql to one of the client dedicated EC2-Instances, wherever it is accessible from MySQL Client application.
  2. Copy mysql_mariadb/Bench_MySQL8_url_locX.sql, from 1 through 4, to their respective client dedicated EC2-Instance, wherever they are accessible from MySQL Client application.
  3. Copy the following 6 files to the server dedicated EC2-Instance, in the /var/lib/mysql-files/ directory:
    • datasets/cities.csv
    • datasets/countries.csv
    • datasets/dialogues.csv
    • datasets/ipaddresses.csv
    • datasets/regions.csv
    • datasets/urls.csv
  4. Change copied files ownership to ensure mysqld process is allowed to read them:
    # Change dataset sources ownership chown mysql:mysql /var/lib/mysql-files/*.csv
B. Next, create the benchmark data model using mysql and the DDL script mysql_mariadb/Bench_MySQL8_DDL.sql previously copied:
# Create benchmark data model mysql -h server_ip_address -u root -p < path_to_uploaded_file/Bench_MySQL8_DDL.sql
C. Finally, verify that the data model is successfully created and complete:
# Connect server instance mysql -h server_ip_address -u root -p # Ensure all datastores in benchdb0 are available (Should display empty sets) SELECT (SELECT COUNT(1) FROM benchdb0.dialogues) AS dialogues_count, (SELECT COUNT(1) FROM benchdb0.regions) AS regions_count, (SELECT COUNT(1) FROM benchdb0.countries) AS countries_count, (SELECT COUNT(1) FROM benchdb0.cities) AS cities_count, (SELECT COUNT(1) FROM benchdb0.ipaddresses) AS ipaddresses_count, (SELECT COUNT(1) FROM benchdb0.urls) AS urls_count; -> +-----------------+---------------+-----------------+--------------+-------------------+------------+ -> | dialogues_count | regions_count | countries_count | cities_count | ipaddresses_count | urls_count | -> +-----------------+---------------+-----------------+--------------+-------------------+------------+ -> |               0 |             0 |               0 |            0 |                 0 |          0 | -> +-----------------+---------------+-----------------+--------------+-------------------+------------+ -> 1 row in set (0.01 sec) # Ensure all datastores in benchdb1 -> benchdb4 are available SELECT COUNT(1) AS v_url_loc_count FROM benchdb0.v_url_loc; -> +-----------------+ -> | v_url_loc_count | -> +-----------------+ -> |               0 | -> +-----------------+ -> 1 row in set (0.00 sec)

The MariaDB edition used for the benchmark is the Community Edition.

Since MariaDB is part of the AWS Linux 2 repository, it should be easy to install it from AWS Linux 2 yum repository. However, the latest version available in the AWS Linux 2 repository is MariaDB 5.5.68, so it is required to tweak the MariaDB yum repo. configuration to get a more recent release.
Note: Although version 11.7.2 is currently published, the procedure below installs version 11.6.1, as it is the latest version available for AWS Linux 2, which is thought to have an underlying operating system based on CentOS 7.

  1. Edit the MariaDB yum repository by either replacing the entire content of the configuration file or adding to it with the following:
    # Create/Edit /etc/yum.repos.d/MariaDB.repo [mariadb] name = MariaDB baseurl = http://yum.mariadb.org/11.rc/centos7-amd64 gpgkey = https://yum.mariadb.org/RPM-GPG-KEY-MariaDB gpgcheck=1
  2. Install MariaDB Community Server:
    # Install MariaDB Community Server and its dependencies sudo yum install mariadb-server -y
  3. Enable & start MariaDB Server service:
    # 1. Enable MariaDB Server service sudo systemctl enable mariadb # 2. Start MariaDB Server service sudo systemctl start mariadb
  4. Verify that MariaDB Server is running:
    # Check MariaDB Server service status sudo systemctl status mariadb
  5. Initialise the MariaDB Server configuration:
    When MariaDB Server is installed, there is no root password defined, it is then required to set it.
    # Connect to local MariaDB Server sudo mariadb -u root
    Once locally connected to the server, execute the following 3 commands:
    # Clean up unused DELETE FROM mysql.user WHERE host <> 'localhost' OR TRIM(user) = ''; -> Query OK, 5 rows affected (0.000 sec) # Set root password SET PASSWORD = PASSWORD('your_password'); -> Query OK, 0 rows affected (0.000 sec) # Allow remote login UPDATE mysql.user SET host = '%' WHERE user = 'root' AND host = 'localhost'; -> Query OK, 1 row affected (0.000 sec)
    Restart the server:
    # Restart MariaDB Server service sudo systemctl restart mariadb

On each client dedicated EC2-Instance, repeat I. MariaDB Server step 1. Then:
  1. Install MariaDB Community Client:
    # Install MariaDDB Community Client and its dependencies sudo yum install mariadb -y
  2. Verify that client is able to connect to the server:
    # Connect MariaDB Server instance mariadb -h host_ipv4_address -u root -p

A. From the benchmark material archive:
  1. Copy mysql_mariadb/Bench_MySQL8_DDL.sql to one of the client dedicated EC2-Instances, wherever it is accessible from MariaDB Client application.
  2. Copy mysql_mariadb/Bench_MySQL8_url_locX.sql, from 1 through 4, to their respective client dedicated EC2-Instance, wherever they are accessible from MariaDB Client application.
  3. Copy the following 6 files to the server dedicated EC2-Instance, in the /var/lib/mysql/ directory:
    • datasets/cities.csv
    • datasets/countries.csv
    • datasets/dialogues.csv
    • datasets/ipaddresses.csv
    • datasets/regions.csv
    • datasets/urls.csv
  4. Change copied files ownership to ensure mariadbd process is allowed to read them:
    # Change dataset sources ownership chown mysql:mysql /var/lib/mysql/*.csv
B. Next, create the benchmark data model using mariadb and the DDL script mysql_mariadb/Bench_MySQL8_DDL.sql previously copied:
# Create benchmark data model mariadb -h server_ip_address -u root -p < path_to_uploaded_file/Bench_MySQL8_DDL.sql
C. Finally, verify that the data model is successfully created and complete:
# Connect server instance mariadb -h server_ip_address -u root -p # Ensure all datastores in benchdb0 are available (Should display empty sets) SELECT (SELECT COUNT(1) FROM benchdb0.dialogues) AS dialogues_count, (SELECT COUNT(1) FROM benchdb0.regions) AS regions_count, (SELECT COUNT(1) FROM benchdb0.countries) AS countries_count, (SELECT COUNT(1) FROM benchdb0.cities) AS cities_count, (SELECT COUNT(1) FROM benchdb0.ipaddresses) AS ipaddresses_count, (SELECT COUNT(1) FROM benchdb0.urls) AS urls_count; -> +-----------------+---------------+-----------------+--------------+-------------------+------------+ -> | dialogues_count | regions_count | countries_count | cities_count | ipaddresses_count | urls_count | -> +-----------------+---------------+-----------------+--------------+-------------------+------------+ -> |               0 |             0 |               0 |            0 |                 0 |          0 | -> +-----------------+---------------+-----------------+--------------+-------------------+------------+ -> 1 row in set (0.002 sec) # Ensure all datastores in benchdb1 -> benchdb4 are available SELECT COUNT(1) AS v_url_loc_count FROM benchdb0.v_url_loc; -> +-----------------+ -> | v_url_loc_count | -> +-----------------+ -> |               0 | -> +-----------------+ -> 1 row in set (0.001 sec)

To start setting up Aurora (MySQL compatible), you have to connect to your AWS RDS Console.

  1. Select the Databases menu and click on Create database button. Then:
    • Select Standard create as Database creation method option.
    • Select Aurora (MySQL compatible) as Engine type option.
    • Select Aurora MySQL 3.08.2 (compatible with MySQL 8.0.39) as Engine version option, which is currently the latest one.
    • Select Dev/Test as Template option.
    • Optionally, change the default name of the cluster identifier.
    • Carefully set your Master username. This user is the database server's primary user which compulsorily replaces the default MySQL/MariaDB root user.
    • Select Self managed as Credentials management option and define carefully your password.
    • Select Aurora standard as Configuration option.
    • Select Serverless v2 as DB instance class option.
    • Set 2 ACUs as Minimum capacity and 8 ACUs as Maximum capacity options. The reason is that it is not possible to precisely define the number of CPUs and memory separately in this type of configuration. That said, by specifying the proposed values ranging from 2 to 8 ACUs, Aurora starts with approximately the equivalent of 4 CPUs and 4 GiB of memory, and automatically scales up to a maximum of 16 CPUs and 16 GiB of memory in case of load increase; this allows for a configuration that closely matches the other tested technologies.
    • Select Don't create an Aurora Replica as Multi-AZ deployment option. It is useless to configure a full replicated cluster in current benchmark context.
    • Select Don't connect to an EC2 compute resource as Compute resource option.
    • Select IPv4 as Network type option.
    • Select the VPC and the DB subnet group according to the ones where your client dedicated EC2-Instances reside.
    • Select No as Public access option.
    • Select Choose existing as VPC security group option and select the Security Group you previously defined.
    • Select the Availability zone according to the ones where your client dedicated EC2-Instances reside.
    • Uncheck the RDS Proxy checkbox.
    • Uncheck the Enable Performance insights checkbox.
    • Check the Enhanced Monitoring checkbox, select 60 seconds and select default as respectively OS metrics granularity and Monitoring role for OS metrics options.
    • Check the Error Log checkbox as Log exports option.
    • Leave all other options at their default settings.
  2. Review the various settings and click on Create database button underneath.
  3. Once the Aurora instance is created, it is necessary to collect both the Reader and the Writer endpoints in order to be able to connect them with the client applications.

If you followed the order of the technologies presented here, Aurora being based on MariaDB, the client dedicated installations can be retained and thus reused.

If you didn't follow the order of the technologies presented here, the procedure II. MariaDB Clients is the one that can be carried out.

Since RDS doesn't allow access to the underlying system, the files must be uploaded from client application, which intrinsically has a slight impact on the measurements.
So, once MariaDB Client is installed, it is recommended to ensure that the server allows local file upload:

# Connect server instance mariadb -h rds_reader_endpoint -u master_user -p --skip-ssl # Verify that Aurora/RDS allows 'local file upload' (Value = ON) SHOW VARIABLES LIKE 'local_infile'; -> +---------------+-------+ -> | Variable_name | Value | -> +---------------+-------+ -> | local_infile  | ON    | -> +---------------+-------+

A. Only if you made a fresh install of MariaDB Client applications, from the benchmark material archive:
  1. Copy mysql_mariadb/Bench_MySQL8_DDL.sql to one of the client dedicated EC2-Instances, wherever it is accessible from MariaDB Client application.
  2. Copy mysql_mariadb/Bench_MySQL8_url_locX.sql, from 1 through 4, to their respective client dedicated EC2-Instance, wherever they are accessible from MariaDB Client application.
  3. Copy the following 6 files to the one of the client dedicated EC2-Instances, wherever it is accessible from MariaDB Client application:
    • datasets/cities.csv
    • datasets/countries.csv
    • datasets/dialogues.csv
    • datasets/ipaddresses.csv
    • datasets/regions.csv
    • datasets/urls.csv
B. Next, create the benchmark data model using mariadb and the DDL script mysql_mariadb/Bench_MySQL8_DDL.sql previously copied:
# Create benchmark data model mariadb -h rds_writer_endpoint -u master_user -p --skip-ssl < path_to_uploaded_file/Bench_MySQL8_DDL.sql
C. Finally, verify that the data model is successfully created and complete:
# Connect server instance (Don't forget the swicth --skip-ssl as Aurora requires to secure communication by default!) mariadb -h rds_reader_endpoint -u master_user -p --skip-ssl # Ensure all datastores in benchdb0 are available (Should display empty sets) SELECT (SELECT COUNT(1) FROM benchdb0.dialogues) AS dialogues_count, (SELECT COUNT(1) FROM benchdb0.regions) AS regions_count, (SELECT COUNT(1) FROM benchdb0.countries) AS countries_count, (SELECT COUNT(1) FROM benchdb0.cities) AS cities_count, (SELECT COUNT(1) FROM benchdb0.ipaddresses) AS ipaddresses_count, (SELECT COUNT(1) FROM benchdb0.urls) AS urls_count; -> +-----------------+---------------+-----------------+--------------+-------------------+------------+ -> | dialogues_count | regions_count | countries_count | cities_count | ipaddresses_count | urls_count | -> +-----------------+---------------+-----------------+--------------+-------------------+------------+ -> |               0 |             0 |               0 |            0 |                 0 |          0 | -> +-----------------+---------------+-----------------+--------------+-------------------+------------+ -> 1 row in set (0.062 sec) # Ensure all datastores in benchdb1 -> benchdb4 are available SELECT COUNT(1) AS v_url_loc_count FROM benchdb0.v_url_loc; -> +-----------------+ -> | v_url_loc_count | -> +-----------------+ -> |               0 | -> +-----------------+ -> 1 row in set (0.009 sec)

Currently, there's no official repository yet for the DBS packages distribution, it is therefore needed to download and install the corresponding RPM manually.

  1. On the server dedicated instance, once the RPM is available:
    # Install server RPM sudo rpm -i dbsd-x64-0.99-beta-rc-1.rpm
  2. When DBS Server is installed, it is configured with default restrictions for security reasons. To waive these restrictions, you must edit the /etc/dbs/dbs.conf file and change the following settings, as the purpose of this benchmark is not to evaluate security aspects:
    # 1. Disable 'password policy' enforcement [Instance] ... passwordpol=false # 2. Bind all system TCPIPv4 addresses svctcpipv4addr=0.0.0.0 # 3. Allow up to 5 simultaneaous sessions (Concurrent write test) svcmaxsessions=5
  3. Enable & start DBS Server service:
    # 1. Enable DBS Server service sudo systemctl enable dbsd # 2. Start DBS Server service sudo systemctl start dbsd
  4. Verify that DBS Server is running:
    # Check DBS Server service status sudo systemctl status dbsd
  5. Verify the addresses bound by DBS Server for listening:
    # List addresses & ports DBS Server is listening sudo netstat -tulpn | grep LISTEN | grep dbsd -> tcp  0 0 0.0.0.0:4404 0.0.0.0:* LISTEN pid/dbsd -> tcp6 0 0 ::1:4404     :::*      LISTEN pid/dbsd

  1. On each client dedicated instance, once the RPM is available:
    # Install client RPM sudo rpm -i dbs-x64-0.99-beta-rc-1.rpm
  2. Verify DBS Server connection:
    # Connect DBS Server using 'dbsa' (Default password = dbsa) dbscli -h server_ip_address -u dbsa -p

A. From the benchmark material archive:
  1. Copy dbs/Bench_DBS_DDL.sql to one of the client dedicated EC2-Instances, wherever it is accessible from DBS Client application.
  2. Copy dbs/Bench_DBS_url_locX.sql, from 1 through 4, to their respective client dedicated EC2-Instance, wherever they are accessible from DBS Client application.
  3. Copy the following 6 files to the server dedicated EC2-Instance, in the /var/dbs/data/lsf/ directory:
    • datasets/cities.csv
    • datasets/countries.csv
    • datasets/dialogues.csv
    • datasets/ipaddresses.csv
    • datasets/regions.csv
    • datasets/urls.csv
  4. Change copied files ownership to ensure dbsd process is allowed to read them:
    # Change dataset sources ownership chown dbs:dbs /var/dbs/data/lsf/*.csv
B. Next, create the benchmark data model using dbscli and the DDL script dbs/Bench_DBS_DDL.sql previously copied:
# Create benchmark data model dbscli -h server_ip_address -u dbsa -p < path_to_uploaded_file/Bench_DBS_DDL.sql
C. Finally, verify that the data model is successfully created and complete:

Note: By running the following commands, the 'bench' schema is automatically mounted; this allows skipping this stage, which usually takes 0.5 to 2 seconds, during the test sequences below.

# Connect server instance dbscli -h server_ip_address -u dbsa -p # Ensure all datastores in bench schema are available (Should display empty sets) SELECT (SELECT COUNT(1) FROM bench.dialogues) AS dialogues_count, (SELECT COUNT(1) FROM bench.regions) AS regions_count, (SELECT COUNT(1) FROM bench.countries) AS countries_count, (SELECT COUNT(1) FROM bench.cities) AS cities_count, (SELECT COUNT(1) FROM bench.ipaddresses) AS ipaddresses_count, (SELECT COUNT(1) FROM bench.urls) AS urls_count; -> +-----------------+---------------+-----------------+--------------+-------------------+------------+ -> | dialogues_count | regions_count | countries_count | cities_count | ipaddresses_count | urls_count | -> +-----------------+---------------+-----------------+--------------+-------------------+------------+ -> |               0 |             0 |               0 |            0 |                 0 |          0 | -> +-----------------+---------------+-----------------+--------------+-------------------+------------+ -> 1 row(s) affected (0.137 sec) # Ensure all datastores in benchtbls1 -> benchtbls4 are available SELECT COUNT(1) AS v_url_loc_count FROM bench.v_url_loc; -> +-----------------+ -> | v_url_loc_count | -> +-----------------+ -> |               0 | -> +-----------------+ -> 1 row(s) affected (0.158 sec)


Test Sequences

To execute the test sequences below, it is necessary to connect to the various EC2-instances via SSH. This obviously assumes that the applications involving the technology to test are up and running.

Note:

  • Test results are highlighted below in green and are summarised in the Benchmark Metrics Summary section.
  • During the setup of MySQL, MariaDB, and Aurora, several databases were created to segregate datastores in a way that aligns as closely as possible the concept of DBS tablespaces. This approach helps avoid comparing apples to oranges in certain aspects.
  • You may notice minor differences in the execution of sequences in your environment. These discrepancies, ranging from a few milliseconds to several seconds, may be due to factors such as network latency, host machine utilisation on which the VMs are running, regional load/capacity, etc.

Process Threads

Connect the server dedicated EC2-Instance and measure the number of process threads:

# Get MySQL Server instance number of process threads ps -o nlwp $(pgrep -f mysqld) -> NLWP ->   38

Mass Import

From client dedicated EC2-Instance 5, connect server instance using mysql:

# Connect server instance mysql -h server_ip_address -u root -p

Then, import the dataset sources in the following order:

  1. dialogues.csv
    # Import 'dialogues' dataset LOAD DATA INFILE '/var/lib/mysql-files/dialogues.csv' INTO TABLE benchdb0.dialogues FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (uuid,dlg_en); -> Query OK, 120668 rows affected (1.12 sec) -> Records: 120668 Deleted: 0 Skipped: 0 Warnings: 0
  2. regions.csv
    # Import 'regions' dataset LOAD DATA INFILE '/var/lib/mysql-files/regions.csv' INTO TABLE benchdb0.regions FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (region_code,@nparent_code,dlg_uuid) SET parent_code = NULLIF(@nparent_code, ''); -> Query OK, 31 rows affected (0.02 sec) -> Records: 31 Deleted: 0 Skipped: 0 Warnings: 0
  3. countries.csv
    # Import 'countries' dataset LOAD DATA INFILE '/var/lib/mysql-files/countries.csv' INTO TABLE benchdb0.countries FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (iso_2dg,iso_3dg,dlg_uuid,@npcountry_iso2dg,region_code) SET pcountry_iso2dg = NULLIF(@npcountry_iso2dg, ''); -> Query OK, 252 rows affected (0.02 sec) -> Records: 252 Deleted: 0 Skipped: 0 Warnings: 0
  4. cities.csv
    # Import 'cities' dataset LOAD DATA INFILE '/var/lib/mysql-files/cities.csv' INTO TABLE benchdb0.cities FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (uuid,country_iso2dg,dlg_uuid); -> Query OK, 141786 rows affected (3.69 sec) -> Records: 141786 Deleted: 0 Skipped: 0 Warnings: 0
  5. ipaddresses.csv
    # Import 'ipaddresses' dataset LOAD DATA INFILE '/var/lib/mysql-files/ipaddresses.csv' INTO TABLE benchdb0.ipaddresses FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (network_cidr,network_start,network_end,country_iso2dg,city_uuid); -> Query OK, 4719740 rows affected (4 min 4.97 sec) -> Records: 4719740 Deleted: 0 Skipped: 0 Warnings: 0
  6. urls.csv
    # Import 'urls' dataset LOAD DATA INFILE '/var/lib/mysql-files/urls.csv' INTO TABLE benchdb0.urls FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (url_hash,url_src,country_iso); -> Query OK, 872238 rows affected (56.44 sec) -> Records: 872238 Deleted: 0 Skipped: 0 Warnings: 0

Concurrent Write

From client dedicated EC2-Instance 1 through 4, run the corresponding script mysql_mariadb/Bench_MySQL8_url_locX.sql:

# Concurrent INSERTs mysql -h host_ipv4_address -u root -p < path_to_uploaded_file/Bench_MySQL8_url_locX.sql

Verify that all INSERTs have been executed:

# Ensure all INSERTs are executed (Total number must be 16,000 records) mysql -h server_ip_address -u root -p -e "SELECT COUNT(1) AS v_url_loc_count FROM benchdb0.v_url_loc;" -> +-----------------+ -> | v_url_loc_count | -> +-----------------+ -> |           16000 | -> +-----------------+

Collect time spent for all INSERTs to be executed as well as average time per record:

# Connect server instance mysql -h host_ipv4_address -u root -p # Collect time spent + average time per client source SELECT FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1')), 3) AS total_sec_1, FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1')) / (SELECT COUNT(1) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1'), 3) AS avg_per_record_1, FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2')), 3) AS total_sec_2, FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2')) / (SELECT COUNT(1) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2'), 3) AS avg_per_record_2, FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3')), 3) AS total_sec_3, FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3')) / (SELECT COUNT(1) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3'), 3) AS avg_per_record_3, FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4')), 3) AS total_sec_4, FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4')) / (SELECT COUNT(1) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4'), 3) AS avg_per_record_4; -> +-------------+------------------+-------------+------------------+-------------+------------------+-------------+------------------+ -> | total_sec_1 | avg_per_record_1 | total_sec_2 | avg_per_record_2 | total_sec_3 | avg_per_record_3 | total_sec_4 | avg_per_record_4 | -> +-------------+------------------+-------------+------------------+-------------+------------------+-------------+------------------+ -> | 32.000      | 0.008            | 32.000      | 0.008            | 32.000      | 0.008            | 32.000      | 0.008            | -> +-------------+------------------+-------------+------------------+-------------+------------------+-------------+------------------+ -> 1 row in set (0.37 sec)

Lookup Query 1

# Count number of records in ipaddresses table SELECT COUNT(1) FROM benchdb0.ipaddresses; -> +----------+ -> | COUNT(1) | -> +----------+ -> |  4719740 | -> +----------+ -> 1 row in set (1.07 sec)

Lookup Query 2

# Retrieve geo-location data for a given IP SELECT ip.network_cidr, d3.dlg_en AS region, d1.dlg_en AS country, d2.dlg_en AS city FROM benchdb0.ipaddresses ip INNER JOIN benchdb0.countries c ON c.iso_2dg = ip.country_iso2dg INNER JOIN benchdb0.cities cy ON cy.uuid = ip.city_uuid INNER JOIN benchdb0.dialogues d1 ON d1.uuid = c.dlg_uuid INNER JOIN benchdb0.dialogues d2 ON d2.uuid = cy.dlg_uuid INNER JOIN benchdb0.regions r ON r.region_code = c.region_code INNER JOIN benchdb0.dialogues d3 ON d3.uuid = r.dlg_uuid WHERE HEX(INET6_ATON('101.53.172.220')) BETWEEN ip.network_start AND ip.network_end ORDER BY ip.network_end LIMIT 1; -> +-----------------+--------------+---------+--------+ -> | network_cidr    | region       | country | city   | -> +-----------------+--------------+---------+--------+ -> | 101.53.168.0/21 | Eastern Asia | Japan   | Haneda | -> +-----------------+--------------+---------+--------+ -> 1 row in set (1 min 0.32 sec)

Lookup Query 3

# Execute 'LIKE' query to count all URLs matching a particular pattern SELECT COUNT(1) AS match_count FROM benchdb0.urls WHERE url_src LIKE '%ibm.com%'; -> +-------------+ -> | match_count | -> +-------------+ -> |          91 | -> +-------------+ -> 1 row in set (7.01 sec)

Lookup Query 4

# Execute 'LIKE' JOINed query to count all URLs matching a particular pattern and return the country SELECT d.dlg_en AS country, COUNT(*) AS match_count FROM benchdb0.urls u INNER JOIN benchdb0.countries c ON c.iso_2dg = u.country_iso INNER JOIN benchdb0.dialogues d ON d.uuid = c.dlg_uuid WHERE url_src LIKE '%ibm.com%' GROUP BY d.dlg_en; -> +---------------+-------------+ -> | country       | match_count | -> +---------------+-------------+ -> | United States |          91 | -> +---------------+-------------+ -> 1 row in set (1.78 sec)

Server CPU Utilisation Peak

Using AWS CloudWatch, determine what is the server dedicated EC2-Instance CPU Utilisation Peak. As only MySQL server is running on the virtual machine, the EC2.CPUUtilisation metric can be reliably collected.

Benchmark MySQL CPU utilisation peak

According to the above graph, the CPU usage peak reaches 36.7%.

Process Threads

Connect the server dedicated EC2-Instance and measure the number of process threads:

# Get MariaDB Server instance number of process threads ps -o nlwp $(pgrep -f mariadbd) -> NLWP ->   12

Mass Import

From client dedicated EC2-Instance 5, connect server instance using mariadb:

# Connect server instance mariadb -h server_ip_address -u root -p

Then, import the dataset sources in the following order:

  1. dialogues.csv
    # Import 'dialogues' dataset LOAD DATA INFILE '/var/lib/mysql/dialogues.csv' INTO TABLE benchdb0.dialogues FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (uuid,dlg_en); -> Query OK, 120668 rows affected (0.442 sec) -> Records: 120668 Deleted: 0 Skipped: 0 Warnings: 0
  2. regions.csv
    # Import 'regions' dataset LOAD DATA INFILE '/var/lib/mysql/regions.csv' INTO TABLE benchdb0.regions FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (region_code,@nparent_code,dlg_uuid) SET parent_code = NULLIF(@nparent_code, ''); -> Query OK, 31 rows affected (0.006 sec) -> Records: 31 Deleted: 0 Skipped: 0 Warnings: 0
  3. countries.csv
    # Import 'countries' dataset LOAD DATA INFILE '/var/lib/mysql/countries.csv' INTO TABLE benchdb0.countries FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (iso_2dg,iso_3dg,dlg_uuid,@npcountry_iso2dg,region_code) SET pcountry_iso2dg = NULLIF(@npcountry_iso2dg, ''); -> Query OK, 252 rows affected (0.010 sec) -> Records: 252 Deleted: 0 Skipped: 0 Warnings: 0
  4. cities.csv
    # Import 'cities' dataset LOAD DATA INFILE '/var/lib/mysql/cities.csv' INTO TABLE benchdb0.cities FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (uuid,country_iso2dg,dlg_uuid); -> Query OK, 141786 rows affected (1.918 sec) -> Records: 141786 Deleted: 0 Skipped: 0 Warnings: 0
  5. ipaddresses.csv
    # Import 'ipaddresses' dataset LOAD DATA INFILE '/var/lib/mysql/ipaddresses.csv' INTO TABLE benchdb0.ipaddresses FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (network_cidr,network_start,network_end,country_iso2dg,city_uuid); -> Query OK, 4719740 rows affected (5 min 46.031 sec) -> Records: 4719740 Deleted: 0 Skipped: 0 Warnings: 0
  6. urls.csv
    # Import 'urls' dataset LOAD DATA INFILE '/var/lib/mysql/urls.csv' INTO TABLE benchdb0.urls FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (url_hash,url_src,country_iso); -> Query OK, 872238 rows affected (26.156 sec) -> Records: 872238 Deleted: 0 Skipped: 0 Warnings: 0

Concurrent Write

From client dedicated EC2-Instance 1 through 4, run the corresponding script mysql_mariadb/Bench_MySQL8_url_locX.sql:

# Concurrent INSERTs mariadb -h host_ipv4_address -u root -p < path_to_uploaded_file/Bench_MySQL8_url_locX.sql

Verify that all INSERTs have been executed:

# Ensure all INSERTs are executed (Total number must be 16,000 records) mariadb -h server_ip_address -u root -p -e "SELECT COUNT(1) AS v_url_loc_count FROM benchdb0.v_url_loc;" -> +-----------------+ -> | v_url_loc_count | -> +-----------------+ -> |           16000 | -> +-----------------+

Collect time spent for all INSERTs to be executed as well as average time per record:

# Connect server instance mariadb -h host_ipv4_address -u root -p # Collect time spent + average time per client source SELECT FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1')), 3) AS total_sec_1, FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1')) / (SELECT COUNT(1) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1'), 3) AS avg_per_record_1, FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2')), 3) AS total_sec_2, FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2')) / (SELECT COUNT(1) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2'), 3) AS avg_per_record_2, FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3')), 3) AS total_sec_3, FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3')) / (SELECT COUNT(1) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3'), 3) AS avg_per_record_3, FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4')), 3) AS total_sec_4, FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4')) / (SELECT COUNT(1) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4'), 3) AS avg_per_record_4; -> +-------------+------------------+-------------+------------------+-------------+------------------+-------------+------------------+ -> | total_sec_1 | avg_per_record_1 | total_sec_2 | avg_per_record_2 | total_sec_3 | avg_per_record_3 | total_sec_4 | avg_per_record_4 | -> +-------------+------------------+-------------+------------------+-------------+------------------+-------------+------------------+ -> | 10.000      | 0.003            | 11.000      | 0.003            | 12.000      | 0.003            | 11.000      | 0.003            | -> +-------------+------------------+-------------+------------------+-------------+------------------+-------------+------------------+ -> 1 row in set (1.243 sec)

Lookup Query 1

# Count number of records in ipaddresses table SELECT COUNT(1) FROM benchdb0.ipaddresses; -> +----------+ -> | COUNT(1) | -> +----------+ -> |  4719740 | -> +----------+ -> 1 row in set (15.446 sec)

Lookup Query 2

# Retrieve geo-location data for a given IP SELECT ip.network_cidr, d3.dlg_en AS region, d1.dlg_en AS country, d2.dlg_en AS city FROM benchdb0.ipaddresses ip INNER JOIN benchdb0.countries c ON c.iso_2dg = ip.country_iso2dg INNER JOIN benchdb0.cities cy ON cy.uuid = ip.city_uuid INNER JOIN benchdb0.dialogues d1 ON d1.uuid = c.dlg_uuid INNER JOIN benchdb0.dialogues d2 ON d2.uuid = cy.dlg_uuid INNER JOIN benchdb0.regions r ON r.region_code = c.region_code INNER JOIN benchdb0.dialogues d3 ON d3.uuid = r.dlg_uuid WHERE HEX(INET6_ATON('101.53.172.220')) BETWEEN ip.network_start AND ip.network_end ORDER BY ip.network_end LIMIT 1; -> +-----------------+--------------+---------+--------+ -> | network_cidr    | region       | country | city   | -> +-----------------+--------------+---------+--------+ -> | 101.53.168.0/21 | Eastern Asia | Japan   | Haneda | -> +-----------------+--------------+---------+--------+ -> 1 row in set (2 min 31.558 sec)

Lookup Query 3

# Execute 'LIKE' query to count all URLs matching a particular pattern SELECT COUNT(1) AS match_count FROM benchdb0.urls WHERE url_src LIKE '%ibm.com%'; -> +-------------+ -> | match_count | -> +-------------+ -> |          91 | -> +-------------+ -> 1 row in set (14.225 sec)

Lookup Query 4

# Execute 'LIKE' JOINed query to count all URLs matching a particular pattern and return the country SELECT d.dlg_en AS country, COUNT(*) AS match_count FROM benchdb0.urls u INNER JOIN benchdb0.countries c ON c.iso_2dg = u.country_iso INNER JOIN benchdb0.dialogues d ON d.uuid = c.dlg_uuid WHERE url_src LIKE '%ibm.com%' GROUP BY d.dlg_en; -> +---------------+-------------+ -> | country       | match_count | -> +---------------+-------------+ -> | United States |          91 | -> +---------------+-------------+ -> 1 row in set (55.434 sec)

Server CPU Utilisation Peak

Using AWS CloudWatch, determine what is the server dedicated EC2-Instance CPU Utilisation Peak. As only MariaDB server is running on the virtual machine, the EC2.CPUUtilisation metric can be reliably collected.

Benchmark MariaDB CPU utilisation peak

According to the above graph, the CPU usage peak reaches 20.9%.

Process Threads

There is no mean to get such information as RDS doesn't allow any access to the underlying system. The only information available comes from querying the active threadpool state although it is not really comparable:

# Connect reader instance mariadb -h rds_reader_endpoint -u master_user -p --skip-ssl # To get Aurora active threadpool state, either execute this query... SHOW STATUS WHERE variable_name LIKE 'Threads_%'; # ...or this one SELECT COUNT(1) FROM INFORMATION_SCHEMA.PROCESSLIST; -> A priori both results should give 1

Mass Import

From client dedicated EC2-Instance 5, connect writer instance using mariadb:

# Connect writer instance mariadb -h rds_writer_endpoint -u master_user -p --skip-ssl

Then, import the dataset sources in the following order:

  1. dialogues.csv
    # Import 'dialogues' dataset LOAD DATA LOCAL INFILE 'path_to_uploaded_file/dialogues.csv' INTO TABLE benchdb0.dialogues FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (uuid,dlg_en); -> Query OK, 120668 rows affected (2.785 sec) -> Records: 120668 Deleted: 0 Skipped: 0 Warnings: 0
  2. regions.csv
    # Import 'regions' dataset LOAD DATA LOCAL INFILE 'path_to_uploaded_file/regions.csv' INTO TABLE benchdb0.regions FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (region_code,@nparent_code,dlg_uuid) SET parent_code = NULLIF(@nparent_code, ''); -> Query OK, 31 rows affected (0.167 sec) -> Records: 31 Deleted: 0 Skipped: 0 Warnings: 0
  3. countries.csv
    # Import 'countries' dataset LOAD DATA LOCAL INFILE 'path_to_uploaded_file/countries.csv' INTO TABLE benchdb0.countries FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (iso_2dg,iso_3dg,dlg_uuid,@npcountry_iso2dg,region_code) SET pcountry_iso2dg = NULLIF(@npcountry_iso2dg, ''); -> Query OK, 252 rows affected (0.306 sec) -> Records: 252 Deleted: 0 Skipped: 0 Warnings: 0
  4. cities.csv
    # Import 'cities' dataset LOAD DATA LOCAL INFILE 'path_to_uploaded_file/cities.csv' INTO TABLE benchdb0.cities FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (uuid,country_iso2dg,dlg_uuid); -> Query OK, 141786 rows affected (8.214 sec) -> Records: 141786 Deleted: 0 Skipped: 0 Warnings: 0
  5. ipaddresses.csv
    # Import 'ipaddresses' dataset LOAD DATA LOCAL INFILE 'path_to_uploaded_file/ipaddresses.csv' INTO TABLE benchdb0.ipaddresses FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (network_cidr,network_start,network_end,country_iso2dg,city_uuid); -> Query OK, 4719740 rows affected (5 min 57.873 sec) -> Records: 4719740 Deleted: 0 Skipped: 0 Warnings: 0
  6. urls.csv
    # Import 'urls' dataset LOAD DATA LOCAL INFILE 'path_to_uploaded_file/urls.csv' INTO TABLE benchdb0.urls FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS (url_hash,url_src,country_iso); -> Query OK, 872238 rows affected (45.620 sec) -> Records: 872238 Deleted: 0 Skipped: 0 Warnings: 0

Concurrent Write

From client dedicated EC2-Instance 1 through 4, run the corresponding script mysql_mariadb/Bench_MySQL8_url_locX.sql:

# Concurrent INSERTs mariadb -h rds_writer_endpoint -u master_user -p --skip-ssl < path_to_uploaded_file/Bench_MySQL8_url_locX.sql

Verify that all INSERTs have been executed:

# Ensure all INSERTs are executed (Total number must be 16,000 records) mariadb -h rds_reader_endpoint -u master_user -p --skip-ssl -e "SELECT COUNT(1) AS v_url_loc_count FROM benchdb0.v_url_loc;" -> +-----------------+ -> | v_url_loc_count | -> +-----------------+ -> |           16000 | -> +-----------------+

Collect time spent for all INSERTs to be executed as well as average time per record:

# Connect server instance mariadb -h rds_reader_endpoint -u master_user -p --skip-ssl # Collect time spent + average time per client source SELECT FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1')), 3) AS total_sec_1, FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1')) / (SELECT COUNT(1) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1'), 3) AS avg_per_record_1, FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2')), 3) AS total_sec_2, FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2')) / (SELECT COUNT(1) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2'), 3) AS avg_per_record_2, FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3')), 3) AS total_sec_3, FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3')) / (SELECT COUNT(1) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3'), 3) AS avg_per_record_3, FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4')), 3) AS total_sec_4, FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4')) / (SELECT COUNT(1) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4'), 3) AS avg_per_record_4; -> +-------------+------------------+-------------+------------------+-------------+------------------+-------------+------------------+ -> | total_sec_1 | avg_per_record_1 | total_sec_2 | avg_per_record_2 | total_sec_3 | avg_per_record_3 | total_sec_4 | avg_per_record_4 | -> +-------------+------------------+-------------+------------------+-------------+------------------+-------------+------------------+ -> | 25.000      | 0.006            | 25.000      | 0.006            | 25.000      | 0.006            | 25.000      | 0.006            | -> +-------------+------------------+-------------+------------------+-------------+------------------+-------------+------------------+ -> 1 row in set (1.243 sec)

Lookup Query 1

# Count number of records in ipaddresses table SELECT COUNT(1) FROM benchdb0.ipaddresses; -> +----------+ -> | COUNT(1) | -> +----------+ -> |  4719740 | -> +----------+ -> 1 row in set (1.940 sec)

Lookup Query 2

# Retrieve geo-location data for a given IP SELECT ip.network_cidr, d3.dlg_en AS region, d1.dlg_en AS country, d2.dlg_en AS city FROM benchdb0.ipaddresses ip INNER JOIN benchdb0.countries c ON c.iso_2dg = ip.country_iso2dg INNER JOIN benchdb0.cities cy ON cy.uuid = ip.city_uuid INNER JOIN benchdb0.dialogues d1 ON d1.uuid = c.dlg_uuid INNER JOIN benchdb0.dialogues d2 ON d2.uuid = cy.dlg_uuid INNER JOIN benchdb0.regions r ON r.region_code = c.region_code INNER JOIN benchdb0.dialogues d3 ON d3.uuid = r.dlg_uuid WHERE HEX(INET6_ATON('101.53.172.220')) BETWEEN ip.network_start AND ip.network_end ORDER BY ip.network_end LIMIT 1; -> +-----------------+--------------+---------+--------+ -> | network_cidr    | region       | country | city   | -> +-----------------+--------------+---------+--------+ -> | 101.53.168.0/21 | Eastern Asia | Japan   | Haneda | -> +-----------------+--------------+---------+--------+ -> 1 row in set (1 min 14.870 sec)

Lookup Query 3

# Execute 'LIKE' query to count all URLs matching a particular pattern SELECT COUNT(1) AS match_count FROM benchdb0.urls WHERE url_src LIKE '%ibm.com%'; -> +-------------+ -> | match_count | -> +-------------+ -> |          91 | -> +-------------+ -> 1 row in set (0.543 sec)

Lookup Query 4

# Execute 'LIKE' JOINed query to count all URLs matching a particular pattern and return the country SELECT d.dlg_en AS country, COUNT(*) AS match_count FROM benchdb0.urls u INNER JOIN benchdb0.countries c ON c.iso_2dg = u.country_iso INNER JOIN benchdb0.dialogues d ON d.uuid = c.dlg_uuid WHERE url_src LIKE '%ibm.com%' GROUP BY d.dlg_en; -> +---------------+-------------+ -> | country       | match_count | -> +---------------+-------------+ -> | United States |          91 | -> +---------------+-------------+ -> 1 row in set (5.769 sec)

Server ACU & CPU Utilisation Peaks

Using AWS CloudWatch, determine what are the Aurora-Instance ACU & CPU Utilisation Peaks.

Benchmark Aurora ACU & CPU utilisation peaks

According to the above graph, the instance's CPU peaks reach 100% utilisation, both in terms of ACU and raw CPU usage.

Process Threads

Connect the server dedicated EC2-Instance and measure the number of process threads:

# Get DBS Server instance number of process threads ps -o nlwp $(pgrep -f dbsd) -> NLWP ->   10

Mass Import

From client dedicated EC2-Instance 5, connect server instance using dbscli:

# Connect server instance (-R to avoid timeout during import long process) dbscli -h server_ip_address -u dbsa -p -R 1800 --reconnect

Then, import the dataset sources in the following order:

  1. dialogues.csv
    # Import 'dialogues' dataset IMPORT CSV 'dialogues.csv' USING INSERT INTO bench.dialogues (uuid, dlg_en) VALUES ('@uuid', '@dlg_en'); -> 120668 row(s) affected (0.629 sec)
  2. regions.csv
    # Import 'regions' dataset IMPORT CSV 'regions.csv' USING INSERT INTO bench.regions (region_code, parent_code, dlg_uuid) VALUES ('@region_code', '@parent_code', '@dlg_uuid'); -> 31 row(s) affected (0.037)
  3. countries.csv
    # Import 'countries' dataset IMPORT CSV 'countries.csv' USING INSERT INTO bench.countries (iso_2dg, iso_3dg, dlg_uuid, pcountry_iso2dg, region_code) VALUES ('@iso_2dg', '@iso_3dg', '@dlg_uuid', '@pcountry_iso2dg', '@region_code'); -> 252 row(s) affected (0.027 sec)
  4. cities.csv
    # Import 'cities' dataset IMPORT CSV 'cities.csv' USING INSERT INTO bench.cities (uuid, country_iso2dg, dlg_uuid) VALUES ('@uuid', '@country_iso2dg', '@dlg_uuid'); -> 141786 row(s) affected (3.064 sec)
  5. ipaddresses.csv
    # Import 'ipaddresses' dataset IMPORT CSV 'ipaddresses.csv' USING INSERT INTO bench.ipaddresses (network_cidr, network_start, network_end, country_iso2dg, city_uuid) VALUES ('@network_cidr', '@network_start', '@network_end', '@country_iso2dg', '@city_uuid'); -> 4719740 row(s) affected (293.641 sec = 4 min 8.940 sec)
  6. urls.csv
    # Import 'urls' dataset IMPORT CSV 'urls.csv' USING INSERT INTO bench.urls (url_hash, url_src, country_iso) VALUES ('@url_hash', '@url_src', '@country_iso'); -> 872238 row(s) affected (52.845 sec)

Concurrent Write

From client dedicated EC2-Instance 1 through 4, run the corresponding script dbs/Bench_DBS_url_locX.sql:

# Concurrent INSERTs dbscli -h host_ipv4_address -u dbsa -p < path_to_uploaded_file/Bench_DBS_url_locX.sql

Verify that all INSERTs have been executed:

# Ensure all INSERTs are executed (Total number must be 16,000 records) dbscli -h server_ip_address -u dbsa -p -e "SELECT COUNT(1) AS v_url_loc_count FROM bench.v_url_loc;" -> +-----------------+ -> | v_url_loc_count | -> +-----------------+ -> |           16000 | -> +-----------------+

Collect time spent for all INSERTs to be executed as well as average time per record:

# Connect server instance dbscli -h server_ip_address -u dbsa -p # Collect time spent + average time per client source SELECT PRINTF('%.3f', ((JULIANDAY((SELECT MAX(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc1')) - JULIANDAY((SELECT MIN(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc1'))) * 86400)) AS total_sec_1, PRINTF('%.3f', ((JULIANDAY((SELECT MAX(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc1')) - JULIANDAY((SELECT MIN(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc1'))) * 86400) / (SELECT COUNT(1) FROM bench.v_url_loc WHERE file_source = 'url_loc1')) AS avg_per_record_1, PRINTF('%.3f', ((JULIANDAY((SELECT MAX(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc2')) - JULIANDAY((SELECT MIN(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc2'))) * 86400)) AS total_sec_2, PRINTF('%.3f', ((JULIANDAY((SELECT MAX(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc2')) - JULIANDAY((SELECT MIN(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc2'))) * 86400) / (SELECT COUNT(1) FROM bench.v_url_loc WHERE file_source = 'url_loc2')) AS avg_per_record_2, PRINTF('%.3f', ((JULIANDAY((SELECT MAX(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc3')) - JULIANDAY((SELECT MIN(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc3'))) * 86400)) AS total_sec_3, PRINTF('%.3f', ((JULIANDAY((SELECT MAX(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc3')) - JULIANDAY((SELECT MIN(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc3'))) * 86400) / (SELECT COUNT(1) FROM bench.v_url_loc WHERE file_source = 'url_loc3')) AS avg_per_record_3, PRINTF('%.3f', ((JULIANDAY((SELECT MAX(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc4')) - JULIANDAY((SELECT MIN(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc4'))) * 86400)) AS total_sec_4, PRINTF('%.3f', ((JULIANDAY((SELECT MAX(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc4')) - JULIANDAY((SELECT MIN(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc4'))) * 86400) / (SELECT COUNT(1) FROM bench.v_url_loc WHERE file_source = 'url_loc4')) AS avg_per_record_4; -> +-------------+------------------+-------------+------------------+-------------+------------------+-------------+------------------+ -> | total_sec_1 | avg_per_record_1 | total_sec_2 | avg_per_record_2 | total_sec_3 | avg_per_record_3 | total_sec_4 | avg_per_record_4 | -> +-------------+------------------+-------------+------------------+-------------+------------------+-------------+------------------+ -> | 152.011     | 0.038            | 152.000     | 0.038            | 156.004     | 0.039            | 152.000     | 0.038            | -> +-------------+------------------+-------------+------------------+-------------+------------------+-------------+------------------+ -> 1 row(s) affected (0.241 sec)

Lookup Query 1

# Count number of records in ipaddresses table SELECT COUNT(1) FROM bench.ipaddresses; -> +----------+ -> | COUNT(1) | -> +----------+ -> |  4719740 | -> +----------+ -> 1 row(s) affected (0.248 sec)

Lookup Query 2

# Retrieve geo-location data for a given IP SELECT ip.network_cidr, d3.dlg_en AS region, d1.dlg_en AS country, d2.dlg_en AS city FROM bench.ipaddresses ip INNER JOIN bench.countries c ON c.iso_2dg = ip.country_iso2dg INNER JOIN bench.cities cy ON cy.uuid = ip.city_uuid INNER JOIN bench.dialogues d1 ON d1.uuid = c.dlg_uuid INNER JOIN bench.dialogues d2 ON d2.uuid = cy.dlg_uuid INNER JOIN bench.regions r ON r.region_code = c.region_code INNER JOIN bench.dialogues d3 ON d3.uuid = r.dlg_uuid WHERE INET_ATOH('101.53.172.220') BETWEEN ip.network_start AND ip.network_end ORDER BY ip.network_end LIMIT 1; -> +-----------------+--------------+---------+--------+ -> | network_cidr    | region       | country | city   | -> +-----------------+--------------+---------+--------+ -> | 101.53.168.0/21 | Eastern Asia | Japan   | Haneda | -> +-----------------+--------------+---------+--------+ -> 1 row(s) affected (0.121 sec)

Lookup Query 3

# Execute 'LIKE' query to count all URLs matching a particular pattern SELECT COUNT(1) AS match_count FROM bench.urls WHERE url_src LIKE '%ibm.com%'; -> +-------------+ -> | match_count | -> +-------------+ -> |          91 | -> +-------------+ -> 1 row(s) affected (0.279 sec)

Lookup Query 4

# Execute 'LIKE' JOINed query to count all URLs matching a particular pattern and return the country SELECT d.dlg_en AS country, COUNT(*) AS match_count FROM bench.urls u INNER JOIN bench.countries c ON c.iso_2dg = u.country_iso INNER JOIN bench.dialogues d ON d.uuid = c.dlg_uuid WHERE url_src LIKE '%ibm.com%' GROUP BY d.dlg_en; -> +---------------+-------------+ -> | country       | match_count | -> +---------------+-------------+ -> | United States |          91 | -> +---------------+-------------+ -> 1 row(s) affected (0.341 sec)

Server CPU Utilisation Peak

Using AWS CloudWatch, determine what is the server dedicated EC2-Instance CPU Utilisation Peak. As only DBS server is running on the virtual machine, the EC2.CPUUtilisation metric can be reliably collected.

Benchmark DBS CPU utilisation peak

According to the above graph, the CPU usage peak reaches 16.5%.


Benchmark Metrics Summary

The following table summarises the results from the benchmark test sequences across the various technologies tested:

◉ Most efficient     ◉ Less efficient
Test Sequence MySQL MariaDB Aurora DBS
Process Threads 38 12 - 10
Mass Import (Total in Seconds) 306.260 374.563 414.965 350.243
Concurrent Write (Average Time per Record in Seconds) 0.008 0.003 0.006 0.038
Lookup Query 1 (Seconds) 1.070 15.446 1.940 0.248
Lookup Query 2 (Seconds) 60.320 151.558 74.870 0.121
Lookup Query 3 (Seconds) 7.010 14.225 0.543 0.279
Lookup Query 4 (Seconds) 1.780 55.434 5.769 0.341
Max. CPU Peak 36.7% 20.9% 100.0% 16.5%
Benchmark DBS Summary Charts

Conclusion

Although DBS is currently an MVP solution, as mentioned at the beginning of this post, it stands up well when compared to established and well-known RDBMS like MySQL, MariaDB, and Aurora. On the contrary, the benchmark results tell a different story: on average, DBS is much faster than the alternatives tested.

It's worth noting, however, that development has primarily focused on optimising the engine for lookup queries, which is clearly reflected in the results. That said, an area of improvement has been identified in handling concurrent write I/O operations — a limitation that will be addressed in future versions.

When considering Green IT — a crucial and increasingly important topic — DBS also shines. It is able to deliver better performance than the alternatives while consuming fewer resources (Threads & CPU peak). This lower resource consumption translates into reduced system requirements, which in turn leads to lower energy consumption and operational costs.

Looking ahead, DBS development will continue to focus on enhancing its capabilities, including optimising concurrent write I/O and scaling. Given its performance and energy efficiency, we strongly believe (hope? :-)) that DBS has the potential to reshape the relational database landscape, offering a powerful and cost-effective alternative.

We encourage the community to test DBS, share feedback, and contribute to its evolution.