Benchmark: DBS vs. MySQL vs. MariaDB vs. AWS Aurora (MySQL compatible)
With the release of DBS 0.99 Beta RC-1, it's time to gather performance metrics and assess how DBS compares to established solutions such as MySQL, MariaDB, and AWS Aurora (MySQL-compatible).
This article outlines our straightforward yet practical benchmarking approach, detailing the datasets used and the results obtained.
Throughout the development of DBS, from its prototype to Beta 0, performance measurements have consistently shown promising results, suggesting the project was headed in the right direction. However, these early results were somewhat skewed due to testing incomplete versions-lacking key features like security controls and user sessions.
The RC-1 version, now including all planned features, offers more consistent and accurate metrics that better reflect real-world conditions.
It's important to note, however, that while this version gives us a glimpse of the first General Availability (GA) release, it remains a Minimum Viable Product (MVP) for version 1.
Lastly, and perhaps most importantly, the results presented here may seem surprising. They are easily reproducible and verifiable, as this post includes detailed information on the lab setup, data sources, and queries used. All of these elements are fully transparent and available as usable material to anyone wishing to validate the findings.
The benchmark environment is set up on AWS for simplicity. It includes 5 EC2-instances for the client applications and 1 EC2-instance for the server applications. For the tests involving AWS Aurora, the RDS/Serverless (MySQL compatible) service is used.
AWS Applied Configuration
Client: 5 x EC2-Instance
Server: 1 x EC2-Instance
Aurora (RDS)
Amazon Linux 2
Kernel 5.10 64-bit (x86)
Instance Type: t2.micro (1 vCPU, 1 GiB Memory)
Storage Volume Type: SSD gp3 (16 GiB)
Standard Network Profile (Allow SSH)
Amazon Linux 2
Kernel 5.10 64-bit (x86)
Instance Type: t2.xlarge (4 vCPU, 16 GiB Memory)
Storage Volume Type: SSD gp3 (16 GiB)
Standard Network Profile (Allow SSH)
Aurora Standard (MySQL compatible)
Serverless v2, default.aurora-mysql8.0
From 2 ACUs to 8 ACUs, autoscaling
SSD Volume, autoscaling
Datasets
The datasets used for the benchmark are not particularly large but are sufficient to run multiple join queries. While these queries are not overly complex, they require the applications to efficiently utilise their resources, ensuring meaningful measurements for identifying key performance indicators.
These datasets are primarily "IP geo-location" oriented and consist of the following sources:
A list of dialogues, with location labels/names, totaling 120,668 records.
A list of regions, containing continental positioning data, totaling 31 records.
A list of countries, with country-level coordinates, totaling 252 records.
A list of cities, providing location data for each place within its respective country, totaling 141,786 records.
A list of IP addresses, serving as one geo-location source to be linked with the other datasets, totaling 4,719,740 records.
A list of URLs, another geo-location source, to be linked with the previous data, totaling 872,238 records.
The url_loc lists used to simulate the "write I/O" load increase, with 16,000 INSERT commands in total into flat datastores from 4 concurrent client applications.
Note: All of the data are test data, mostly generated randomly, and should not be considered reliable for use in any other context.
Setup
It is suggested to generally follow these steps to make the setup process easier and faster:
I. Client instances
Launch & initialise the firstclient dedicated EC2-Instance.
When it is running, update Linux packages (sudo yum update -y).
Power off the instance and make a snapshot of the volume.
Prepare the nextclient dedicated EC2-Instances without launching them.
Replace the volume of each client dedicated EC2 instance from 2 to 5 with the restore of the first snapshot.
Launch the 5 client dedicated EC2-Instances [with volume replaced by snapshot].
Install one of the software packages according to the below procedure.
II. Server instance
Launch & initialise the server dedicated EC2-Instance.
When it is running, install AWS CloudWatch agent (sudo yum install amazon-cloudwatch-agent -y)
Update Linux packages (sudo yum update -y).
Power off the instance and make a snapshot of the volume.
Launch the server dedicated EC2-Instance [with volume replaced by snapshot].
Install one of the software packages according to the below procedure.
Note: Both instance volume snapshots are also useful to avoid any potential conflicts between MySQL and MariaDB software.
Once the software is configured and running, perform benchmark sequences and collect the measures. Then:
Power off all EC2-Instances.
Replace all instance volumes with their corresponding snapshot.
Repeat operation from step I.6 client-side and from step II.5 server-side to proceed with the next one.
III. Security Groups
It is required to allow ports 3306 (MySQL/PariaDB) & 4404 (DBS), with the IP version set to IPv4 and the Protocol set to TCP, in the Inbound rules.
MySQL not being part of AWS Linux 2 repository, it is required to install it directly from MySQL yum repository.
Note: Although version 8.4.4 is currently published, the procedure below installs version 8.0.41, as many dependencies are "unresolved" with the latest version in the AWS Linux 2 repository. Moreover, this makes the comparison more aligned in terms of version with Aurora, since the latter provides compatibility with MySQL 8.0.39 at the time this benchmark is run.
To avoid potential "Oracle GPG package signature failure" problem:
# Download MySQL GPG key
sudo rpm --import https://repo.mysql.com/RPM-GPG-KEY-mysql-2023
Add the MySQL yum repository:
# 1. Download the MySQL yum repo. RPM
sudo wget https://dev.mysql.com/get/mysql80-community-release-el7-5.noarch.rpm
# 2. Install the MySQL yum repo.
sudo rpm -ivh mysql80-community-release-el7-5.noarch.rpm
Install MySQL Community Server:
# Install MySQL Community Server and its dependencies
sudo yum install mysql-server -y
Enable & start MySQL Server service:
# 1. Enable MySQL Server service
sudo systemctl enable mysqld
# 2. Start MySQL Server service
sudo systemctl start mysqld
Verify that MySQL Server is running:
# Check MySQL Server service status
sudo systemctl status mysqld
Initialise the MySQL Server configuration:
Before all, it is needed to retrieve the default root password that has been randomly generated during the installation process.
# Retrieve temporary password generated for 'root@localhost' from MySQL Server log
sudo grep 'temporary password' /var/log/mysqld.log
Now the secure configuration tool can be launched:
# Start MySQL Server secure configuration tool
sudo mysql_secure_installation
The following options are selected:
# Start MySQL Server secure configuration tool
-> Securing the MySQL server deployment.
-> Enter password for user root: Enter the retrieved password
-> The existing password for the user account root has expired. Please set a new password.
-> New password: Enter the new password
-> Re-enter new password: Confirm the new password
-> Change the password for root ? ((Press y|Y for Yes, any other key for No) : y
-> By default, a MySQL installation has an anonymous user,
-> allowing anyone to log into MySQL without having to have
-> a user account created for them. This is intended only for
-> testing, and to make the installation go a bit smoother.
-> You should remove them before moving into a production
-> environment.
-> Remove anonymous users? (Press y|Y for Yes, any other key for No) : y
-> Normally, root should only be allowed to connect from
-> 'localhost'. This ensures that someone cannot guess at
-> the root password from the network.
-> Disallow root login remotely? (Press y|Y for Yes, any other key for No) : n
-> ... skipping.
-> By default, MySQL comes with a database named 'test' that
-> anyone can access. This is also intended only for testing,
-> and should be removed before moving into a production
-> environment.
-> Remove test database and access to it? (Press y|Y for Yes, any other key for No) : y
-> - Dropping test database...
-> Success.
-> - Removing privileges on test database...
-> Success.
-> Reloading the privilege tables will ensure that all changes
-> made so far will take effect immediately.
-> Reload privilege tables now? (Press y|Y for Yes, any other key for No) : y
-> Success.
-> All done!
Note: Despite answering N to the "Disallow root login remotely?" question, you may encounter an Access Denied error during the first remote connection with mysql client. If that happens, you will need to connect to the server locally and execute the following commands:
# Solve 'access denied' issue for remote login
mysql -h server_ip_address -u root -p -e "UPDATE mysql.user SET host = '%' WHERE user = 'root' AND host = 'localhost';"
# Restart MySQL Server service
sudo systemctl restart mysqld
It is more than likely that the MySQL server is listening on TCPIPv6 type addresses only. To verify the addresses bound by MySQL Server for listening:
# List addresses & ports MySQL Server is listening
sudo netstat -tulpn | grep LISTEN | grep mysqld
-> tcp6 0 0 :::3306 :::* LISTEN pid/mysqld
-> tcp6 0 0 :::33060 :::* LISTEN pid/mysqld
If so, it is suggested to make MySQL Server also listen on TCPIPv4 type addresses to simplify connectivity from the client.
To do this, edit the MySQL Server configuration file /etc/my.cnf and add the following under the [mysqld] section:
[mysqld]
bind-address=0.0.0.0
Then, it is required to restart the MySQL Server to apply the changes:
# Restart MySQL Server service
sudo systemctl restart mysqld
Check again the addresses bound by MySQL Server for listening:
# List addresses & ports MySQL Server is listening
sudo netstat -tulpn | grep LISTEN | grep mysqld
-> tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN pid/mysqld
-> tcp6 0 0 :::33060 :::* LISTEN pid/mysqld
On each client dedicated EC2-Instance, repeat I. MySQL Server steps from 1 to 2. Then:
Install MySQL Community Client:
# Install MySQL Community Client and its dependencies
sudo yum install mysql -y
Verify that client is able to connect to the server:
# Connect MySQL Server instance
mysql -h host_ipv4_address -u root -p
Copy mysql_mariadb/Bench_MySQL8_DDL.sql to one of the client dedicated EC2-Instances, wherever it is accessible from MySQL Client application.
Copy mysql_mariadb/Bench_MySQL8_url_locX.sql, from 1 through 4, to their respective client dedicated EC2-Instance, wherever they are accessible from MySQL Client application.
Copy the following 6 files to the server dedicated EC2-Instance, in the /var/lib/mysql-files/ directory:
datasets/cities.csv
datasets/countries.csv
datasets/dialogues.csv
datasets/ipaddresses.csv
datasets/regions.csv
datasets/urls.csv
Change copied files ownership to ensure mysqld process is allowed to read them:
B. Next, create the benchmark data model using mysql and the DDL script mysql_mariadb/Bench_MySQL8_DDL.sql previously copied:
# Create benchmark data model
mysql -h server_ip_address -u root -p < path_to_uploaded_file/Bench_MySQL8_DDL.sql
C. Finally, verify that the data model is successfully created and complete:
# Connect server instance
mysql -h server_ip_address -u root -p
# Ensure all datastores in benchdb0 are available (Should display empty sets)
SELECT
(SELECT COUNT(1) FROM benchdb0.dialogues) AS dialogues_count,
(SELECT COUNT(1) FROM benchdb0.regions) AS regions_count,
(SELECT COUNT(1) FROM benchdb0.countries) AS countries_count,
(SELECT COUNT(1) FROM benchdb0.cities) AS cities_count,
(SELECT COUNT(1) FROM benchdb0.ipaddresses) AS ipaddresses_count,
(SELECT COUNT(1) FROM benchdb0.urls) AS urls_count;
-> +-----------------+---------------+-----------------+--------------+-------------------+------------+
-> | dialogues_count | regions_count | countries_count | cities_count | ipaddresses_count | urls_count |
-> +-----------------+---------------+-----------------+--------------+-------------------+------------+
-> | 0 | 0 | 0 | 0 | 0 | 0 |
-> +-----------------+---------------+-----------------+--------------+-------------------+------------+
-> 1 row in set (0.01 sec)
# Ensure all datastores in benchdb1 -> benchdb4 are available
SELECT COUNT(1) AS v_url_loc_count FROM benchdb0.v_url_loc;
-> +-----------------+
-> | v_url_loc_count |
-> +-----------------+
-> | 0 |
-> +-----------------+
-> 1 row in set (0.00 sec)
The MariaDB edition used for the benchmark is the Community Edition.
Since MariaDB is part of the AWS Linux 2 repository, it should be easy to install it from AWS Linux 2 yum repository. However, the latest version available in the AWS Linux 2 repository is MariaDB 5.5.68, so it is required to tweak the MariaDB yum repo. configuration to get a more recent release.
Note: Although version 11.7.2 is currently published, the procedure below installs version 11.6.1, as it is the latest version available for AWS Linux 2, which is thought to have an underlying operating system based on CentOS 7.
Edit the MariaDB yum repository by either replacing the entire content of the configuration file or adding to it with the following:
Copy mysql_mariadb/Bench_MySQL8_DDL.sql to one of the client dedicated EC2-Instances, wherever it is accessible from MariaDB Client application.
Copy mysql_mariadb/Bench_MySQL8_url_locX.sql, from 1 through 4, to their respective client dedicated EC2-Instance, wherever they are accessible from MariaDB Client application.
Copy the following 6 files to the server dedicated EC2-Instance, in the /var/lib/mysql/ directory:
datasets/cities.csv
datasets/countries.csv
datasets/dialogues.csv
datasets/ipaddresses.csv
datasets/regions.csv
datasets/urls.csv
Change copied files ownership to ensure mariadbd process is allowed to read them:
B. Next, create the benchmark data model using mariadb and the DDL script mysql_mariadb/Bench_MySQL8_DDL.sql previously copied:
# Create benchmark data model
mariadb -h server_ip_address -u root -p < path_to_uploaded_file/Bench_MySQL8_DDL.sql
C. Finally, verify that the data model is successfully created and complete:
# Connect server instance
mariadb -h server_ip_address -u root -p
# Ensure all datastores in benchdb0 are available (Should display empty sets)
SELECT
(SELECT COUNT(1) FROM benchdb0.dialogues) AS dialogues_count,
(SELECT COUNT(1) FROM benchdb0.regions) AS regions_count,
(SELECT COUNT(1) FROM benchdb0.countries) AS countries_count,
(SELECT COUNT(1) FROM benchdb0.cities) AS cities_count,
(SELECT COUNT(1) FROM benchdb0.ipaddresses) AS ipaddresses_count,
(SELECT COUNT(1) FROM benchdb0.urls) AS urls_count;
-> +-----------------+---------------+-----------------+--------------+-------------------+------------+
-> | dialogues_count | regions_count | countries_count | cities_count | ipaddresses_count | urls_count |
-> +-----------------+---------------+-----------------+--------------+-------------------+------------+
-> | 0 | 0 | 0 | 0 | 0 | 0 |
-> +-----------------+---------------+-----------------+--------------+-------------------+------------+
-> 1 row in set (0.002 sec)
# Ensure all datastores in benchdb1 -> benchdb4 are available
SELECT COUNT(1) AS v_url_loc_count FROM benchdb0.v_url_loc;
-> +-----------------+
-> | v_url_loc_count |
-> +-----------------+
-> | 0 |
-> +-----------------+
-> 1 row in set (0.001 sec)
To start setting up Aurora (MySQL compatible), you have to connect to your AWS RDS Console.
Select the Databases menu and click on Create database button. Then:
Select Standard create as Database creation method option.
Select Aurora (MySQL compatible) as Engine type option.
Select Aurora MySQL 3.08.2 (compatible with MySQL 8.0.39) as Engine version option, which is currently the latest one.
Select Dev/Test as Template option.
Optionally, change the default name of the cluster identifier.
Carefully set your Master username. This user is the database server's primary user which compulsorily replaces the default MySQL/MariaDB root user.
Select Self managed as Credentials management option and define carefully your password.
Select Aurora standard as Configuration option.
Select Serverless v2 as DB instance class option.
Set 2 ACUs as Minimum capacity and 8 ACUs as Maximum capacity options. The reason is that it is not possible to precisely define the number of CPUs and memory separately in this type of configuration. That said, by specifying the proposed values ranging from 2 to 8 ACUs, Aurora starts with approximately the equivalent of 4 CPUs and 4 GiB of memory, and automatically scales up to a maximum of 16 CPUs and 16 GiB of memory in case of load increase; this allows for a configuration that closely matches the other tested technologies.
Select Don't create an Aurora Replica as Multi-AZ deployment option. It is useless to configure a full replicated cluster in current benchmark context.
Select Don't connect to an EC2 compute resource as Compute resource option.
Select IPv4 as Network type option.
Select the VPC and the DB subnet group according to the ones where your client dedicated EC2-Instances reside.
Select No as Public access option.
Select Choose existing as VPC security group option and select the Security Group you previously defined.
Select the Availability zone according to the ones where your client dedicated EC2-Instances reside.
Uncheck the RDS Proxy checkbox.
Uncheck the Enable Performance insights checkbox.
Check the Enhanced Monitoring checkbox, select 60 seconds and select default as respectively OS metrics granularity and Monitoring role for OS metrics options.
Check the Error Log checkbox as Log exports option.
Leave all other options at their default settings.
Review the various settings and click on Create database button underneath.
Once the Aurora instance is created, it is necessary to collect both the Reader and the Writerendpoints in order to be able to connect them with the client applications.
If you followed the order of the technologies presented here, Aurora being based on MariaDB, the client dedicated installations can be retained and thus reused.
If you didn't follow the order of the technologies presented here, the procedure II. MariaDB Clients is the one that can be carried out.
Since RDS doesn't allow access to the underlying system, the files must be uploaded from client application, which intrinsically has a slight impact on the measurements.
So, once MariaDB Client is installed, it is recommended to ensure that the server allows local file upload:
# Connect server instance
mariadb -h rds_reader_endpoint -u master_user -p --skip-ssl
# Verify that Aurora/RDS allows 'local file upload' (Value = ON)
SHOW VARIABLES LIKE 'local_infile';
-> +---------------+-------+
-> | Variable_name | Value |
-> +---------------+-------+
-> | local_infile | ON |
-> +---------------+-------+
A.Only if you made a fresh install of MariaDB Client applications, from the benchmark material archive:
Copy mysql_mariadb/Bench_MySQL8_DDL.sql to one of the client dedicated EC2-Instances, wherever it is accessible from MariaDB Client application.
Copy mysql_mariadb/Bench_MySQL8_url_locX.sql, from 1 through 4, to their respective client dedicated EC2-Instance, wherever they are accessible from MariaDB Client application.
Copy the following 6 files to the one of the client dedicated EC2-Instances, wherever it is accessible from MariaDB Client application:
datasets/cities.csv
datasets/countries.csv
datasets/dialogues.csv
datasets/ipaddresses.csv
datasets/regions.csv
datasets/urls.csv
B. Next, create the benchmark data model using mariadb and the DDL script mysql_mariadb/Bench_MySQL8_DDL.sql previously copied:
# Create benchmark data model
mariadb -h rds_writer_endpoint -u master_user -p --skip-ssl < path_to_uploaded_file/Bench_MySQL8_DDL.sql
C. Finally, verify that the data model is successfully created and complete:
# Connect server instance (Don't forget the swicth --skip-ssl as Aurora requires to secure communication by default!)
mariadb -h rds_reader_endpoint -u master_user -p --skip-ssl
# Ensure all datastores in benchdb0 are available (Should display empty sets)
SELECT
(SELECT COUNT(1) FROM benchdb0.dialogues) AS dialogues_count,
(SELECT COUNT(1) FROM benchdb0.regions) AS regions_count,
(SELECT COUNT(1) FROM benchdb0.countries) AS countries_count,
(SELECT COUNT(1) FROM benchdb0.cities) AS cities_count,
(SELECT COUNT(1) FROM benchdb0.ipaddresses) AS ipaddresses_count,
(SELECT COUNT(1) FROM benchdb0.urls) AS urls_count;
-> +-----------------+---------------+-----------------+--------------+-------------------+------------+
-> | dialogues_count | regions_count | countries_count | cities_count | ipaddresses_count | urls_count |
-> +-----------------+---------------+-----------------+--------------+-------------------+------------+
-> | 0 | 0 | 0 | 0 | 0 | 0 |
-> +-----------------+---------------+-----------------+--------------+-------------------+------------+
-> 1 row in set (0.062 sec)
# Ensure all datastores in benchdb1 -> benchdb4 are available
SELECT COUNT(1) AS v_url_loc_count FROM benchdb0.v_url_loc;
-> +-----------------+
-> | v_url_loc_count |
-> +-----------------+
-> | 0 |
-> +-----------------+
-> 1 row in set (0.009 sec)
Currently, there's no official repository yet for the DBS packages distribution, it is therefore needed to download and install the corresponding RPM manually.
On the server dedicated instance, once the RPM is available:
# Install server RPM
sudo rpm -i dbsd-x64-0.99-beta-rc-1.rpm
When DBS Server is installed, it is configured with default restrictions for security reasons. To waive these restrictions, you must edit the /etc/dbs/dbs.conf file and change the following settings, as the purpose of this benchmark is not to evaluate security aspects:
# 1. Disable 'password policy' enforcement
[Instance]
...
passwordpol=false
# 2. Bind all system TCPIPv4 addresses
svctcpipv4addr=0.0.0.0
# 3. Allow up to 5 simultaneaous sessions (Concurrent write test)
svcmaxsessions=5
Enable & start DBS Server service:
# 1. Enable DBS Server service
sudo systemctl enable dbsd
# 2. Start DBS Server service
sudo systemctl start dbsd
Verify that DBS Server is running:
# Check DBS Server service status
sudo systemctl status dbsd
Verify the addresses bound by DBS Server for listening:
Copy dbs/Bench_DBS_DDL.sql to one of the client dedicated EC2-Instances, wherever it is accessible from DBS Client application.
Copy dbs/Bench_DBS_url_locX.sql, from 1 through 4, to their respective client dedicated EC2-Instance, wherever they are accessible from DBS Client application.
Copy the following 6 files to the server dedicated EC2-Instance, in the /var/dbs/data/lsf/ directory:
datasets/cities.csv
datasets/countries.csv
datasets/dialogues.csv
datasets/ipaddresses.csv
datasets/regions.csv
datasets/urls.csv
Change copied files ownership to ensure dbsd process is allowed to read them:
B. Next, create the benchmark data model using dbscli and the DDL script dbs/Bench_DBS_DDL.sql previously copied:
# Create benchmark data model
dbscli -h server_ip_address -u dbsa -p < path_to_uploaded_file/Bench_DBS_DDL.sql
C. Finally, verify that the data model is successfully created and complete:
Note: By running the following commands, the 'bench' schema is automatically mounted; this allows skipping this stage, which usually takes 0.5 to 2 seconds, during the test sequences below.
# Connect server instance
dbscli -h server_ip_address -u dbsa -p
# Ensure all datastores in bench schema are available (Should display empty sets)
SELECT
(SELECT COUNT(1) FROM bench.dialogues) AS dialogues_count,
(SELECT COUNT(1) FROM bench.regions) AS regions_count,
(SELECT COUNT(1) FROM bench.countries) AS countries_count,
(SELECT COUNT(1) FROM bench.cities) AS cities_count,
(SELECT COUNT(1) FROM bench.ipaddresses) AS ipaddresses_count,
(SELECT COUNT(1) FROM bench.urls) AS urls_count;
-> +-----------------+---------------+-----------------+--------------+-------------------+------------+
-> | dialogues_count | regions_count | countries_count | cities_count | ipaddresses_count | urls_count |
-> +-----------------+---------------+-----------------+--------------+-------------------+------------+
-> | 0 | 0 | 0 | 0 | 0 | 0 |
-> +-----------------+---------------+-----------------+--------------+-------------------+------------+
-> 1 row(s) affected (0.137 sec)
# Ensure all datastores in benchtbls1 -> benchtbls4 are available
SELECT COUNT(1) AS v_url_loc_count FROM bench.v_url_loc;
-> +-----------------+
-> | v_url_loc_count |
-> +-----------------+
-> | 0 |
-> +-----------------+
-> 1 row(s) affected (0.158 sec)
Test Sequences
To execute the test sequences below, it is necessary to connect to the various EC2-instances via SSH. This obviously assumes that the applications involving the technology to test are up and running.
Note:
Test results are highlighted below in green and are summarised in the Benchmark Metrics Summary section.
During the setup of MySQL, MariaDB, and Aurora, several databases were created to segregate datastores in a way that aligns as closely as possible the concept of DBS tablespaces. This approach helps avoid comparing apples to oranges in certain aspects.
You may notice minor differences in the execution of sequences in your environment. These discrepancies, ranging from a few milliseconds to several seconds, may be due to factors such as network latency, host machine utilisation on which the VMs are running, regional load/capacity, etc.
Process Threads
Connect the server dedicated EC2-Instance and measure the number of process threads:
# Get MySQL Server instance number of process threads
ps -o nlwp $(pgrep -f mysqld)
-> NLWP
-> 38
Mass Import
From client dedicated EC2-Instance 5, connect server instance using mysql:
# Connect server instance
mysql -h server_ip_address -u root -p
Then, import the dataset sources in the following order:
dialogues.csv
# Import 'dialogues' dataset
LOAD DATA INFILE '/var/lib/mysql-files/dialogues.csv'
INTO TABLE benchdb0.dialogues
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(uuid,dlg_en);
-> Query OK, 120668 rows affected (1.12 sec)
-> Records: 120668 Deleted: 0 Skipped: 0 Warnings: 0
regions.csv
# Import 'regions' dataset
LOAD DATA INFILE '/var/lib/mysql-files/regions.csv'
INTO TABLE benchdb0.regions
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(region_code,@nparent_code,dlg_uuid)
SET parent_code = NULLIF(@nparent_code, '');
-> Query OK, 31 rows affected (0.02 sec)
-> Records: 31 Deleted: 0 Skipped: 0 Warnings: 0
countries.csv
# Import 'countries' dataset
LOAD DATA INFILE '/var/lib/mysql-files/countries.csv'
INTO TABLE benchdb0.countries
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(iso_2dg,iso_3dg,dlg_uuid,@npcountry_iso2dg,region_code)
SET pcountry_iso2dg = NULLIF(@npcountry_iso2dg, '');
-> Query OK, 252 rows affected (0.02 sec)
-> Records: 252 Deleted: 0 Skipped: 0 Warnings: 0
cities.csv
# Import 'cities' dataset
LOAD DATA INFILE '/var/lib/mysql-files/cities.csv'
INTO TABLE benchdb0.cities
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(uuid,country_iso2dg,dlg_uuid);
-> Query OK, 141786 rows affected (3.69 sec)
-> Records: 141786 Deleted: 0 Skipped: 0 Warnings: 0
ipaddresses.csv
# Import 'ipaddresses' dataset
LOAD DATA INFILE '/var/lib/mysql-files/ipaddresses.csv'
INTO TABLE benchdb0.ipaddresses
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(network_cidr,network_start,network_end,country_iso2dg,city_uuid);
-> Query OK, 4719740 rows affected (4 min 4.97 sec)
-> Records: 4719740 Deleted: 0 Skipped: 0 Warnings: 0
urls.csv
# Import 'urls' dataset
LOAD DATA INFILE '/var/lib/mysql-files/urls.csv'
INTO TABLE benchdb0.urls
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(url_hash,url_src,country_iso);
-> Query OK, 872238 rows affected (56.44 sec)
-> Records: 872238 Deleted: 0 Skipped: 0 Warnings: 0
Concurrent Write
From client dedicated EC2-Instance 1 through 4, run the corresponding script mysql_mariadb/Bench_MySQL8_url_locX.sql:
# Ensure all INSERTs are executed (Total number must be 16,000 records)
mysql -h server_ip_address -u root -p -e "SELECT COUNT(1) AS v_url_loc_count FROM benchdb0.v_url_loc;"
-> +-----------------+
-> | v_url_loc_count |
-> +-----------------+
-> | 16000 |
-> +-----------------+
Collect time spent for all INSERTs to be executed as well as average time per record:
# Connect server instance
mysql -h host_ipv4_address -u root -p
# Collect time spent + average time per client source
SELECT
FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1')), 3) AS total_sec_1,
FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1')) / (SELECT COUNT(1) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1'), 3) AS avg_per_record_1,
FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2')), 3) AS total_sec_2,
FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2')) / (SELECT COUNT(1) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2'), 3) AS avg_per_record_2,
FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3')), 3) AS total_sec_3,
FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3')) / (SELECT COUNT(1) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3'), 3) AS avg_per_record_3,
FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4')), 3) AS total_sec_4,
FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4')) / (SELECT COUNT(1) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4'), 3) AS avg_per_record_4;
-> +-------------+------------------+-------------+------------------+-------------+------------------+-------------+------------------+
-> | total_sec_1 | avg_per_record_1 | total_sec_2 | avg_per_record_2 | total_sec_3 | avg_per_record_3 | total_sec_4 | avg_per_record_4 |
-> +-------------+------------------+-------------+------------------+-------------+------------------+-------------+------------------+
-> | 32.000 | 0.008 | 32.000 | 0.008 | 32.000 | 0.008 | 32.000 | 0.008 |
-> +-------------+------------------+-------------+------------------+-------------+------------------+-------------+------------------+
-> 1 row in set (0.37 sec)
Lookup Query 1
# Count number of records in ipaddresses table
SELECT COUNT(1) FROM benchdb0.ipaddresses;
-> +----------+
-> | COUNT(1) |
-> +----------+
-> | 4719740 |
-> +----------+
-> 1 row in set (1.07 sec)
Lookup Query 2
# Retrieve geo-location data for a given IP
SELECT ip.network_cidr, d3.dlg_en AS region, d1.dlg_en AS country, d2.dlg_en AS city
FROM benchdb0.ipaddresses ip
INNER JOIN benchdb0.countries c ON c.iso_2dg = ip.country_iso2dg
INNER JOIN benchdb0.cities cy ON cy.uuid = ip.city_uuid
INNER JOIN benchdb0.dialogues d1 ON d1.uuid = c.dlg_uuid
INNER JOIN benchdb0.dialogues d2 ON d2.uuid = cy.dlg_uuid
INNER JOIN benchdb0.regions r ON r.region_code = c.region_code
INNER JOIN benchdb0.dialogues d3 ON d3.uuid = r.dlg_uuid
WHERE HEX(INET6_ATON('101.53.172.220')) BETWEEN ip.network_start AND ip.network_end
ORDER BY ip.network_end
LIMIT 1;
-> +-----------------+--------------+---------+--------+
-> | network_cidr | region | country | city |
-> +-----------------+--------------+---------+--------+
-> | 101.53.168.0/21 | Eastern Asia | Japan | Haneda |
-> +-----------------+--------------+---------+--------+
-> 1 row in set (1 min 0.32 sec)
Lookup Query 3
# Execute 'LIKE' query to count all URLs matching a particular pattern
SELECT COUNT(1) AS match_count FROM benchdb0.urls WHERE url_src LIKE '%ibm.com%';
-> +-------------+
-> | match_count |
-> +-------------+
-> | 91 |
-> +-------------+
-> 1 row in set (7.01 sec)
Lookup Query 4
# Execute 'LIKE' JOINed query to count all URLs matching a particular pattern and return the country
SELECT d.dlg_en AS country, COUNT(*) AS match_count
FROM benchdb0.urls u
INNER JOIN benchdb0.countries c ON c.iso_2dg = u.country_iso
INNER JOIN benchdb0.dialogues d ON d.uuid = c.dlg_uuid
WHERE url_src LIKE '%ibm.com%'
GROUP BY d.dlg_en;
-> +---------------+-------------+
-> | country | match_count |
-> +---------------+-------------+
-> | United States | 91 |
-> +---------------+-------------+
-> 1 row in set (1.78 sec)
Server CPU Utilisation Peak
Using AWS CloudWatch, determine what is the server dedicated EC2-Instance CPU Utilisation Peak. As only MySQL server is running on the virtual machine, the EC2.CPUUtilisation metric can be reliably collected.
According to the above graph, the CPU usage peak reaches 36.7%.
Process Threads
Connect the server dedicated EC2-Instance and measure the number of process threads:
# Get MariaDB Server instance number of process threads
ps -o nlwp $(pgrep -f mariadbd)
-> NLWP
-> 12
Mass Import
From client dedicated EC2-Instance 5, connect server instance using mariadb:
# Connect server instance
mariadb -h server_ip_address -u root -p
Then, import the dataset sources in the following order:
dialogues.csv
# Import 'dialogues' dataset
LOAD DATA INFILE '/var/lib/mysql/dialogues.csv'
INTO TABLE benchdb0.dialogues
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(uuid,dlg_en);
-> Query OK, 120668 rows affected (0.442 sec)
-> Records: 120668 Deleted: 0 Skipped: 0 Warnings: 0
regions.csv
# Import 'regions' dataset
LOAD DATA INFILE '/var/lib/mysql/regions.csv'
INTO TABLE benchdb0.regions
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(region_code,@nparent_code,dlg_uuid)
SET parent_code = NULLIF(@nparent_code, '');
-> Query OK, 31 rows affected (0.006 sec)
-> Records: 31 Deleted: 0 Skipped: 0 Warnings: 0
countries.csv
# Import 'countries' dataset
LOAD DATA INFILE '/var/lib/mysql/countries.csv'
INTO TABLE benchdb0.countries
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(iso_2dg,iso_3dg,dlg_uuid,@npcountry_iso2dg,region_code)
SET pcountry_iso2dg = NULLIF(@npcountry_iso2dg, '');
-> Query OK, 252 rows affected (0.010 sec)
-> Records: 252 Deleted: 0 Skipped: 0 Warnings: 0
cities.csv
# Import 'cities' dataset
LOAD DATA INFILE '/var/lib/mysql/cities.csv'
INTO TABLE benchdb0.cities
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(uuid,country_iso2dg,dlg_uuid);
-> Query OK, 141786 rows affected (1.918 sec)
-> Records: 141786 Deleted: 0 Skipped: 0 Warnings: 0
ipaddresses.csv
# Import 'ipaddresses' dataset
LOAD DATA INFILE '/var/lib/mysql/ipaddresses.csv'
INTO TABLE benchdb0.ipaddresses
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(network_cidr,network_start,network_end,country_iso2dg,city_uuid);
-> Query OK, 4719740 rows affected (5 min 46.031 sec)
-> Records: 4719740 Deleted: 0 Skipped: 0 Warnings: 0
urls.csv
# Import 'urls' dataset
LOAD DATA INFILE '/var/lib/mysql/urls.csv'
INTO TABLE benchdb0.urls
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(url_hash,url_src,country_iso);
-> Query OK, 872238 rows affected (26.156 sec)
-> Records: 872238 Deleted: 0 Skipped: 0 Warnings: 0
Concurrent Write
From client dedicated EC2-Instance 1 through 4, run the corresponding script mysql_mariadb/Bench_MySQL8_url_locX.sql:
# Ensure all INSERTs are executed (Total number must be 16,000 records)
mariadb -h server_ip_address -u root -p -e "SELECT COUNT(1) AS v_url_loc_count FROM benchdb0.v_url_loc;"
-> +-----------------+
-> | v_url_loc_count |
-> +-----------------+
-> | 16000 |
-> +-----------------+
Collect time spent for all INSERTs to be executed as well as average time per record:
# Connect server instance
mariadb -h host_ipv4_address -u root -p
# Collect time spent + average time per client source
SELECT
FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1')), 3) AS total_sec_1,
FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1')) / (SELECT COUNT(1) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1'), 3) AS avg_per_record_1,
FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2')), 3) AS total_sec_2,
FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2')) / (SELECT COUNT(1) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2'), 3) AS avg_per_record_2,
FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3')), 3) AS total_sec_3,
FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3')) / (SELECT COUNT(1) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3'), 3) AS avg_per_record_3,
FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4')), 3) AS total_sec_4,
FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4')) / (SELECT COUNT(1) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4'), 3) AS avg_per_record_4;
-> +-------------+------------------+-------------+------------------+-------------+------------------+-------------+------------------+
-> | total_sec_1 | avg_per_record_1 | total_sec_2 | avg_per_record_2 | total_sec_3 | avg_per_record_3 | total_sec_4 | avg_per_record_4 |
-> +-------------+------------------+-------------+------------------+-------------+------------------+-------------+------------------+
-> | 10.000 | 0.003 | 11.000 | 0.003 | 12.000 | 0.003 | 11.000 | 0.003 |
-> +-------------+------------------+-------------+------------------+-------------+------------------+-------------+------------------+
-> 1 row in set (1.243 sec)
Lookup Query 1
# Count number of records in ipaddresses table
SELECT COUNT(1) FROM benchdb0.ipaddresses;
-> +----------+
-> | COUNT(1) |
-> +----------+
-> | 4719740 |
-> +----------+
-> 1 row in set (15.446 sec)
Lookup Query 2
# Retrieve geo-location data for a given IP
SELECT ip.network_cidr, d3.dlg_en AS region, d1.dlg_en AS country, d2.dlg_en AS city
FROM benchdb0.ipaddresses ip
INNER JOIN benchdb0.countries c ON c.iso_2dg = ip.country_iso2dg
INNER JOIN benchdb0.cities cy ON cy.uuid = ip.city_uuid
INNER JOIN benchdb0.dialogues d1 ON d1.uuid = c.dlg_uuid
INNER JOIN benchdb0.dialogues d2 ON d2.uuid = cy.dlg_uuid
INNER JOIN benchdb0.regions r ON r.region_code = c.region_code
INNER JOIN benchdb0.dialogues d3 ON d3.uuid = r.dlg_uuid
WHERE HEX(INET6_ATON('101.53.172.220')) BETWEEN ip.network_start AND ip.network_end
ORDER BY ip.network_end
LIMIT 1;
-> +-----------------+--------------+---------+--------+
-> | network_cidr | region | country | city |
-> +-----------------+--------------+---------+--------+
-> | 101.53.168.0/21 | Eastern Asia | Japan | Haneda |
-> +-----------------+--------------+---------+--------+
-> 1 row in set (2 min 31.558 sec)
Lookup Query 3
# Execute 'LIKE' query to count all URLs matching a particular pattern
SELECT COUNT(1) AS match_count FROM benchdb0.urls WHERE url_src LIKE '%ibm.com%';
-> +-------------+
-> | match_count |
-> +-------------+
-> | 91 |
-> +-------------+
-> 1 row in set (14.225 sec)
Lookup Query 4
# Execute 'LIKE' JOINed query to count all URLs matching a particular pattern and return the country
SELECT d.dlg_en AS country, COUNT(*) AS match_count
FROM benchdb0.urls u
INNER JOIN benchdb0.countries c ON c.iso_2dg = u.country_iso
INNER JOIN benchdb0.dialogues d ON d.uuid = c.dlg_uuid
WHERE url_src LIKE '%ibm.com%'
GROUP BY d.dlg_en;
-> +---------------+-------------+
-> | country | match_count |
-> +---------------+-------------+
-> | United States | 91 |
-> +---------------+-------------+
-> 1 row in set (55.434 sec)
Server CPU Utilisation Peak
Using AWS CloudWatch, determine what is the server dedicated EC2-Instance CPU Utilisation Peak. As only MariaDB server is running on the virtual machine, the EC2.CPUUtilisation metric can be reliably collected.
According to the above graph, the CPU usage peak reaches 20.9%.
Process Threads
There is no mean to get such information as RDS doesn't allow any access to the underlying system. The only information available comes from querying the active threadpool state although it is not really comparable:
# Connect reader instance
mariadb -h rds_reader_endpoint -u master_user -p --skip-ssl
# To get Aurora active threadpool state, either execute this query...
SHOW STATUS WHERE variable_name LIKE 'Threads_%';
# ...or this one
SELECT COUNT(1) FROM INFORMATION_SCHEMA.PROCESSLIST;
-> A priori both results should give 1
Mass Import
From client dedicated EC2-Instance 5, connect writer instance using mariadb:
# Ensure all INSERTs are executed (Total number must be 16,000 records)
mariadb -h rds_reader_endpoint -u master_user -p --skip-ssl -e "SELECT COUNT(1) AS v_url_loc_count FROM benchdb0.v_url_loc;"
-> +-----------------+
-> | v_url_loc_count |
-> +-----------------+
-> | 16000 |
-> +-----------------+
Collect time spent for all INSERTs to be executed as well as average time per record:
# Connect server instance
mariadb -h rds_reader_endpoint -u master_user -p --skip-ssl
# Collect time spent + average time per client source
SELECT
FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1')), 3) AS total_sec_1,
FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1')) / (SELECT COUNT(1) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc1'), 3) AS avg_per_record_1,
FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2')), 3) AS total_sec_2,
FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2')) / (SELECT COUNT(1) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc2'), 3) AS avg_per_record_2,
FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3')), 3) AS total_sec_3,
FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3')) / (SELECT COUNT(1) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc3'), 3) AS avg_per_record_3,
FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4')), 3) AS total_sec_4,
FORMAT(TIMESTAMPDIFF(SECOND, (SELECT MIN(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4'), (SELECT MAX(ins_timestamp) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4')) / (SELECT COUNT(1) FROM benchdb0.v_url_loc WHERE file_source = 'url_loc4'), 3) AS avg_per_record_4;
-> +-------------+------------------+-------------+------------------+-------------+------------------+-------------+------------------+
-> | total_sec_1 | avg_per_record_1 | total_sec_2 | avg_per_record_2 | total_sec_3 | avg_per_record_3 | total_sec_4 | avg_per_record_4 |
-> +-------------+------------------+-------------+------------------+-------------+------------------+-------------+------------------+
-> | 25.000 | 0.006 | 25.000 | 0.006 | 25.000 | 0.006 | 25.000 | 0.006 |
-> +-------------+------------------+-------------+------------------+-------------+------------------+-------------+------------------+
-> 1 row in set (1.243 sec)
Lookup Query 1
# Count number of records in ipaddresses table
SELECT COUNT(1) FROM benchdb0.ipaddresses;
-> +----------+
-> | COUNT(1) |
-> +----------+
-> | 4719740 |
-> +----------+
-> 1 row in set (1.940 sec)
Lookup Query 2
# Retrieve geo-location data for a given IP
SELECT ip.network_cidr, d3.dlg_en AS region, d1.dlg_en AS country, d2.dlg_en AS city
FROM benchdb0.ipaddresses ip
INNER JOIN benchdb0.countries c ON c.iso_2dg = ip.country_iso2dg
INNER JOIN benchdb0.cities cy ON cy.uuid = ip.city_uuid
INNER JOIN benchdb0.dialogues d1 ON d1.uuid = c.dlg_uuid
INNER JOIN benchdb0.dialogues d2 ON d2.uuid = cy.dlg_uuid
INNER JOIN benchdb0.regions r ON r.region_code = c.region_code
INNER JOIN benchdb0.dialogues d3 ON d3.uuid = r.dlg_uuid
WHERE HEX(INET6_ATON('101.53.172.220')) BETWEEN ip.network_start AND ip.network_end
ORDER BY ip.network_end
LIMIT 1;
-> +-----------------+--------------+---------+--------+
-> | network_cidr | region | country | city |
-> +-----------------+--------------+---------+--------+
-> | 101.53.168.0/21 | Eastern Asia | Japan | Haneda |
-> +-----------------+--------------+---------+--------+
-> 1 row in set (1 min 14.870 sec)
Lookup Query 3
# Execute 'LIKE' query to count all URLs matching a particular pattern
SELECT COUNT(1) AS match_count FROM benchdb0.urls WHERE url_src LIKE '%ibm.com%';
-> +-------------+
-> | match_count |
-> +-------------+
-> | 91 |
-> +-------------+
-> 1 row in set (0.543 sec)
Lookup Query 4
# Execute 'LIKE' JOINed query to count all URLs matching a particular pattern and return the country
SELECT d.dlg_en AS country, COUNT(*) AS match_count
FROM benchdb0.urls u
INNER JOIN benchdb0.countries c ON c.iso_2dg = u.country_iso
INNER JOIN benchdb0.dialogues d ON d.uuid = c.dlg_uuid
WHERE url_src LIKE '%ibm.com%'
GROUP BY d.dlg_en;
-> +---------------+-------------+
-> | country | match_count |
-> +---------------+-------------+
-> | United States | 91 |
-> +---------------+-------------+
-> 1 row in set (5.769 sec)
Server ACU & CPU Utilisation Peaks
Using AWS CloudWatch, determine what are the Aurora-Instance ACU & CPU Utilisation Peaks.
According to the above graph, the instance's CPU peaks reach 100% utilisation, both in terms of ACU and raw CPU usage.
Process Threads
Connect the server dedicated EC2-Instance and measure the number of process threads:
# Get DBS Server instance number of process threads
ps -o nlwp $(pgrep -f dbsd)
-> NLWP
-> 10
Mass Import
From client dedicated EC2-Instance 5, connect server instance using dbscli:
# Connect server instance (-R to avoid timeout during import long process)
dbscli -h server_ip_address -u dbsa -p -R 1800 --reconnect
Then, import the dataset sources in the following order:
# Ensure all INSERTs are executed (Total number must be 16,000 records)
dbscli -h server_ip_address -u dbsa -p -e "SELECT COUNT(1) AS v_url_loc_count FROM bench.v_url_loc;"
-> +-----------------+
-> | v_url_loc_count |
-> +-----------------+
-> | 16000 |
-> +-----------------+
Collect time spent for all INSERTs to be executed as well as average time per record:
# Connect server instance
dbscli -h server_ip_address -u dbsa -p
# Collect time spent + average time per client source
SELECT
PRINTF('%.3f', ((JULIANDAY((SELECT MAX(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc1')) - JULIANDAY((SELECT MIN(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc1'))) * 86400)) AS total_sec_1,
PRINTF('%.3f', ((JULIANDAY((SELECT MAX(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc1')) - JULIANDAY((SELECT MIN(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc1'))) * 86400) / (SELECT COUNT(1) FROM bench.v_url_loc WHERE file_source = 'url_loc1')) AS avg_per_record_1,
PRINTF('%.3f', ((JULIANDAY((SELECT MAX(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc2')) - JULIANDAY((SELECT MIN(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc2'))) * 86400)) AS total_sec_2,
PRINTF('%.3f', ((JULIANDAY((SELECT MAX(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc2')) - JULIANDAY((SELECT MIN(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc2'))) * 86400) / (SELECT COUNT(1) FROM bench.v_url_loc WHERE file_source = 'url_loc2')) AS avg_per_record_2,
PRINTF('%.3f', ((JULIANDAY((SELECT MAX(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc3')) - JULIANDAY((SELECT MIN(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc3'))) * 86400)) AS total_sec_3,
PRINTF('%.3f', ((JULIANDAY((SELECT MAX(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc3')) - JULIANDAY((SELECT MIN(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc3'))) * 86400) / (SELECT COUNT(1) FROM bench.v_url_loc WHERE file_source = 'url_loc3')) AS avg_per_record_3,
PRINTF('%.3f', ((JULIANDAY((SELECT MAX(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc4')) - JULIANDAY((SELECT MIN(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc4'))) * 86400)) AS total_sec_4,
PRINTF('%.3f', ((JULIANDAY((SELECT MAX(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc4')) - JULIANDAY((SELECT MIN(ins_timestamp) FROM bench.v_url_loc WHERE file_source = 'url_loc4'))) * 86400) / (SELECT COUNT(1) FROM bench.v_url_loc WHERE file_source = 'url_loc4')) AS avg_per_record_4;
-> +-------------+------------------+-------------+------------------+-------------+------------------+-------------+------------------+
-> | total_sec_1 | avg_per_record_1 | total_sec_2 | avg_per_record_2 | total_sec_3 | avg_per_record_3 | total_sec_4 | avg_per_record_4 |
-> +-------------+------------------+-------------+------------------+-------------+------------------+-------------+------------------+
-> | 152.011 | 0.038 | 152.000 | 0.038 | 156.004 | 0.039 | 152.000 | 0.038 |
-> +-------------+------------------+-------------+------------------+-------------+------------------+-------------+------------------+
-> 1 row(s) affected (0.241 sec)
Lookup Query 1
# Count number of records in ipaddresses table
SELECT COUNT(1) FROM bench.ipaddresses;
-> +----------+
-> | COUNT(1) |
-> +----------+
-> | 4719740 |
-> +----------+
-> 1 row(s) affected (0.248 sec)
Lookup Query 2
# Retrieve geo-location data for a given IP
SELECT ip.network_cidr, d3.dlg_en AS region, d1.dlg_en AS country, d2.dlg_en AS city
FROM bench.ipaddresses ip
INNER JOIN bench.countries c ON c.iso_2dg = ip.country_iso2dg
INNER JOIN bench.cities cy ON cy.uuid = ip.city_uuid
INNER JOIN bench.dialogues d1 ON d1.uuid = c.dlg_uuid
INNER JOIN bench.dialogues d2 ON d2.uuid = cy.dlg_uuid
INNER JOIN bench.regions r ON r.region_code = c.region_code
INNER JOIN bench.dialogues d3 ON d3.uuid = r.dlg_uuid
WHERE INET_ATOH('101.53.172.220') BETWEEN ip.network_start AND ip.network_end
ORDER BY ip.network_end
LIMIT 1;
-> +-----------------+--------------+---------+--------+
-> | network_cidr | region | country | city |
-> +-----------------+--------------+---------+--------+
-> | 101.53.168.0/21 | Eastern Asia | Japan | Haneda |
-> +-----------------+--------------+---------+--------+
-> 1 row(s) affected (0.121 sec)
Lookup Query 3
# Execute 'LIKE' query to count all URLs matching a particular pattern
SELECT COUNT(1) AS match_count FROM bench.urls WHERE url_src LIKE '%ibm.com%';
-> +-------------+
-> | match_count |
-> +-------------+
-> | 91 |
-> +-------------+
-> 1 row(s) affected (0.279 sec)
Lookup Query 4
# Execute 'LIKE' JOINed query to count all URLs matching a particular pattern and return the country
SELECT d.dlg_en AS country, COUNT(*) AS match_count
FROM bench.urls u
INNER JOIN bench.countries c ON c.iso_2dg = u.country_iso
INNER JOIN bench.dialogues d ON d.uuid = c.dlg_uuid
WHERE url_src LIKE '%ibm.com%'
GROUP BY d.dlg_en;
-> +---------------+-------------+
-> | country | match_count |
-> +---------------+-------------+
-> | United States | 91 |
-> +---------------+-------------+
-> 1 row(s) affected (0.341 sec)
Server CPU Utilisation Peak
Using AWS CloudWatch, determine what is the server dedicated EC2-Instance CPU Utilisation Peak. As only DBS server is running on the virtual machine, the EC2.CPUUtilisation metric can be reliably collected.
According to the above graph, the CPU usage peak reaches 16.5%.
Benchmark Metrics Summary
The following table summarises the results from the benchmark test sequences across the various technologies tested:
◉ Most efficient◉ Less efficient
Test Sequence
MySQL
MariaDB
Aurora
DBS
Process Threads
38
12
-
10
Mass Import (Total in Seconds)
306.260
374.563
414.965
350.243
Concurrent Write (Average Time per Record in Seconds)
0.008
0.003
0.006
0.038
Lookup Query 1 (Seconds)
1.070
15.446
1.940
0.248
Lookup Query 2 (Seconds)
60.320
151.558
74.870
0.121
Lookup Query 3 (Seconds)
7.010
14.225
0.543
0.279
Lookup Query 4 (Seconds)
1.780
55.434
5.769
0.341
Max. CPU Peak
36.7%
20.9%
100.0%
16.5%
Conclusion
Although DBS is currently an MVP solution, as mentioned at the beginning of this post, it stands up well when compared to established and well-known RDBMS like MySQL, MariaDB, and Aurora. On the contrary, the benchmark results tell a different story: on average, DBS is much faster than the alternatives tested.
It's worth noting, however, that development has primarily focused on optimising the engine for lookup queries, which is clearly reflected in the results. That said, an area of improvement has been identified in handling concurrent write I/O operations — a limitation that will be addressed in future versions.
When considering Green IT — a crucial and increasingly important topic — DBS also shines. It is able to deliver better performance than the alternatives while consuming fewer resources (Threads & CPU peak). This lower resource consumption translates into reduced system requirements, which in turn leads to lower energy consumption and operational costs.
Looking ahead, DBS development will continue to focus on enhancing its capabilities, including optimising concurrent write I/O and scaling. Given its performance and energy efficiency, we strongly believe (hope? :-)) that DBS has the potential to reshape the relational database landscape, offering a powerful and cost-effective alternative.
We encourage the community to test DBS, share feedback, and contribute to its evolution.