Library

1 Tb Firebird database: preliminary report

Dmitry Kuzmenko, last update 31-03-2014

Translations of this document: Portuguese Russian Chinese

Read our article about even bigger (1.7 Terabyte) database: More Details About 1.7 Terabyte database.

Subscribe to IBSurgeon Firebird\'s news and receive practical recommendations to improve Firebird performance.

 

Why to create terabyte Firebird database?

Many companies work with big Firebird databases and rely on them to support important business operations. Some of Firebird databases are already hundreds of gigabytes in size and continue to grow (see “Who's Big?” section), and it's easy to predict the moment when they become 2, 3 or 5 times bigger. So database administrators and vendors are interested to investigate Firebird behaviour with big databases and get some recommendations how to manage them.

 

Also the important reason we had in mind while creating 1Tb Firebird database was the final elimination of prevalent perception of Firebird as database engine for “small databases”. This myth seems to be dead now, but some analysts and journalists raise it from the grave regularly, and we hope to finish with this ridiculous perception at last.

Hardware

Firebird is well-known for its incredible scalability and this investigation has confirmed it once again. The initial purpose of this experiment was just creation of Firebird database 1Tb in size, so we used usual desktop computer:

 

Table 1: Hardware

 

Component

Parameters

CPU

AMD Athlon 64 x2 5200

RAM

4GB

Motherboard

MSI K9N Platinum

HDD1 (operation system and temp)

ST3160815AS, 160GB, SATA II

HDD2 (auxilary)

HDT721064SLA360, 640GB, SATA II

HDD2 (auxilary)

HDS728080PLA380, 80GB, SATA I

HDD3 (database)

ST31500341AS, 1.5TB, SATA II (Firmware CC1H)

 


In essence, we've put the 1.5Tb HDD into the one of our office desktops, without any other modifications. This HDD was formatted with cluster size 16Kb (the same as page size of database, as you can see below).

 

Software

As it's a desktop computer, operation system is Windows XP Professional SP3, 32bit. To actually perform the test we used the loader from TPC-based toolkit (download it from http://ibdeveloper.com/tests/tpc-c/, both binaries and sources are available).

We'd like to emphasize that loader inserts data as it would be inserted in real life scenario: records are being inserted and situated inside database (and at physical disk areas) into several master-detail-subdetail tables, not table by table.

 

Table 2: Software

 

Software

Version

Operation system

Windows XP Professional SP3, 32bit

Firebird

2.1.3 SuperServer (snapshot)

Loader

Custom loader from tpc-based test

Who's big?

There are several companies who handle significantly big Firebird databases. Here we’ve listed three of them as examples of real-world big Firebird databases in 3 key industries: retail, finance and medicine.

Bas-X

Bas-X (http://www.basx.com.au/, Australia) is a leading provider of enterprise information technologies to independent retailers, particularly multi-site operators and management groups. Bas-X is a true leader in Firebird-based development: two of their customers have Firebird databases with size more than 450Gb, and several others have databases more than 200Gb.

Interesting thing about Bas-X is that they offer Software-as-a-Service solutions and using Firebird as database management system for thousands of clients. This is definitely one of the brightest examples of real-life cloud computing offerings, and it uses Firebird, which is absolutely good enough for this hard job.

Watermark Technologies

Watermark Technologies (http://www.watermarktech.co.uk/, UK) is the great example of utilizing Firebird technology to serve enterprises in Finance and Government sectors.

Watermark Technologies produces software which uses Firebird for document management which includes indexed OCR for full text search. It is used by financial advisers, insurance companies and so on. There are several 300+Gb Firebird databases deployed at present.

Free Firebird licensing is one of the things which enables Watermark Technologies offer flexible subscription model for end customers, so they can avoid upfront payments and pay as they go.

Profitmed

Profitmed (http://www.profitmed.net/, Russia) joint stock company is one the largest Russian pharmaceutical distributors. They have relatively small database (only ~40Gb), but we decided to mention them as they have extremely high load in terms of simultaneous active connections, serving hundreds of small resellers and drug store shops across Russia. Though this database looks smaller than others, it contains very dense data inside: SKUs of medicines, warehouse movements are represented as numbers, and, thanks to Firebird data compression mechanism, these data consume very modest amount of disk storage.

 

Plan

We had very straightforward plan for this experiment:

 

  1. Create database and load it with 1Tb data, without indices
  2. Create primary keys and appropriate indices (so actually database size is more than 1Tb)
  3. Gather database statistics
  4. Run several SQL queries and estimate database performance

Database and Firebird server configuration

Database has page size 16384 bytes, the same as HDD cluster, to maximize disk throughout performance (to read/write 1 page at one I/O cycle).

In the Firebird configuration we have configured additional directory for temp space and point it to the disk with 640Gb (where ~300Gb was free).

Loading step

Data were loaded into this database by several steps. Computer was used during loading operations as usual desktop (we have MS Office, Firefox, IBAnalyst, etc – about 8-12 programs ran at the same time). If we would dedicate the hardware for this task only, probably it would be faster, so please consider these values only as a low-end example; they are definitely not the top results.

 

 

Table 3: Loading operations

 

Description

Value

Time to load

~70 hours

Total records inserted

6.2 billions

Average insertion speed

24500 records/second

Average record size

146 bytes (min 13 bytes, max – 600 bytes)

Transactions

646489

 

We spent ~4 days at loading, and after that we had Firebird database with exactly 1Tb size (i.e. 1 099 900 125 184 bytes).

Below you can see database growth and transactions dynamics in FBDataGuard Viewer:

 

 

Indices

We created indices one by one and counted their creations' time and appropriate size of temp file used for sorting.

The biggest index was created for table ORDER_LINE. Its primary key contains four fields (Smallint, Smallint, Integer and Smallint). Temp file for this sorting index was 182Gb, and final index size in the database is 29.3Gb.

It's interesting to see that even index for table with 3.8 billion records has depth = 3, because page size was 16384 bytes, so there is no overhead while searching data using primary key for this table.

Statistics

After that we have gathered database statistics. It took 7 hours 32 mins 45 secs.

We've put key statistics information into the one table and included some queries and time measurements:

 

Table 4: Consolidated statistics for 1Tb database

Table name

Record counts

Size, gb

Execution time of select count(*)

Index creation time

Tmp file size, Gb

Index size, Gb

WAREHOUSE

1240

0.002

0s

0

0

0.0

ITEM

100000

0.012

0.7s

-

-

0.0

DISTRICT

124000

0.017

0.7s

6

-

0.0

NEW_ORDER

111600000

32

20m 00s

23m 00s

4.56

0.8

CUSTOMER

372000000

224

-

41m 00s

-

2.6

customer_last

 

 

 

1h 52m 32s

12.4

2.3

fk_cust_ware

 

 

 

2h 10m 51s

-

2.3

HISTORY

372000000

32

-

-

-

-

ORDERS

372000000

25

32m 00s

45m 41s

15.2

2.5

STOCK

1240000000

404

-

3h 34m 44s

41.5

9.2

ORDER_LINE

3720051796

359

-

12h 6m 18s

182.0

29.3

 

Database statistics can be downloaded from here.

 

You can use free FBDataGuard Community Edition Viewer to interpret text data and see not only database performance metrics, but also CPU and memory consumption.

Queries

First of all, we've run select count(*) queries on several tables (see 4-th column in Table 4 above). As you know, due to the multi-version nature of Firebird select count(*) for whole table is an expensive operation for server because it requires visiting of every page, and experienced Firebird developers does not use select count(*), but we used it to demonstrate the overall performance ratio of database and hardware.

After select count queries we have run queries from real-life scenario and, to be honest, we were amazed with so good results. See for yourself:

 

Query

Statistics

Description

select w_id, w_name, c_id, c_last

from WAREHOUSE, customer

where c_w_id = w_id

PLAN JOIN (WAREHOUSE NATURAL, CUSTOMER INDEX (FK_CUST_WARE))

 

------ Performance info ------

Prepare time = 15ms

Execute time = 79ms

Avg fetch time = 6.08 ms

Current memory = 272 264 476

Max memory = 272 514 048

Memory buffers = 16 384

Reads from disk to cache = 82

Writes from cache to disk = 0

Fetches from cache = 3 648

Simple join of tables with 12400 and 372000000 records, no WHERE conditions. “Avg fetch time = 6.08 ms” is for fetching the first row.

select w_id, w_name, c_id, c_last

from WAREHOUSE, customer

where c_w_id = w_id and c_w_id = 10000

PLAN JOIN (WAREHOUSE INDEX (WAREHOUSE_PK), CUSTOMER INDEX (FK_CUST_WARE))

 

------ Performance info ------

Prepare time = 16ms

Execute time = 78ms

Avg fetch time = 6.00 ms

Current memory = 272 266 148

Max memory = 272 514 048

Memory buffers = 16 384

Reads from disk to cache = 88

Writes from cache to disk = 0

Fetches from cache = 3 656

Join of the same tables with condition which forces selection of recent records. “Avg fetch time = 6.00 ms” is for fetching the first row.

select count(*)

from WAREHOUSE, customer

where c_w_id = w_id and c_w_id = 10000

 

Result = 30000

PLAN JOIN (WAREHOUSE INDEX (WAREHOUSE_PK), CUSTOMER INDEX (FK_CUST_WARE))

 

------ Performance info ------

Prepare time = 0ms

Execute time = 453ms

Avg fetch time = 453.00 ms

Current memory = 272 263 844

Max memory = 272 514 048

Memory buffers = 16 384

Reads from disk to cache = 1 048

Writes from cache to disk = 0

Fetches from cache = 60 024

Count records for previous query

 

 

 

SELECT * FROM ORDER_LINE

WHERE OL_W_ID = 500

 

 

Plan

PLAN (ORDER_LINE INDEX (ORDER_LINE_PK))

 

------ Performance info ------

Prepare time = 0ms

Execute time = 94ms

Avg fetch time = 7.23 ms

Current memory = 136 445 536

Max memory = 136 592 176

Memory buffers = 8 192

Reads from disk to cache = 150

Writes from cache to disk = 0

Fetches from cache = 2 402

 

Query to the biggest table (3.8B records). “Avg fetch time = 7.23 ms” is for fetching the first row.

 

 

 

 

Plan
PLAN (ORDER_LINE INDEX (ORDER_LINE_PK))

 
------ Performance info ------
Prepare time = 0ms
Execute time = 3s 438ms
Avg fetch time = 0.01 ms
Current memory = 136 445 496
Max memory = 136 592 176
Memory buffers = 8 192
Reads from disk to cache = 1 840
Writes from cache to disk = 0
Fetches from cache = 598 636

 

 

SELECT * FROM ORDER_LINE

WHERE OL_W_ID = 500

 

 

The same query to the biggest table (3.8B records), but at this time we have fetched all records (299245 records fetched).

 

 

select w_id, w_name, c_id, c_last

from WAREHOUSE, customer

where c_w_id = w_id and (c_w_id > 8000) and (c_w_id < 10000)

Plan

PLAN JOIN (WAREHOUSE INDEX (WAREHOUSE_PK), CUSTOMER INDEX (FK_CUST_WARE))

 

------ Performance info ------

Prepare time = 0ms

Execute time = 125ms

Avg fetch time = 9.62 ms

Current memory = 272 270 824

Max memory = 272 514 048

Memory buffers = 16 384

Reads from disk to cache = 91

Writes from cache to disk = 0

Fetches from cache = 3 659

Join tables with 1240 records and 372M records.

select count(*)

from WAREHOUSE, customer

where c_w_id = w_id and (c_w_id > 8000) and (c_w_id < 10000)

 

Result = 59 970 000

Plan

PLAN JOIN (WAREHOUSE INDEX (WAREHOUSE_PK), CUSTOMER INDEX (FK_CUST_WARE))

 

------ Performance info ------

Prepare time = 0ms

Execute time = 13m 4s 718ms

Avg fetch time = 784 718.00 ms

Current memory = 272 268 532

Max memory = 272 514 048

Memory buffers = 16 384

Reads from disk to cache = 2 332 583

Writes from cache to disk = 0

Fetches from cache = 119 977 902

Count records for previous query

 

Summary

In this experiment Firebird shows the following results

 

  1. Undoubted ability to handle big databases. We are pretty sure that it's possible to create and use 32 Tb database on appropriate hardware, and Firebird will show the same high performance as it shows for smaller databases (i.e., 1Tb and below).

 

  1. Good scalability and amazingly small footprint. 1Tb database was created at usual desktop computer and, more important, it can be used to perform general queries: if you don't fetch millions of records, query’s speed is the same as it is for moderate size databases (10-15Gb).

 

This is not the end of this experiment: we intend to run some queries, gather additional statistics and publish more detailed report shortly. Please stay tuned.

 

Contacts

Send all your questions and enquiries to terabyte@ib-aid.com