Example 2. Save data in partitioned database




Partitioning makes large tables more manageable by enabling users to manage and access subsets of data quickly and efficiently, while maintaining the integrity of a data collection. Maintenance operations on subsets of data are also more efficient as these operations target only the required data instead of the entire table.



Example 2a: save a sequentially partitioned database table


If we need to load a table larger than the system memory, we can use a partitioned table. The following script splits the input text file USstocks.csv into 8 partitions based on the order of rows, and then loads the partitions into the database in parallel. The result is a distributed table USstocks saved under the directory corresponding to the database db. As a guideline for choosing the number of partitions, the size of each partition should not exceed 1/4 of the system memory.



db = database("c:/DolphinDB/Data/seqdb", SEQ, 8)

USstocks = loadTextEx(db, "USstocks",, "C:/DolphinDB/Data/USstocks.csv");




Example 2b: save a range partitioned database table


We can use ranges on a column of a table to partition the table. The following script saves the text file USstocks.csv as a partitioned table in DolphinDB. The partitioning is based on 4 ranges on the column of TICKER.



db=database("C:/DolphinDB/Data/rangedb1", RANGE,  `A`F`N`T`ZZZZ)

USstocks = loadTextEx(db, "USstocks",`TICKER, "C:/DolphinDB/Data/USstocks.csv");



The following example saves table t as a partitioned table pt in the distributed database db with a range partition on the column ID.



n=1000000

ID=rand(10, n)

x=rand(1.0, n)

t=table(ID, x);


db=database("C:/DolphinDB/Data/rangedb2", RANGE,  0 5 10)

pt = db.createPartitionedTable(t, `pt, `ID)

pt.append!(t);


ID x

-- -


pt=loadTable(db,`pt)

select count(*) from pt;


count

-------

1000000



In the example above, the database db has 2 partitions: [0,5) and [5,10). The saved table pt also has 2 partitions.


Example 2c: save a value partitioned database table


We can make a partition of a table for each element of a column. The following example saves table t as a partitioned table pt in the distributed database db with a value partition.


n=1000000

month=take(2000.01M..2016.12M, n)

x=rand(1.0, n)

t=table(month, x);


db=database("C:/DolphinDB/Data/valuedb", VALUE, 2000.01M..2016.12M)

pt = db.createPartitionedTable(t, `pt, `month)

pt.append!(t);


month x

----- -


pt=loadTable(db,`pt)

select count(x) from pt;


count_x

-------

1000000



The example above defines a database db with 204 partitions. Each partition is a month between January 2000 and December 2016.



Example 2d: save a hierarchically partitioned database table


A database can have 2 or 3 levels of partitions.


n=1000000

ID=rand(100, n)

dates=2017.08.07..2017.08.11

date=rand(dates, n)

x=rand(10.0, n)

t=table(ID, date, x);


dbDate = database(, VALUE, 2017.08.07..2017.08.11)

dbID=database(, RANGE, 0 50 100)

db = database("C:/DolphinDB/Data/hierDB", HIER, [dbDate, dbID])

pt = db.createPartitionedTable(t, `pt, `date`ID)

pt.append!(t);


ID date x

-- ---- -


pt=loadTable(db,`pt)

select count(x) from pt;


count_x

-------

1000000



The example above defines a database db with 2 levels of partitions. The first level of partition is of the value domain and has 5 partitions for 5 days. The second level of partition is of the range domain. Each partition in the first level is further partitioned into 2 partitions based on ID.


Download source code here.