Data Type
The following table summarizes the atomic data types that DolphinDB supports:
Data Type Name |
Data Type ID |
Examples |
Data Type Symbol |
Size |
Category |
Range |
---|---|---|---|---|---|---|
VOID |
0 |
NULL |
1 |
Void |
||
BOOL |
1 |
1b, 0b, true, false |
b |
1 |
Logical |
0~1 |
CHAR |
2 |
‘a’, 97c |
c |
1 |
Integral |
-2^7+1~2^7-1 |
SHORT |
3 |
122h |
h |
2 |
Integral |
-2^15+1~2^15-1 |
INT |
4 |
21 |
i |
4 |
Integral |
-2^31+1~2^31-1 |
LONG |
5 |
22l |
l |
8 |
Integral |
-2^63+1~2^63-1 |
DATE |
6 |
2013.06.13 |
d |
4 |
Temporal |
|
MONTH |
7 |
2012.06M |
M |
4 |
Temporal |
|
TIME |
8 |
13:30:10.008 |
t |
4 |
Temporal |
|
MINUTE |
9 |
13:30m |
m |
4 |
Temporal |
|
SECOND |
10 |
13:30:10 |
s |
4 |
Temporal |
|
DATETIME |
11 |
2012.06.13 13:30:10 or 2012.06.13T13:30:10 |
D |
4 |
Temporal |
[1901.12.13T20:45:53, 2038.01.19T03:14:07] |
TIMESTAMP |
12 |
2012.06.13 13:30:10.008 or 2012.06.13T13:30:10.008 |
T |
8 |
Temporal |
|
NANOTIME |
13 |
13:30:10.008007006 |
n |
8 |
Temporal |
|
NANOTIMESTAMP |
14 |
2012.06.13 13:30:10.008007006 or 2012.06.13T13:30:10.008007006 |
N |
8 |
Temporal |
|
FLOAT |
15 |
2.1f |
f |
4 |
Floating |
Sig. Fig. 06-09 |
DOUBLE |
16 |
2.1 |
F |
8 |
Floating |
Sig. Fig. 15-17 |
SYMBOL |
17 |
S |
4 |
Literal |
||
STRING |
18 |
“Hello” or ‘Hello’ or `Hello |
W |
Literal |
||
UUID |
19 |
5d212a78-cc48-e3b1-4235-b4d91473ee87 |
16 |
Literal |
||
FUNCTIONDEF |
20 |
def f1(a,b) {return a+b;} |
System |
|||
HANDLE |
21 |
file handle, socket handle, and db handle |
System |
|||
CODE |
22 |
<1+2> |
System |
|||
DATASOURCE |
23 |
System |
||||
RESOURCE |
24 |
System |
||||
ANY |
25 |
(1,2,3) |
Mixed |
|||
COMPRESS |
26 |
1 |
Integral |
-2^7+1~2^7-1 |
||
ANY DICTIONARY |
27 |
{a:1,b:2} |
Mixed |
|||
DATEHOUR |
28 |
2012.06.13T13 |
4 |
Temporal |
||
IPADDR |
30 |
192.168.1.13 |
16 |
Literal |
||
INT128 |
31 |
e1671797c52e15f763380b45e841ec32 |
16 |
Integral |
-2^127+1~2^127-1 |
|
BLOB |
32 |
Literal |
||||
COMPLEX |
34 |
16 |
||||
POINT |
35 |
16 |
||||
DURATION |
36 |
1s, 3M, 5y, 200ms |
8 |
System |
Note:
1. SYMBOL is a special STRING type.
2. ANY DICTIONARY is the data type in DolphinDB for JSON.
3. DATEHOUR can only be generated with function datehour .
4. The DURATION type can be generated with function
duration
or by combining an integer with a unit of time (case sensitive): y, M, w, d, B, H, m, s, ms, us, ns. The range of a DURATION value is -2^31+1~2^31-1. If a data type overflow occurs, the data is treated as NULL value. DURATION type indicates a time interval and can be used in the following functions: bar, wj(pwj), interval, temporalAdd, and dailyAlignedBar.5. DolphinDB uses IEEE 754 standard for the data types DOUBLE and FLOAT. If a data type overflow occurs, the data is treated as NULL value.
Type check
Use functions typestr and type to check data types. The function typestr
returns a string; the function type
returns an integer.
$ typestr 3l;
LONG
$ type 3l;
5
$ x=3;
$ if(type(x) == INT){y=10};
$ y;
10
Data range
The range for integral data types are listed in the table above. For each of them, the mininum allowed value minus 1 represents the corresponding NULL value. For example, -128c is a NULL character. For NULL values please see Null Value Manipulation.
$ x=-128c;
$ x;
00c
$ typestr x;
CHAR
Data type symbols
A data type symbol is used for declaring a data type of a constant. In the example below, without specifying a data type symbol, number 3 is stored in memory by default as an integer. If you would like to save it as a floating number, it should be declared as 3f(float) or 3F(double).
$ typestr 3;
INT
$ typestr 3f;
FLOAT
$ typestr 3F;
DOUBLE
$ typestr 3l;
LONG
$ typestr 3h;
SHORT
$ typestr 3c;
CHAR
$ typestr 3b;
BOOL
Symbol and String
In some circumstances it might be optimal to save strings as SYMBOL types in DolphinDB. SYMBOL types are stored as integers in DolphinDB to allow more efficient sorting and comparison. Therefore, SYMBOL types could potentially improve operating performance and save storage space. On the other hand, mapping strings to integers (hashing) takes time and the hash table consumes memory.
The following rules could help you decide whether to use SYMBOL types or not:
Avoid using SYMBOL types if the data will not be sorted, searched or compared.
Avoid using SYMBOL types if there are few duplicate values.
Two specific cases:
Stock tickers in a trades or quotes table should use SYMBOL types because a stock usually has a large amount of rows in these tables, and because stocks tickers are frequently searched and compared.
Descriptive fields should not use SYMBOL types because description seldom repeats and is rarely searched, sorted or compared.
Example 1: Sorting a symbol vector with 3 million records is 40 times faster than that of the same sized string vector.
$ n=3000000
$ strs=array(string,0,n)
$ strs.append!(rand(`IBM`C`MS`GOOG, n))
$ timer sort strs;
Time elapsed: 482.027 ms
$ n=3000000
$ syms=array(symbol,0,n)
$ syms.append!(rand(`IBM`C`MS`GOOG, n))
$ timer sort syms;
Time elapsed: 12.001 ms
Example 2: Comparing a symbol vector with 3 million records is almost 15 times as fast as comparing the same sized string vector.
$ timer(100){strs>`C};
Time elapsed: 4661.26 ms
$ timer(100){syms>`C};
Time elapsed: 322.655 ms
Symbol vector creation
(1) With function array
$ syms=array(symbol, 0, 100);
// create an empty symbol array;
$ typestr syms;
FAST SYMBOL VECTOR
$ syms.append!(`IBM`C`MS);
$ syms;
["IBM","C","MS"]
(2) With type conversion
$ syms=`IBM`C`MS;
$ typestr syms;
STRING VECTOR
//converting to a symbol vector;
$ sym=syms$SYMBOL;
// symbol conversion can only be applied to a string vector
$ typestr sym;
FAST SYMBOL VECTOR
$ typestr syms;
STRING VECTOR
(3) With function rand
$ syms=`IBM`C`MS;
$ symRand=rand(syms, 10);
// generate a random SYMBOL vector
$ symRand;
["IBM","IBM","IBM","MS","C","C","MS","IBM","C","MS"]
$ typestr symRand;
FAST SYMBOL VECTOR
Note that the rand
function takes a string vector and generates a symbol vector. The rand
function doesn’t change any other input data types. We intentionally make this exception as when users generate a random vector based on a string vector, in most cases they would like to get a symbol vector.