Example 10. Calculate pairwise correlation with tick data



The programming language in DolphinDB is very expressive. In this example, we start with tick-by-tick stock trade data in a day. We would like to calculate pairwise correlation of the 500 most liquid stocks based on minute level stock returns in the following 4 steps:


1. Select the stock tickers (Symbol) of the 500 most liquid stocks.



num=500

syms = (exec sum(Trade_Volume) as totalVolume from trade where time.second() between 09:30:00 : 15:59:59 group by Symbol order by totalVolume desc).Symbol[0:num]



2. Construct a matrix of the minute level volume-weighted stock price. Each column represents a stock and each row is a minute. Please note the use of the keyword "exec" to create a matrix instead of a table and "pivot by" to rearrange the matrix.



priceMatrix = exec wavg(Trade_Price, Trade_Volume) as wavp from trade where Symbol in syms, Time.second() between 09:30:00 : 15:59:59 pivot by Time.minute() as minute, Symbol



3. Use the template function each to apply the function ratios on each column of the stock price matrix. The result is a stock return matrix with the same dimensions as the stock price matrix.



retMatrix = each(def(x):ratios(x)-1, priceMatrix)



4. Use the template function cross on the stock return matrix to calculate a 500*500 stock return correlation matrix.



corrMatrix = cross(corr, retMatrix, retMatrix)



Now we have generated the pairwise correlation matrix of the 500 most liquid stocks. To get the 10 most correlated stocks for each stock:



mostCorrelated = select * from table(corrMatrix).rename!(`sym`corrSym`corr) context by sym having rank(corr,false) between 1:10



"context by" specifies the groups within which certain functions will be applied. The use of "context by" and "having" together selects only the rows that satisfy the specified conditions after "having". The selected rows can be entire groups if "having" is followed by aggregate functions, or certain members in groups if "having" is followed by a non-aggregate function.


To get the 10 most correlated stocks for XOM:



select * from mostCorrelated where sym="XOM" order by corr desc


Download source code here.