Orla: A data flow programming system for very large time series

Abstract

To analyze tick-by-tick financial time series, programs are needed which are able to handle several millions of data points. For this purpose we have developed a data flow programming framework called “Orla”. The basic processing unit in Orla is a “block”, and blocks are connected to form a “network”. During execution, the “data” flow through the network and are processed as they pass through each block. The main advantages of Orla are that there is no limit to the size of the data sets, and that the same program works both with historical data and in real time mode. In order to tame the diversity of financial time series, the Orla data structure is specified through a BNF description called SQDADL, and the Orla data are expressions in this language. For storage, the times series are written in a “tick warehouse” which is configured completely by the SQDADL description. Queries to the tick warehouse are SQDADL expressions and the repository returns the matching time series. In this way, we achieve a seamless integration between storage and processing, including real time mode. Currently, our tick warehouse contains 20’000 “elementary” time series. In this paper, we provide a brief overview of Orla and present a few examples of actual statistical analysis computed with Orla.

Publication
In Proceedings of the second international workshop on Distributed Statistical Computing
Adrian Trapletti
Adrian Trapletti
PhD, CEO

Quant, software engineer, and consultant mostly investment industry. Long-term contributor and package author R Project for Statistical Computing.

Related