Efficient Estimation of Volatility Using High Frequency Data


The limitations of volatilities computed with daily data as well as simple statistical considerations strongly suggest to use intraday data in order to obtain accurate volatility estimates. Under a continuous time arbitrage-free setup, the quadratic variations of the prices would allow us, in principle, to construct an approximately error free estimate of volatility by using data at the highest frequency available. Yet, empirical data at very short time scales differ in many ways from the arbitrage-free continuous time price processes. For foreign exchange rates, the main difference originates in the incoherent structure of the price formation process. This market micro-structure effect introduces a noisy component in the price process leading to a strong overestimation of volatility when using naive estimators. Therefore, to be able to fully exploit the information contained in high frequency data, this incoherent effect needs to be discounted. In this contribution, we investigate several unbiased estimators that take into account the incoherent noise. One approach is to use a filter for pre-whitening the prices, and then using volatility estimators based on the filtered series. Another solution is to directly define a volatility estimator using tick-by-tick price differences, and including a correction term for the price formation effect. The properties of these estimators are investigated by Monte Carlo simulations. A number of important real-world effects are included in the simulated processes: realistic volatility and price dynamic, the incoherent effect, seasonalities, and random arrival time of ticks. Moreover, we investigate the robustness of the estimators with respect to data frequency changes and gaps. Finally, we illustrate the behavior of the best estimators on empirical data.

Adrian Trapletti
Adrian Trapletti

Quant, software engineer, and consultant mostly investment industry. Long-term contributor and package author R Project for Statistical Computing.