on measuring autocorrelation, heavy tails and other statistics (Vol 1)

on measuring autocorrelation, heavy tails and other statistics (Vol 1)

*Be aware: math ahead. **The first part of the discussion about the distribution of financial data, pitfalls when working with them and possible solutions when evaluating the accompanying statistics.

This article is part one (of three) in a discussion about sharing and working with financial data. In this part we will discuss in detail from a mathematical point of view some pitfalls that arise when working with financial data, as well as (non-) application of classical statistical methods when working with them. In the second part of the article, we will talk about possible solutions to the difficulties described in this part. Finally, in the third part, we will present possible language implementations of the approach described in the second part Pythonand we will also talk about examples and application of the described methodology.

A brief introduction

Let’s say you’re working with financial data; most often (when talking about working with financial data) it is the profitability of a certain asset. Let’s use the classic definition of the profitability of an asset at a moment in time:

where is the price of the asset at the moment of time. Gold, oil, Bitcoin, etc. can act as an asset.

(1) Bitcoin price data (2) Profits calculated from price data (3) Distribution of returns

I wonder what properties a time series has? In the literature, empirical properties characteristic of returns on financial assets are usually called stylized facts and highlight the following key ones:

  1. [Гипотеза эффективного рынка] Absence of linear dependencies and autocorrelations:

  2. [Нелинейные зависимости] The presence of non-linear dependencies and clustering of volatility, which is usually described by a high correlation of non-linear functions:

  3. [Тяжелохвостность] Heavy tails of the distribution: where is a function that varies weakly at infinity, and is the tail index.

Task. Let’s say you get a sample of the returns of some asset over a period of time. Based on this data, you want to assess how efficient the market is at a given time interval, as well as “measure” volatility clustering.

If you are going to use the classical approach, then you will most likely want to calculate the sample correlation (for and ) and then, using the normality of the marginal distribution, construct a statistical estimate / test the hypothesis / construct a confidence interval.

However is it reliable such an approach in terms of distribution with heavy tails? In this part of the article, we will understand this in detail!

Problems of classical approaches when working with “heavy-tailed” data

In this section we will see that selective autocovariance and autocorrelation have non-standard statistical properties that make classical approaches to detecting and measuring dependencies from items 1. and 2. above unreliable and poorly applicable

The problem of moments of yield distribution

Consider the quality of 3. returns from stylized facts (heavy tail). It is convenient to assume that there is some lower limit, starting from which the static law is fulfilled, then the distribution is described by the Pareto law. Recall that Pareto distributions have the following distribution and density functions:

In this case, the moments are given by the following equalities:

From this it immediately follows that is defined only by , and is defined by . Empirical research shows that for most developed markets, while for emerging markets.

Conclusion 1: The heavy tails of the distribution of returns make classical statistics unreliable because many moments (and sometimes even the first) are not defined in this case.

The problem of convergence of sample autocorrelations

Davis and Mikosh 1998 obtained results on the convergence of sample autocovariance and autocorrelation functions for regularly varying random processes. In this section, we consider several cases of convergence of sample autocovariances and autocorrelations for a process (which, according to the third of the stylized facts, is described by the equation) depending on the tail index.

Before proceeding directly to the description of convergences, let us define the sample autocovariance and autocorrelation functions:

Definition: For a stationary process by the selective autocovariance function the function is called:


Definition: For a stationary process by the selective autocorrelation function the function is called:


Consider the convergence of these functions for different:

  1. . Then the following coincidences take place:

    and the random vector is stable.

  2. . Then the following coincidences take place:

    and the random vector is stable.

  3. . Then the following coincidences take place:

    and the random vector has a multivariate normal distribution.

It can be seen from the relations above that the marginal distribution of sample autocovariances has the form of normal only when . When the marginal distribution is stable with parameter (in the first case with . This expands the bounds of the confidence interval. It is also important to note that in cases 1 and 2 the speed of convergence is significantly slower than .

Conclusion 2: Sample autocovariances and autocorrelations do not always converge to a normal distribution, and the rate of convergence is often (depending on the tail index) slower.

In the first part of the discussion, we made sure that classical approaches estimates of income distribution statistics often unusable due to the presence of heavy tails of the distribution. This fact suggests further thinking about finding a more stable and effective replacement for the classical approach. Such an approach exists and we will talk about it in the next part of the article. In many ways, the further discussion will be based on the results obtained in the work of Ibragimov et al. 2021.

Thanks for reading!

Related posts