a tool for financial ML
Hello, Habre!
Today we will consider such a wonderful library as mlfinlab.
If you’ve tried to apply machine learning techniques to financial data, you’ve probably run into a lot of pitfalls, from noisy data to autocorrelation issues. mlfinlab – This is a library that implements advanced techniques from the book “Advances in Financial Machine Learning” by Marcos López de Prado. It allows you not to reinvent the wheel, but to use time-tested methods to solve complex problems of financial ML.
Let’s start with the installation. Nothing complicated:
pip install mlfinlab
Now we import the necessary modules:
import pandas as pd
import numpy as np
import mlfinlab
Contents
We get data
As an example, we will use historical data on shares of Apple (ticker: AAPL). Use the library yfinance
to download data.
import yfinance as yf
ticker="AAPL"
data = yf.download(ticker, start="2020-01-01", end='2021-01-01', interval="1d")
prices = data['Close']
Marking of bars
Conventional time bars can be misleading due to uneven market activity. mlfinlab offers an alternative – creating bars based on volume, dollars or number of ticks.
Let’s create dollar bars with a threshold of 1 million dollars.
from mlfinlab.data_structures import StandardBars
db = StandardBars(bar_type="dollar", threshold=1e6)
dollar_bars = db.batch_run(data)
The data is now grouped by actual market activity.
Event marking
Determining significant price changes is an important point in financial ML. We use CUSUM filter to identify trend change points.
from mlfinlab.filters.filters import cusum_filter
threshold = 0.02 # 2% изменение цены
events = cusum_filter(prices, threshold=threshold)
We got a list of dates when the price changed by more than 2%. These are potential entry or exit points.
Triple barrier to help
Now we need to label our events so that the model can learn. We use the method Triple Barrier Method.
from mlfinlab.labeling.labeling import get_events, get_bins
# Устанавливаем вертикальные барьеры через 5 дней
vertical_barriers = mlfinlab.labeling.add_vertical_barrier(t_events=events, close=prices, num_days=5)
# Получаем события с учетом тройного барьера
events = get_events(close=prices,
t_events=events,
pt_sl=[1, 1], # Устанавливаем пороги прибыли и убытка
target=None,
min_ret=0.01,
vertical_barrier_times=vertical_barriers)
# Получаем метки
labels = get_bins(events=events, close=prices)
Now there are labels that take into account not only the price change, but also the time factor.
We apply meta-labeling
Financial data is often unbalanced: the number of successful deals may be significantly less than unsuccessful. Meta labeling helps to solve this problem.
from mlfinlab.meta_labeling.meta_labeling import MetaLabeling
meta = MetaLabeling()
meta_labels = meta.get_meta_labels(events, prices)
Meta-labeling improves the model’s accuracy by teaching it to recognize the conditions under which initial predictions should be trusted.
Accounting for autocorrelation
Autocorrelation can lead to overtraining of the model. We use the method Computation of the Effective Sample Size (ESS).
from mlfinlab.sample_weights import get_weights_by_time_decay
# Получаем веса с учетом времени
weights = get_weights_by_time_decay(meta_labels['t1'], decay=0.5)
This allows the sample weights to be adjusted, reducing the effect of autocorrelation on the model.
Now you can teach the model
It’s time to train our model. We use XGBoostbecause why not?
import xgboost as xgb
from sklearn.model_selection import train_test_split
# Подготавливаем данные
X = prices.loc[labels.index].to_frame()
y = labels['bin']
# Разбиваем на обучающую и тестовую выборки
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
# Создаем DMatrix для XGBoost
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)
# Задаем параметры модели
params = {
'objective': 'binary:logistic',
'eval_metric': 'auc',
}
# Обучаем модель
bst = xgb.train(params, dtrain, num_boost_round=100)
Let’s look at the quality of our model.
from sklearn.metrics import classification_report
# Предсказываем
y_pred = bst.predict(dtest)
y_pred_binary = [1 if y > 0.5 else 0 for y in y_pred]
# Выводим отчет
print(classification_report(y_test, y_pred_binary))
To make sure that our model has not overlearned, we use the method Walk-forward validation.
from mlfinlab.cross_validation.cross_validation import ml_cross_val_score
scores = ml_cross_val_score(bst, X, y, cv=5, sample_weight=weights)
print('Средний AUC: ', np.mean(scores))
Conclusion
Walked through the main functions mlfinlab and saw how the library makes life easier.
Where to go next?
-
to study Fractionally Differentiated Features to create stationary time series.
-
Try Bet Sizing for capital management.
-
To investigate Clustering Algorithms detection of hidden patterns.
You can read more about the library here.
On October 28, an open class will be held on the topic “Building a sales agent based on reinforcement learning algorithms.” Participants will learn how to build a financial market model, create and train a trading agent using a specialized framework. If you are interested, sign up for a lesson on the “ML for Financial Analysis” course page.