Inovance – bollinger band feature analaysis using a random forest algorithm exchange rate usd to yen


Bollinger Bands are one of the more popular technical indicators with many traders using them to both trade the range as well as look for breakouts europe futures trading hours. However, what features and values should you really be looking at when you are using Bollinger Bands to trade?

In this article we will use a random forests, a powerful machine learning approach, to find what aspects of the Bollinger Bands are most important to a GBP/USD strategy on 4-hour charts.

Bollinger Bands, a technical trading tool developed by John Bollinger in the early 1980s, provide a relative measure of the range of the market decode binary. During times of high volatility the trading range will naturally be larger, while in times of low volatility the range will be smaller.

Bollinger Bands are a combination of three lines hex to binary file converter. The middle line is a simple moving average (usually 20 periods) with the upper line being 2 standard deviations of the close above the middle line and and the lower line being 2 standard deviations below the middle line.

When using the Bollinger Bands in any sort of systematic strategy, the three different lines present some questions and issues; namely, what single factor should you include? Do you want to look at where the current price is relative to the range? Maybe you are only concerned with the upper band for short trades and the lower band for long trades, or do you want to look at the total height of the range?

There are huge number of different values you could look at and this process, known as “feature” selection in the machine learning world, is incredibly important.

Choosing the correct feature that contains the most information relevant to your strategy can have a huge impact on the performance of your strategy.

Instead of needing to evaluate these features yourself, we can use a random forest, a powerful machine learning technique, to objectively evaluate these features for us.

Random forests are an ensemble approach that are based off the principle that a group of “weak learners” can be combined to form a “strong learner” funny quotes about life and love. Random forests start with building a large number of individual decision trees.

Decision trees enter an input at the top of the “tree” and send it down its branches, where each split represents varying levels of indicator values. (To learn more about decision trees, take a look at our previous post where we used a decision tree to trade Bank of America stock.)

Individual decision trees are seen as fairly weak learners, meaning they can very easily overfit the data and have trouble generalizing well to new data binary operation. Combined into “forests” with thousands of “trees”, these simplistic algorithms can form a very powerful modelling approach.

The success on a random forest is derived largely from the ability to create large amount of variability among the individual trees 1111 number meaning. This is accomplished by introducing a degree of randomness in two different ways:

Bagging, short for “bootstrap aggregating”, simply means to sample with replacement exchange rate pound us dollar. Instead of using the entire available dataset to build each tree, the data set is randomly sampled with each data point being available to be selected again butterfly quotes about life. For example, if we were to perform bagging on a training set consisting of the numbers 1 through 10, we could end up the following data set: 1 3 2 1 5 7 5 5 7 10

Instead of using every available indicator to build each tree, only a randomly chosen subset of indicators is used funny quotes about marriage. For instance, if we had 5 different features we were testing, only 3 would be used in each tree.

From these two sources of randomness we have created an entire forest of diverse trees binary to hexadecimal chart. Now for each data point, each tree is called to make a classification and a majority vote decides the final decision.

One major advantage of random forests is they are able to give a very robust measure of the performance of each indicator. With thousands of trees each built with a different bagged training set and subset of indicators, they tell which indicators were most important in deciding the class of the variable we are trying to predict, in this case the direction of the market.

We take advantage of this valuable property of random forests to figure out which features, derived from the Bollinger Bands, we should be using in our strategy.

First we have to decide what features to derive from the Bollinger Bands. This an area where you can get creative in coming up with fancy calculations and formulas but for now we’ll stick to 8 basic features:

Returns<-Delt(HLCxts$Close,k=1); Class<-ifelse(Returns>0,"Up","Down") #Calculate the percent change and the resultant class we are looking to predict, either an upward or downward move in the market

Features<-data.frame(Upper, Lower, Middle, Bollinger$pctB, PChangepctB, PChangeUpper, PChangeLower, PChangeMiddle) #Combine all of our features

colnames(FinalModelData)<-c("pctB","LowerDiff","UpperDiff","MiddleDiff","PChangepctB","PChangeUpper","PChangeLower","PChangeMiddle","Class") #Name the columns

Before we can build our random forest, we need to find the optimal number of indicators to use for each individual tree. Luckily, the random forest package we are using can help us out:

FeatureNumber<-tuneRF(FinalModelData[,-9],FinalModelData[,9],ntreeTry=100, stepFactor=1.5,improve=0.01, trace=TRUE, plot=TRUE, dobest=FALSE) #We are evaluating the features (columns 1 through 9) using the class (column 9) to find the optimal number of features per tree

RandomForest<-randomForest(Class~.,data=FinalModelData,mtry=2,ntree=2000,keep.forest=TRUE,importance=TRUE) #We are using all of the features to predict the class, with 2 features per tree, a forest of 2,000 trees, keeping the final forest and we want to measure the importance of each feature. Note: this may take a couple minutes to run.

The “mean decrease accuracy” measures how worse each model performs without each feature and the “mean decrease Gini” is a more complex mathematical function that is a measure of how pure the end of each tree branch is for each feature.

We can immediately see that the percent change in the %B value was the most important factor and, in general, looking at the percent change in the values were better than looking at only the distance between the price and the upper, lower, and middle lines.

Random forests are a very powerful approach that commonly outperform more sophisticated algorithms. They can be used for classification (predicting a category), regression (predicting a number), or with feature selection (like we saw here).

Feature selection, or deciding what factors to include in your strategy, is an incredibly important part of building any strategy and there are many techniques in machine learning focused on solving this problem.

With TRAIDE, we take care of the second step: once you have selected the features for your strategy, we use machine-learning algorithms to find the patterns for you.