• Editorial Board +
• For Contributors +
• Journal Search +
Journal Search Engine
ISSN : 1598-7248 (Print)
ISSN : 2234-6473 (Online)
Industrial Engineering & Management Systems Vol.17 No.3 pp.479-496
DOI : https://doi.org/10.7232/iems.2018.17.3.479

Prediction of Stock Market Using an Ensemble Learning-based Intelligent Model

Ph.D. Student, University of Qom, Faculty of Engineering, Department of Information Technology, Qom, Iran
Iran University of Science and Technology, School of Computer Engineering, Tehran, Iran
Corresponding Author, E-mail: b_minaei@iust.ac.ir
January 28, 2018 May 26, 2018 July 27, 2018

ABSTRACT

AI-based models have shown that stock market is predictable despite its uncertainty and fluctuating nature. Research in this field has further dealt with predicting the next step price amount and less attention has been paid to the prediction of the next movement of price. However, in practice, the necessary requisite for decision-making and use of the results of prediction lies in considering the predictable trend of stock movement along with predicting stock price. Considering the widespread search in the literature on the matter, this paper takes into account, for the first time, two criteria of direction and price simultaneously for the prediction of the stock price. The proposed model has two stages and is developed based on ensemble learning and meta-heuristic optimization algorithms. The first stage predicts the direction of the next price movement. At the second stage, such prediction and other input variables create a new training dataset and the stock price is predicted. At each stage, in order to optimize the results, genetic algorithm (GA) optimization and particle swarm optimization (PSO) are applied. Evaluation of the results, on the real data of stock price, indicates that the proposed model has higher accuracy than other models used in the literature.

1. INTRODUCTION AND LITERATURE REVIEW

Prediction is a process based on historical data and its accurate results will make better policy-making for the future (Campbell and Thompson, 2007). In financial actions, there are numerous cases that need careful prediction (Tkáč and Verner, 2016). Prediction of financial time series is an important challenge in prediction, in which researchers try to extract patterns of data to predict the next event (Schwert, 1989). There are several hypotheses about the prediction of the stock market; the efficient market hypothesis is a theory that states market prices fully reflect all the available information and volatilities on prices are made with the results of new information. Based on this hypothesis, it should be impossible to outperform the overall market through expert stock selection or market timing, and that the only way an investor can possibly obtain higher returns is by chance or by purchasing riskier investments (Fama, 1970). In an efficient market, if expectations and information of all the participants of the market are well reflected by prices, volatilities of prices remain unpredictable. Another hypothesis compatible with the efficient market hypothesis is the random walk (RW) that says the trend of volatilities in stock market prices are random and, thus, not predictable (Fama, 1995).

In recent years, application of artificial intelligence in financial cases has triggered the strength of this idea that market might not be always completely efficient and not move randomly, and future price can be extracted from historical data and by means of various techniques (Cervelló-Royo et al., 2015; Enke and Thawornwong, 2005; Patel et al., 2015). Because the nature of the financial time series is fundamentally complex, noisy, dynamic, non-linear, non-parametric, and chaotic (Si and Yin, 2013), stock market prediction is a challenging issue for researchers.

There are different models for the prediction of stock market by using historical data, one of which divides the models into linear and non-linear, and the other divides them into statistical and machine learning. A suitable approach is to divide these models into two intelligent and classic ones. In the classic prediction, it is assumed that the future value of price follows the linear trend of the past values; autoregressive moving average (ARIMA), autoregressive conditional heteroscedasticity (GARCH) and regression belong to this class (Wang et al., 2011b). Artificial neural networks (ANNs), fuzzy logic, support vector machines (SVMs), hybrid models and ensemble learning (EL) models belong to intelligent models (Cavalcante et al., 2016). These models, unlike the classic ones, are capable to obtain a non-linear relationship between the input variable without having information about the statistical distribution of the inputs (Lu et al., 2009). Comparisons show that intelligent models, via overcoming limitations of linear models, can better extract patterns from data with higher accuracy prediction (Adebiyi et al., 2014). For the same reason, in recent years, most of the studies conducted for the prediction of stock market have been focused on intelligent models (Tkáč and Verner, 2016) and are also used in this paper.

According to the studies conducted by (Atsalakis and Valavanis, 2009), out of about 150 papers in the field of stock exchange market through using intelligent models, artificial neural networks (ANNs) technique has been mostly applied. By reviewing more than 400 published papers in this field, (Tkáč and Verner, 2016) concluded that ANNs had better performance than other models. Despite the complexity of stock market prediction, it is shown that ANNs with only a hidden layer can model a complex system with the concerned accuracy (Chauvin and Rumelhart, 1995).

Many researches use the ANNs to predict the direction and stock price. In this paper, the direction is determined by decrease (negative) or increases (positive) of stock price relative to the past value of price. These papers often follow one of these two approaches; for example, (Kara et al., 2011) have used two models of SVM and ANN for the prediction of direction. To predict stock price, (Ticknor, 2013) proposed a Bayesian regularized artificial neural network (BRNN) and (Wang and Wang, 2015) proposed a stochastic time effective function neural network (STNN).

Although the application of ANNs in relation to classic models has led to an over-increase in the accuracy predicted, these models have problems like being fallen in local optimum and over-fitting that make prediction accuracy challenging. Studies have specified that ANNs can be combined with other models to create a hybrid model. The hybrid model is better than simple ANNs in terms of increased prediction accuracy (Zhang and Wu, 2009). Therefore, one of the suitable ways to improve prediction accuracy is to use the combined models. The hybrid models are a combination of two or several simple models in order to take advantage of each one and cover shortcomings of one another. For the improvement of accuracy, in the literature, various models have been proposed by combining ANNs and other techniques, one of which is to combine ANN with classical models.

By combining an ANN as non-linear model and ARIMA as linear model, (Zhang, 2003) used the advantages of both and (Khashei et al., 2009) also proposed a combination of ARIMA, ANN, and fuzzy logic. Adhikari and Agrawal (2014) proposed a hybrid model from RW for exploring linear patterns and means of two NN models for uncovering non-linear patterns. Results show that hybrid model has higher accuracy than single models.

Another model is using neuro-fuzzy techniques with a meta-heuristic optimization algorithm. Chang and Liu (2008) applied a resulting fuzzy model of Takagi-Sugeno (TSK) and simulated annealing (SA) for training the fuzzy system parameters. Esfahanipour and Aghamiri (2010) employed a neuro-fuzzy system and fuzzy clustering to extract rules, where the results obtained were better than the Chang’s results. Qiu et al. (2016) selected effective variables by means of a fuzzy model and applied them to three models of BPNN, GA-BPNN, and SABPNN.

The subsequent hybrid model is use of the ANNs with meta-heuristic optimization algorithm which is ordinarily used for improving the training of NN and overcoming its problems. Hassan et al. (2007) proposed a combination of hidden Markov model (HMM), NN, and GA. Asadi et al. (2012) used a combination of NN and Levenberg-Marquardt (LM) with improving the training of network with GA.

In these types of works, the proposed models have been compared with the existing individual and simple models in the combination and it has been specified that hybrid models have better accuracy in evaluations.

Models that have been created using a combination of several techniques in relation to simple models could eliminate restrictions and improve accuracy. Due to the existing problems inherent in an individual model, whether simple or hybrid, these models cannot be expected to have access to the highest possible accuracy. In recent years, one of the concerned models for increasing accuracy of regression models is the usage of EL. The results obtained have proven its efficiency in different applications (Adhikari, 2015; Dietterich, 2000).

The related literature shows widely that EL algorithms have better performance than individual models for a wide spectrum of applications and different scenarios. The results are more accurate, more reliable, and more stable (Adhikari, 2015; Andrawis et al., 2011; Dietterich, 2000; Jose and Winkler, 2008). It is also shown that the necessary and sufficient condition for the higher accuracy of an ensemble learner than its base learners depends on the accuracy (Hansen and Salamon, 1990) (better accuracy than that of random learner) and diversity in base learners. These models have been developed under general titles such as multiple classifier systems, committee of classifiers, ensemble based systems, mixture of experts, multiple class combinations, neural network associations, and bootstrap aggregation (bagging) (Breiman, 1996).

Hashem (1997), Adhikari (2015), and Mabu et al. (2015) have used linear combinations by determining the weight of each base learner through ANNs. Li et al. (2014) based on the majority voting rule and (Rather et al., 2015) have proposed genetic algorithms for determining weights of the base learners. Also, Andrawis et al. (2011) and Jose and Winkler (2008) have proposed various model methods such as mean, trimmed mean, and winsorized mean for the aggregation of models. Table 1 shows papers related to the use of EL algorithms.

1.1. Problem Statement

The most important problem in stock market prediction is accuracy. Because there is inherent complexity for the case, most of the models have restrictions in this regard. As mentioned before, the studies have further discussed the next step price prediction and less attention is paid to direction prediction. The first group includes models that predict stock price in the next time and criteria like mean squared error (MSE), root-mean-square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) are used for evaluation. Cases reviewed above are in this category, such as (Chang and Liu, 2008; Hassan et al., 2007; Lin et al., 2017; Maknickienė and Maknickas, 2016; Ticknor, 2013; Wang and Wang, 2015). The second group contains models which predict stock direction for the next time and criteria like direct, hit ratio, and accuracy are used to appraise them (Kara et al., 2011; Mabu et al., 2015). The results of the model that deals with price prediction and the first group criteria only concerned do not suffice to make decision and trade in the real world because a model may be practically used, in which MAPE criterion is suitable, but trade leads to loss. In order to prevent the mentioned problem and practically use the results, stock price prediction is required to happen in the next time interval considering stock movement direction prediction (de Oliveira et al., 2013). Further, this is explained by an example.

The data shown in Figure 1 are chosen as real and predicted data for consecutive 11 days in Dow Jones index price. Calculated MAPE for predicted amounts is %18 which is a suitable figure for this dataset. As in Figure 1, 11 real and predicted points are specified in the chart. Based on such prediction and given that trader predicts and trades 10 days on a daily basis, the real value on day 1 is 21100 and it is predicted to be 21115 on day 2, which shows the upward movement direction. Therefore, a trader buys as predicted whereas the real price is 21020 on day 2 and he loses $80. Likewise, for the next days, if a trader decides to trade based on today’s price and predicted price, he is faced with$250 of loss within 10 days just as daily profits; losses are shown in Figure 1.

In Figure 2, another prediction chart is presented where the prediction model has considered stock future movement direction and has predicted prices thereof. For the second prediction data, the calculated MAPE value is 0.28% which is about 50% more than the pre-vious error. In this process which is predicted based on predicted price and direction, if daily trading happens, the trader gains \$100 of profit after 10 days.

These two charts are different in that the first mod-el predicts next price based on the previous direction without considering next predicted direction. The second model predicts next price by adding next predicted di-rection to other same input variables; despite having greater MAPE, profit which is aimed by trader increases.

Due to the proper prediction of direction (increase or decrease of price), this profit is gained. Therefore, regarding the results of the example, it can be argued that prediction remains insufficient merely owing to a group of criteria (prediction of price or direction) and profit should be gained through price prediction by con-sidering the predicted direction.

To the best knowledge of the present authors, no solution has been presented for this problem so far. This is the first study to solve the explained problem that considers two simultaneous criteria of direction and price for the prediction of stock price and the results of prediction can be used in a real trader system. The pro-posed model has two stages that are composed of EL and meta-heuristic optimization algorithms. In the first stage, next direction (increase and decrease of price) is predicted and it is used for the prediction of price in the second stage.

2. DEVELOPMENT OF AN INTELLIGENT ENSEMBLE-BASED MODEL FOR STOCK PRICE PREDICTION

Various studies in applying training methods show that there is no specific training algorithm that can be the most accurate and best for all the predictions. To overcome such a problem, EL algorithms have been developed largely. The main motivation for the devel-opments is to reduce the error.

The basic assumption of this methodology is that in EL, the probability of error prediction in an unknown sample is much less than the prediction of an individual model. In comparison with common machine learning methods trying to learn a hypothesis from training data, in EL models, several learners are trained to attain the greatest possible accuracy and also try to construct a set of hypotheses and compositions (Wang et al., 2011a). Learners that are used in EL are called base learners. The EL algorithm that is a combination of base learners produces better accuracy than the individual models (Dietterich, 2000). The general rule in EL systems is that the results of base learner are different from each other as much as possible. This diversity can be obtained in different ways. Four proposed methods for diversity are:

• 1. Use of different training datasets for the training of base learner; through resampling methods in which the sub-set of original training data is se-lected randomly and will be replaced with the original training dataset (Breiman, 1996).

• 2. In order to ensure that the boundaries are ap-propriately different, in addition to using differ-ent training data, unstable models are used as base models because they can make different decision boundaries, even with the low change in their training parameters (Cubiles-De-La-Vega et al., 2013).

• 3. Another way to achieve a diversity of different parameters is to use different models. For ex-ample, a set of multi-layer perceptron neural networks can be trained with initial weights, a number of layers and nodes, different error crite-ria, and so on. Setting such parameters can con-trol individual model instability and, ultimately, diversify them (Yao and Islam, 2008). The abil-ity to control the unstable ANNs has become the ideal candidate for the use of EL algorithms.

• 4. By using different features: Input space is divid-ed into different sub-sets of original features that might overlap and each sub-set is given to a model as an input. Through this method, every base learner explores some part of knowledge and diversity, by using features, triggers better result of EL algorithms (Dietrich et al., 2003).

Bagging as one of the simplest EL algorithms is of-fered to improve the performance of prediction models, and combinative strategy of base learners in them is the majority vote. Diversity in bagging is made using the bootstraps that are randomly selected and replaced with the original training data. Each bootstrap is used to train a learner of the same type (Wang et al., 2012). Lack of using unstable predictor leads to the creation of a collec-tion of almost identical predictors that no longer im-prove individual predictor efficiency. For the same rea-son, in bagging, unstable learning models like DT and ANNs are very efficient and effectively used because small changes in data can cause big changes in the re-sult of prediction (Cubiles-De-La-Vega et al., 2013). After training different base learners and in order to achieve final prediction, the results from all the learners are combined for an instance with different methods. In the simple weighted mean method, weights of all the learners are the same for producing the final result of an instance. The weight of each learner in the weighted mean method for final prediction is determined based on the accuracy of training step and compared to other learners. The effect of each learner on the result of final prediction can be considered as an optimization prob-lem. The goal of the optimization problem is to deter-mine the best weights for each learner in such a way that it can maximize the accuracy of prediction of test data. In this paper, to solve the optimization problem, two PSO and GA algorithms are used.

2.1. Propose Model

In the previous sections, the challenge of a model was presented in the case of simultaneously paying no attention to stock price and direction. In order to solve the challenge, in this section, a new stock price predic-tion model is introduced by considering simultaneously the price and direction. The proposed model includes two dependent stages. Firstly, the direction of price change is predicted and it is added to other features as a new feature and this new dataset is used for prediction in the next time. In the first stage, in order to maximize classification accuracy (prediction of direction) and, in the second stage, to maximize regression accuracy (pre-diction of price), bagging algorithm that is a kind of EL is used. For the purpose of achieving the appropriate accuracy, it is necessary that the results of the base models be diverse as much as possible. The diversity is earned with different training datasets for each model. Diverse datasets are obtained by re-sampling the subset of the training data randomly via replacement. In addition, one NN that can create different decision boundaries, even with low deviations in training parameters, is used as base models. The ag-gregation of the result is done in four ways: optimiza-tion with GA; optimization with PSO; weight aggrega-tion based on the weight of each model that is earned by the accuracy of training data; and aggregation result with equal weight for each model. The best way for the aggregation of the base model is selected based on ac-curacy.

2.1.1. First Stage (Extraction Direction)

In the first stage, the stock price direction is pre-dicted for next time. Most of the time series data in stock market are non-stationary and trendy, which re-duces the accuracy of stock market prediction. The data must be as de-trend and stationary as possible so that the hidden pattern in the series can be extracted more accurately (Kantelhardt et al., 2002). Differentia-tion and logarithmic conversion can discover more knowledge in the data. The first difference of a time series is the series of changes from one period to the next:(1)

$∇ x t = x t − x t − 1$
(1)

xt denotes the value of the time series xt at period t and the first difference of xt at period t is xt - xt-1 By the difference of the initial series, a new time series is created. The initial time series elements are stock prices and new time series elements are changes in price.

Using the value of X at period t is auto-correlated with its value at earlier periods where the n-th element of the series with k lag as input into the model is entered and the element of n+1 is predicted. This value will be predicted as change of price in the next period. In the proposed model, the close price data of the previous days are assigned as the initial inputs and, with their differentiation, a new series is made. The output of the model is the difference between the close price of today and previous day. The number of members of a time series with n element will reach n-1 with the first differentiation. The i-th element with the k lags is used to pre-dict the i+1 element. In the time series dataset, the number of k+1 variables (element i plus its previous k lags) is the input and a variable (element i+1) is the output in each record. The final number of records in this dataset will be n-k-2.

The process of the first stage is presented in Algo-rithm 1. First, data preparation and formation of new time series are performed as the result of the differential of two successive elements of the initial series along with the number of k lagging from that (line 1-2). Then, data are divided into two training and testing data (line 3). After completing these steps, if the records contained in dataset N are assumed, with N times of sampling with replacement on training data, N bootstraps are created (line 4-6). One NN is created and N times are trained with N bootstraps until N base models are obtained (line 7-10). In the following, training data are entered into each of the trained base models and its output is com-pared with the target output in order to determine the base model prediction accuracy (line 11-13). If the ac-curacy of prediction is better than random prediction (greater than 0.5), then the output of this model is main-tained and the results (predicted direction) are added to the results matrix (line 14-15). After the training data are applied to all the trained models and the results matrix is completed, this matrix is aggregated with four meth-ods and the best weighing vector of the combination of trained models is extracted (line 18). In Algorithm 2, the optimization of the vector of weighting is described.

The aggregation process results of base models are presented in Algorithm 2. The results are aggregated by four methods: simple average aggregation (SAV); weighted average aggregation (WAV); GA; and PSO (line 2-5). The extracted weights from the method that has most accuracy are selected (line 6).

Considering the importance of learner’s weight for final performance, as earlier explained, extract of weights is defined as an optimization problem. In the following, the weight of each model is extracted by the use of PSO. The process is explained in Algorithm 3. Every particle in this algorithm is defined as a weight vector for combining learners for the computation of the final output. Therefore, every particle of the vector equals the dimensions of a number of learners obtained in the previous steps. Weights and initial velocity are determined randomly for each particle (line 3-6). Below, the performance (accuracy) of each particle (weights of base learners) is calculated. The performance of each particle means that the performance of the learner in teamwork for reaching the least possible error for all the training data is obtained by particle-related weight combination (line 8-9).

For example, for a particle the with weights of 0.5, 0.3 and 0.2, learner for a specific sample having 59, 65 and 62, final output stands at 61.4 = 0.×5 e359+0.3365+ 0.2×62. A complete update of the group of particles is made based on the best personal and group experience with a certain number of iterations (line 11-12). Finally, the best particle in the last iteration is being used as final weights for the combination of base models.

The GA optimization is applied for obtaining opti-mal weights of base models in addition to PSO. In this algorithm, each chromosome is taken as one weight vector. The number of genes concerned for each chro-mosome equals the number of base models obtained in the previous steps. To begin the GA, initial amounts of weights are generated in each chromosome randomly. Afterwards, chromosomes are arranged based on their performance (exactly similar to PSO). To generate the next generation, the selection is made through roulette wheel, and two-point crossover and two-point mutation are respectively used for crossover and mutation.

To generate the next generation, the selection is made through roulette wheel, and two-point crossover and uniform mutation are respectively used for crosso-ver and mutation. In the roulette wheel selection, par-ents are selected according to their fitness. The better the chromosomes, the more the chances for their selec-tion would be. Imagine a roulette wheel, in which all the chromosomes are placed in the population and the size of every chromosome corresponds to the size of its fit-ness function.

Then, a marble is thrown there to select the chro-mosome. The chromosome with bigger fitness will be selected most of the time. Two-point crossover calls for two points to be selected on the parent chromosome. Everything between the two points is swapped between the parent organisms. Uniform mutation replaces the value of the chosen gene with a uniform random value selected between the user-specified upper and lower bounds for that gene. This mutation operator can only be used for integer and float genes.

For the evolution of chromosome, a predetermined number of chromosomes are considered for crossover and mutation. In the last iteration of the GA, after or-dering of chromosomes based on their accuracy in de-termining the direction for all the training data, the best chromosome is being used as final weights for base learner combination.

WAV is another method of training output matrix aggregation where, firstly, accuracy level of each matrix column (prediction of a base model) is calculated for predicting target vector of training (line 3). The accuracy of each base model is being divided by total of accura-cy, and coefficient of each matrix column is obtained in the optimal combination vector (line 4). SAV is the sim-ple average of base models output for the aggregation of results with equal weights.

Figure 3 shows the process aggregation of the results of the base models in different ways. The matrix columns are the results of the base models. For each column, one weight is considered. The method that makes the weight vector with the highest accuracy is selected as the aggregation method.

2.1.2. Second Stage (Price Prediction)

After the termination of the model’s first stage, its output which includes “the price movement direction” in the stock market, whether upward or downward, is obtained. In the second stage, with adding this feature to existing features (new dataset), a model is trained with the new dataset, the best combination of lags is chosen through trial and error, and it is used as the input of the second stage. The applied techniques in this stage are conceptually similar to the first stage of the proposed model to some extent, but it is different in the use; for example, evaluation criteria for base models and aggre-gation methods in this stage, instead of evaluating accu-racy of results in correct prediction of direction, varies with evaluation of accuracy in predicted price through MAPE criterion. In this stage, by bootstrap, base models are trained and next-time price is being predicted; in order to improve accuracy and further assurance, re-sults of base models are aggregated with different methods and the method with the highest accuracy is selected.

The process of the second stage is presented in Al-gorithm 5. Initially, a new dataset is created by adding the feature taken from the previous stage. Then, the (near) optimal lag is selected by trial and error (line 1-2). In the following, the new dataset is divided into two training and testing data (line 3) and, then, N bootstraps are created from the training dataset (line 5-7).

One NN is created and later, it is trained with N bootstraps until N trained base models are learned (line 9-11). Training data are applied to all the trained mod-els and their output is added to the training output ma-trix (line 12-18). According to EL algorithm, which is mentioned in the previous section, the results are aggre-gated through four methods, from among which the best vector is chosen (line 19).

In the first stage of the model, the difference of the closing price data is used to predict the next direction of price movement. Each data record is in this form w$[ D C t − 1 ; D C t + 2 ; ⋯ ; D C t + k ; D t ]$; where DC as Difference between Closes, k-lagged as inputs, and D as the next Direction price change are the output. The data records are shown in Figure 4

Thus, the output vector of the first stage $[ D t , D t + 1 , D t + 2 , ⋯ D t + n ]$ along with other data is used as the input for the second stage.

Aggregation methods of results in this stage are car-ried out as in the previous stage (Algorithm. 2), with the difference that the criterion for evaluating and selecting the optimal weight of the vectors in the matrix is MAPE. This is also proper for algorithms used in this stage. Just as aggregation with PSO of results presented in Algo-rithm 3, each particle will be evaluated by the MAPE criterion. Algorithm 6 will be replaced with Algorithm 4 in the aggregation of the results based on weighted av-erage aggregation methods. Algorithm 5

3. EXPERIMENTAL RESULTS

In this section, the performance of the proposed model is evaluated with several datasets, which include the introduction of datasets, evaluation criteria, imple-mentation of the proposed model, and comparing re-sults of the proposed model with other papers.

3.1. Datasets

In order to compare the results of the proposed model with those of the accredited papers, the same datasets in the papers are used (Asadi et al., 2012; Chang and Liu, 2008; Esfahanipour and Aghamiri, 2010). These data are different indices of the world’s validated stock exchanges that show changes of general level of prices in the market. In this paper, Indices of Dow Jones Industrial Average (DJIA), Taiwan Stock Exchange (TSE), and Tehran Price Index (TEPIX) to-gether with three other Tehran Indices are being investi-gated. Indices of Tehran Industry Index (TII) show av-erage changes of stock price of operating companies in industry sector; Tehran Index of Financial Group (TIFG), the average changes in the stock price of oper-ating companies in the financial sector and Tehran In-dex of top 50 companies (TIT50C) in terms of Liquidity. Two other datasets are Nasdaq Index and Amazon Stock Prices. The information related to data is shown in Table 2.

3.2. Evaluation Criteria

Considering the approach of the paper to simulta-neously improve the prediction of direction and price, the criteria for evaluating the results should support these two categories. The first criterion that has been used to compare the models is MAPE. In this criterion, absolute value of the difference between the real amount and prediction amount is divided by the real amount and, by dividing it by the number of total data in the problem, it is expressed in terms of percentage, where yi is the real amount, pi is predicted amount, and N is the number of data.(2)(3)(4)(5)(6)

$MAPE = 100 × 1 N ∑ i = 1 N y i − p i y i$
(2)

$P O I C D = 100 × 1 N ∑ i = 0 N D i$
(3)

(4)

$U of Tail = ∑ i = 1 N ( y i − p i ) 2 ∑ i = 1 N ( y i − y i + 1 ) 2$
(5)

$ARV = ∑ ​ i = 1 n N ( y i − p i ) 2 ∑ ​ i = 1 N ( y − − p i ) 2$
(6)

The criterion the prediction of change in direction (POCID) shows calculation of direction change predic-tion. Model accuracy in direction prediction together with the proper prediction of price plays a leading role in profit gain. The interval of POCID criterion is [0, 100]; if POCID is closer to 100, the better accuracy of predic-tion is obtained. The third criterion is Theil’s U statistic which compares model performance with RW model. If the Theil’s U of the model is equal to 1, model perfor-mance is equivalent to RW. If the number is bigger than 1, the performance is worser than RW and, if it is less than 1, the proposed model performance is better than RW. The fourth criterion is the average relative variance (ARV); if it is equal to 1. it means that if instead of all the predicted values, we set the average of the time se-ries, accuracy will not change. The value of this criterion, which is less than 1 and closer to 0, indi-cates better prediction accuracy (Ferreira et al., 2008).

3.3. Implementation of the Proposed Model

Research has shown that the input variables of stock price predictive models have been used, including price, technical, fundamental, and macroeconomic var-iables that can be categorized into different groups (de Oliveira et al., 2013). One common categorization for stock prediction models’ input variables divides them into two types: the first type is the price variables such as open, close, low, high price, as well as the volume and number of trading in a period (Hassan et al., 2007). The second type is the technical variables that are derived from these price data using different formulas (Kara et al., 2011).

In the proposed model, this paper uses price varia-bles. Six time series in Table 1 contain 620 records, di-vided into two training and test data, 80% (500 records) for training 20% (120 data records) for test. The other papers compared with the results of this model also di-vide these data into the same training and test (Asadi et al., 2012). Two other time series are based on 80% of the training data and 20% of the test data

The final output of the proposed model is the pre-diction of price in the next time owing to the price change in the trend. In the first stage that is responsible for specifying price direction changes in next time, data entering the model of the first stage are differentiated and new data are price changes in two consecutive times.

In the proposed model, the base learners are com-posed of a three-layer feed-forward neural network and the number of inputs is equal to the number of lags used. Several factors in model setting can affect the accuracy of the results and each factor has different levels: settings for training data, such as use/non-use of logarithmic transformation, applying different lags to the data, determining the percentage of validation data, number of bootstraps, amount of the use of data from each bootstrap, as well as number of inputs, number of neurons, training method, and aggregation method of the results of the base learners. The range of levels used in this experimental plan in shown in Table 3.

Combining all possible scenarios and testing them require a large amount of time. To solve this problem and to reduce the tests using the Taguchi method and the Minitab software, 25 different modes are selected and the model is implemented to achieve the highest possible accuracy among these combinations. Models with the highest accuracy in the training data are shown in Table 4 for each dataset and model settings.

Prediction output vector obtained from the first stage is added to other price variables, which creates a new input dataset into the second stage. In this stage, learn-ing model is repeated by changing its settings (similar to the first stage) to get the best results.

After each training of individual models and their aggregation, the combination that has the best result of the evaluation of the training data is selected and, then, the testing data will be entered into the model and the model will be evaluated. The result of the evaluation of testing data along with the settings that produced these results is shown in Table 5.

As shown in Table 5, for the Dow Jones dataset, the (near) optimal MAPE happens when 3 lags are used. Also, 9 neurons in base models and 0.5% of data of each bootstrap are used for validation. The number of bootstraps is 100 and the method, by which base mod-els are aggregated, is WAV. By considering these settings in evaluation with testing data for the Dow Jones da-taset, the value of MAPE is 1.126. Figure 5 and Figure 6 show a compari-son of real and predicted values with the proposed model.

In this paper, for the aggregation of results in both stages, the meta-heuristic optimization algorithms are used. Several settings are obtained by trial and error, the best of which are selected for each dataset. These pa-rameters are shown in Table 6.

3.4. Comparison of Proposed Model

The results of the proposed model are compared with the models of valid papers (Asadi et al., 2012; Chang and Liu, 2008; Esfahanipour and Aghamiri, 2010), which have used the same datasets in the field of stock price prediction with MAPE criterion. Table 7 shows the MAPE value of the proposed model and the results of other models as well as their percentage of improvement by the proposed model. Table 7 indicates the superiority of the proposed model to other models in most cases.

Consideration of prediction criteria of price and market movement direction is the advantage of the proposed model in predicting. The results obtained compared to the results of Asadi’s model (Asadi et al., 2012), in which POICD criterion is being used for the prediction of changes on the same datasets, show that the proposed model has better accuracy than this model. The results of this comparison are found in Table 8.

The results of Theil’s U evaluation show that the proposed model is better than RW model and ARV in all the datasets. In respect to MAPE and POICD, the pro-posed model, in most cases, has better predicted accu-racy than the other models, the improved amount of which is shown in improvement column of Table 8. For example, for the Dow Jones dataset, MAPE obtained from Asadi’s model is 1.41 and MAPE obtained from the proposed model of this paper is 1.126, which shows 18% improvement in price prediction. This is while the proposed model has shown 5% improvement in terms of price direction compared to that of this model.

In Figure 7 and Figure 8, comparisons of six indices between proposed model and Asadi’s model in terms of MAPE and POCID criteria is demonstrated.

The average computation time for building the proposed model in the first and second stages is shown in Table 9. The time to build the model depends on var-ious factors (Table 3). Depending on the number of base models that are equal to the number of bootstraps (100 to 400), the runtime is different. The runtime also de-pends on the training algorithm used for the base mod-els. The average runtime for different composition fac-tors is shown in Table 9. This computation time involves constructing base models, training them, and aggregat-ing results with 4 methods.

All the prediction algorithms were implemented in MATLAB R2016b and carried out on a personal com-puter with an Intel(R) Ci7 7700K CPU and 64GB of RAM DDR4 and HDD 300 GB SSD. The training sam-ples for all the dataset were 500.

As can be seen in the table, the average building time of the proposed model was much larger than indi-vidual models; but, it should be noted that when the training step was completed and the model was selected, there was no significant difference in the implementa-tion time of the test data from that of the proposed model and individual models.

3.5. Discussion

As mentioned before, prediction models have been divided into two groups of price prediction and direction prediction. Also, by examples, it was clarified that the use of models predicting price regardless of price trends, despite having less error in some important criteria, may cause a loss in real trading. This is also true for models that only deal with direction prediction. The proposed model predicts price based on the consideration of price trend and its prediction that may bring more profit than other models in real trading.

Three categories of models include price prediction, direction prediction, and price prediction regarding direc-tion (proposed model of this paper); they are imple-mented with different datasets, the results of which are compared. The results of predictions obtained from all models are compared through various trading strategies and the profits. For example, DIJA test data (Table 2) are applied to each of three models, where real price and predicted price (in terms of two models) are shown in Figure 9.

Also, the results of direction prediction model are shown in Figure 10. In this figure, number 1 shows price increase and number -1 shows price decrease.

These predictions are evaluated by various trading strategies. If the output of prediction model is price, one strategy for trading seems like this: In case the next pre-dicted price has increased in relation to present price, a new stock is bought as much as a prediction is; on the contrary, if the next predicted price has decreased (ac-cording to present price), stock is sold as much as pre-diction is. If the output of direction model is direction, buying and selling are done in the same model, but it is done on a constant basis rather than as much as predic-tion is.

This strategy of buying and selling is being implemented on the prediction results of the above three models.

In case a deal is made with an initial capital of 10,000,000 money units for each of three models, the results obtained are shown in Figure 11. These results show that price prediction model has gained 8% of profit owing to direction while the initial capital of the other two models has decreased during this period and dealing has led to loss by the use of the other two models.

3.6. Conclusion

High-accuracy stock price prediction for trading is highly important in this market, which leads to the preservation and increase of capital. Considering the fact that some of the classic financial theories find market unpredictable, in this paper, despite fluctuating and unstable nature of the stock market, using artificial intelligence models, predicting the behavior of stock markets has shown that it is possible to predict the market. The proposed model simultaneously takes into account price movement direction (increase or decrease) and proper stock price in order to predict stock price and market behavior. The models in the literature have mainly underscored price prediction and paid no attention to the next movement direction; this has caused their results to be impractical and their application for trading in stock market to culminate in financial loss. To solve this problem, this paper proposed a two-stage model, based on which stock price was predicted with attention to the next price movement direction. In the first stage, the price change direction was predicted and, in the second stage, this direction was added to input variables for price prediction. The proposed model employed ensemble learning (EL) to increase the accuracy of prediction and rendered higher evaluation in criteria in addition to simultaneous consideration to two dimensions of price and direction in relation to other one-dimensional models. Because of the simultaneous prediction of the market direction and its price, the proposed model can be applied in a real trading system, in which direction prediction is employed for the release of buying and selling order, and price prediction is used for extracting its volume. The proposed model is being implemented on several datasets. The results showed that the proposed model, compared to other models, had more desirable performance. According to the results, the model proposed can be utilized as a backup system of certain decision-making in real trading in the stock market.

Figure

Real price and predicted price regardless of the prediction related to the direction of the stock price movement.

Real price and estimated price considering the direction of the stock movement.

Aggregation process of the results of the base models.

Data records and output vector.

Comparison of real and predicted values with the proposed model.

Comparison of real and predicted values for Amazon stock price

Comparison of two models with MAPE criterion.

Comparison of two models with POICD criterion.

Real values, predicted prices and predicted prices according to the direction for DIJA dataset.

Real direction of stock movements and the predicted direction for DIJA dataset.

Results of trading with three predicted models and strategy expressed over 120 days.

Table

Papers related to the use of EL for prediction

Prediction of direction

Aggregation_method

Optimization of weight using PSO

Weighted Average Aggregation (Direct)

Weighted Average Aggregation (Direct)

Weighted Average Aggregation (Direct)

The range of levels used in this experimental plan

Results of the first stage of implementation of the proposed model for training data

Results of the second stage of implementation of the proposed model for test data

Meta-heuristic algorithm settings

Comparing results of proposed model and other models

Comparing the results of the proposed model with Asadi’s paper

Runtime per training of various models

REFERENCES

1. Adebiyi, A. A. , Adewumi, A. O. , and Ayo, C. K. (2014), Comparison of ARIMA and artificial neural networks models for stock price prediction, Journal of Applied Mathematics, 2014 , Article ID614342, 7.
2. Adhikari, R. (2015), A neural network based linear ensemble framework for time series forecasting , Neurocomputing, 157, 231-242.
3. Adhikari, R. and Agrawal, R. (2014), A combination of artificial neural network and random walk models for financial time series forecasting , Neural Computing and Applications, 24(6), 1441-1449.
4. Andrawis, R. R. , Atiya, A. F. , and El-Shishiny, H. (2011), Forecast combinations of computational intelligence and linear models for the NN5 time series forecasting competition , International Journal of Forecasting, 27(3), 672-688.
5. Asadi, S. , Hadavandi, E. , Mehmanpazir, F. , and Nakhostin, M. M. (2012), Hybridization of evolutionary Levenberg-Marquardt neural networks and data pre-processing for stock market prediction , Knowledge-Based Systems, 35, 245-258.
6. Atsalakis, G. S. and Valavanis, K. P. (2009), Surveying stock market forecasting techniques Part II: Soft computing methods , Expert Systems with Applications, 36(3), 5932-5941.
7. Breiman, L. (1996), Bagging predictors , Machine learning, 24(2), 123-140.
8. Campbell, J. Y. and Thompson, S. B. (2007), Predicting excess stock returns out of sample: Can anything beat the historical average? , The Review of Financial Studies, 21(4), 1509-1531.
9. Cavalcante, R. C. , Brasileiro, R. C. , Souza, V. L. , Nobrega, J. P. , and Oliveira, A. L. (2016), Computational intelligence and financial markets: A survey and future directions , Expert Systems with Applications, 55, 194-211.
10. Cervelló-Royo, R. , Guijarro, F. , and Michniuk, K. (2015), Stock market trading rule based on pattern recognition and technical analysis: Forecasting the DJIA index with intraday data , Expert Systems with Applications, 42(14), 5963-5975.
11. Chang, P. C. and Liu, C. H. (2008), A TSK type fuzzy rule based system for stock price prediction , Expert Systems with Applications, 34(1), 135-144.
12. Chauvin, Y. and Rumelhart, D. E. (1995), Backpropagation: Theory, architectures, and applications, Psychology Press, New York.
13. Cubiles-De-La-Vega, M. D. , Blanco-Oliver, A. , Pino-Mej as, R. , and Lara-Rubio, J. (2013), Improving the management of microfinance institutions by using credit scoring models based on statistical learning techniques , Expert Systems with Applications, 40(17), 6910-6917.
14. de Oliveira, F. A. , Nobre, C. N. , and Z rate, L. E. (2013), Applying artificial neural networks to prediction of stock price and improvement of the directional prediction index: Case study of PETR4, Petrobras, Brazil , Expert Systems with Applications, 40(18), 7596-7606.
15. Dietrich, C. , Palm, G. , and Schwenker, F. (2003), Decision templates for the classification of bioacoustic time series , Information Fusion, 4(2), 101-109.
16. Dietterich, T. G. (2000), Ensemble methods in machine learning , Paper presented at the International Workshop on Multiple Classifier Systems, 1-15.
17. Enke, D. and Thawornwong, S. (2005), The use of data mining and neural networks for forecasting stock market returns , Expert Systems with Applications, 29(4), 927-940.
18. Esfahanipour, A. and Aghamiri, W. (2010), Adapted neuro-fuzzy inference system on indirect approach TSK fuzzy rule base for stock market analysis , Expert Systems with Applications, 37(7), 4742-4748.
19. Fama, E. F. (1970), Efficient capital markets: A review of theory and empirical work , The Journal of Finance, 25(2), 383-417.
20. Fama, E. F. (1995), Random walks in stock market prices , Financial Analysts Journal, 51(1), 75-80.
21. Ferreira, T. A. , Vasconcelos, G. C. , and Adeodato, P. J. (2008), A new intelligent system methodology for time series forecasting with artificial neural networks , Neural Processing Letters, 28(2), 113-129.
22. Freitas, P. S. and Rodrigues, A. J. (2006), Model combination in neural-based forecasting , European Journal of Operational Research, 173(3), 801-814.
23. Hansen, L. K. and Salamon, P. (1990), Neural network ensembles , IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10), 993-1001.
24. Hashem, S. (1997), Optimal linear combinations of neural networks , Neural Networks, 10(4), 599-614.
25. Hassan, M. R. , Nath, B. , and Kirley, M. (2007), A fusion model of HMM, ANN and GA for stock market forecasting , Expert Systems with Applications, 33(1), 171-180.
26. Jose, V. R. R. and Winkler, R. L. (2008), Simple robust averages of forecasts: Some empirical results , International Journal of Forecasting, 24(1), 163-169.
27. Kantelhardt, J. W. , Zschiegner, S. A. , Koscielny-Bunde, E. , Havlin, S. , Bunde, A. , and Stanley, H. E. (2002), Multifractal detrended fluctuation analysis of nonstationary time series , Physica A: Statistical Mechanics and its Applications, 316(1-4), 87-114.
28. Kara, Y. , Acar Boyacioglu, M. , and Baykan, . K. (2011), Predicting direction of stock price index movement using artificial neural networks and support vector machines: The sample of the Istanbul stock exchange , Expert Systems with Applications, 38(5), 5311-5319.
29. Khashei, M. , Bijari, M. , and Ardali, G. A. R. (2009), Improvement of auto-regressive integrated moving average models using fuzzy logic and artificial neural networks (ANNs) , Neurocomputing, 72(4-5), 956-967.
30. Li, Y. , Wu, C. , Liu, J. , and Luo, P. (2014), A combination prediction model of stock composite index based on artificial intelligent methods and multi-agent simulation , International Journal of Computational Intelligence Systems, 7(5), 853-864.
31. Lin, L. , Wang, F. , Xie, X. , and Zhong, S. (2017), Random forests-based extreme learning machine ensemble for multi-regime time series prediction , Expert Systems with Applications, 83, 164-176.
32. Lu, C. J. , Lee, T. S. , and Chiu, C. C. (2009), Financial time series forecasting using independent component analysis and support vector regression , Decision Support Systems, 47(2), 115-125.
33. Mabu, S. , Obayashi, M. , and Kuremoto, T. (2015), Ensemble learning of rule-based evolutionary algorithm using multi-layer perceptron for supporting decisions in stock trading problems , Applied Soft Computing, 36, 357-367.
34. Maknickienė, N. and and Maknickas, A. (2016), Prediction Capabilities of Evolino RNN Ensembles , Computational Intelligence, 473-485, Springer.
35. Patel, J. , Shah, S. , Thakkar, P. , and Kotecha, K. (2015), Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques , Expert Systems with Applications, 42(1), 259-268.
36. Qiu, M. , Song, Y. , and Akagi, F. (2016), Application of artificial neural network for the prediction of stock market returns: The case of the Japanese stock market , Chaos, Solitons & Fractals, 85, 1-7.
37. Rather, A. M. , Agarwal, A. , and Sastry, V. (2015), Recurrent neural network and a hybrid model for prediction of stock returns , Expert Systems with Applications, 42(6), 3234-3241.
38. Schwert, G. W. (1989), Why does stock market volatility change over time? , The Journal of Finance, 44(5), 1115-1153.
39. Si, Y. W. and Yin, J. (2013), OBST-based segmentation approach to financial time series , Engineering Applications of Artificial Intelligence, 26(10), 2581-2596.
40. Ticknor, J. L. (2013). A Bayesian regularized artificial neural network for stock market forecasting , Expert Systems with Applications, 40(14), 5501-5506.
41. Tkáč, M. and Verner, R. (2016), Artificial neural networks in business: Two decades of research , Applied Soft Computing, 38, 788-804.
42. Wang, G. , Hao, J. , Ma, J. , and Jiang, H. (2011), A comparative assessment of ensemble learning for credit scoring , Expert Systems with Applications, 38(1), 223-230.
43. Wang, G. , Ma, J. , Huang, L. , and Xu, K. (2012), Two credit scoring models based on dual strategy ensemble trees , Knowledge-Based Systems, 26, 61-68.
44. Wang, J. Z. , Wang, J. J. , Zhang, Z. G. , and Guo, S. P. (2011), Forecasting stock indices with back propagation neural network , Expert Systems with Applications, 38(11), 14346-14355.
45. Wang, J. and Wang, J. (2015), Forecasting stock market indexes using principle component analysis and stochastic time effective neural networks , Neurocomputing, 156, 68-78.
46. Yao, X. and Islam, M. M. (2008), Evolving artificial neural network ensembles , IEEE Computational Intelligence Magazine, 3(1), 31-42.
47. Zhang, G. P. (2003), Time series forecasting using a hybrid ARIMA and neural network model , Neurocomputing, 50, 159-175.
48. Zhang, Y. and Wu, L. (2009), Stock market prediction of S&P 500 via combination of improved BCO approach and BP neural network , Expert Systems with Applications, 36(5), 8849-8854.