1. INTRODUCTION AND LITERATURE REVIEW
Prediction is a process based on historical data and its accurate results will make better policymaking for the future (Campbell and Thompson, 2007). In financial actions, there are numerous cases that need careful prediction (Tkáč and Verner, 2016). Prediction of financial time series is an important challenge in prediction, in which researchers try to extract patterns of data to predict the next event (Schwert, 1989). There are several hypotheses about the prediction of the stock market; the efficient market hypothesis is a theory that states market prices fully reflect all the available information and volatilities on prices are made with the results of new information. Based on this hypothesis, it should be impossible to outperform the overall market through expert stock selection or market timing, and that the only way an investor can possibly obtain higher returns is by chance or by purchasing riskier investments (Fama, 1970). In an efficient market, if expectations and information of all the participants of the market are well reflected by prices, volatilities of prices remain unpredictable. Another hypothesis compatible with the efficient market hypothesis is the random walk (RW) that says the trend of volatilities in stock market prices are random and, thus, not predictable (Fama, 1995).
In recent years, application of artificial intelligence in financial cases has triggered the strength of this idea that market might not be always completely efficient and not move randomly, and future price can be extracted from historical data and by means of various techniques (CervellóRoyo et al., 2015; Enke and Thawornwong, 2005; Patel et al., 2015). Because the nature of the financial time series is fundamentally complex, noisy, dynamic, nonlinear, nonparametric, and chaotic (Si and Yin, 2013), stock market prediction is a challenging issue for researchers.
There are different models for the prediction of stock market by using historical data, one of which divides the models into linear and nonlinear, and the other divides them into statistical and machine learning. A suitable approach is to divide these models into two intelligent and classic ones. In the classic prediction, it is assumed that the future value of price follows the linear trend of the past values; autoregressive moving average (ARIMA), autoregressive conditional heteroscedasticity (GARCH) and regression belong to this class (Wang et al., 2011b). Artificial neural networks (ANNs), fuzzy logic, support vector machines (SVMs), hybrid models and ensemble learning (EL) models belong to intelligent models (Cavalcante et al., 2016). These models, unlike the classic ones, are capable to obtain a nonlinear relationship between the input variable without having information about the statistical distribution of the inputs (Lu et al., 2009). Comparisons show that intelligent models, via overcoming limitations of linear models, can better extract patterns from data with higher accuracy prediction (Adebiyi et al., 2014). For the same reason, in recent years, most of the studies conducted for the prediction of stock market have been focused on intelligent models (Tkáč and Verner, 2016) and are also used in this paper.
According to the studies conducted by (Atsalakis and Valavanis, 2009), out of about 150 papers in the field of stock exchange market through using intelligent models, artificial neural networks (ANNs) technique has been mostly applied. By reviewing more than 400 published papers in this field, (Tkáč and Verner, 2016) concluded that ANNs had better performance than other models. Despite the complexity of stock market prediction, it is shown that ANNs with only a hidden layer can model a complex system with the concerned accuracy (Chauvin and Rumelhart, 1995).
Many researches use the ANNs to predict the direction and stock price. In this paper, the direction is determined by decrease (negative) or increases (positive) of stock price relative to the past value of price. These papers often follow one of these two approaches; for example, (Kara et al., 2011) have used two models of SVM and ANN for the prediction of direction. To predict stock price, (Ticknor, 2013) proposed a Bayesian regularized artificial neural network (BRNN) and (Wang and Wang, 2015) proposed a stochastic time effective function neural network (STNN).
Although the application of ANNs in relation to classic models has led to an overincrease in the accuracy predicted, these models have problems like being fallen in local optimum and overfitting that make prediction accuracy challenging. Studies have specified that ANNs can be combined with other models to create a hybrid model. The hybrid model is better than simple ANNs in terms of increased prediction accuracy (Zhang and Wu, 2009). Therefore, one of the suitable ways to improve prediction accuracy is to use the combined models. The hybrid models are a combination of two or several simple models in order to take advantage of each one and cover shortcomings of one another. For the improvement of accuracy, in the literature, various models have been proposed by combining ANNs and other techniques, one of which is to combine ANN with classical models.
By combining an ANN as nonlinear model and ARIMA as linear model, (Zhang, 2003) used the advantages of both and (Khashei et al., 2009) also proposed a combination of ARIMA, ANN, and fuzzy logic. Adhikari and Agrawal (2014) proposed a hybrid model from RW for exploring linear patterns and means of two NN models for uncovering nonlinear patterns. Results show that hybrid model has higher accuracy than single models.
Another model is using neurofuzzy techniques with a metaheuristic optimization algorithm. Chang and Liu (2008) applied a resulting fuzzy model of TakagiSugeno (TSK) and simulated annealing (SA) for training the fuzzy system parameters. Esfahanipour and Aghamiri (2010) employed a neurofuzzy system and fuzzy clustering to extract rules, where the results obtained were better than the Chang’s results. Qiu et al. (2016) selected effective variables by means of a fuzzy model and applied them to three models of BPNN, GABPNN, and SABPNN.
The subsequent hybrid model is use of the ANNs with metaheuristic optimization algorithm which is ordinarily used for improving the training of NN and overcoming its problems. Hassan et al. (2007) proposed a combination of hidden Markov model (HMM), NN, and GA. Asadi et al. (2012) used a combination of NN and LevenbergMarquardt (LM) with improving the training of network with GA.
In these types of works, the proposed models have been compared with the existing individual and simple models in the combination and it has been specified that hybrid models have better accuracy in evaluations.
Models that have been created using a combination of several techniques in relation to simple models could eliminate restrictions and improve accuracy. Due to the existing problems inherent in an individual model, whether simple or hybrid, these models cannot be expected to have access to the highest possible accuracy. In recent years, one of the concerned models for increasing accuracy of regression models is the usage of EL. The results obtained have proven its efficiency in different applications (Adhikari, 2015; Dietterich, 2000).
The related literature shows widely that EL algorithms have better performance than individual models for a wide spectrum of applications and different scenarios. The results are more accurate, more reliable, and more stable (Adhikari, 2015; Andrawis et al., 2011; Dietterich, 2000; Jose and Winkler, 2008). It is also shown that the necessary and sufficient condition for the higher accuracy of an ensemble learner than its base learners depends on the accuracy (Hansen and Salamon, 1990) (better accuracy than that of random learner) and diversity in base learners. These models have been developed under general titles such as multiple classifier systems, committee of classifiers, ensemble based systems, mixture of experts, multiple class combinations, neural network associations, and bootstrap aggregation (bagging) (Breiman, 1996).
Hashem (1997), Adhikari (2015), and Mabu et al. (2015) have used linear combinations by determining the weight of each base learner through ANNs. Li et al. (2014) based on the majority voting rule and (Rather et al., 2015) have proposed genetic algorithms for determining weights of the base learners. Also, Andrawis et al. (2011) and Jose and Winkler (2008) have proposed various model methods such as mean, trimmed mean, and winsorized mean for the aggregation of models. Table 1 shows papers related to the use of EL algorithms.
1.1. Problem Statement
The most important problem in stock market prediction is accuracy. Because there is inherent complexity for the case, most of the models have restrictions in this regard. As mentioned before, the studies have further discussed the next step price prediction and less attention is paid to direction prediction. The first group includes models that predict stock price in the next time and criteria like mean squared error (MSE), rootmeansquare error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) are used for evaluation. Cases reviewed above are in this category, such as (Chang and Liu, 2008; Hassan et al., 2007; Lin et al., 2017; Maknickienė and Maknickas, 2016; Ticknor, 2013; Wang and Wang, 2015). The second group contains models which predict stock direction for the next time and criteria like direct, hit ratio, and accuracy are used to appraise them (Kara et al., 2011; Mabu et al., 2015). The results of the model that deals with price prediction and the first group criteria only concerned do not suffice to make decision and trade in the real world because a model may be practically used, in which MAPE criterion is suitable, but trade leads to loss. In order to prevent the mentioned problem and practically use the results, stock price prediction is required to happen in the next time interval considering stock movement direction prediction (de Oliveira et al., 2013). Further, this is explained by an example.
The data shown in Figure 1 are chosen as real and predicted data for consecutive 11 days in Dow Jones index price. Calculated MAPE for predicted amounts is %18 which is a suitable figure for this dataset. As in Figure 1, 11 real and predicted points are specified in the chart. Based on such prediction and given that trader predicts and trades 10 days on a daily basis, the real value on day 1 is 21100 and it is predicted to be 21115 on day 2, which shows the upward movement direction. Therefore, a trader buys as predicted whereas the real price is 21020 on day 2 and he loses $80. Likewise, for the next days, if a trader decides to trade based on today’s price and predicted price, he is faced with $250 of loss within 10 days just as daily profits; losses are shown in Figure 1.
In Figure 2, another prediction chart is presented where the prediction model has considered stock future movement direction and has predicted prices thereof. For the second prediction data, the calculated MAPE value is 0.28% which is about 50% more than the previous error. In this process which is predicted based on predicted price and direction, if daily trading happens, the trader gains $100 of profit after 10 days.
These two charts are different in that the first model predicts next price based on the previous direction without considering next predicted direction. The second model predicts next price by adding next predicted direction to other same input variables; despite having greater MAPE, profit which is aimed by trader increases.
Due to the proper prediction of direction (increase or decrease of price), this profit is gained. Therefore, regarding the results of the example, it can be argued that prediction remains insufficient merely owing to a group of criteria (prediction of price or direction) and profit should be gained through price prediction by considering the predicted direction.
To the best knowledge of the present authors, no solution has been presented for this problem so far. This is the first study to solve the explained problem that considers two simultaneous criteria of direction and price for the prediction of stock price and the results of prediction can be used in a real trader system. The proposed model has two stages that are composed of EL and metaheuristic optimization algorithms. In the first stage, next direction (increase and decrease of price) is predicted and it is used for the prediction of price in the second stage.
2. DEVELOPMENT OF AN INTELLIGENT ENSEMBLEBASED MODEL FOR STOCK PRICE PREDICTION
Various studies in applying training methods show that there is no specific training algorithm that can be the most accurate and best for all the predictions. To overcome such a problem, EL algorithms have been developed largely. The main motivation for the developments is to reduce the error.
The basic assumption of this methodology is that in EL, the probability of error prediction in an unknown sample is much less than the prediction of an individual model. In comparison with common machine learning methods trying to learn a hypothesis from training data, in EL models, several learners are trained to attain the greatest possible accuracy and also try to construct a set of hypotheses and compositions (Wang et al., 2011a). Learners that are used in EL are called base learners. The EL algorithm that is a combination of base learners produces better accuracy than the individual models (Dietterich, 2000). The general rule in EL systems is that the results of base learner are different from each other as much as possible. This diversity can be obtained in different ways. Four proposed methods for diversity are:

1. Use of different training datasets for the training of base learner; through resampling methods in which the subset of original training data is selected randomly and will be replaced with the original training dataset (Breiman, 1996).

2. In order to ensure that the boundaries are appropriately different, in addition to using different training data, unstable models are used as base models because they can make different decision boundaries, even with the low change in their training parameters (CubilesDeLaVega et al., 2013).

3. Another way to achieve a diversity of different parameters is to use different models. For example, a set of multilayer perceptron neural networks can be trained with initial weights, a number of layers and nodes, different error criteria, and so on. Setting such parameters can control individual model instability and, ultimately, diversify them (Yao and Islam, 2008). The ability to control the unstable ANNs has become the ideal candidate for the use of EL algorithms.

4. By using different features: Input space is divided into different subsets of original features that might overlap and each subset is given to a model as an input. Through this method, every base learner explores some part of knowledge and diversity, by using features, triggers better result of EL algorithms (Dietrich et al., 2003).
Bagging as one of the simplest EL algorithms is offered to improve the performance of prediction models, and combinative strategy of base learners in them is the majority vote. Diversity in bagging is made using the bootstraps that are randomly selected and replaced with the original training data. Each bootstrap is used to train a learner of the same type (Wang et al., 2012). Lack of using unstable predictor leads to the creation of a collection of almost identical predictors that no longer improve individual predictor efficiency. For the same reason, in bagging, unstable learning models like DT and ANNs are very efficient and effectively used because small changes in data can cause big changes in the result of prediction (CubilesDeLaVega et al., 2013). After training different base learners and in order to achieve final prediction, the results from all the learners are combined for an instance with different methods. In the simple weighted mean method, weights of all the learners are the same for producing the final result of an instance. The weight of each learner in the weighted mean method for final prediction is determined based on the accuracy of training step and compared to other learners. The effect of each learner on the result of final prediction can be considered as an optimization problem. The goal of the optimization problem is to determine the best weights for each learner in such a way that it can maximize the accuracy of prediction of test data. In this paper, to solve the optimization problem, two PSO and GA algorithms are used.
2.1. Propose Model
In the previous sections, the challenge of a model was presented in the case of simultaneously paying no attention to stock price and direction. In order to solve the challenge, in this section, a new stock price prediction model is introduced by considering simultaneously the price and direction. The proposed model includes two dependent stages. Firstly, the direction of price change is predicted and it is added to other features as a new feature and this new dataset is used for prediction in the next time. In the first stage, in order to maximize classification accuracy (prediction of direction) and, in the second stage, to maximize regression accuracy (prediction of price), bagging algorithm that is a kind of EL is used. For the purpose of achieving the appropriate accuracy, it is necessary that the results of the base models be diverse as much as possible. The diversity is earned with different training datasets for each model. Diverse datasets are obtained by resampling the subset of the training data randomly via replacement. In addition, one NN that can create different decision boundaries, even with low deviations in training parameters, is used as base models. The aggregation of the result is done in four ways: optimization with GA; optimization with PSO; weight aggregation based on the weight of each model that is earned by the accuracy of training data; and aggregation result with equal weight for each model. The best way for the aggregation of the base model is selected based on accuracy.
2.1.1. First Stage (Extraction Direction)
In the first stage, the stock price direction is predicted for next time. Most of the time series data in stock market are nonstationary and trendy, which reduces the accuracy of stock market prediction. The data must be as detrend and stationary as possible so that the hidden pattern in the series can be extracted more accurately (Kantelhardt et al., 2002). Differentiation and logarithmic conversion can discover more knowledge in the data. The first difference of a time series is the series of changes from one period to the next:(1)
x_{t} denotes the value of the time series x_{t} at period t and the first difference of x_{t} at period t is x_{t}  x_{t}_{1} By the difference of the initial series, a new time series is created. The initial time series elements are stock prices and new time series elements are changes in price.
Using the value of X at period t is autocorrelated with its value at earlier periods where the nth element of the series with k lag as input into the model is entered and the element of n+1 is predicted. This value will be predicted as change of price in the next period. In the proposed model, the close price data of the previous days are assigned as the initial inputs and, with their differentiation, a new series is made. The output of the model is the difference between the close price of today and previous day. The number of members of a time series with n element will reach n1 with the first differentiation. The ith element with the k lags is used to predict the i+1 element. In the time series dataset, the number of k+1 variables (element i plus its previous k lags) is the input and a variable (element i+1) is the output in each record. The final number of records in this dataset will be nk2.
The process of the first stage is presented in Algorithm 1. First, data preparation and formation of new time series are performed as the result of the differential of two successive elements of the initial series along with the number of k lagging from that (line 12). Then, data are divided into two training and testing data (line 3). After completing these steps, if the records contained in dataset N are assumed, with N times of sampling with replacement on training data, N bootstraps are created (line 46). One NN is created and N times are trained with N bootstraps until N base models are obtained (line 710). In the following, training data are entered into each of the trained base models and its output is compared with the target output in order to determine the base model prediction accuracy (line 1113). If the accuracy of prediction is better than random prediction (greater than 0.5), then the output of this model is maintained and the results (predicted direction) are added to the results matrix (line 1415). After the training data are applied to all the trained models and the results matrix is completed, this matrix is aggregated with four methods and the best weighing vector of the combination of trained models is extracted (line 18). In Algorithm 2, the optimization of the vector of weighting is described.
The aggregation process results of base models are presented in Algorithm 2. The results are aggregated by four methods: simple average aggregation (SAV); weighted average aggregation (WAV); GA; and PSO (line 25). The extracted weights from the method that has most accuracy are selected (line 6).
Considering the importance of learner’s weight for final performance, as earlier explained, extract of weights is defined as an optimization problem. In the following, the weight of each model is extracted by the use of PSO. The process is explained in Algorithm 3. Every particle in this algorithm is defined as a weight vector for combining learners for the computation of the final output. Therefore, every particle of the vector equals the dimensions of a number of learners obtained in the previous steps. Weights and initial velocity are determined randomly for each particle (line 36). Below, the performance (accuracy) of each particle (weights of base learners) is calculated. The performance of each particle means that the performance of the learner in teamwork for reaching the least possible error for all the training data is obtained by particlerelated weight combination (line 89).
For example, for a particle the with weights of 0.5, 0.3 and 0.2, learner for a specific sample having 59, 65 and 62, final output stands at 61.4 = 0.×5 e359+0.3365+ 0.2×62. A complete update of the group of particles is made based on the best personal and group experience with a certain number of iterations (line 1112). Finally, the best particle in the last iteration is being used as final weights for the combination of base models.
The GA optimization is applied for obtaining optimal weights of base models in addition to PSO. In this algorithm, each chromosome is taken as one weight vector. The number of genes concerned for each chromosome equals the number of base models obtained in the previous steps. To begin the GA, initial amounts of weights are generated in each chromosome randomly. Afterwards, chromosomes are arranged based on their performance (exactly similar to PSO). To generate the next generation, the selection is made through roulette wheel, and twopoint crossover and twopoint mutation are respectively used for crossover and mutation.
To generate the next generation, the selection is made through roulette wheel, and twopoint crossover and uniform mutation are respectively used for crossover and mutation. In the roulette wheel selection, parents are selected according to their fitness. The better the chromosomes, the more the chances for their selection would be. Imagine a roulette wheel, in which all the chromosomes are placed in the population and the size of every chromosome corresponds to the size of its fitness function.
Then, a marble is thrown there to select the chromosome. The chromosome with bigger fitness will be selected most of the time. Twopoint crossover calls for two points to be selected on the parent chromosome. Everything between the two points is swapped between the parent organisms. Uniform mutation replaces the value of the chosen gene with a uniform random value selected between the userspecified upper and lower bounds for that gene. This mutation operator can only be used for integer and float genes.
For the evolution of chromosome, a predetermined number of chromosomes are considered for crossover and mutation. In the last iteration of the GA, after ordering of chromosomes based on their accuracy in determining the direction for all the training data, the best chromosome is being used as final weights for base learner combination.
WAV is another method of training output matrix aggregation where, firstly, accuracy level of each matrix column (prediction of a base model) is calculated for predicting target vector of training (line 3). The accuracy of each base model is being divided by total of accuracy, and coefficient of each matrix column is obtained in the optimal combination vector (line 4). SAV is the simple average of base models output for the aggregation of results with equal weights.
Figure 3 shows the process aggregation of the results of the base models in different ways. The matrix columns are the results of the base models. For each column, one weight is considered. The method that makes the weight vector with the highest accuracy is selected as the aggregation method.
2.1.2. Second Stage (Price Prediction)
After the termination of the model’s first stage, its output which includes “the price movement direction” in the stock market, whether upward or downward, is obtained. In the second stage, with adding this feature to existing features (new dataset), a model is trained with the new dataset, the best combination of lags is chosen through trial and error, and it is used as the input of the second stage. The applied techniques in this stage are conceptually similar to the first stage of the proposed model to some extent, but it is different in the use; for example, evaluation criteria for base models and aggregation methods in this stage, instead of evaluating accuracy of results in correct prediction of direction, varies with evaluation of accuracy in predicted price through MAPE criterion. In this stage, by bootstrap, base models are trained and nexttime price is being predicted; in order to improve accuracy and further assurance, results of base models are aggregated with different methods and the method with the highest accuracy is selected.
The process of the second stage is presented in Algorithm 5. Initially, a new dataset is created by adding the feature taken from the previous stage. Then, the (near) optimal lag is selected by trial and error (line 12). In the following, the new dataset is divided into two training and testing data (line 3) and, then, N bootstraps are created from the training dataset (line 57).
One NN is created and later, it is trained with N bootstraps until N trained base models are learned (line 911). Training data are applied to all the trained models and their output is added to the training output matrix (line 1218). According to EL algorithm, which is mentioned in the previous section, the results are aggregated through four methods, from among which the best vector is chosen (line 19).
In the first stage of the model, the difference of the closing price data is used to predict the next direction of price movement. Each data record is in this form w$\left[D{C}_{t1};\hspace{0.17em}D{C}_{t+2};\hspace{0.17em}\hspace{0.17em}\cdots ;\hspace{0.17em}D{C}_{t+k};\hspace{0.17em}{D}_{t}\right]$; where DC as Difference between Closes, klagged as inputs, and D as the next Direction price change are the output. The data records are shown in Figure 4
Thus, the output vector of the first stage $\left[{D}_{t},\hspace{0.17em}{D}_{t+1},\hspace{0.17em}{D}_{t+2},\hspace{0.17em}\cdots \hspace{0.17em}{D}_{t+n}\right]$ along with other data is used as the input for the second stage.
Aggregation methods of results in this stage are carried out as in the previous stage (Algorithm. 2), with the difference that the criterion for evaluating and selecting the optimal weight of the vectors in the matrix is MAPE. This is also proper for algorithms used in this stage. Just as aggregation with PSO of results presented in Algorithm 3, each particle will be evaluated by the MAPE criterion. Algorithm 6 will be replaced with Algorithm 4 in the aggregation of the results based on weighted average aggregation methods. Algorithm 5
3. EXPERIMENTAL RESULTS
In this section, the performance of the proposed model is evaluated with several datasets, which include the introduction of datasets, evaluation criteria, implementation of the proposed model, and comparing results of the proposed model with other papers.
3.1. Datasets
In order to compare the results of the proposed model with those of the accredited papers, the same datasets in the papers are used (Asadi et al., 2012; Chang and Liu, 2008; Esfahanipour and Aghamiri, 2010). These data are different indices of the world’s validated stock exchanges that show changes of general level of prices in the market. In this paper, Indices of Dow Jones Industrial Average (DJIA), Taiwan Stock Exchange (TSE), and Tehran Price Index (TEPIX) together with three other Tehran Indices are being investigated. Indices of Tehran Industry Index (TII) show average changes of stock price of operating companies in industry sector; Tehran Index of Financial Group (TIFG), the average changes in the stock price of operating companies in the financial sector and Tehran Index of top 50 companies (TIT50C) in terms of Liquidity. Two other datasets are Nasdaq Index and Amazon Stock Prices. The information related to data is shown in Table 2.
3.2. Evaluation Criteria
Considering the approach of the paper to simultaneously improve the prediction of direction and price, the criteria for evaluating the results should support these two categories. The first criterion that has been used to compare the models is MAPE. In this criterion, absolute value of the difference between the real amount and prediction amount is divided by the real amount and, by dividing it by the number of total data in the problem, it is expressed in terms of percentage, where y_{i} is the real amount, p_{i} is predicted amount, and N is the number of data.(2)(3)(4)(5)(6)
The criterion the prediction of change in direction (POCID) shows calculation of direction change prediction. Model accuracy in direction prediction together with the proper prediction of price plays a leading role in profit gain. The interval of POCID criterion is [0, 100]; if POCID is closer to 100, the better accuracy of prediction is obtained. The third criterion is Theil’s U statistic which compares model performance with RW model. If the Theil’s U of the model is equal to 1, model performance is equivalent to RW. If the number is bigger than 1, the performance is worser than RW and, if it is less than 1, the proposed model performance is better than RW. The fourth criterion is the average relative variance (ARV); if it is equal to 1. it means that if instead of all the predicted values, we set the average of the time series, accuracy will not change. The value of this criterion, which is less than 1 and closer to 0, indicates better prediction accuracy (Ferreira et al., 2008).
3.3. Implementation of the Proposed Model
Research has shown that the input variables of stock price predictive models have been used, including price, technical, fundamental, and macroeconomic variables that can be categorized into different groups (de Oliveira et al., 2013). One common categorization for stock prediction models’ input variables divides them into two types: the first type is the price variables such as open, close, low, high price, as well as the volume and number of trading in a period (Hassan et al., 2007). The second type is the technical variables that are derived from these price data using different formulas (Kara et al., 2011).
In the proposed model, this paper uses price variables. Six time series in Table 1 contain 620 records, divided into two training and test data, 80% (500 records) for training 20% (120 data records) for test. The other papers compared with the results of this model also divide these data into the same training and test (Asadi et al., 2012). Two other time series are based on 80% of the training data and 20% of the test data
The final output of the proposed model is the prediction of price in the next time owing to the price change in the trend. In the first stage that is responsible for specifying price direction changes in next time, data entering the model of the first stage are differentiated and new data are price changes in two consecutive times.
In the proposed model, the base learners are composed of a threelayer feedforward neural network and the number of inputs is equal to the number of lags used. Several factors in model setting can affect the accuracy of the results and each factor has different levels: settings for training data, such as use/nonuse of logarithmic transformation, applying different lags to the data, determining the percentage of validation data, number of bootstraps, amount of the use of data from each bootstrap, as well as number of inputs, number of neurons, training method, and aggregation method of the results of the base learners. The range of levels used in this experimental plan in shown in Table 3.
Combining all possible scenarios and testing them require a large amount of time. To solve this problem and to reduce the tests using the Taguchi method and the Minitab software, 25 different modes are selected and the model is implemented to achieve the highest possible accuracy among these combinations. Models with the highest accuracy in the training data are shown in Table 4 for each dataset and model settings.
Prediction output vector obtained from the first stage is added to other price variables, which creates a new input dataset into the second stage. In this stage, learning model is repeated by changing its settings (similar to the first stage) to get the best results.
After each training of individual models and their aggregation, the combination that has the best result of the evaluation of the training data is selected and, then, the testing data will be entered into the model and the model will be evaluated. The result of the evaluation of testing data along with the settings that produced these results is shown in Table 5.
As shown in Table 5, for the Dow Jones dataset, the (near) optimal MAPE happens when 3 lags are used. Also, 9 neurons in base models and 0.5% of data of each bootstrap are used for validation. The number of bootstraps is 100 and the method, by which base models are aggregated, is WAV. By considering these settings in evaluation with testing data for the Dow Jones dataset, the value of MAPE is 1.126. Figure 5 and Figure 6 show a comparison of real and predicted values with the proposed model.
In this paper, for the aggregation of results in both stages, the metaheuristic optimization algorithms are used. Several settings are obtained by trial and error, the best of which are selected for each dataset. These parameters are shown in Table 6.
3.4. Comparison of Proposed Model
The results of the proposed model are compared with the models of valid papers (Asadi et al., 2012; Chang and Liu, 2008; Esfahanipour and Aghamiri, 2010), which have used the same datasets in the field of stock price prediction with MAPE criterion. Table 7 shows the MAPE value of the proposed model and the results of other models as well as their percentage of improvement by the proposed model. Table 7 indicates the superiority of the proposed model to other models in most cases.
Consideration of prediction criteria of price and market movement direction is the advantage of the proposed model in predicting. The results obtained compared to the results of Asadi’s model (Asadi et al., 2012), in which POICD criterion is being used for the prediction of changes on the same datasets, show that the proposed model has better accuracy than this model. The results of this comparison are found in Table 8.
The results of Theil’s U evaluation show that the proposed model is better than RW model and ARV in all the datasets. In respect to MAPE and POICD, the proposed model, in most cases, has better predicted accuracy than the other models, the improved amount of which is shown in improvement column of Table 8. For example, for the Dow Jones dataset, MAPE obtained from Asadi’s model is 1.41 and MAPE obtained from the proposed model of this paper is 1.126, which shows 18% improvement in price prediction. This is while the proposed model has shown 5% improvement in terms of price direction compared to that of this model.
In Figure 7 and Figure 8, comparisons of six indices between proposed model and Asadi’s model in terms of MAPE and POCID criteria is demonstrated.
The average computation time for building the proposed model in the first and second stages is shown in Table 9. The time to build the model depends on various factors (Table 3). Depending on the number of base models that are equal to the number of bootstraps (100 to 400), the runtime is different. The runtime also depends on the training algorithm used for the base models. The average runtime for different composition factors is shown in Table 9. This computation time involves constructing base models, training them, and aggregating results with 4 methods.
All the prediction algorithms were implemented in MATLAB R2016b and carried out on a personal computer with an Intel(R) Ci7 7700K CPU and 64GB of RAM DDR4 and HDD 300 GB SSD. The training samples for all the dataset were 500.
As can be seen in the table, the average building time of the proposed model was much larger than individual models; but, it should be noted that when the training step was completed and the model was selected, there was no significant difference in the implementation time of the test data from that of the proposed model and individual models.
3.5. Discussion
As mentioned before, prediction models have been divided into two groups of price prediction and direction prediction. Also, by examples, it was clarified that the use of models predicting price regardless of price trends, despite having less error in some important criteria, may cause a loss in real trading. This is also true for models that only deal with direction prediction. The proposed model predicts price based on the consideration of price trend and its prediction that may bring more profit than other models in real trading.
Three categories of models include price prediction, direction prediction, and price prediction regarding direction (proposed model of this paper); they are implemented with different datasets, the results of which are compared. The results of predictions obtained from all models are compared through various trading strategies and the profits. For example, DIJA test data (Table 2) are applied to each of three models, where real price and predicted price (in terms of two models) are shown in Figure 9.
Also, the results of direction prediction model are shown in Figure 10. In this figure, number 1 shows price increase and number 1 shows price decrease.
These predictions are evaluated by various trading strategies. If the output of prediction model is price, one strategy for trading seems like this: In case the next predicted price has increased in relation to present price, a new stock is bought as much as a prediction is; on the contrary, if the next predicted price has decreased (according to present price), stock is sold as much as prediction is. If the output of direction model is direction, buying and selling are done in the same model, but it is done on a constant basis rather than as much as prediction is.
This strategy of buying and selling is being implemented on the prediction results of the above three models.
In case a deal is made with an initial capital of 10,000,000 money units for each of three models, the results obtained are shown in Figure 11. These results show that price prediction model has gained 8% of profit owing to direction while the initial capital of the other two models has decreased during this period and dealing has led to loss by the use of the other two models.
3.6. Conclusion
Highaccuracy stock price prediction for trading is highly important in this market, which leads to the preservation and increase of capital. Considering the fact that some of the classic financial theories find market unpredictable, in this paper, despite fluctuating and unstable nature of the stock market, using artificial intelligence models, predicting the behavior of stock markets has shown that it is possible to predict the market. The proposed model simultaneously takes into account price movement direction (increase or decrease) and proper stock price in order to predict stock price and market behavior. The models in the literature have mainly underscored price prediction and paid no attention to the next movement direction; this has caused their results to be impractical and their application for trading in stock market to culminate in financial loss. To solve this problem, this paper proposed a twostage model, based on which stock price was predicted with attention to the next price movement direction. In the first stage, the price change direction was predicted and, in the second stage, this direction was added to input variables for price prediction. The proposed model employed ensemble learning (EL) to increase the accuracy of prediction and rendered higher evaluation in criteria in addition to simultaneous consideration to two dimensions of price and direction in relation to other onedimensional models. Because of the simultaneous prediction of the market direction and its price, the proposed model can be applied in a real trading system, in which direction prediction is employed for the release of buying and selling order, and price prediction is used for extracting its volume. The proposed model is being implemented on several datasets. The results showed that the proposed model, compared to other models, had more desirable performance. According to the results, the model proposed can be utilized as a backup system of certain decisionmaking in real trading in the stock market.