Pairs trading is an important and challenging research area in computational finance, in which pairs of stocks are bought and sold in pair combinations for arbitrage opportunities.
Traditional methods that solve this set of problems mostly rely on statistical methods such as regression. In contrast to the statistical approaches, recent advances in computational intelligence CI are leading to promising opportunities for solving problems in the financial applications more effectively. In this paper, we present a novel methodology for pairs trading using genetic algorithms GA. Our results showed that the GA-based models are able to significantly outperform the benchmark and our proposed method is capable of generating robust models to tackle the dynamic characteristics in the financial application studied.
Based upon the promising results obtained, we expect this GA-based method to advance the research in computational intelligence for finance and provide an effective solution to pairs trading for investment in practice. In the past decades, due to the inefficacy of traditional statistical approaches, such as regression-based and factor analysis methods for solving difficult financial problems, the methodologies stemming from computational intelligence, including fuzzy theory, artificial neural networks ANN , support vector machines SVM , and evolutionary algorithms EA , have been developed as more effective alternatives to solving the problems in the financial domain [ 1 , 2 ].
Among the CI-based techniques studied for finance, the models may be classified as two major areas of applications: For the first category, earlier research works include the fuzzy multiple attribute decision analysis for portfolio construction [ 9 ]. Zargham and Sayeh [ 10 ] employed a fuzzy rule-based system to evaluate a set of stocks for the same task. Chapados and Bengio [ 11 ] trained neural networks for estimation and prediction of asset behavior to facilitate decision-making in asset allocation.
In EA applications along this line of research, Becker et al. In Lai et al. Recently, Huang [ 5 ] devised a hybrid machine learning-based model to identify promising sets of features and optimal model parameters; Huang's model was demonstrated to be more effective than the benchmark and some traditional statistical methods for stock selection.
To improve the performance of the single-objective GA-based models, more recently, Chen et al. In that approach, the authors used the nondominated sorting to search for nondominated solutions and showed that the multiobjective method outperformed the single-objective version proposed by Huang [ 5 ].
Another popular study of computational intelligence has been particularly concerning the prediction of financial time series. A certain amount of research employs network learning techniques, including feed-forward, radial basis function or recurrent NN [ 7 ], and SVM [ 8 ]. Other intelligent methods, such as genetically evolved regression models [ 15 ] and inductive fuzzy inference systems [ 16 ], were also available in the literature. Pairs trading [ 17 ] is an important research area of computational finance that typically relies on time series data of stock price for investment, in which stocks are bought and sold in pairs for arbitrage opportunities.
Although there has been a significant amount of CI-based studies in financial applications, reported CI-based research for pairs trading is sparse and lacks serious analysis. To date, many existing works along this line of research rely on traditional statistical methods such as the cointegration approach [ 19 ], the Kalman filters [ 20 , 21 ], and the principle component analysis [ 18 ]. In the CI area, Thomaidis et al. Saks and Maringer [ 22 ] used genetic programming for various pairs of stocks in Eurostoxx 50 equities and also found good pair-trading strategies.
Although there exist these previous CI-based studies for pairs trading, they lacked serious analysis such as the method of temporal validation used in [ 5 , 23 ] for further evaluation of the robustness of the trading systems. In addition, in these previous studies, the trading models were constructed using only two stocks as a trading pair; here, we propose a generalized approach that uses more than two stocks as a trading group for arbitrage in order to further improve the performance of the models.
In this study, we also employ the GA for the optimization problems in our proposed arbitrage models. In a past study [ 23 ], Huang et al. Motivated by this research work, we thus intend to employ the GA to optimize our intelligent system for pairs trading, and the experimental results will show that our proposed GA-based methodology is promising in outperforming the benchmark.
Furthermore, in contrast to traditional pairs-trading methods that aim at matching pairs of stocks with similar characteristics, we also show that our method is able to construct working trading models for stocks with different characteristics. In this study, we also investigate the robustness of our proposed method and the results show that our method is indeed effective in generating robust models for the dynamic environment of the pairs-trading problem.
This paper is organized into four sections. Section 2 outlines the method proposed in our study. In Section 3 , we describe the research data used in this study and present the experimental results and discussions. Section 4 concludes this paper. In this section, we provide the relevant background and descriptions for the design of our pairs-trading systems using the GA for model optimization.
Traditional decision-making for investment typically relies on fundamentals of companies to assess their value and price their stocks, accordingly.
As the true values of the stocks are rarely known, pairs-trading techniques were developed in order to resolve this by investing stock pairs with similar characteristics e. This mutual mispricing between two stocks is theoretically formulated by the notion of spread, which is used to identify the relative positions when an inefficient market results in the mispricing of stocks [ 18 , 21 ].
As a result, the trading model is usually market-neutral in the sense that it is uncorrelated with the market and may produce a low-volatility investment strategy. A typical form of pairs trading of stocks operates by selling the stock with a relatively high price and buying the other with a relatively low price at the inception of the trading period, expecting that the higher one will decline while the lower one will rise in the future. The price gap of the two stocks, also known as spread, thus acts as a signal to the open and close positions of the pairs of stocks.
During the trading period, position is opened when the spread widens by a certain threshold, and thereafter the positions are closed when spread of the stocks reverts. The objective of this long-short strategy is to profit from the movement of the spread that is expected to revert to its long-term mean.
Consider initial capital X 0 , with an interest rate of r per annum and a frequency of compounding n in a year; the capital X after a year may be expressed as. Therefore, the continuously compounded rate r is calculated by taking the natural logarithm as follows:. Now consider the two price time series, P 1 t and P 2 t , of two stocks S 1 and S 2 with similar characteristics, the process of a pairs-trading model can be described as follows [ 18 ]:.
The rationale behind the mean-reverting process is that there exists a long-term equilibrium mean for the spread.
The investor may bet on the reversion of the current spread to its historical mean by selling and buying an appropriate amount of the pair of the stocks. As a result, the return of the long-short portfolio may oscillate around a statistical equilibrium. In real-world practice, the return of the long-short portfolio above for a period of time may be calculated as follows:. The pairs-trading method can be generalized to a group of stocks in which mispricing may be identified through a proper combination of assets whose time series is mean-reverting.
Mean reversion in the equation above refers to the assumption that both the high and low prices of the synthetic asset P are temporary and that its price tends to move toward its average price over time.
In this work, the long-term mean of an asset's price in the mean-reverting process may be modeled by the celebrated moving average [ 24 ], which is the average price of an asset in a specified period.
Let P t be the price of a stock at time t. The moving average at time t , the mean of the prices corresponding to the most recent n time periods, is defined as.
In this study, we employ the Bollinger Bands [ 24 ] to determine if the spread of a pair of stocks departs from its dynamic average value. Typically, the Bollinger Bands prescribe two volatility bands placed above and below a moving average, in which volatility may be defined as a multiple of the standard deviation of the prices in the past. Formally the Bollinger Bands can be defined as follows:. An important component of a successful trading system is to construct models for market timing that prescribe meaningful entry and exit points in the market.
In this study, we will use the moving averages and Bollinger Bands to develop a trading system, which is described in the next subsection. We calculate the spread for the synthetic asset generated by m stocks as. Here we evaluate the performance of a trading system in terms of its compounded return, which is to be determined by the relevant parameters of the trading models employed. Then the performance metric we use here is through the total cumulative compounded return, R c , where R c is defined by the product of the returns over z consecutive trades as.
Therefore, in the process of capital growth, the capital X z at the end of z trades is. Given the market timing and pairs-trading models, the performance of a trading system shall be enhanced by suitable values of the corresponding model parameters. For the market timing models, the parameters include the period n for the moving average and parameters x and y for the Bollinger Bands that controls the multiples of the standard deviations of the moving average for entry and exit points.
In this study, we propose using genetic algorithms GA for the search of optimal parameters of the trading system. We will describe the basics of GA as well as our proposed optimization scheme in the following.
Genetic algorithms [ 25 ] have been used as computational simulation models of natural evolutionary systems and as adaptive algorithms for solving complex optimization problems in the real world.
The core of this class of algorithms lies in the production of new genetic structures, along the course of evolution, that provide innovations to solutions for the problem. Typically, the GA operate on an evolving population of artificial agents whose composition can be as simple as a binary string that encodes a solution to the problem at hand and a phenotype that represents the solution itself. In each iteration, a new generation is created by applying crossover and mutation to candidates selected as the parents.
Evolution occurs by iterated stochastic variation of genotypes and selection of the fit phenotypes in an environment based on how well the individual solutions solve a problem. Here we use the binary coding scheme to represent a chromosome in the GA. In Figure 1 , loci b n 1 through b n n n represent the encoding for the period n of moving average. Loci b x 1 through b x n x and b y 1 through b y n y represent the encoding of x and y for the Bollinger Bands, respectively.
In our encoding scheme, the chromosome representing the genotypes of parameters is to be transformed into the phenotype by 13 below for further fitness computation. The precision representing each parameter depends on the number of bits used to encode it in the chromosome, which is determined as follows:. With this scheme, we define the fitness function of a chromosome as the annualized return of the trading system over h years of investment:.
Our overall GA-based arbitrage system is a multistage process, including the simultaneous optimization on the weighting coefficients for stocks, the period for the moving average, and the width of the Bollinger Bands. The input to the system is the time series datasets of stock price.
For any given combinations of model parameters of the moving average, Bollinger Bands, and the weighting coefficients of stocks, we employ the pairs-trading arbitrage system for investment. In this work, the timing for trading is designated as buying selling the spread right after it gets to a certain distance measured by standard deviations to the average below above the average and the position is then closed right after the spread gets closer to the mean.
We then compute the corresponding returns for the performance evaluation of the system. In this study, the GA is used as the optimization tool for simultaneous optimization of these model parameters.
The final output is a set of models parameters optimized by the GA that prescribes the pairs-trading and timing models.
The flowchart of this GA-based trading system is summarized in Figure 2. In this section we examine the performance of our proposed method for pair-trading systems. We use two sets of stocks listed in the Taiwan Stock Exchange for illustration: The daily returns of the 10 semiconductor stocks in Taiwan from years to were used to examine the performance of the GA-optimized trading system.
Table 1 shows the 10 stocks used for this subsection. Figure 3 displays an illustration of the best-so-far curve for the accumulated return i. In addition, in this study, the GA experiments employ a binary tournament selection [ 26 ], one-point crossover, and mutation rates of 0. This figure shows how the GA searches for the solutions over the course of evolution to gradually improve the performance of the trading system. Figure 4 displays an illustration of the accumulated return of the benchmark and that of our GA-based model.
In this study, the benchmark is defined as the traditional buy-and-hold method where we allocate one's capital in equal proportion to each stock and the accumulated return is calculated as the product of the average daily returns of all the 10 stocks over the 10 years; i.
This figure shows that the GA-based model gradually outperforms the benchmark and the performance discrepancy becomes quite significant at the end of year As opposed to the buy-and-hold method that allocates one's capital in equal proportions to each stock, the GA proactively searches for the optimal proportions for long or short positions for each asset in order to construct the spread byMore...