Testing and Optimizing
Trading Systems
Ron Schoenberg, Ph.D.
Alan B. Corwin
Trading Desk Strategies, LLC
August 21, 2009
Any trading system that can be back tested by computer simulation contains settings that must be adjusted for best results. This analysis must resolve differing market conditions as well as multiple objectives, for example, profit versus risk. The most common practice for determining optimal settings is brute force back testing. This method is tedious and not guaranteed to produce an optimal result. In this article we introduce the use of the well established Design of Experiments to accomplish this task (http://en.wikipedia.org/wiki/Optimal_design). This method is efficient, feasible even for trading systems that are computer intensive or time consuming to simulate. But more importantly, significant information about the trading systems is produced that can be used to improve them.
Design of experiments technology can do more than any other method to gain knowledge and perspective about your trading system and to make it better and more profitable.
Alternative Methods for
Testing and Improving Trading Systems
Brute Force Back Testing With this method the trader tries reasonable settings on historical data and looks to see what happens. If they’re not satisfied, they try different settings. If it gets worse they adjust them and try again. With any luck a version emerges that produces a profit. With this method, however, there can be no expectation of ever finding the optimal settings. Nor is it easy to resolve issues regarding the multiple objectives and market conditions
Bottom line: A lot of time and effort is required, and the likelihood of finding the best settings is nil.
Brute Force Grid Search This is a more sophisticated approach to finding optimal settings for a trading system and one that conceivably could find optimal settings. In this system, all possible settings are tried. If a setting is quantitative, different values can be selected over a range. Suppose there are 5 quantitative settings for a trading system, and 10 possible values are selected over a chosen range for each of them. The resulting search is over a hyper-cube with 5 dimensions. The search of this hyper-cube would require 10^5 or 10000 runs of the trading system on the historical data. This may, or more likely may not, be practical. If each run of the data takes one minute, and you have access to only one computer, then the entire study would take 10000 minutes or 6.9 days. If it took more than a minute, you might be sitting there for weeks for the final results.
One solution to this problem would be to use a parallel cluster of computers. Setting up the problem could be tedious, and getting access to all that computer hardware might be costly. A cloud would be most efficient, but the computer programming gets even more tedious.
The resource consuming nature of this method precludes resolving other issues such as market conditions and multiple objectives. The solutions to these problems require additional runs increasing the size of the problem exponentially.
Bottom line: While the possibility of finding the optimal answer is far more likely, the time and computer resources required are vastly greater.
Genetic Algorithms This is a more sophisticated version of the Brute Force Back Testing method. With this method a computer program systematically tries different values of the settings, determining new values to try based on a statistical model. This process can be quite slow requiring many evaluations of the model and therefore can be time-consuming. It does, however, increase the likelihood of finding optimal settings. This process has to be repeated for each market condition and objective outcome
Bottom line: While time-consumption and computer resources are less than that for the Brute Force Grid Search, they remain significant.
Design of Experiments
and Response Surface Analysis
The DOE, or Design of Experiments, is a well-established method for fine-tuning industrial processes that has applicability to optimizing financial trading systems. This method is especially useful where trial runs are expensive, time-consuming, and computer intensive. A statistical criterion is applied to select a very few trial settings from the hyper-cube defining the setting space. Thus instead of 10000 possible trial runs, there might be 24 trial runs. This dramatically reduces time and required computer resources. The results of these few trial runs would be fitted to a response surface over which a grid search is conducted to find the optimal settings for the trading system.
For multiple objectives, such as finding settings that maximize profits and minimize risk, separate trial runs are not required. Each of the outcomes under scrutiny is measured in the same set of trial runs. The resolution of the outcomes, i.e., finding the settings that optimize for both outcomes, comes at the analysis of the response surface which is the easy part of the task.
Resolving different market conditions does require multiple experiments, that is, separate sets of trial runs, increasing the very part of the task that is the most difficult and time consuming. However, the reduction of this to sets of 24 trials down from sets of 10000 trials, makes this kind of analysis feasible.
Bottom line: Using far less time and computer resources, all optimal solutions are found. The response surface analysis also has the additional benefit of providing information on the correlation and interaction of the settings in determining the outcome of the trading system.
How it is done First a client establishes the “factors”, the different settings in the trading system that require investigation. Minimum and maximum values are chosen for each of these factors. The number of factors and their minimum and maximums are sent to Trading Desk Strategies. A design matrix, a group of parameter settings for the trial runs, is generated that satisfies the I-Optimal statistical criterion for greatest efficiency. The more factors in the system, the more trials would be necessary, but in any case they are far fewer than what is required by any of the other methods. The client collects the results from the trial runs -- profit/loss, risk measures, or any other relevant outcome -- and sends them to us. Nothing about the trading system would need to be revealed to us. All we use are the values of the settings for each trial run along with the results. A response surface is computed, and a grid search over it is conducted for the “sweet spots”. The client then gets a full report indicating the optimal values of the settings and an analysis of the multiple objectives. For example, both the optimal settings for a profit might be reported along with an optimal set adjusted for risk.
Running the trials is the most difficult part of the analysis. It requires access to historical data and computer programs have to be written to simulate the trading strategy. Trading Desk Strategies would be happy to work out the conduct of the trials, securing historical data and writing the computer programs, in addition to the other aspects of the investigation, determining factors, generating the design matrix, as well analyzing the results.
For more information contact us at abcorwin@optionbots.com.
Example
A client developed a complex trading strategy involving groups of options on the S&P 500. The basic components were what were called “bookends” and “fillers”. There are three kinds of bookends, a First Bookend, a Second Bookend and a Third Bookend. These groups of options, or spreads, are opened in a sequence starting about 65 days from expiry. A computer program reading option chains from a data feed generates trading signals in real time.
The critical settings are
Three “experiments” or groups of trial runs were conducted at the May 2009, June 2009, and July 2009 expiries. “Sweet spots” or optimal settings were found by grid search for each of them:
May
liquidation constant
0.215
plateau size 13.7
distanceToUnderWeight
0.322
creditToMaxExposureWeight 3.8
first minCreditToMaxExposure 11.815
second minCreditToMaxExposure 11.603
June
liquidation constant 0.215
plateau size 8.07
distanceToUnderWeight 0.322
creditToMaxExposureWeight 0.37
first minCreditToMaxExposure 4.021
second minCreditToMaxExposure 11.603
July
liquidation constant
0.935
plateau size 13.7
distanceToUnderWeight
0.322
creditToMaxExposureWeight 0.37
first minCreditToMaxExposure 4.021
second minCreditToMaxExposure 11.603
Overall
Sweet Spot
liquidation constant
0.935
plateau size 13.7
distanceToUnderWeight
0.322
creditToMaxExposureWeight 0.371
first minCreditToMaxExposure 11.815
second minCreditToMaxExposure 11.604
The overall sweet spot was produced by a grid search that maximized the profit from all three expiries simultaneously. There are other ways of doing this. If the market condition associated with one of the expiries is unusual it could be weighted less. Ideally a weighting scheme would be used that is proportional to the frequency of a market condition. In the present case equal weighting was employed.
We can see from the variation in the settings by expiry that they are sensitive market conditions. The individual sweet spots produced the following results:
May $780
June -$1771
July $9144
Total $8154
The overall sweet spot produced the following results:
May -$22230
June -$11646
July $3894
Total -$29982
Clearly the issue of the differing market conditions hasn’t been resolved. We could try some sort of weighting scheme. Alternatively, a relationship of the variation of these parameters with the VIX looks plausible. The VIX tracked these three months closely and it might be useful to have some of the settings track the VIX. On that basis we let the “liquidation constant” be linearly inverse to the VIX, and let the first MinCreditToMaxExposure and creditToMaxExposureWeight be positively linear with respect to the VIX. Such a scheme produced the following results:
May $760
June -$2004
July $6504
Total $5260
This result isn’t as good as the individual sweet spots, but it has the advantage of not having to predict the market conditions.
Conclusion
Design of Experiments is a feasible means for finding optimal settings for trading systems that can be back-tested through computer simulation on historical data. It is the only feasible means for trading systems that are heavy consumers of time and resources. Using a statistical criterion a design matrix of parameter settings is selected that efficiently sample the parameter space dramatically improving the task of searching for the optimal settings.
The efficient use of time and resources allows for further investigation of the trading system. Multiple objectives can be analyzed, and multiple experiments become feasible permitting improvements in the model for handling changing market conditions.
Because we save large amounts of time using DOE, we can run
more experiments, gain more knowledge and perspective from each experiment, and
discover more and more ways to incrementally improve the model.