Testing and Optimizing Trading Systems

 

Ron Schoenberg, Ph.D.

ronschoenberg@optionbots.com

Alan B. Corwin

abcorwin@optionbots.com

Trading Desk Strategies, LLC

August 21, 2009

 

 

Any trading system that can be back tested by computer simulation contains settings that must be adjusted for best results.   This analysis must resolve differing market conditions as well as multiple objectives, for example, profit versus risk.   The most common practice for determining optimal settings is brute force back testing.    This method is tedious and not guaranteed to produce an optimal result.  In this article we introduce the use of the well established Design of Experiments to accomplish this task (http://en.wikipedia.org/wiki/Optimal_design).  This method is efficient, feasible even for trading systems that are computer intensive or time consuming to simulate.  But more importantly, significant information about the trading systems is produced that can be used to improve them. 

Design of experiments technology can do more than any other method to gain knowledge and perspective about your trading system and to make it better and more profitable.

 

Alternative Methods for Testing and Improving Trading Systems

Brute Force Back Testing   With this method the trader tries reasonable settings on historical data and looks to see what happens.  If they’re not satisfied, they try different settings.  If it gets worse they adjust them and try again.  With any luck a version emerges that produces a profit.   With this method, however, there can be no expectation of ever finding the optimal settings.  Nor is it easy to resolve issues regarding the multiple objectives and market conditions

Bottom line:  A lot of time and effort is required, and the likelihood of finding the best settings is nil.

Brute Force Grid Search   This is a more sophisticated approach to finding optimal settings for a trading system and one that conceivably could find optimal settings.  In this system, all possible settings are tried.  If a setting is quantitative, different values can be selected over a range.  Suppose there are 5 quantitative settings for a trading system, and 10 possible values are selected over a chosen range for each of them.  The resulting search is over a hyper-cube with 5 dimensions.  The search of this hyper-cube would require 10^5 or 10000 runs of the trading system on the historical data.  This may, or more likely may not, be practical.  If each run of the data takes one minute, and you have access to only one computer, then the entire study would take 10000 minutes or 6.9 days.  If it took more than a minute, you might be sitting there for weeks for the final results.

One solution to this problem would be to use a parallel cluster of computers.  Setting up the problem could be tedious, and getting access to all that computer hardware might be costly.  A cloud would be most efficient, but the computer programming gets even more tedious. 

The resource consuming nature of this method precludes resolving other issues such as market conditions and multiple objectives.  The solutions to these problems require additional runs increasing the size of the problem exponentially. 

Bottom line:  While the possibility of finding the optimal answer is far more likely, the time and computer resources required are vastly greater.

Genetic Algorithms   This is a more sophisticated version of the Brute Force Back Testing method.  With this method a computer program systematically tries different values of the settings, determining new values to try based on a statistical model.    This process can be quite slow requiring many evaluations of the model and therefore can be time-consuming.  It does, however, increase the likelihood of finding optimal settings.  This process has to be repeated for each market condition and objective outcome

Bottom line:  While time-consumption and computer resources are less than that for the Brute Force Grid Search, they remain significant. 

 

Design of Experiments and Response Surface Analysis

The DOE, or Design of Experiments, is a well-established method for fine-tuning industrial processes that has applicability to optimizing financial trading systems.  This method is especially useful where trial runs are expensive, time-consuming, and computer intensive.  A statistical criterion is applied to select a very few trial settings from the hyper-cube defining the setting space.  Thus instead of 10000 possible trial runs, there might be 24 trial runs.   This dramatically reduces time and required computer resources.  The results of these few trial runs would be fitted to a response surface over which a grid search is conducted to find the optimal settings for the trading system. 

For multiple objectives, such as finding settings that maximize profits and minimize risk, separate trial runs are not required.  Each of the outcomes under scrutiny is measured in the same set of trial runs.  The resolution of the outcomes, i.e., finding the settings that optimize for both outcomes, comes at the analysis of the response surface which is the easy part of the task.

Resolving different market conditions does require multiple experiments, that is, separate sets of trial runs, increasing the very part of the task that is the most difficult and time consuming.  However, the reduction of this to sets of 24 trials down from sets of 10000 trials, makes this kind of analysis feasible. 

Bottom line:  Using far less time and computer resources, all optimal solutions are found.  The response surface analysis also has the additional benefit of providing information on the correlation and interaction of the settings in determining the outcome of the trading system.

 

How it is done    First a client establishes the “factors”, the different settings in the trading system that require investigation.  Minimum and maximum values are chosen for each of these factors.  The number of factors and their minimum and maximums are sent to Trading Desk Strategies.   A design matrix, a group of parameter settings for the trial runs, is generated that satisfies the I-Optimal statistical criterion for greatest efficiency.  The more factors in the system, the more trials would be necessary, but in any case they are far fewer than what is required by any of the other methods.  The client collects the results from the trial runs -- profit/loss, risk measures, or any other relevant outcome -- and sends them to us.  Nothing about the trading system would need to be revealed to us.  All we use are the values of the settings for each trial run along with the results.  A response surface is computed, and a grid search over it is conducted for the “sweet spots”.  The client then gets a full report indicating the optimal values of the settings and an analysis of the multiple objectives.  For example, both the optimal settings for a profit might be reported along with an optimal set adjusted for risk.   

Running the trials is the most difficult part of the analysis.  It requires access to historical data and computer programs have to be written to simulate the trading strategy.  Trading Desk Strategies would be happy to work out the conduct of the trials, securing historical data and writing the computer programs, in addition to the other aspects of the investigation, determining factors, generating the design matrix, as well analyzing the results.

For more information contact us at abcorwin@optionbots.com.

 

Example

A client developed a complex trading strategy involving groups of options on the S&P 500.  The basic components were what were called “bookends” and “fillers”.   There are three kinds of bookends, a First Bookend, a Second Bookend and a Third Bookend.  These groups of options, or spreads, are opened in a sequence starting about 65 days from expiry.  A computer program reading option chains from a data feed generates trading signals in real time.   

The critical settings are

 

Three “experiments” or groups of trial runs were conducted at the May 2009, June 2009, and July 2009 expiries.  “Sweet spots” or optimal settings were found by grid search for each of them:

May

liquidation constant            0.215

plateau size                    13.7

distanceToUnderWeight           0.322

creditToMaxExposureWeight       3.8

first minCreditToMaxExposure    11.815

second minCreditToMaxExposure   11.603

 

 

June

liquidation constant            0.215

plateau size                    8.07

distanceToUnderWeight           0.322

creditToMaxExposureWeight       0.37

first minCreditToMaxExposure    4.021

second minCreditToMaxExposure   11.603

 

July

liquidation constant            0.935

plateau size                    13.7

distanceToUnderWeight           0.322

creditToMaxExposureWeight       0.37

first minCreditToMaxExposure    4.021

second minCreditToMaxExposure   11.603

 

Overall Sweet Spot

liquidation constant            0.935  

plateau size                    13.7 

distanceToUnderWeight           0.322  

creditToMaxExposureWeight       0.371  

first minCreditToMaxExposure    11.815 

second minCreditToMaxExposure   11.604 

 

The overall sweet spot was produced by a grid search that maximized the profit from all three expiries simultaneously.  There are other ways of doing this.  If the market condition associated with one of the expiries is unusual it could be weighted less.  Ideally a weighting scheme would be used that is proportional to the frequency of a market condition.  In the present case equal weighting was employed.

We can see from the variation in the settings by expiry that they are sensitive market conditions.  The individual sweet spots produced the following results:

May       $780

June      -$1771

July        $9144

Total      $8154

 

The overall sweet spot produced the following results:

May       -$22230

June      -$11646

July        $3894

Total      -$29982

 

Clearly the issue of the differing market conditions hasn’t been resolved.  We could try some sort of weighting scheme.  Alternatively, a relationship of the variation of these parameters with the VIX looks plausible. The VIX tracked these three months closely and it might be useful to have some of the settings track the VIX.  On that basis we let the “liquidation constant” be linearly inverse to the VIX, and let the first MinCreditToMaxExposure and creditToMaxExposureWeight be positively linear with respect to the VIX.  Such a scheme produced the following results:

May       $760

June      -$2004

July        $6504

Total      $5260

 

This result isn’t as good as the individual sweet spots, but it has the advantage of not having to predict the market conditions. 

 

Conclusion

 

Design of Experiments is a feasible means for finding optimal settings for trading systems that can be back-tested through computer simulation on historical data.  It is the only feasible means for trading systems that are heavy consumers of time and resources.  Using a statistical criterion a design matrix of parameter settings is selected that efficiently sample the parameter space dramatically improving the task of searching for the optimal settings. 

 

The efficient use of time and resources allows for further investigation of the trading system.  Multiple objectives can be analyzed, and multiple experiments become feasible permitting improvements in the model for handling changing market conditions.

 

Because we save large amounts of time using DOE, we can run more experiments, gain more knowledge and perspective from each experiment, and discover more and more ways to incrementally improve the model.