CoStat Statistics  

Multiple Regression in CoStat
(Including Backward Elimination and All Subsets)

Multiple regression is the simultaneous linear regression of several x columns of data (independent variables) on one y column of data (the dependent variable). The general form of the resulting equation is:

y = b0 + b1x1 + b2x2 + b3x3 ... bnxn where the b values are the coefficients that the regression finds optimal (least squares) values for.

CoStat can do a multiple regression where all of the x columns are in the model, or a multiple regression where you specify a subset of the x columns.

Note that some of the x columns may have been created from other x columns with CoStat's 'Transformations' procedure. For example, you could make a column with x12, or a column with x1*x2. In this way, you can make model of a "response surface" or other more complex models.

Often, an experimenter has a large number of x columns and wishes to know if there is a smaller, simpler model with a subset of these x columns which adequately explains the dependent variable. For a very large number of x columns, the number of possible subset models is quite large and the time needed to test and compare all models can become prohibitive. A further complication is that the importance of each x column changes depending on the other columns in the model and their order in the model. Statisticians have recommended different approaches to the problem. CoStat has procedures for two of the approaches.

One approach to this problem is in CoStat's "Backward Elimination Multiple Regression" procedure. This procedure starts by testing a model which contains all of the x columns. The x column which contributes the least to this model (that is, the one with the lowest F value) is removed from the model and this new model is analyzed. The column which contributes the least to this new model is then removed and the resulting model is analyzed. Etc. The procedure continues until only one x column remains. CoStat can generate and test all of these models very quickly. By comparing these models, you can quickly get an idea of approximately how well models with various numbers of x columns explain the y column. The model chosen by the procedure for a given number of x columns may not be the best model for that number of columns, but it will probably be close.

Another solution to this problem is in CoStat's "All Subsets Multiple Regression" procedure. Because models with more x terms are more likely to be a better fit, it is best to directly compare just the models with a specific number of x columns. So, this procedure tests all models with a specific number of x columns. For example, the procedure can find the best model with three x columns selected from a data file with six x columns. The procedure prints out the 100 best models (the models with the highest R2 values). CoStat is so fast that it can test a large number of models very quickly.


CoStat Statistics | Top
All material Copyright © 1996-2001 CoHort Software. All rights reserved.