(Including Backward Elimination and All Subsets)

Multiple regression is the simultaneous linear regression of several x columns of data (independent variables) on one y column of data (the dependent variable). The general form of the resulting equation is:

y = b_{0} + b_{1}x_{1} + b_{2}x_{2} +
b_{3}x_{3} ... b_{n}x_{n }where the b values are
the coefficients that the regression finds optimal (least squares) values for.

CoStat can do a multiple regression where all of the x columns are in the model, or a multiple regression where you specify a subset of the x columns.

Note that some of the x columns may have been created from other x columns
with CoStat's 'Transformations' procedure. For example, you could make a column
with x_{1}^{2}, or a column with x_{1}*x_{2}. In
this way, you can make model of a **"response surface"** or other more
complex models.

Often, an experimenter has a large number of x columns and wishes to know if there is a smaller, simpler model with a subset of these x columns which adequately explains the dependent variable. For a very large number of x columns, the number of possible subset models is quite large and the time needed to test and compare all models can become prohibitive. A further complication is that the importance of each x column changes depending on the other columns in the model and their order in the model. Statisticians have recommended different approaches to the problem. CoStat has procedures for two of the approaches.

One approach to this problem is in CoStat's **"Backward Elimination Multiple
Regression"** procedure. This procedure starts by testing a model which
contains all of the x columns. The x column which contributes the least to this
model (that is, the one with the lowest F value) is removed from the model and
this new model is analyzed. The column which contributes the least to this new
model is then removed and the resulting model is analyzed. Etc. The procedure
continues until only one x column remains. CoStat can generate and test all of
these models very quickly. By comparing these models, you can quickly get an
idea of approximately how well models with various numbers of x columns explain
the y column. The model chosen by the procedure for a given number of x columns
may not be the best model for that number of columns, but it will probably be
close.

Another solution to this problem is in CoStat's **"All Subsets Multiple
Regression"** procedure. Because models with more x terms are more likely to
be a better fit, it is best to directly compare just the models with a specific
number of x columns. So, this procedure tests all models with a specific number
of x columns. For example, the procedure can find the best model with three x
columns selected from a data file with six x columns. The procedure prints out
the 100 best models (the models with the highest R^{2} values). CoStat
is so fast that it can test a large number of models very quickly.

CoStat Statistics | Top