Simple linear regresion and multiple regression

Multiple regression by David M. Lane Hyperstat Online.

More about regression analysis

Automatic selection procedures

Often the researcher does not know witch independent variables should be taken into the model. The selection of the predictors should always start from the substantial knowledge and the results of the correlation's and the scatter diagrams. In the computers are some automatic procedures to help the researcher to select the predictors, but the results should be read carefully and with the knowledge of the substance.

1. Forward

- The variable, that has the strongest (and significant) correlation is taken into the model.

- Calculate the partial correlation between the variables not in the equation and the model. The variable with the strongest (and significant) correlation in taken into the model.

- The procedure continues until there is no significant partial correlation's.

2. Backward

- Estimate the model with all the predictors. If all the regression coefficients are significant, the procedure stops.

- The predictor with the smallest significance in the regression coefficient test is eliminated from the model.

- The procedure stops, when all the regression coefficients are significant.

3. Stepwise

- Starts like the forward procedure, but in every step all the regression coefficients must be significant, or the variable is eliminated.

Multicollinearity

The predictors should not correlate. In the stepwise selection one variable might take the prediction of the another variable into the model and the second variable will not be taken in to the model. The variables might get also inconsistent correlation coefficients (negative if the effect is positive).

The problem should be firstly handled from the substantial point of view. Why take almost the same predictor into the model twice. The researcher can try to combine the variables (for example the body mass index from the length and weight). If the correlated variables must be entered into the model, the researcher can use the biased regression coefficient estimation methods (for example ridge-estimation).

The usual regression coefficient estimates can be calculated from the formula

ß = (XTX)-1 XTy

where X is the matrix of the prediction variables, y is the vector of the dependent variable, T is the transformation of the matrix and -1 is the inverse matrix.

ß = (XTX)-1 XTy

In the multicollinear situation the XTX is almost singular (the inverse matrix is inconsistency). We can add there a diagonal constant matrix.

where k is the ridge parameter and

I is the unit matrix (in the diagonal 1 and else 0).