A Simple Bootstrap Variable Selection Method for Building
Database Marketing Models
Bruce Ratner, Ph.D.
Variable selection - determining which independent variables to include in a model - is a vital part of the model building process. Most data analysts use the well-known variable selection approaches, such as forward selection that includes one-by-one variables that contribute to the prediction of the target variable (binary/response for logistic regression; continuous/profit for ordinary least squares regression) until no additional variable contributes any significant improvement in the model's prediction. Not as well-known is the variable selection methods produce suboptimal models: either omitting an important (necessary) predictor variable producing biased predictions, or including an unnecessary variable producing large (unstable) prediction errors. The purpose of this article is to use in tandem the bootstrap and the variable selection methods for a less biased and more stable variable selection methodology. Two case studies are presented using response and profit database marketing models.
1. When Data Are Too Large to Handle in the Memory of Your Computer
2. Creating A Bootstrap Sample