
A
Database Marketing Model
for Zeroinflated Data Bruce Ratner, Ph.D. The problem of modeling
data with missing values is well known to data analysts. Data analysts
know that almost all standard statistical modeling techniques require
complete data, and accordingly discard individuals with missing data.
They make every effort to impute the missing data values. A common
approach is to "zeroinflate" the data by replacing missing values with
zeros. For binary variables and dummified categorical variables, say,
representing participation in lifestyle activities, which assume 1 or 0
if an individual does or does not participate in a given lifestyle
activity, respectively, missingvalue individuals would have zeros. The
working assumption is the missingvalue individuals are nonparticipants
of the corresponding lifestyle activities. Similarly, for continuous
variables, say, representing a count activity (e.g., number of visits)
or dollar amount, missingvalue individuals would have zeros, implying
they have no activity or a zero dollar value. Zeroinflated data
clearly do not meet the bellshaped data distributional assumption of
the standard statistical modeling techniques. The zeroinflated data
approach empirically has been justified by producing good model results
in the majority.
The purpose of this
article is to present a distributionfree alternative to regression
modeling with zeroinflated data, which are either due to imputation as
discussed above, or actually observed. The GenIQ Model,
which is based on the machine learning method of genetic programming,
theoretically accepts zeroinflated data, and thus offers optimal model
results. Two case studies are presented using response and profit
database marketing models.

