DM Stat-1 Articles
Link to Home

Link to Articles

Link to Consulting

Link to Seminar

Link to Stat-Chat

Link to Software

Link to Clients

Statistical Modeling and Analysis for Database Marketing: Effective Techniques for Mining Big Data (4th printing) -
Bruce Ratner, Ph.D.


ARTICLES for the Second Edition: See Volumes 8 - 11

Table of Contents

Chapter 1 - Introduction
1.1 The Personal Computer and Statistics
1.2 Statistics and Data Analysis
1.3 EDA
1.4 The EDA Paradigm
1.5 EDA Weaknesses
1.6 Small and Big Data 
     1.6.1 Data Size Characteristics 
     1.6.2 Data Size: Personal Observation of One
1.7 Data Mining Paradigm
1.8 Statistics and Machine Learning
1.9 Statistical Learning
1.10 References

Chapter 2 - Two Simple Data Mining Methods for Variable Assessment
2.1 Correlation Coefficient
2.2 Scatterplots
2.3 Data Mining
2.3.1 Example #1
2.3.3 Example #2
2.4 Smoothed Scatterplot
2.5 General Association Test
2.6 Summary
2.7 References

Chapter 3 - Logistic Regression: The Workhorse of Database Response Modeling
3.1 Logistic Regression Model 
     3.1.1 Illustration 
     3.1.2 Scoring a LRM
3.2 Case Study 
     3.2.1 Candidate Predictor and Dependent Variables
3.3 Logits and Logit Plots 
     3.3.1 Logits for Case Study
3.4 The Importance of Straight Data
3.5 Re-expressing for Straight Data 
     3.5.1 Ladder of Powers 
     3.5.2 Bulging Rule 
     3.5.3 Measuring Straight Data
3.6 Straight Data for Case Study
     3.6.1 Re-expressing FD2_OPEN 
     3.6.2 Re-expressing INVESTMENT
3.7 Techniques When Bulging Rule Does Not Apply
     3.7.1 Fitted Logit Plot 
     3.7.2 Smooth Predicted vs. Actual Plot
3.8 Re-expressing MOS_OPEN 
     3.8.1 Smooth Predicted vs. Actual Plot for MOS_OPEN
3.9 Assessing the Importance of Variables 
     3.9.1 Computing the G statistic 
     3.9.2 Importance of a Single Variable 
     3.9.3 Importance of a Subset of Variables 
     3.9.4 Comparing the Importance of Different Subsets of Variables
3.10 Important Variables for Case Study 
     3.10.1 Importance of the Predictor Variables
3.11 Relative Importance of the Variables 
     3.11.1 Selecting the Best Subset
3.12 Best Subset of Variables for Case Study
3.13 Visual Indicators of Goodness of Model Predictions
     3.13.1 Smooth Residual by Score Groups Plot 
          3.13.1.1 Smooth Residual by Score Groups Plot for Case Study 
      3.13.2 Smooth Actual vs. Predicted by Decile Groups Plot 
          3.13.2.1 Smooth Actual vs. Predicted by Decile Groups Plot for Case Study 
     3.13.3 Smooth Actual vs. Predicted by Score Groups Plot 
          3.13.3.1 Smooth Actual vs. Predicted by Score Groups Plot for Case Study
3.14 Evaluating the Data Mining Work 
     3.14.1 Comparision of Smooth Residual by Score Groups Plots: EDA vs.NonEDA Models 
     3.14.2 Comparison of Smooth Actual vs. Predicted by Decile Groups Plots: EDA vs. NonEDA Models
     3.14.3 Comparison of Smooth Actual vs. Predicted by Score Groups Plots: EDA vs. NonEDA Models 
     3.14.4 Summary of the Data Mining Work
3.15 Smoothing A Categorical Variable 
     3.15.1 Smoothing FD_TYPE with CHAID 
     3.15.2 Importance of CH_FTY_1 and CH_FTY_2
3.16 Additional Data Mining Work For Case Study 
     3.16.1 Comparison of Smooth Residual by Score Group Plots: 4var- vs. 3var-EDA Models 
     3.16.2 Comparison of Smooth Actual vs. Predicted by Decile Groups Plots: 4var- vs. 3var-EDA Models 
     3.16.3 Comparison of Smooth Actual vs. Predicted by Score Groups Plots: 4var- vs. 3var-EDA Models 
     3.16.4 Final Summary of the Additional Data Mining Work
3.17 Summary

Chapter 4 - Ordinary Regression: The Workhorse of Database Profit Modeling
4.1 Ordinary Regression Model 
     4.1.1 Illustration 
     4.1.2 Scoring A OLS Profit Model
4.2 Mini Case Study 
     4.2.1 Straight Data for Mini Case Study 
          4.2.1.1 Re-expressing INCOME 
          4.2.1.2 Re-expressing AGE 
     4.2.2 Smooth Predicted vs. Actual Plot 
     4.2.3 Assessing the Importance of Variables 
          4.2.3.1 Defining the F Statistic and R-squared 
          4.2.3.2 Importance of a Single Variable 
          4.2.3.3 Importance of a Subset of Variables 
          4.2.3.4 Comparing the Importance of Different Subsets of Variables
4.3 Important Variables for Mini Case Study 
     4.3.1 Relative Importance of the Variables 
     4.3.2 Selecting the Best Subset
4.4 Best Subset of Variable for Case Study 
     4.4.1 PROFIT Model with gINCOME and AGE 
     4.4.2 Best PROFIT Model
4.5 Suppressor Variable AGE
4.6 Summary

Chapter 5 - CHAID for Interpreting a Logistic Regression Model
5.1 Logistic Regression Model
5.2 Database Marketing Response Model Case Study 
     5.2.1 Odds Ratio
5.3 CHAID 
     5.3.1 Proposed CHAID-based Method
5.4 Multivariable CHAID Trees
5.5 CHAID Market Segmentation
5.6 CHAID Tree Graphs
5.7 Summary

Chapter 6 - The Importance of the Regression Coefficient
6.1 The Ordinary Regression Model
6.2 Four Questions
6.3 Important Predictor Variables
6.4 P-values and BIG Data
6.5 Returning to Question #1
6.6 Predictor Variable's Effect On Prediction
6.7 The Caveat
6.8 Returning to Question #2
6.9 Ranking Predictor Variables By Effect On Prediction
6.10 Returning to Question #3
6.11 Returning to Question #4
6.12 Summary
6.13 Reference

Chapter 7 - The Predictive Contribution Coefficient: A Measure of Predictive Importance
7.1 Background
7.2 Illustration of Decision Rule
7.3 Predictive Contribution Coefficient
7.4 Calculation of Predictive Contribution Coefficient
7.5 Extra-illustration of Predictive Contribution Coefficient
7.6 Summary
7.7 Reference

Chapter 8 - CHAID For Specifying A Model With Interaction Variables
8.1 Interaction Variables
8.2 Strategy for Modeling with Interaction Variables
8.3 Strategy Based on the Notion of a Special Point
8.4 Example of a Response Model with an Interaction Variable
8.5 CHAID for Uncovering Relationships
8.6 Illustration of CHAID for Specifying a Model
8.7 An Exploratory Look
8.8 Database Implication
8.9 Summary
8.10 Reference

Chapter 9 - Market Segment Classification Modeling With Logistic Regression
9.1 Binary Logistic Regression 
     9.1.1 Necessary Notation
9.2 Polychotomous Logistic Regression Model
9.3 Model Building With PLR
9.4 Market Segmentation Classification Model 
     9.4.1 Survey of Cellular Phone Users 
     9.4.2 CHAID Analysis 
     9.4.3 CHAID-tree Graphs 
     9.4.4 Market Segment Classification Model
9.5 Summary

Chapter 10 - CHAID As A Method For Filling In Missing Values
10.1 Introduction to the Problem of Missing Data
10.2 Missing-data Assumption
10.3 CHAID Imputation
10.4 Illustration 
     10.4.1 CHAID Mean-value Imputation for a Continuous Variable 
     10.4.2 Many Mean-value CHAID Imputations for a Continuous Variable 
     10.4.3 Regression-tree Imputation for LIF_DOL
10.5 CHAID Most-likely Category Imputation for a Categorical Variable 
     10.5.1 CHAID Most-likely Category Imputation for GENDER 
     10.5.2 Classification-tree Imputation for GENDER
10.6 Summary
10.7 Reference

Chapter 11 - Identifying Your Best Customers: Descriptive, Predictive and Look-Alike Profiling
11.1 Some Definitions
11.2 Illustration of a Flawed Targeting Effort
11.3 Well-Defined Targeting Effort
11.4 Predictive Profiles
11.5 Continuous Trees
11.6 Look-Alike Profiling
11.7 Look-Alike Tree Characteristics
11.8 Summary

Chapter 12 - Assessment of Database Marketing Models
12.1 Accuracy for Response Model
12.2 Accuracy for Profit Model 12.3 Decile Analysis and Cum Lift for Response Model
12.3 Decile Analysis and Cum Lift for Response Model
12.4 Decile Analysis and Cum Lift for Profit Model
12.5 Precision for Response Model
12.6.Construction of SWMAD
12.7 Separability for Response and Profit Models
12.8 Guidelines for Using Cum Lift, HL/SWMAD and CV
12.9 Summary

Chapter 13 - Bootstrapping in Database Marketing:A New Approach For Validating Models
13.1 Traditional Model Validation
13.2 Illustration
13.3 Three Questions
13.4 The Bootstrap 
     13.4.1 Traditional Construction of Confidence Intervals
13.5 How To Bootstrap 
     13.5.1 Simple Illustration
13.6 Bootstrap Decile Analysis Validation
13.7 Another Question
13.8 Bootstrap Assessment of Model Implementation Performance 
     13.8.1 Illustration
13.9 Bootstrap Assessment of Model Efficiency
13.10 Summary
13.11 Reference

Chapter 14 - Visualization of Database Models
14. 1 Brief History of the Graph
14.2 Star Graph Basics 
     14.2.1 Illustration
14.3 Star Graphs for Single Variables
14.4 Star Graphs for Many Variables Considered Jointly
14.5 Profile Curves Method
     14.5.1 Profile Curves Basics 
     14.5.2 Profile Analysis
14.6 Illustration 
     14.6.1 Profile Curves for RESPONSE Model 
     14.6.2 Decile-Group Profile Curves
14.7 Summary
14.8 SAS Code for Star Graphs for Each Demographic Variable about the Deciles
14.9 SAS Code for Star Graphs for Each Decile About the Demographic Variables
14.10 SAS Code for Profile Curves: All Deciles
14.11 Reference

Chapter 15 - Genetic Modeling in Database Marketing: The GenIQ Model
15.1 What Is Optimization?
15.2 What Is Genetic Modeling ?
15.3 Genetic Modeling: An Illustration 
     15.3.1 Reproduction 
     15.3.2 Crossover 
     15.3.3 Mutation
15.4 Parameters for Controlling A Genetic Model Run
15.5 Genetic Modeling : Strengths and Limitations
15.6 Goals of Modeling in Database Marketing
15.7 The GenIQ Response Model
15.8 The GenIQ Profit Model
15.9 Case Study - Response Model
15.10 Case Study - Profit Model
15.11 Summary
15.12 Reference

Chapter 16 - Finding The Best Variables For Database Marketing Models
16.1 Background
16.2 Weakness in the Variable Selection Methods
16.3 Goals of Modeling In Database Marketing
16.4 Variable Selection With GenIQ 
     16.4.1 GenIQ Modeling 
     16.4.2 GenIQ-Structure Identification 
     16.4.3 GenIQ Variable Selection
16.5 Nonlinear Alternative To Logistic Regression Model
16.6 Summary
16.7 Reference

Chapter 17 - Interpretation of Coefficient-free Models
17.1 The Linear Regression Coefficient 
     17.1.2 Illustration for the Simple Ordinary Regression Model
17.2 The Quasi-Regression Coefficient for Simple Regression Models 
     17.2.1 Illustration of Quasi-RC for the Simple Ordinary Regression Model 
     17.2.2 Illustration of Quasi-RC for the Simple Logistic Regression Model 
     17.2.3 Illustration of Quasi-RC for Nonlinear Predictions
17.3 Partial Quasi-RC for The Everymodel 
     17.3.1 Calculating the Partial Quasi-RC for The Everymodel 
     17.3.2 Illustration for the Multiple Logistic Regression Model
17.4 Quasi-RC for A Coefficient-free Model 
     17.4.1 Illustration of Quasi-RC for a Coefficient-free Model
17.5 Summary



Go back to Home Page.


For more information about this article, call Bruce at 516.791.3544, or
1 800 DM STAT-1, or e-mail at br@dmstat1.com.