4.3 Feature Selection
After preprocessing and feature engineering, 653 features were present in the data. Using boruta, 58 features were identified as important, resulting in a 91 % reduction of the number of features. For details, refer to notebook 5_Feature_Extraction.ipynb18.
Three groups of features were selected. By crosschecking with the dataset dictionary (Appendix B.2), their meaning could be identified:
Features from the giving history broadly correspond to those used in classical RFM models mentioned in literature (Section 1). It is reassuring to find them among the all-relevant features:
- Donation amount for promotion 14
- Summary features: All-time donation amount, all-time number of donations, smallest, average and largest donation, donation amount of most recent donation
- 24 Features on frequency and amount of donations as per the date of past donations
- Time since first donation, Time since largest donation, time since last donation
- Number of donations in response to card promotions
- Number of months between first and second donation
- An indicator for star donors
The promotion history features can be interpreted as a measure of the importance of the examples to the organization. Those who receive many promotions are deemed valuable:
- Number of promotions received
- Number promotions in last 12 months before the current promotion
- Number of card promotions received
- Number of card promotions in last 12 months before the current promotion
Features from the US census data are concerned with the social status and wealth of the neighborhood of donors and by intuition make sense to be deemed relevant.
- Median and average home value
- Percentage of home values above some threshold (5 features)
- Percentage of renters paying more than 500 $
- DMA (designated market area, a geographical grouping)
- Median / average family / house income
- Per capita income
- Percentage of households with interest, rental or dividend income
- Percentage of adults with a Bachelor’s degree
- Percentage of people born in state of residence