Marketing Analytics Projects

Customer Segmentation and Behavioral Analysis

Analyzing Customer Cohorts

Sample analysis of customer cohort for improved marketing targeting. By identify the most valuable customer segments, audience targeting can be improved along with the creation of look a like audiences.

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Data Wrangling

First, remove negative values, null values, and incomplete months

## [1] 541909      8

## [1] 531285      8

## [1] 397924      8

## [1] "Date Range: 2010-12-01 08:26:00 ~ 2011-12-09 12:50:00"

## [1] 397924      8

## [1] 380620      8

Data Normilzation

##    CustomerID       TotalSales.V1        OrderCount.V1       AvgOrderValue.V1  
##  Min.   :12346   Min.   :-1.7314464   Min.   :-1.7314464   Min.   :-1.7314464  
##  1st Qu.:13815   1st Qu.:-0.8660254   1st Qu.:-0.8657232   1st Qu.:-0.8657232  
##  Median :15300   Median : 0.0000000   Median : 0.0000000   Median : 0.0000000  
##  Mean   :15302   Mean   : 0.0000000   Mean   : 0.0000000   Mean   : 0.0000000  
##  3rd Qu.:16781   3rd Qu.: 0.8657232   3rd Qu.: 0.8657232   3rd Qu.: 0.8657232  
##  Max.   :18287   Max.   : 1.7314464   Max.   : 1.7314464   Max.   : 1.7314464

##    CustomerID    TotalSales    OrderCount AvgOrderValue 
##      1720.983         1.000         1.000         1.000

Customer Cluster Visualizations

##   TotalSales OrderCount AvgOrderValue
## 1  1.2058544  1.0068055     0.8683349
## 2 -0.1314289 -0.8520880     0.7984689
## 3  0.2234497  0.7188893    -0.6402441
## 4 -1.2412204 -0.7892257    -1.0603240

## # A tibble: 4 × 2
##   Cluster Count
##     <int> <int>
## 1       1  1134
## 2       2  1062
## 3       3   943
## 4       4  1159

ggplot(normalizedDF, aes(x=TotalSales, y=OrderCount, color=Cluster)) +
  geom_point()

ggplot(normalizedDF, aes(x=TotalSales, y=AvgOrderValue, color=Cluster)) +
  geom_point()

Cluster Silhouette

## [1] "Silhouette Score for 4 Clusters: 0.4120"
## [1] "Silhouette Score for 5 Clusters: 0.3761"
## [1] "Silhouette Score for 6 Clusters: 0.3773"
## [1] "Silhouette Score for 7 Clusters: 0.3915"
## [1] "Silhouette Score for 8 Clusters: 0.3808"

# Interpreting customer segments
cluster <- kmeans(normalizedDF[c("TotalSales", "OrderCount", "AvgOrderValue")], 4)
normalizedDF$Cluster <- cluster$cluster
# count per cluster
normalizedDF %>% group_by(Cluster) %>% summarise(Count=n())

## # A tibble: 4 × 2
##   Cluster Count
##     <int> <int>
## 1       1   950
## 2       2  1136
## 3       3  1150
## 4       4  1062

# cluster centers
cluster$centers

##   TotalSales OrderCount AvgOrderValue
## 1  0.2132451  0.7112607    -0.6432176
## 2  1.2059015  1.0076634     0.8661868
## 3 -1.2460083 -0.7960747    -1.0616569
## 4 -0.1314289 -0.8520880     0.7984689