Customer Segmentation and Behavioral Analysis
Analyzing Customer Cohorts
Sample analysis of customer cohort for improved marketing targeting. By identify the most valuable customer segments, audience targeting can be improved along with the creation of look a like audiences.
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Data Wrangling
First, remove negative values, null values, and incomplete months
## [1] 541909 8
## [1] 531285 8
## [1] 397924 8
## [1] "Date Range: 2010-12-01 08:26:00 ~ 2011-12-09 12:50:00"
## [1] 397924 8
## [1] 380620 8
Data Normilzation
## CustomerID TotalSales.V1 OrderCount.V1 AvgOrderValue.V1
## Min. :12346 Min. :-1.7314464 Min. :-1.7314464 Min. :-1.7314464
## 1st Qu.:13815 1st Qu.:-0.8660254 1st Qu.:-0.8657232 1st Qu.:-0.8657232
## Median :15300 Median : 0.0000000 Median : 0.0000000 Median : 0.0000000
## Mean :15302 Mean : 0.0000000 Mean : 0.0000000 Mean : 0.0000000
## 3rd Qu.:16781 3rd Qu.: 0.8657232 3rd Qu.: 0.8657232 3rd Qu.: 0.8657232
## Max. :18287 Max. : 1.7314464 Max. : 1.7314464 Max. : 1.7314464
## CustomerID TotalSales OrderCount AvgOrderValue
## 1720.983 1.000 1.000 1.000
Customer Cluster Visualizations
## TotalSales OrderCount AvgOrderValue
## 1 1.2058544 1.0068055 0.8683349
## 2 -0.1314289 -0.8520880 0.7984689
## 3 0.2234497 0.7188893 -0.6402441
## 4 -1.2412204 -0.7892257 -1.0603240
## # A tibble: 4 × 2
## Cluster Count
## <int> <int>
## 1 1 1134
## 2 2 1062
## 3 3 943
## 4 4 1159
ggplot(normalizedDF, aes(x=TotalSales, y=OrderCount, color=Cluster)) +
geom_point()
ggplot(normalizedDF, aes(x=TotalSales, y=AvgOrderValue, color=Cluster)) +
geom_point()
Cluster Silhouette
## [1] "Silhouette Score for 4 Clusters: 0.4120"
## [1] "Silhouette Score for 5 Clusters: 0.3761"
## [1] "Silhouette Score for 6 Clusters: 0.3773"
## [1] "Silhouette Score for 7 Clusters: 0.3915"
## [1] "Silhouette Score for 8 Clusters: 0.3808"
# Interpreting customer segments
cluster <- kmeans(normalizedDF[c("TotalSales", "OrderCount", "AvgOrderValue")], 4)
normalizedDF$Cluster <- cluster$cluster
# count per cluster
normalizedDF %>% group_by(Cluster) %>% summarise(Count=n())
## # A tibble: 4 × 2
## Cluster Count
## <int> <int>
## 1 1 950
## 2 2 1136
## 3 3 1150
## 4 4 1062
# cluster centers
cluster$centers
## TotalSales OrderCount AvgOrderValue
## 1 0.2132451 0.7112607 -0.6432176
## 2 1.2059015 1.0076634 0.8661868
## 3 -1.2460083 -0.7960747 -1.0616569
## 4 -0.1314289 -0.8520880 0.7984689