FLO Customer Segmentation with RFM

Yasemin Derya Dilli
4 min readNov 19, 2023

--

FLO is a large shoe company with both online and offline stores in Turkey. The company is looking to introduce a new shoe brand into its portfolio. Using the dataset, we will systematically reach out to female customers and loyal customers, creating special campaigns for them.

Data Understanding ve Preparation


import datetime as dt
pd.set_option("display.max_columns", 10)
pd.set_option( "display.float_format", lambda x : '%.3f' % x )
import pandas as pd
df_ = pd.read_csv('flo_data_20k.csv')
df= df_.copy()

# a. First 10 observations,
df.head(10)
# b. Variable Names,
df.columns
# c. Descriptive Statistics,
df.describe().T
# d. Missing Value,
df.isnull().sum()
# e. Variable Types.
df.info()
master_id: Unique customer number 
order_channel: Platform where the purchase was made
last_order_channel: The channel used for the latest purchase
first_order_date: The date of the customer's first purchase
last_order_date: The date of the customer's last purchase
last_order_date_online: The date of the customer's last online purchase
last_order_date_offline: The date of the customer's last offline purchase
order_num_total_ever_online: The total number of purchases made by the customer online
order_num_total_ever_offline: The total number of purchases made by the customer offline
customer_value_total_ever_offline: The total amount paid by the customer in offline purchases
customer_value_total_ever_online: The total amount paid by the customer in online purchases
interested_in_categories_12: List of categories the customer has shopped in the last 12 months

“Omnichannel” refers to customers who shop from both online and offline platforms. I am creating new variables to track the shopping count and spending for each customer.



df['order_num_total'] = df['order_num_total_ever_online'] + df['order_num_total_ever_offline']
df['customer_value_total'] = df['customer_value_total_ever_offline'] + df['customer_value_total_ever_online']

I examined variable names and changed the data type of variables representing dates to ‘date’



date_columns= df.columns[df.columns.str.contains('date')]
df[date_columns]= df[date_columns].apply(pd.to_datetime)
df.info()

According to order channel, I looked distribution of customer count, total quantity of products purchased, total spending.


df.groupby('order_channel').agg({'master_id': 'count'
,'order_num_total':'sum','customer_value_total':"sum"})
 #   Column                             Non-Null Count  Dtype         
--- ------ -------------- -----
0 master_id 19945 non-null object
1 order_channel 19945 non-null object
2 last_order_channel 19945 non-null object
3 first_order_date 19945 non-null datetime64[ns]
4 last_order_date 19945 non-null datetime64[ns]
5 last_order_date_online 19945 non-null datetime64[ns]
6 last_order_date_offline 19945 non-null datetime64[ns]
7 order_num_total_ever_online 19945 non-null float64
8 order_num_total_ever_offline 19945 non-null float64
9 customer_value_total_ever_offline 19945 non-null float64
10 customer_value_total_ever_online 19945 non-null float64
11 interested_in_categories_12 19945 non-null object
12 order_num_total 19945 non-null float64
13 customer_value_total 19945 non-null float64
dtypes: datetime64[ns](4), float64(6), object(4)

I sorted the first 10 customers who had the highest total spending and placed the highest number of orders.

df.sort_values('customer_value_total', ascending= False)[:10]


df.sort_values('order_num_total', ascending= False)[:10]
      customer_value_total  
11150 45905.100
4315 36818.290
7613 33918.100
13880 31227.410
9055 20706.340
7330 18443.570
8068 16918.570
163 12726.100
7223 12282.240
18767 12103.150

I modularized the data preprocessing process so that I can reuse it when needed.


def data_prep(dataframe):
dataframe["order_num_total"] = dataframe["order_num_total_ever_online"] + dataframe["order_num_total_ever_offline"]
dataframe["customer_value_total"] = dataframe["customer_value_total_ever_offline"] + dataframe["customer_value_total_ever_online"]
date_columns = dataframe.columns[dataframe.columns.str.contains("date")]
dataframe[date_columns] = dataframe[date_columns].apply(pd.to_datetime)
return df

The Calculation of RFM Metrics

I designated the date of analysis as two days after the date of the most recent purchase in the dataset


df['last_order_date'].max()
analysis_date = dt.datetime(2021,6,1)

After creating the RFM dataframe, I assigned the recency, frequency, and monetary values.

rfm= pd.DataFrame()
rfm['customer_id']= df['master_id']
rfm['recency']= (analysis_date - df['last_order_date']).astype('timedelta64[D]')
rfm['frequency']= df['order_num_total']
rfm['monetary']= df['customer_value_total']

rfm.head()

                            customer_id  recency  frequency  monetary
0 cc294636-19f0-11eb-8d74-000d3a38a36f 95.000 5.000 939.370
1 f431bd5a-ab7b-11e9-a2fc-000d3a38a36f 105.000 21.000 2013.550
2 69b69676-1a40-11ea-941b-000d3a38a36f 186.000 5.000 585.320
3 1854e56c-491f-11eb-806e-000d3a38a36f 135.000 2.000 121.970
4 d6ea1074-f1f5-11e9-9346-000d3a38a36f 86.000 2.000 209.980

I have converted the recency, frequency, and monetary metrics into scores ranging from 1 to 5 using the qcut method, and I have saved these scores as recency_score, frequency_score, and monetary_score.

rfm['recency_score'] = pd.qcut(rfm['recency'],5,labels=[5, 4, 3, 2, 1])
rfm['frequency_score'] =pd.qcut(rfm['frequency'].rank(method='first'), 5, labels =[1, 2, 3, 4, 5])
rfm['monetary_score']=pd.qcut(rfm['monetary'],5,labels= [1, 2, 3, 4, 5])
rfm.head()
  recency_score frequency_score monetary_score  
0 3 4 4
1 3 5 5
2 2 4 3
3 3 1 1
4 3 1 1

I have assigned recency_score and frequency_score as a single variable and saved it as RF_SCORE.


rfm["RF_SCORE"] = (rfm['recency_score'].astype(str) + rfm['frequency_score'].astype(str))
0        34
1 35
2 24
3 31
4 31
..
19940 13
19941 22
19942 53
19943 34
19944 12

I described RF_SCORE as a segment.



seg_map = {
r'[1-2][1-2]': 'hibernating',
r'[1-2][3-4]': 'at_Risk',
r'[1-2]5': 'cant_loose',
r'3[1-2]': 'about_to_sleep',
r'33': 'need_attention',
r'[3-4][4-5]': 'loyal_customers',
r'41': 'promising',
r'51': 'new_customers',
r'[4-5][2-3]': 'potential_loyalists',
r'5[4-5]': 'champions'
}

rfm['segment'] = rfm['RF_SCORE'].replace(seg_map, regex=True)
0            loyal_customers
1 loyal_customers
2 at_Risk
3 about_to_sleep
4 about_to_sleep
...
19940 at_Risk
19941 hibernating
19942 potential_loyalists
19943 loyal_customers
19944 hibernating

I examined the segments.



rfm.groupby('segment').agg({"recency": ['mean', 'count'],"frequency": ['mean', 'count'],"monetary":['mean', 'count']})
                   recency       frequency       monetary      
mean count mean count mean count
segment
about_to_sleep 113.785 1629 2.401 1629 359.009 1629
at_Risk 241.607 3131 4.472 3131 646.610 3131
cant_loose 235.444 1200 10.698 1200 1474.468 1200
champions 17.107 1932 8.934 1932 1406.625 1932
hibernating 247.950 3604 2.394 3604 366.267 3604
loyal_customers 82.595 3361 8.375 3361 1216.819 3361
need_attention 113.829 823 3.728 823 562.143 823
new_customers 17.918 680 2.000 680 339.956 680
potential_loyalists 37.156 2938 3.304 2938 533.184 2938
promising 58.921 647 2.000 647 335.673 647

Now, we are ready to task.

Task

To advertise the new shoes, we need to reach out to loyal and female customers.

target_segments_customer_ids = rfm[rfm["segment"].isin(["champions","loyal_customers"])]["customer_id"]
cust_ids = df[(df['master_id'].isin(target_segments_customer_ids)) & (df["interested_in_categories_12"].str.contains('KADIN'))]['master_id']
cust_ids.to_csv('yeni_marka_hedef_müşteri_id.csv', index=False)
0        cc294636-19f0-11eb-8d74-000d3a38a36f
1 f431bd5a-ab7b-11e9-a2fc-000d3a38a36f
8 cfbda69e-5b4f-11ea-aca7-000d3a38a36f
11 c2e15af2-9eed-11e9-9897-000d3a38a36f
33 fb840306-1219-11ea-a001-000d3a38a36f
...
19912 f63aba0e-41d8-11ea-96d9-000d3a38a36f
19913 81d0da04-a53e-11e9-a2fc-000d3a38a36f
19914 825e6b00-ab40-11e9-a2fc-000d3a38a36f
19917 f8c471c8-2596-11eb-81e9-000d3a38a36f
19926 7e1c15be-6298-11ea-9861-000d3a38a36f

Our customers are ready for our new shoe brand, and we can create special campaigns for these customers.

Big thanks to Vahit Keskin and Miuul

Contact me on Linkedin :) yaseminderyadilli

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Yasemin Derya Dilli
Yasemin Derya Dilli

Written by Yasemin Derya Dilli

Data Analyst | Engineer | Content Writer

No responses yet

Write a response