FLO Customer Segmentation with RFM

4 min readNov 19, 2023

FLO is a large shoe company with both online and offline stores in Turkey. The company is looking to introduce a new shoe brand into its portfolio. Using the dataset, we will systematically reach out to female customers and loyal customers, creating special campaigns for them.

Data Understanding ve Preparation


import datetime as dt
pd.set_option("display.max_columns", 10)
pd.set_option( "display.float_format", lambda x :  '%.3f' % x )
import pandas as pd
df_ = pd.read_csv('flo_data_20k.csv')
df= df_.copy()

# a. First 10 observations,
df.head(10)
# b. Variable Names,
df.columns
# c. Descriptive Statistics,
 df.describe().T
 # d. Missing Value,
 df.isnull().sum()
# e. Variable Types.
  df.info()

master_id: Unique customer number 
order_channel: Platform where the purchase was made 
last_order_channel: The channel used for the latest purchase 
first_order_date: The date of the customer's first purchase 
last_order_date: The date of the customer's last purchase 
last_order_date_online: The date of the customer's last online purchase 
last_order_date_offline: The date of the customer's last offline purchase 
order_num_total_ever_online: The total number of purchases made by the customer online 
order_num_total_ever_offline: The total number of purchases made by the customer offline 
customer_value_total_ever_offline: The total amount paid by the customer in offline purchases 
customer_value_total_ever_online: The total amount paid by the customer in online purchases 
interested_in_categories_12: List of categories the customer has shopped in the last 12 months

“Omnichannel” refers to customers who shop from both online and offline platforms. I am creating new variables to track the shopping count and spending for each customer.



df['order_num_total'] = df['order_num_total_ever_online'] + df['order_num_total_ever_offline']
df['customer_value_total'] = df['customer_value_total_ever_offline'] + df['customer_value_total_ever_online']

I examined variable names and changed the data type of variables representing dates to ‘date’



date_columns= df.columns[df.columns.str.contains('date')]
df[date_columns]= df[date_columns].apply(pd.to_datetime)
df.info()

According to order channel, I looked distribution of customer count, total quantity of products purchased, total spending.


df.groupby('order_channel').agg({'master_id': 'count'
                                ,'order_num_total':'sum','customer_value_total':"sum"})

 #   Column                             Non-Null Count  Dtype         
---  ------                             --------------  -----         
 0   master_id                          19945 non-null  object        
 1   order_channel                      19945 non-null  object        
 2   last_order_channel                 19945 non-null  object        
 3   first_order_date                   19945 non-null  datetime64[ns]
 4   last_order_date                    19945 non-null  datetime64[ns]
 5   last_order_date_online             19945 non-null  datetime64[ns]
 6   last_order_date_offline            19945 non-null  datetime64[ns]
 7   order_num_total_ever_online        19945 non-null  float64       
 8   order_num_total_ever_offline       19945 non-null  float64       
 9   customer_value_total_ever_offline  19945 non-null  float64       
 10  customer_value_total_ever_online   19945 non-null  float64       
 11  interested_in_categories_12        19945 non-null  object        
 12  order_num_total                    19945 non-null  float64       
 13  customer_value_total               19945 non-null  float64       
dtypes: datetime64[ns](4), float64(6), object(4)

I sorted the first 10 customers who had the highest total spending and placed the highest number of orders.

df.sort_values('customer_value_total', ascending= False)[:10]


df.sort_values('order_num_total', ascending= False)[:10]

      customer_value_total  
11150             45905.100  
4315              36818.290  
7613              33918.100  
13880             31227.410  
9055              20706.340  
7330              18443.570  
8068              16918.570  
163               12726.100  
7223              12282.240  
18767             12103.150

I modularized the data preprocessing process so that I can reuse it when needed.


def data_prep(dataframe):
    dataframe["order_num_total"] = dataframe["order_num_total_ever_online"] + dataframe["order_num_total_ever_offline"]
    dataframe["customer_value_total"] = dataframe["customer_value_total_ever_offline"] + dataframe["customer_value_total_ever_online"]
    date_columns = dataframe.columns[dataframe.columns.str.contains("date")]
    dataframe[date_columns] = dataframe[date_columns].apply(pd.to_datetime)
    return df

The Calculation of RFM Metrics

I designated the date of analysis as two days after the date of the most recent purchase in the dataset


df['last_order_date'].max()
analysis_date = dt.datetime(2021,6,1)

After creating the RFM dataframe, I assigned the recency, frequency, and monetary values.

rfm= pd.DataFrame()
rfm['customer_id']= df['master_id']
rfm['recency']= (analysis_date - df['last_order_date']).astype('timedelta64[D]')
rfm['frequency']= df['order_num_total']
rfm['monetary']= df['customer_value_total']

rfm.head()

                            customer_id  recency  frequency  monetary
0  cc294636-19f0-11eb-8d74-000d3a38a36f   95.000      5.000   939.370
1  f431bd5a-ab7b-11e9-a2fc-000d3a38a36f  105.000     21.000  2013.550
2  69b69676-1a40-11ea-941b-000d3a38a36f  186.000      5.000   585.320
3  1854e56c-491f-11eb-806e-000d3a38a36f  135.000      2.000   121.970
4  d6ea1074-f1f5-11e9-9346-000d3a38a36f   86.000      2.000   209.980

I have converted the recency, frequency, and monetary metrics into scores ranging from 1 to 5 using the qcut method, and I have saved these scores as recency_score, frequency_score, and monetary_score.

rfm['recency_score'] = pd.qcut(rfm['recency'],5,labels=[5, 4, 3, 2, 1])
rfm['frequency_score'] =pd.qcut(rfm['frequency'].rank(method='first'), 5, labels =[1, 2, 3, 4, 5])
rfm['monetary_score']=pd.qcut(rfm['monetary'],5,labels= [1, 2, 3, 4, 5])
rfm.head()

  recency_score frequency_score monetary_score  
0             3               4              4  
1             3               5              5  
2             2               4              3  
3             3               1              1  
4             3               1              1

I have assigned recency_score and frequency_score as a single variable and saved it as RF_SCORE.


rfm["RF_SCORE"] = (rfm['recency_score'].astype(str) + rfm['frequency_score'].astype(str))

0        34
1        35
2        24
3        31
4        31
         ..
19940    13
19941    22
19942    53
19943    34
19944    12

I described RF_SCORE as a segment.



seg_map = {
    r'[1-2][1-2]': 'hibernating',
    r'[1-2][3-4]': 'at_Risk',
    r'[1-2]5': 'cant_loose',
    r'3[1-2]': 'about_to_sleep',
    r'33': 'need_attention',
    r'[3-4][4-5]': 'loyal_customers',
    r'41': 'promising',
    r'51': 'new_customers',
    r'[4-5][2-3]': 'potential_loyalists',
    r'5[4-5]': 'champions'
}

rfm['segment'] = rfm['RF_SCORE'].replace(seg_map, regex=True)

0            loyal_customers
1            loyal_customers
2                    at_Risk
3             about_to_sleep
4             about_to_sleep
                ...         
19940                at_Risk
19941            hibernating
19942    potential_loyalists
19943        loyal_customers
19944            hibernating

I examined the segments.



rfm.groupby('segment').agg({"recency": ['mean', 'count'],"frequency": ['mean', 'count'],"monetary":['mean', 'count']})

                   recency       frequency       monetary      
                       mean count      mean count     mean count
segment                                                         
about_to_sleep      113.785  1629     2.401  1629  359.009  1629
at_Risk             241.607  3131     4.472  3131  646.610  3131
cant_loose          235.444  1200    10.698  1200 1474.468  1200
champions            17.107  1932     8.934  1932 1406.625  1932
hibernating         247.950  3604     2.394  3604  366.267  3604
loyal_customers      82.595  3361     8.375  3361 1216.819  3361
need_attention      113.829   823     3.728   823  562.143   823
new_customers        17.918   680     2.000   680  339.956   680
potential_loyalists  37.156  2938     3.304  2938  533.184  2938
promising            58.921   647     2.000   647  335.673   647

Now, we are ready to task.

Task

To advertise the new shoes, we need to reach out to loyal and female customers.

target_segments_customer_ids = rfm[rfm["segment"].isin(["champions","loyal_customers"])]["customer_id"]
cust_ids = df[(df['master_id'].isin(target_segments_customer_ids)) & (df["interested_in_categories_12"].str.contains('KADIN'))]['master_id']
cust_ids.to_csv('yeni_marka_hedef_müşteri_id.csv', index=False)

0        cc294636-19f0-11eb-8d74-000d3a38a36f
1        f431bd5a-ab7b-11e9-a2fc-000d3a38a36f
8        cfbda69e-5b4f-11ea-aca7-000d3a38a36f
11       c2e15af2-9eed-11e9-9897-000d3a38a36f
33       fb840306-1219-11ea-a001-000d3a38a36f
                         ...                 
19912    f63aba0e-41d8-11ea-96d9-000d3a38a36f
19913    81d0da04-a53e-11e9-a2fc-000d3a38a36f
19914    825e6b00-ab40-11e9-a2fc-000d3a38a36f
19917    f8c471c8-2596-11eb-81e9-000d3a38a36f
19926    7e1c15be-6298-11ea-9861-000d3a38a36f

Our customers are ready for our new shoe brand, and we can create special campaigns for these customers.

Big thanks to Vahit Keskin and Miuul

Contact me on Linkedin :) yaseminderyadilli

FLO Customer Segmentation with RFM

Data Understanding ve Preparation

The Calculation of RFM Metrics

Task

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Yasemin Derya Dilli

No responses yet