Gelişmiş fonksiyonel keşifçi Veri Analizi

Yasemin Derya Dilli
6 min readNov 5, 2023

--

Selamlar, Python ile veri analizi bölümüne devam ediyoruz. Eğer diğer kodları incelemediyseniz hemen Veri Bilimi İçin Python ve Python ile Veri Analizi yazılarını sırasıyla incelemenizi tavsiye ederim. Şimdi bu asıl konumuza geri dönelim.

Gelişmiş Fonksiyonel Keşifçi Veri Analizi (Advanced Functional Exploratory Data Analysis)

Veri analizi süreçlerinde daha ayrıntılı, karmaşık ve bilgi çıkarım odaklı analiz yöntemlerini ifade eder. Bu yaklaşım, veri setlerini derinlemesine anlamak ve veriden değerli bilgiler elde etmek amacıyla kullanılır. Geleneksel keşifçi veri analizine kıyasla daha ileri düzeyde teknikler ve yaklaşımlar içerir.

Genel yaklaşımla veriyi anlaya çalışıyoruz.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
pd.set_option(‘display.max_columns’, None)
pd.set_option(‘display.width’, 500)
df = sns.load_dataset(“titanic”)
df.head()
df.tail()
df.shape
df.info()
df.columns
df.index
df.describe().T #sayısal değişkenleri betimleme
df.isnull().values.any() #eksik değer var mı yok mu?
df.isnull().sum() # veri setindeki bütün eksik değerlerin toplamı

Kategorik Değişken Analizi (Analysis of Categorical Variables)

Kategorik değişken analizi, bir veri setindeki kategorik veya nominal değişkenlerin incelenmesi ve anlaşılması işlemidir. Kategorik değişkenler, belirli kategorileri veya grupları temsil eden değişkenlerdir. Örnek olarak, cinsiyet, eğitim seviyesi, şehir isimleri gibi değişkenler kategorik değişkenlere örnektir. Bu tür değişkenlerin analizi, veri setinizin özelliklerini ve ilişkilerini anlamanıza yardımcı olur.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
pd.set_option(‘display.max_columns’, None)
pd.set_option(‘display.width’, 500)
df = sns.load_dataset(“titanic”)
df.head()

df[“survived”].value_counts()

Name: survived, dtype: int64

df[“sex”].unique()

Out[103]: array(['male', 'female'], dtype=object)

df[“class”].nunique()

Out[104]: 3

cat_cols = [col for col in df.columns if str(df[col].dtypes) in [“category”, “object”, “bool”]] # columslar içinde gez, değişkenlerin tiplerini string’e çevir ve “category”, “object”, “bool” ise onları cat_cols’a ata.

['sex',
'embarked',
'class',
'who',
'adult_male',
'deck',
'embark_town',
'alive',
'alone']

num_but_cat = [col for col in df.columns if df[col].nunique() < 10 and df[col].dtypes in [“int”, “float”]] # df.colums’larda gez eğer eğer colums’ların number unique değeri 10'dan küçükse ve tipi “int” ya da “float”sa num_but_cat’e ata.

Sayısal Değişken Analizi (Analysis of Numerical Variables)

Sayısal değişken analizi, bir veri setindeki sayısal (nicel) değişkenlerin istatistiksel olarak incelenmesi ve anlaşılması işlemidir. Bu analiz, sayısal değişkenlerin merkezi eğilim, dağılım, dağılımın simetrisi, aykırı değerler, korelasyonlar ve daha fazlası gibi özelliklerini inceleyerek veriler hakkında önemli bilgiler sağlar.


import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
pd.set_option(‘display.max_columns’, None)
pd.set_option(‘display.width’, 500)
df = sns.load_dataset(“titanic”)
df.head()


num_cols = [col for col in df.columns if df[col].dtypes in [“int”,”float”]] # kategorik değişkenlerde yaptığımız işlemelere benzer işlemler yapıyoruz. Veri setindeki int and float sayısal değişkenlere ulaşıyoruz.

['age', 'fare']

Korelasyon Analizi (Analysis of Correlation)

İki veya daha fazla değişken arasındaki ilişkiyi ölçmek için kullanılan istatistiksel bir yöntemdir. Bu analiz, değişkenler arasındaki ilişkinin ne kadar güçlü veya zayıf olduğunu belirlemeye yardımcı olur. Korelasyon analizi, veri madenciliği, istatistiksel analiz ve veri keşfi gibi birçok alanda yaygın olarak kullanılır.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
pd.set_option(‘display.max_columns’, None)
pd.set_option(‘display.width’, 500)
df = pd.read_csv(“datasets/breast_cancer.csv”)
df = df.iloc[:, 1:-1]
df.head()

num_cols = [col for col in df.columns if df[col].dtype in [int, float]]

corr = df[num_cols].corr()

                        radius_mean  texture_mean  perimeter_mean  area_mean  smoothness_mean  compactness_mean  concavity_mean  concave points_mean  symmetry_mean  fractal_dimension_mean  radius_se  texture_se  perimeter_se  area_se  smoothness_se  compactness_se  concavity_se  concave points_se  symmetry_se  fractal_dimension_se  radius_worst  texture_worst  perimeter_worst  area_worst  smoothness_worst  compactness_worst  concavity_worst  concave points_worst  symmetry_worst  fractal_dimension_worst
radius_mean 1.00000 0.32378 0.99786 0.98736 0.17058 0.50612 0.67676 0.82253 0.14774 -0.31163 0.67909 -0.09732 0.67417 0.73586 -0.22260 0.20600 0.19420 0.37617 -0.10432 -0.04264 0.96954 0.29701 0.96514 0.94108 0.11962 0.41346 0.52691 0.74421 0.16395 0.00707
texture_mean 0.32378 1.00000 0.32953 0.32109 -0.02339 0.23670 0.30242 0.29346 0.07140 -0.07644 0.27587 0.38636 0.28167 0.25984 0.00661 0.19197 0.14329 0.16385 0.00913 0.05446 0.35257 0.91204 0.35804 0.34355 0.07750 0.27783 0.30103 0.29532 0.10501 0.11921
perimeter_mean 0.99786 0.32953 1.00000 0.98651 0.20728 0.55694 0.71614 0.85098 0.18303 -0.26148 0.69177 -0.08676 0.69313 0.74498 -0.20269 0.25074 0.22808 0.40722 -0.08163 -0.00552 0.96948 0.30304 0.97039 0.94155 0.15055 0.45577 0.56388 0.77124 0.18912 0.05102
area_mean 0.98736 0.32109 0.98651 1.00000 0.17703 0.49850 0.68598 0.82327 0.15129 -0.28311 0.73256 -0.06628 0.72663 0.80009 -0.16678 0.21258 0.20766 0.37232 -0.07250 -0.01989 0.96275 0.28749 0.95912 0.95921 0.12352 0.39041 0.51261 0.72202 0.14357 0.00374
smoothness_mean 0.17058 -0.02339 0.20728 0.17703 1.00000 0.65912 0.52198 0.55370 0.55777 0.58479 0.30147 0.06841 0.29609 0.24655 0.33238 0.31894 0.24840 0.38068 0.20077 0.28361 0.21312 0.03607 0.23885 0.20672 0.80532 0.47247 0.43493 0.50305 0.39431 0.49932
compactness_mean 0.50612 0.23670 0.55694 0.49850 0.65912 1.00000 0.88312 0.83114 0.60264 0.56537 0.49747 0.04620 0.54891 0.45565 0.13530 0.73872 0.57052 0.64226 0.22998 0.50732 0.53532 0.24813 0.59021 0.50960 0.56554 0.86581 0.81628 0.81557 0.51022 0.68738
concavity_mean 0.67676 0.30242 0.71614 0.68598 0.52198 0.88312 1.00000 0.92139 0.50067 0.33678 0.63192 0.07622 0.66039 0.61743 0.09856 0.67028 0.69127 0.68326 0.17801 0.44930 0.68824 0.29988 0.72956 0.67599 0.44882 0.75497 0.88410 0.86132 0.40946 0.51493
concave points_mean 0.82253 0.29346 0.85098 0.82327 0.55370 0.83114 0.92139 1.00000 0.46250 0.16692 0.69805 0.02148 0.71065 0.69030 0.02765 0.49042 0.43917 0.61563 0.09535 0.25758 0.83032 0.29275 0.85592 0.80963 0.45275 0.66745 0.75240 0.91016 0.37574 0.36866
symmetry_mean 0.14774 0.07140 0.18303 0.15129 0.55777 0.60264 0.50067 0.46250 1.00000 0.47992 0.30338 0.12805 0.31389 0.22397 0.18732 0.42166 0.34263 0.39330 0.44914 0.33179 0.18573 0.09065 0.21917 0.17719 0.42668 0.47320 0.43372 0.43030 0.69983 0.43841
fractal_dimension_mean -0.31163 -0.07644 -0.26148 -0.28311 0.58479 0.56537 0.33678 0.16692 0.47992 1.00000 0.00011 0.16417 0.03983 -0.09017 0.40196 0.55984 0.44663 0.34120 0.34501 0.68813 -0.25369 -0.05127 -0.20515 -0.23185 0.50494 0.45880 0.34623 0.17533 0.33402 0.76730
radius_se 0.67909 0.27587 0.69177 0.73256 0.30147 0.49747 0.63192 0.69805 0.30338 0.00011 1.00000 0.21325 0.97279 0.95183 0.16451 0.35606 0.33236 0.51335 0.24057 0.22775 0.71507 0.19480 0.71968 0.75155 0.14192 0.28710 0.38058 0.53106 0.09454 0.04956
texture_se -0.09732 0.38636 -0.08676 -0.06628 0.06841 0.04620 0.07622 0.02148 0.12805 0.16417 0.21325 1.00000 0.22317 0.11157 0.39724 0.23170 0.19500 0.23028 0.41162 0.27972 -0.11169 0.40900 -0.10224 -0.08319 -0.07366 -0.09244 -0.06896 -0.11964 -0.12821 -0.04565
perimeter_se 0.67417 0.28167 0.69313 0.72663 0.29609 0.54891 0.66039 0.71065 0.31389 0.03983 0.97279 0.22317 1.00000 0.93766 0.15108 0.41632 0.36248 0.55626 0.26649 0.24414 0.69720 0.20037 0.72103 0.73071 0.13005 0.34192 0.41890 0.55490 0.10993 0.08543
area_se 0.73586 0.25984 0.74498 0.80009 0.24655 0.45565 0.61743 0.69030 0.22397 -0.09017 0.95183 0.11157 0.93766 1.00000 0.07515 0.28484 0.27089 0.41573 0.13411 0.12707 0.75737 0.19650 0.76121 0.81141 0.12539 0.28326 0.38510 0.53817 0.07413 0.01754
smoothness_se -0.22260 0.00661 -0.20269 -0.16678 0.33238 0.13530 0.09856 0.02765 0.18732 0.40196 0.16451 0.39724 0.15108 0.07515 1.00000 0.33670 0.26868 0.32843 0.41351 0.42737 -0.23069 -0.07474 -0.21730 -0.18220 0.31446 -0.05556 -0.05830 -0.10201 -0.10734 0.10148
compactness_se 0.20600 0.19197 0.25074 0.21258 0.31894 0.73872 0.67028 0.49042 0.42166 0.55984 0.35606 0.23170 0.41632 0.28484 0.33670 1.00000 0.80127 0.74408 0.39471 0.80327 0.20461 0.14300 0.26052 0.19937 0.22739 0.67878 0.63915 0.48321 0.27788 0.59097
concavity_se 0.19420 0.14329 0.22808 0.20766 0.24840 0.57052 0.69127 0.43917 0.34263 0.44663 0.33236 0.19500 0.36248 0.27089 0.26868 0.80127 1.00000 0.77180 0.30943 0.72737 0.18690 0.10024 0.22668 0.18835 0.16848 0.48486 0.66256 0.44047 0.19779 0.43933
concave points_se 0.37617 0.16385 0.40722 0.37232 0.38068 0.64226 0.68326 0.61563 0.39330 0.34120 0.51335 0.23028 0.55626 0.41573 0.32843 0.74408 0.77180 1.00000 0.31278 0.61104 0.35813 0.08674 0.39500 0.34227 0.21535 0.45289 0.54959 0.60245 0.14312 0.31065
symmetry_se -0.10432 0.00913 -0.08163 -0.07250 0.20077 0.22998 0.17801 0.09535 0.44914 0.34501 0.24057 0.41162 0.26649 0.13411 0.41351 0.39471 0.30943 0.31278 1.00000 0.36908 -0.12812 -0.07747 -0.10375 -0.11034 -0.01266 0.06025 0.03712 -0.03041 0.38940 0.07808
fractal_dimension_se -0.04264 0.05446 -0.00552 -0.01989 0.28361 0.50732 0.44930 0.25758 0.33179 0.68813 0.22775 0.27972 0.24414 0.12707 0.42737 0.80327 0.72737 0.61104 0.36908 1.00000 -0.03749 -0.00320 -0.00100 -0.02274 0.17057 0.39016 0.37997 0.21520 0.11109 0.59133
radius_worst 0.96954 0.35257 0.96948 0.96275 0.21312 0.53532 0.68824 0.83032 0.18573 -0.25369 0.71507 -0.11169 0.69720 0.75737 -0.23069 0.20461 0.18690 0.35813 -0.12812 -0.03749 1.00000 0.35992 0.99371 0.98401 0.21657 0.47582 0.57397 0.78742 0.24353 0.09349
texture_worst 0.29701 0.91204 0.30304 0.28749 0.03607 0.24813 0.29988 0.29275 0.09065 -0.05127 0.19480 0.40900 0.20037 0.19650 -0.07474 0.14300 0.10024 0.08674 -0.07747 -0.00320 0.35992 1.00000 0.36510 0.34584 0.22543 0.36083 0.36837 0.35975 0.23303 0.21912
perimeter_worst 0.96514 0.35804 0.97039 0.95912 0.23885 0.59021 0.72956 0.85592 0.21917 -0.20515 0.71968 -0.10224 0.72103 0.76121 -0.21730 0.26052 0.22668 0.39500 -0.10375 -0.00100 0.99371 0.36510 1.00000 0.97758 0.23677 0.52941 0.61834 0.81632 0.26949 0.13896
area_worst 0.94108 0.34355 0.94155 0.95921 0.20672 0.50960 0.67599 0.80963 0.17719 -0.23185 0.75155 -0.08319 0.73071 0.81141 -0.18220 0.19937 0.18835 0.34227 -0.11034 -0.02274 0.98401 0.34584 0.97758 1.00000 0.20915 0.43830 0.54333 0.74742 0.20915 0.07965
smoothness_worst 0.11962 0.07750 0.15055 0.12352 0.80532 0.56554 0.44882 0.45275 0.42668 0.50494 0.14192 -0.07366 0.13005 0.12539 0.31446 0.22739 0.16848 0.21535 -0.01266 0.17057 0.21657 0.22543 0.23677 0.20915 1.00000 0.56819 0.51852 0.54769 0.49384 0.61762
compactness_worst 0.41346 0.27783 0.45577 0.39041 0.47247 0.86581 0.75497 0.66745 0.47320 0.45880 0.28710 -0.09244 0.34192 0.28326 -0.05556 0.67878 0.48486 0.45289 0.06025 0.39016 0.47582 0.36083 0.52941 0.43830 0.56819 1.00000 0.89226 0.80108 0.61444 0.81045
concavity_worst 0.52691 0.30103 0.56388 0.51261 0.43493 0.81628 0.88410 0.75240 0.43372 0.34623 0.38058 -0.06896 0.41890 0.38510 -0.05830 0.63915 0.66256 0.54959 0.03712 0.37997 0.57397 0.36837 0.61834 0.54333 0.51852 0.89226 1.00000 0.85543 0.53252 0.68651
concave points_worst 0.74421 0.29532 0.77124 0.72202 0.50305 0.81557 0.86132 0.91016 0.43030 0.17533 0.53106 -0.11964 0.55490 0.53817 -0.10201 0.48321 0.44047 0.60245 -0.03041 0.21520 0.78742 0.35975 0.81632 0.74742 0.54769 0.80108 0.85543 1.00000 0.50253 0.51111
symmetry_worst 0.16395 0.10501 0.18912 0.14357 0.39431 0.51022 0.40946 0.37574 0.69983 0.33402 0.09454 -0.12821 0.10993 0.07413 -0.10734 0.27788 0.19779 0.14312 0.38940 0.11109 0.24353 0.23303 0.26949 0.20915 0.49384 0.61444 0.53252 0.50253 1.00000 0.53785
fractal_dimension_worst 0.00707 0.11921 0.05102 0.00374 0.49932 0.68738 0.51493 0.36866 0.43841 0.76730 0.04956 -0.04565 0.08543 0.01754 0.10148 0.59097 0.43933 0.31065 0.07808 0.59133 0.09349 0.21912 0.13896 0.07965 0.61762 0.81045 0.68651 0.51111 0.53785 1.00000

Bir sonraki yazıda veri görselleştirme konusuna yakından bakıyor olacağız. Takipte kalın :)

Kaynak olarak katılımcısı olduğum Data Scientist Bootcamp kurs içeriği kullanılmıştır. Vahit Keskin’e teşekkürü borç bilirim.

Linkedin’de daha yakından tanışabiliriz :) yaseminderyadilli

--

--

Yasemin Derya Dilli
Yasemin Derya Dilli

Written by Yasemin Derya Dilli

Data Analyst | Engineer | Content Writer

No responses yet