Gelişmiş fonksiyonel keşifçi Veri Analizi
Selamlar, Python ile veri analizi bölümüne devam ediyoruz. Eğer diğer kodları incelemediyseniz hemen Veri Bilimi İçin Python ve Python ile Veri Analizi yazılarını sırasıyla incelemenizi tavsiye ederim. Şimdi bu asıl konumuza geri dönelim.
Gelişmiş Fonksiyonel Keşifçi Veri Analizi (Advanced Functional Exploratory Data Analysis)
Veri analizi süreçlerinde daha ayrıntılı, karmaşık ve bilgi çıkarım odaklı analiz yöntemlerini ifade eder. Bu yaklaşım, veri setlerini derinlemesine anlamak ve veriden değerli bilgiler elde etmek amacıyla kullanılır. Geleneksel keşifçi veri analizine kıyasla daha ileri düzeyde teknikler ve yaklaşımlar içerir.
Genel yaklaşımla veriyi anlaya çalışıyoruz.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
pd.set_option(‘display.max_columns’, None)
pd.set_option(‘display.width’, 500)
df = sns.load_dataset(“titanic”)
df.head()
df.tail()
df.shape
df.info()
df.columns
df.index
df.describe().T #sayısal değişkenleri betimleme
df.isnull().values.any() #eksik değer var mı yok mu?
df.isnull().sum() # veri setindeki bütün eksik değerlerin toplamı
Kategorik Değişken Analizi (Analysis of Categorical Variables)
Kategorik değişken analizi, bir veri setindeki kategorik veya nominal değişkenlerin incelenmesi ve anlaşılması işlemidir. Kategorik değişkenler, belirli kategorileri veya grupları temsil eden değişkenlerdir. Örnek olarak, cinsiyet, eğitim seviyesi, şehir isimleri gibi değişkenler kategorik değişkenlere örnektir. Bu tür değişkenlerin analizi, veri setinizin özelliklerini ve ilişkilerini anlamanıza yardımcı olur.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
pd.set_option(‘display.max_columns’, None)
pd.set_option(‘display.width’, 500)
df = sns.load_dataset(“titanic”)
df.head()
df[“survived”].value_counts()
Name: survived, dtype: int64
df[“sex”].unique()
Out[103]: array(['male', 'female'], dtype=object)
df[“class”].nunique()
Out[104]: 3
cat_cols = [col for col in df.columns if str(df[col].dtypes) in [“category”, “object”, “bool”]] # columslar içinde gez, değişkenlerin tiplerini string’e çevir ve “category”, “object”, “bool” ise onları cat_cols’a ata.
['sex',
'embarked',
'class',
'who',
'adult_male',
'deck',
'embark_town',
'alive',
'alone']
num_but_cat = [col for col in df.columns if df[col].nunique() < 10 and df[col].dtypes in [“int”, “float”]] # df.colums’larda gez eğer eğer colums’ların number unique değeri 10'dan küçükse ve tipi “int” ya da “float”sa num_but_cat’e ata.
Sayısal Değişken Analizi (Analysis of Numerical Variables)
Sayısal değişken analizi, bir veri setindeki sayısal (nicel) değişkenlerin istatistiksel olarak incelenmesi ve anlaşılması işlemidir. Bu analiz, sayısal değişkenlerin merkezi eğilim, dağılım, dağılımın simetrisi, aykırı değerler, korelasyonlar ve daha fazlası gibi özelliklerini inceleyerek veriler hakkında önemli bilgiler sağlar.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
pd.set_option(‘display.max_columns’, None)
pd.set_option(‘display.width’, 500)
df = sns.load_dataset(“titanic”)
df.head()
num_cols = [col for col in df.columns if df[col].dtypes in [“int”,”float”]] # kategorik değişkenlerde yaptığımız işlemelere benzer işlemler yapıyoruz. Veri setindeki int and float sayısal değişkenlere ulaşıyoruz.
['age', 'fare']
Korelasyon Analizi (Analysis of Correlation)
İki veya daha fazla değişken arasındaki ilişkiyi ölçmek için kullanılan istatistiksel bir yöntemdir. Bu analiz, değişkenler arasındaki ilişkinin ne kadar güçlü veya zayıf olduğunu belirlemeye yardımcı olur. Korelasyon analizi, veri madenciliği, istatistiksel analiz ve veri keşfi gibi birçok alanda yaygın olarak kullanılır.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
pd.set_option(‘display.max_columns’, None)
pd.set_option(‘display.width’, 500)
df = pd.read_csv(“datasets/breast_cancer.csv”)
df = df.iloc[:, 1:-1]
df.head()
num_cols = [col for col in df.columns if df[col].dtype in [int, float]]
corr = df[num_cols].corr()
radius_mean texture_mean perimeter_mean area_mean smoothness_mean compactness_mean concavity_mean concave points_mean symmetry_mean fractal_dimension_mean radius_se texture_se perimeter_se area_se smoothness_se compactness_se concavity_se concave points_se symmetry_se fractal_dimension_se radius_worst texture_worst perimeter_worst area_worst smoothness_worst compactness_worst concavity_worst concave points_worst symmetry_worst fractal_dimension_worst
radius_mean 1.00000 0.32378 0.99786 0.98736 0.17058 0.50612 0.67676 0.82253 0.14774 -0.31163 0.67909 -0.09732 0.67417 0.73586 -0.22260 0.20600 0.19420 0.37617 -0.10432 -0.04264 0.96954 0.29701 0.96514 0.94108 0.11962 0.41346 0.52691 0.74421 0.16395 0.00707
texture_mean 0.32378 1.00000 0.32953 0.32109 -0.02339 0.23670 0.30242 0.29346 0.07140 -0.07644 0.27587 0.38636 0.28167 0.25984 0.00661 0.19197 0.14329 0.16385 0.00913 0.05446 0.35257 0.91204 0.35804 0.34355 0.07750 0.27783 0.30103 0.29532 0.10501 0.11921
perimeter_mean 0.99786 0.32953 1.00000 0.98651 0.20728 0.55694 0.71614 0.85098 0.18303 -0.26148 0.69177 -0.08676 0.69313 0.74498 -0.20269 0.25074 0.22808 0.40722 -0.08163 -0.00552 0.96948 0.30304 0.97039 0.94155 0.15055 0.45577 0.56388 0.77124 0.18912 0.05102
area_mean 0.98736 0.32109 0.98651 1.00000 0.17703 0.49850 0.68598 0.82327 0.15129 -0.28311 0.73256 -0.06628 0.72663 0.80009 -0.16678 0.21258 0.20766 0.37232 -0.07250 -0.01989 0.96275 0.28749 0.95912 0.95921 0.12352 0.39041 0.51261 0.72202 0.14357 0.00374
smoothness_mean 0.17058 -0.02339 0.20728 0.17703 1.00000 0.65912 0.52198 0.55370 0.55777 0.58479 0.30147 0.06841 0.29609 0.24655 0.33238 0.31894 0.24840 0.38068 0.20077 0.28361 0.21312 0.03607 0.23885 0.20672 0.80532 0.47247 0.43493 0.50305 0.39431 0.49932
compactness_mean 0.50612 0.23670 0.55694 0.49850 0.65912 1.00000 0.88312 0.83114 0.60264 0.56537 0.49747 0.04620 0.54891 0.45565 0.13530 0.73872 0.57052 0.64226 0.22998 0.50732 0.53532 0.24813 0.59021 0.50960 0.56554 0.86581 0.81628 0.81557 0.51022 0.68738
concavity_mean 0.67676 0.30242 0.71614 0.68598 0.52198 0.88312 1.00000 0.92139 0.50067 0.33678 0.63192 0.07622 0.66039 0.61743 0.09856 0.67028 0.69127 0.68326 0.17801 0.44930 0.68824 0.29988 0.72956 0.67599 0.44882 0.75497 0.88410 0.86132 0.40946 0.51493
concave points_mean 0.82253 0.29346 0.85098 0.82327 0.55370 0.83114 0.92139 1.00000 0.46250 0.16692 0.69805 0.02148 0.71065 0.69030 0.02765 0.49042 0.43917 0.61563 0.09535 0.25758 0.83032 0.29275 0.85592 0.80963 0.45275 0.66745 0.75240 0.91016 0.37574 0.36866
symmetry_mean 0.14774 0.07140 0.18303 0.15129 0.55777 0.60264 0.50067 0.46250 1.00000 0.47992 0.30338 0.12805 0.31389 0.22397 0.18732 0.42166 0.34263 0.39330 0.44914 0.33179 0.18573 0.09065 0.21917 0.17719 0.42668 0.47320 0.43372 0.43030 0.69983 0.43841
fractal_dimension_mean -0.31163 -0.07644 -0.26148 -0.28311 0.58479 0.56537 0.33678 0.16692 0.47992 1.00000 0.00011 0.16417 0.03983 -0.09017 0.40196 0.55984 0.44663 0.34120 0.34501 0.68813 -0.25369 -0.05127 -0.20515 -0.23185 0.50494 0.45880 0.34623 0.17533 0.33402 0.76730
radius_se 0.67909 0.27587 0.69177 0.73256 0.30147 0.49747 0.63192 0.69805 0.30338 0.00011 1.00000 0.21325 0.97279 0.95183 0.16451 0.35606 0.33236 0.51335 0.24057 0.22775 0.71507 0.19480 0.71968 0.75155 0.14192 0.28710 0.38058 0.53106 0.09454 0.04956
texture_se -0.09732 0.38636 -0.08676 -0.06628 0.06841 0.04620 0.07622 0.02148 0.12805 0.16417 0.21325 1.00000 0.22317 0.11157 0.39724 0.23170 0.19500 0.23028 0.41162 0.27972 -0.11169 0.40900 -0.10224 -0.08319 -0.07366 -0.09244 -0.06896 -0.11964 -0.12821 -0.04565
perimeter_se 0.67417 0.28167 0.69313 0.72663 0.29609 0.54891 0.66039 0.71065 0.31389 0.03983 0.97279 0.22317 1.00000 0.93766 0.15108 0.41632 0.36248 0.55626 0.26649 0.24414 0.69720 0.20037 0.72103 0.73071 0.13005 0.34192 0.41890 0.55490 0.10993 0.08543
area_se 0.73586 0.25984 0.74498 0.80009 0.24655 0.45565 0.61743 0.69030 0.22397 -0.09017 0.95183 0.11157 0.93766 1.00000 0.07515 0.28484 0.27089 0.41573 0.13411 0.12707 0.75737 0.19650 0.76121 0.81141 0.12539 0.28326 0.38510 0.53817 0.07413 0.01754
smoothness_se -0.22260 0.00661 -0.20269 -0.16678 0.33238 0.13530 0.09856 0.02765 0.18732 0.40196 0.16451 0.39724 0.15108 0.07515 1.00000 0.33670 0.26868 0.32843 0.41351 0.42737 -0.23069 -0.07474 -0.21730 -0.18220 0.31446 -0.05556 -0.05830 -0.10201 -0.10734 0.10148
compactness_se 0.20600 0.19197 0.25074 0.21258 0.31894 0.73872 0.67028 0.49042 0.42166 0.55984 0.35606 0.23170 0.41632 0.28484 0.33670 1.00000 0.80127 0.74408 0.39471 0.80327 0.20461 0.14300 0.26052 0.19937 0.22739 0.67878 0.63915 0.48321 0.27788 0.59097
concavity_se 0.19420 0.14329 0.22808 0.20766 0.24840 0.57052 0.69127 0.43917 0.34263 0.44663 0.33236 0.19500 0.36248 0.27089 0.26868 0.80127 1.00000 0.77180 0.30943 0.72737 0.18690 0.10024 0.22668 0.18835 0.16848 0.48486 0.66256 0.44047 0.19779 0.43933
concave points_se 0.37617 0.16385 0.40722 0.37232 0.38068 0.64226 0.68326 0.61563 0.39330 0.34120 0.51335 0.23028 0.55626 0.41573 0.32843 0.74408 0.77180 1.00000 0.31278 0.61104 0.35813 0.08674 0.39500 0.34227 0.21535 0.45289 0.54959 0.60245 0.14312 0.31065
symmetry_se -0.10432 0.00913 -0.08163 -0.07250 0.20077 0.22998 0.17801 0.09535 0.44914 0.34501 0.24057 0.41162 0.26649 0.13411 0.41351 0.39471 0.30943 0.31278 1.00000 0.36908 -0.12812 -0.07747 -0.10375 -0.11034 -0.01266 0.06025 0.03712 -0.03041 0.38940 0.07808
fractal_dimension_se -0.04264 0.05446 -0.00552 -0.01989 0.28361 0.50732 0.44930 0.25758 0.33179 0.68813 0.22775 0.27972 0.24414 0.12707 0.42737 0.80327 0.72737 0.61104 0.36908 1.00000 -0.03749 -0.00320 -0.00100 -0.02274 0.17057 0.39016 0.37997 0.21520 0.11109 0.59133
radius_worst 0.96954 0.35257 0.96948 0.96275 0.21312 0.53532 0.68824 0.83032 0.18573 -0.25369 0.71507 -0.11169 0.69720 0.75737 -0.23069 0.20461 0.18690 0.35813 -0.12812 -0.03749 1.00000 0.35992 0.99371 0.98401 0.21657 0.47582 0.57397 0.78742 0.24353 0.09349
texture_worst 0.29701 0.91204 0.30304 0.28749 0.03607 0.24813 0.29988 0.29275 0.09065 -0.05127 0.19480 0.40900 0.20037 0.19650 -0.07474 0.14300 0.10024 0.08674 -0.07747 -0.00320 0.35992 1.00000 0.36510 0.34584 0.22543 0.36083 0.36837 0.35975 0.23303 0.21912
perimeter_worst 0.96514 0.35804 0.97039 0.95912 0.23885 0.59021 0.72956 0.85592 0.21917 -0.20515 0.71968 -0.10224 0.72103 0.76121 -0.21730 0.26052 0.22668 0.39500 -0.10375 -0.00100 0.99371 0.36510 1.00000 0.97758 0.23677 0.52941 0.61834 0.81632 0.26949 0.13896
area_worst 0.94108 0.34355 0.94155 0.95921 0.20672 0.50960 0.67599 0.80963 0.17719 -0.23185 0.75155 -0.08319 0.73071 0.81141 -0.18220 0.19937 0.18835 0.34227 -0.11034 -0.02274 0.98401 0.34584 0.97758 1.00000 0.20915 0.43830 0.54333 0.74742 0.20915 0.07965
smoothness_worst 0.11962 0.07750 0.15055 0.12352 0.80532 0.56554 0.44882 0.45275 0.42668 0.50494 0.14192 -0.07366 0.13005 0.12539 0.31446 0.22739 0.16848 0.21535 -0.01266 0.17057 0.21657 0.22543 0.23677 0.20915 1.00000 0.56819 0.51852 0.54769 0.49384 0.61762
compactness_worst 0.41346 0.27783 0.45577 0.39041 0.47247 0.86581 0.75497 0.66745 0.47320 0.45880 0.28710 -0.09244 0.34192 0.28326 -0.05556 0.67878 0.48486 0.45289 0.06025 0.39016 0.47582 0.36083 0.52941 0.43830 0.56819 1.00000 0.89226 0.80108 0.61444 0.81045
concavity_worst 0.52691 0.30103 0.56388 0.51261 0.43493 0.81628 0.88410 0.75240 0.43372 0.34623 0.38058 -0.06896 0.41890 0.38510 -0.05830 0.63915 0.66256 0.54959 0.03712 0.37997 0.57397 0.36837 0.61834 0.54333 0.51852 0.89226 1.00000 0.85543 0.53252 0.68651
concave points_worst 0.74421 0.29532 0.77124 0.72202 0.50305 0.81557 0.86132 0.91016 0.43030 0.17533 0.53106 -0.11964 0.55490 0.53817 -0.10201 0.48321 0.44047 0.60245 -0.03041 0.21520 0.78742 0.35975 0.81632 0.74742 0.54769 0.80108 0.85543 1.00000 0.50253 0.51111
symmetry_worst 0.16395 0.10501 0.18912 0.14357 0.39431 0.51022 0.40946 0.37574 0.69983 0.33402 0.09454 -0.12821 0.10993 0.07413 -0.10734 0.27788 0.19779 0.14312 0.38940 0.11109 0.24353 0.23303 0.26949 0.20915 0.49384 0.61444 0.53252 0.50253 1.00000 0.53785
fractal_dimension_worst 0.00707 0.11921 0.05102 0.00374 0.49932 0.68738 0.51493 0.36866 0.43841 0.76730 0.04956 -0.04565 0.08543 0.01754 0.10148 0.59097 0.43933 0.31065 0.07808 0.59133 0.09349 0.21912 0.13896 0.07965 0.61762 0.81045 0.68651 0.51111 0.53785 1.00000
Bir sonraki yazıda veri görselleştirme konusuna yakından bakıyor olacağız. Takipte kalın :)
Kaynak olarak katılımcısı olduğum Data Scientist Bootcamp kurs içeriği kullanılmıştır. Vahit Keskin’e teşekkürü borç bilirim.
Linkedin’de daha yakından tanışabiliriz :) yaseminderyadilli