Mari menulis dan memahami Pohon Keputusan dengan Python dari awal! Bagian 3. Pustaka Analisis Data Pandas

Halo, Habr! Saya persembahkan untuk perhatian Anda terjemahan dari artikel " Python ใง 0 ใ‹ ใ‚‰ ใƒ‡ ใ‚ฃ ใ‚ท ใ‚ธ ใƒง ใƒณ ใƒ„ ใƒช ใƒผ ใ‚’ ไฝœ ใฃ ใฆ ็†่งฃ ใ™ ใ‚‹ ๏ผˆ3. ใƒ‡ ใƒผ ใ‚ฟ ๅˆ†ๆž โ€‹โ€‹ใƒฉ ใ‚ค โ€‹โ€‹ใƒ– ใƒฉ ใƒช Pandas ็ทจ๏ผ‰ ".



Ini adalah artikel ketiga dari sebuah seri. Tautan ke artikel sebelumnya: pertama , kedua



Pada artikel ini, saya akan menjelaskan cara bekerja dengan pustaka Pandas untuk membuat Pohon Keputusan.



3.1 Mengimpor perpustakaan



#  pandas  ,        pd
import pandas as pd


3.2 Bingkai data dan Seri



Panda menggunakan struktur seperti Data frame dan Series.

Mari kita lihat tabel mirip Excel berikut.



Satu baris data disebut Seri, kolom disebut atribut data ini, dan seluruh tabel disebut bingkai Data.





3.3 Membuat bingkai Data



Kami menghubungkan spreadsheet Excel menggunakan read_excel atau ExcelWriter

#  Excel   ,     ipynb
df0 = pd.read_excel("data_golf.xlsx")
 
#  DataFrame  HTML  
from IPython.display import HTML
html = "<div style='font-family:\"ใƒกใ‚คใƒชใ‚ช\";'>"+df0.to_html()+"</div>"
HTML(html)
 
#   Excel  (with   f.close)
with pd.ExcelWriter("data_golf2.xlsx") as f: 
       df0.to_excel(f)


Membuat Bingkai Data dari Kamus (Array Asosiatif): Kamus menyatukan data dari kolom DataFrame



#   :    
 
d = {
    "":["","","","","","","","","","","","","",""],
    "":["","","","","","","","","","","","","",""], 
    "":["","","","","","","","","","","","","",""],
    "":["","","","","","","","","","","","","",""],
 
"":["ร—","ร—","โ—‹","โ—‹","โ—‹","ร—","โ—‹","ร—","โ—‹","โ—‹","โ—‹","โ—‹","โ—‹","ร—"],
}
df0 = pd.DataFrame(d)


Membuat Bingkai Data dari Array: Mengumpulkan Data dari Baris DataFrame



#   :     
d = [["","","","","ร—"],
     ["","","","","ร—"],
     ["","","","","โ—‹"],
     ["","","","","โ—‹"],
     ["","","","","โ—‹"],
     ["","","","","ร—"],
     ["","","","","โ—‹"],
     ["","","","","ร—"],
     ["","","","","โ—‹"],
     ["","","","","โ—‹"],
     ["","","","","โ—‹"],
     ["","","","","โ—‹"],
     ["","","","","โ—‹"],
     ["","","","","ร—"],
    ]
#        columns  index .  ,   ,    .

df0 = pd.DataFrame(d,columns=["","","","",""],index=range(len(d)))


3.4 Mendapatkan informasi dari tabel



#    
 
#    
print(df0.shape) #  (14, 5)
 
#    
print(df0.shape[0]) #  14
 
#   
print(df0.columns) #  Index(['', '', '', '', ''], dtype='object')
 
#    (  df0 -    ๏ผ‰
print(df0.index) #  RangeIndex(start=0, stop=14, step=1)


3.5 Mengambil nilai loc iloc



#  
 
#  ,    
#       โ„–1 ( )
print(df0.loc[1,""]) #  

#  ,       
#        1,2,4,      Data Frame-  
df = df0.loc[[1,2,4],["",""]]
print(df)
# 
#                    
# 1                               ร—
# 2                        โ—‹
# 3                            โ—‹
# 4                            โ—‹

# iloc     .    0.
#      1  3,    .   iloc  ,   1:4,  4-   . 
df = df0.iloc[1:4,:-1]
print(df)
# 
#                
# 1                              
# 2                    
# 3                        


#      (Series)
#      . s  Series
s = df0.iloc[0,:]
#  ,    ,      s[" "]
print(s[""]) #  

#       (numpy.ndarray).
print(df0.values)


3.6 Perulangan melalui data, melalui data dengan iterrows iteritems



#  ,  
#     .     .
for i,row in df0.iterrows():
    # i    ( ), row  Series
    print(i,row)
    pass

#     .    .
for i,col in df0.iteritems():
    # i   , col  Series
    print(i,col)
    pass


3.7 Frekuensi value_counts



#   
#      . s  Series
s = df0.loc[:,""]

#     
print(s.value_counts())
# 
#     5
#     5
#     4
# Name: , dtype: int64

# ,   ,   โ€œโ€
print(s.value_counts()[""]) #  5


3.8 Mengambil Data Permintaan Khusus



#   
#  ,   - .
print(df0.query("==''"))
# 
#                    
# 0                            ร—
# 1                            ร—
# 7                            ร—
# 8                            โ—‹
# 10                                โ—‹

#  ,   - ,     
print(df0.query("=='' and =='โ—‹'"))
# 
#                    
# 8                            โ—‹
# 10                                โ—‹

#  ,   - ,     
print(df0.query("=='' or =='โ—‹'"))
# 
#                    
# 0                            ร—
# 1                            ร—
# 2                        โ—‹
# 3                            โ—‹
# 4                            โ—‹
# 6                        โ—‹
# 7                            ร—
# 8                            โ—‹
# 9                                โ—‹
# 10                                โ—‹
# 11                            โ—‹
# 12                                โ—‹


Terima kasih sudah membaca!



Kami akan sangat senang jika Anda memberi tahu kami jika Anda menyukai artikel ini, apakah terjemahannya jelas, apakah bermanfaat bagi Anda?



All Articles