Pandas速成
PIERIAN DATA
Pandas built on top of Numpy which allow Python to read in data sets from various formats, such as CSV file
We will mainly use pandas to read in data sets and select rows or columns of data, or quickly grab statistics, such as mean value of a column
read csv file
import pandas as pd
df = pd.read_csv(‘salaries.csv’) #dataframe為pandas的數據格式
print(df)
選擇column
print(df[‘Salary’]) #單選column
print(df[[‘Name’,‘Salary’]]) #多選columns
operate
print(df[‘Salary’].min())
print(df[‘Salary’].mean())
ser_of_vool = df[‘Age’] > 30
print(df[ser_of_bool])
print(df[df[‘Age’]>30])
print(df[‘Age’].unique()) #有哪些不重複(唯一)的值
print(df[‘Age’].nunique()) #有幾個不重複(唯一)的值
print(df.columns) #columns name
print(df.info) #dataframe info 欄位幾個 欄位名稱 型態
print(df.describe()) #統計相關 個數 平均值 標準差 min max 25% 50% 70%
print(df.index) #調用index
numpy創建數據給pandas
import numpy as np
import pandas as pd
mat = np.arange(0,50).reshape(5,10)
df = pd.DataFrame(data=mat)
print(df)
import numpy as np
import pandas as pd
mat = np.arange(0,10).reshape(5,2)
df = pd.DataFrame(data=mat,columns=[‘A’,’B’]) #columns name
print(df)
import numpy as np
import pandas as pd
mat = np.arange(0,10).reshape(5,2)
df = pd.DataFrame(data=mat,columns=[‘A’,’B’],index=[]) #指定index一般不太需要
print(df)