Pandas速成

Steven Wang
3 min readMar 31, 2019

--

PIERIAN DATA

Pandas built on top of Numpy which allow Python to read in data sets from various formats, such as CSV file

We will mainly use pandas to read in data sets and select rows or columns of data, or quickly grab statistics, such as mean value of a column

read csv file

import pandas as pd
df = pd.read_csv(‘salaries.csv’) #dataframe為pandas的數據格式
print(df)

選擇column

print(df[‘Salary’]) #單選column
print(df[[‘Name’,‘Salary’]]) #多選columns

operate

print(df[‘Salary’].min())
print(df[‘Salary’].mean())

ser_of_vool = df[‘Age’] > 30
print(df[ser_of_bool])

print(df[df[‘Age’]>30])

print(df[‘Age’].unique()) #有哪些不重複(唯一)的值
print(df[‘Age’].nunique()) #有幾個不重複(唯一)的值

print(df.columns) #columns name

print(df.info) #dataframe info 欄位幾個 欄位名稱 型態

print(df.describe()) #統計相關 個數 平均值 標準差 min max 25% 50% 70%

print(df.index) #調用index

numpy創建數據給pandas

import numpy as np
import pandas as pd
mat = np.arange(0,50).reshape(5,10)
df = pd.DataFrame(data=mat)
print(df)

import numpy as np
import pandas as pd
mat = np.arange(0,10).reshape(5,2)
df = pd.DataFrame(data=mat,columns=[‘A’,’B’]) #columns name
print(df)

import numpy as np
import pandas as pd
mat = np.arange(0,10).reshape(5,2)
df = pd.DataFrame(data=mat,columns=[‘A’,’B’],index=[]) #指定index一般不太需要
print(df)

--

--

Steven Wang
Steven Wang

No responses yet