您好,登錄后才能下訂單哦!
小編這次要給大家分享的是如何實(shí)現(xiàn)pandas數(shù)據(jù)拼接,文章內(nèi)容豐富,感興趣的小伙伴可以來了解一下,希望大家閱讀完這篇文章之后能夠有所收獲。
一 前言
pandas數(shù)據(jù)拼接有可能會(huì)用到,比如出現(xiàn)重復(fù)數(shù)據(jù),需要合并兩份數(shù)據(jù)的交集,并集就是個(gè)不錯(cuò)的選擇,知識(shí)追尋者本著技多不壓身的態(tài)度蠻學(xué)習(xí)了一下下;
二 數(shù)據(jù)拼接
在進(jìn)行學(xué)習(xí)數(shù)據(jù)轉(zhuǎn)換之前,先學(xué)習(xí)一些數(shù)拼接相關(guān)的知識(shí)
2.1 join()聯(lián)結(jié)
有關(guān)merge操作知識(shí)追尋者這邊不提及,有空可能后面會(huì)專門出一篇相關(guān)文章,因?yàn)槠鋵W(xué)習(xí)方式根SQL的表聯(lián)結(jié)類似,不是幾行能說清楚的知識(shí)點(diǎn);
join操作能將 2 個(gè)DataFrame 合并為一塊,前提是DataFrame 之間的列沒有重復(fù);
# -*- coding: utf-8 -*- import pandas as pd import numpy as np data1 = { 'user' : ['zszxz','craler','rose'], 'price' : [100, 200, 300], 'hobby' : ['reading','running','hiking'] } index1 = ['user1','user2','user3'] frame1 = pd.DataFrame(data1,index1) data2 = { 'person' : ['zszxz','craler','rose'], 'number' : [100, 2000, 3000], 'activity' : ['swing','riding','climbing'] } index2 = ['user1','user2','user3'] frame2 = pd.DataFrame(data2,index2) join = frame1.join(frame2) print(join)
輸出
user price hobby person number activity
user1 zszxz 100 reading zszxz 100 swing
user2 craler 200 running craler 2000 riding
user3 rose 300 hiking rose 3000 climbing
2.2 concat()拼接
使用 concat() 函數(shù)能將2個(gè) Series 拼接為一個(gè),默認(rèn)按行拼接;
ser1 = pd.Series(['111','222',np.NaN]) ser2 = pd.Series(['333','444',np.NaN]) # 默認(rèn)按行拼接 print(pd.concat([ser1, ser2]))
如果按列拼接則 axis = 1
ser1 = pd.Series(['111','222',np.NaN]) ser2 = pd.Series(['333','444',np.NaN]) # 按列拼接 print(pd.concat([ser1, ser2],axis=1))
輸出
0 1
0 111 333
1 222 444
2 NaN NaN
更近一步,指定key 參數(shù) 輸出的數(shù)據(jù)格式就和 DataFrame 一樣
ser1 = pd.Series(['111','222',np.NaN]) ser2 = pd.Series(['333','444',np.NaN]) # 按列拼接 data = pd.concat([ser1, ser2],axis=1, keys=['zszxz', 'rzxx']) print(data)
輸出
zszxz rzxx
0 111 333
1 222 444
2 NaN NaN
注 : DataFrame 的 concat 操作 和 Series 類似;
2.3 combine_first()組合
索引重復(fù)時(shí)就可以使用combine_first進(jìn)行拼接
ser1 = pd.Series(['111','222',np.NaN],index=[1,2,3]) ser2 = pd.Series(['333','444',np.NaN,'555'],index=[1,2,3,4]) data = ser1.combine_first(ser2) print(data)
輸出
1 111
2 222
3 NaN
4 555
dtype: object
將Series 位置互換一下,可以看見基準(zhǔn)將以 ser2為準(zhǔn);
ser1 = pd.Series(['111','222',np.NaN],index=[1,2,3]) ser2 = pd.Series(['333','444',np.NaN,'555'],index=[1,2,3,4]) data = ser2.combine_first(ser1) print(data)
輸出
1 333
2 444
3 NaN
4 555
dtype: object
2.4 軸轉(zhuǎn)換
準(zhǔn)備的數(shù)據(jù)
# -*- coding: utf-8 -*- import pandas as pd import numpy as np data = { 'user' : ['zszxz','craler','rose'], 'price' : [100, 200, 300], 'hobby' : ['reading','running','hiking'] } index = ['user1','user2','user3'] frame = pd.DataFrame(data,index) print(frame)
輸出
user price hobby
user1 zszxz 100 reading
user2 craler 200 running
user3 rose 300 hiking
stack() 將 列轉(zhuǎn)為行;
# -*- coding: utf-8 -*- import pandas as pd import numpy as np data = { 'user' : ['zszxz','craler','rose'], 'price' : [100, 200, 300], 'hobby' : ['reading','running','hiking'] } index = ['user1','user2','user3'] frame = pd.DataFrame(data,index) print(frame.stack())
輸出
user1 user zszxz
price 100
hobby reading
user2 user craler
price 200
hobby running
user3 user rose
price 300
hobby hiking
dtype: object
使用 unstack()將 數(shù)據(jù)結(jié)構(gòu)重新返回
# -*- coding: utf-8 -*- import pandas as pd import numpy as np data = { 'user' : ['zszxz','craler','rose'], 'price' : [100, 200, 300], 'hobby' : ['reading','running','hiking'] } index = ['user1','user2','user3'] frame = pd.DataFrame(data,index) sta = frame.stack() print(sta.unstack())
輸出
user price hobby
user1 zszxz 100 reading
user2 craler 200 running
user3 rose 300 hiking
看完這篇關(guān)于如何實(shí)現(xiàn)pandas數(shù)據(jù)拼接的文章,如果覺得文章內(nèi)容寫得不錯(cuò)的話,可以把它分享出去給更多人看到。
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。