<strong id="65ryr"><sup id="65ryr"></sup></strong>

<table id="65ryr"><legend id="65ryr"><ins id="65ryr"></ins></legend></table>

溫馨提示×

溫馨提示×

您好，登錄后才能下訂單哦！

密碼登錄×

忘記密碼？

登錄注冊(cè)×

獲取短信驗(yàn)證碼

其他方式登錄

點(diǎn)擊登錄注冊(cè) 即表示同意《億速云用戶服務(wù)條款》

用戶登錄×

賬戶密碼登錄

請(qǐng)使用微信掃描上方二維碼

使用幫助

請(qǐng)求超時(shí)！

請(qǐng)點(diǎn)擊重新獲取二維碼

怎么使用Python?pandas找出刪除重復(fù)的數(shù)據(jù)

發(fā)布時(shí)間：2022-07-12 09:57:15 來(lái)源：億速云閱讀：207 作者：iii 欄目：開(kāi)發(fā)技術(shù)

這篇文章主要介紹了怎么使用Python pandas找出刪除重復(fù)的數(shù)據(jù)的相關(guān)知識(shí)，內(nèi)容詳細(xì)易懂，操作簡(jiǎn)單快捷，具有一定借鑒價(jià)值，相信大家閱讀完這篇怎么使用Python pandas找出刪除重復(fù)的數(shù)據(jù)文章都會(huì)有所收獲，下面我們一起來(lái)看看吧。

前言

當(dāng)我們使用pandas處理數(shù)據(jù)的時(shí)候，經(jīng)常會(huì)遇到數(shù)據(jù)重復(fù)的問(wèn)題，如何找出重復(fù)數(shù)據(jù)進(jìn)而分析重復(fù)原因，或者如何直接刪除重復(fù)的數(shù)據(jù)是一個(gè)關(guān)鍵的步驟，pandas提供了很方便的方法：duplicated()和drop_duplicates()。

一、duplicated()

duplicated()可以被用在DataFrame的三種情況下，分別是pandas.DataFrame.duplicated、pandas.Series.duplicated和pandas.Index.duplicated。他們的用法都類(lèi)似，前兩個(gè)會(huì)返回一個(gè)布爾值的Series，最后一個(gè)會(huì)返回一個(gè)布爾值的numpy.ndarray。

DataFrame.duplicated(subset=None, keep=‘first’)

subset：默認(rèn)為None，需要標(biāo)記重復(fù)的標(biāo)簽或標(biāo)簽序列

keep：默認(rèn)為‘first’，如何標(biāo)記重復(fù)標(biāo)簽

first：將除第一次出現(xiàn)以外的重復(fù)數(shù)據(jù)標(biāo)記為T(mén)rue
last：將除最后一次出現(xiàn)以外的重復(fù)數(shù)據(jù)標(biāo)記為T(mén)rue
False：將所有重復(fù)的項(xiàng)都標(biāo)記為T(mén)rue（不管是不是第一次出現(xiàn)）

Series.duplicated(keep=‘first’)

keep：與DataFrame.duplicated的keep相同

Index.duplicated(keep=‘first’)

keep：與DataFrame.duplicated的keep相同

例子：

import pandas as pd
df = pd.DataFrame({
    'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
    'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
    'rating': [4, 4, 3.5, 15, 5]
})
df

brand style rating
0 Yum Yum cup 4.0
1 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0

df.duplicated()

0 False
1 True
2 False
3 False
4 False
dtype: bool

df.duplicated(keep='last')

0 True
1 False
2 False
3 False
4 False
dtype: bool

df.duplicated(keep=False)

0 True
1 True
2 False
3 False
4 False
dtype: bool

df.duplicated(subset=['brand'])

0 False
1 True
2 False
3 True
4 True
dtype: bool

關(guān)于Index的重復(fù)標(biāo)記：

df = df.set_index('brand')
df

style rating
brand
Yum Yum cup 4.0
Yum Yum cup 4.0
Indomie cup 3.5
Indomie pack 15.0
Indomie pack 5.0

df.index.duplicated()

array([False,  True, False,  True,  True])

二、drop_duplicates()

與duplicated()類(lèi)似，drop_duplicates()是直接把重復(fù)值給刪掉。下面只會(huì)介紹一些含義不同的參數(shù)。

DataFrame.drop_duplicates(subset=None, keep=‘first’, inplace=False)

subset：與duplicated()中相同
keep：與duplicated()中相同
inplace：與pandas其他函數(shù)的inplace相同，選擇是修改現(xiàn)有數(shù)據(jù)還是返回新的數(shù)據(jù)

Series.drop_duplicates()相比Series.duplicated()也是多了一個(gè)inplace參數(shù)，和上訴介紹一樣，Index.drop_duplicates()與Index.duplicated()參數(shù)相同就不做贅述。下面是例子：

df = pd.DataFrame({
    'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'],
    'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
    'rating': [4, 4, 3.5, 15, 5]
})
df

brand style rating
0 Yum Yum cup 4.0
1 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0

df.drop_duplicates()

brand style rating
0 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0

df.drop_duplicates(inplace = True)

df

brand style rating
0 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0

關(guān)于“怎么使用Python pandas找出刪除重復(fù)的數(shù)據(jù)”這篇文章的內(nèi)容就介紹到這里，感謝各位的閱讀！相信大家對(duì)“怎么使用Python pandas找出刪除重復(fù)的數(shù)據(jù)”知識(shí)都有一定的了解，大家如果還想學(xué)習(xí)更多知識(shí)，歡迎關(guān)注億速云行業(yè)資訊頻道。

向AI問(wèn)一下細(xì)節(jié)

推薦閱讀：

免責(zé)聲明：本站發(fā)布的內(nèi)容（圖片、視頻和文字）以原創(chuàng)、轉(zhuǎn)載和分享為主，文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng)，如果涉及侵權(quán)請(qǐng)聯(lián)系站長(zhǎng)郵箱：is@yisu.com進(jìn)行舉報(bào)，并提供相關(guān)證據(jù)，一經(jīng)查實(shí)，將立刻刪除涉嫌侵權(quán)內(nèi)容。

上一篇新聞：
until封裝watch常用邏輯簡(jiǎn)化代碼怎么寫(xiě)
下一篇新聞：
Python3錯(cuò)誤:SyntaxError:?unexpected?EOF?while?parsin怎么解決

猜你喜歡

AI
助
手

產(chǎn)品服務(wù)

地區(qū)劃分

專(zhuān)題活動(dòng)

幫助支持

關(guān)于我們

售后咨詢

7*24小時(shí)在線電話：400-100-2938

7*24小時(shí)在線 QQ：800811969

關(guān)注億速云

億速云公眾號(hào)

手機(jī)網(wǎng)站二維碼