您好,登錄后才能下訂單哦!
不管是誰,只要處理過由用戶提交的調(diào)查數(shù)據(jù),就能明白這種亂七八糟的數(shù)據(jù)是怎么一回事。為了得到一組能用于分析工作的格式統(tǒng)一的字符串,需要做很多事情:去除空白符、刪除各種標(biāo)點符號、正確的大寫格式等。做法之一是使用內(nèi)建的字符串方法和正則表達(dá)式re模塊:
states = [' Alabama ', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda',
'south carolina##', 'West virginia?']
import re
def clean_strings(strings): # 一般對數(shù)據(jù)的處理步驟
result = []
for value in strings:
value = value.strip()
value = re.sub('[!#?]', '', value)
value = value.title()
result.append(value)
return result
In [173]: clean_strings(states)
Out[173]:
['Alabama',
'Georgia',
'Georgia',
'Georgia',
'Florida',
'South Carolina',
'West Virginia']
def remove_punctuation(value):
return re.sub('[!#?]', '', value)
clean_ops = [str.strip, remove_punctuation, str.title] # 函數(shù)也是對象
def clean_strings(strings, ops):
result = []
for value in strings:
for function in ops:
value = function(value)
result.append(value)
return result
In [175]: clean_strings(states, clean_ops)
Out[175]:
['Alabama',
'Georgia',
'Georgia',
'Georgia',
'Florida',
'South Carolina',
'West Virginia']
# 或者
In [176]: for x in map(remove_punctuation, states): #
.....: print(x)
Alabama
Georgia
Georgia
georgia
FlOrIda
south carolina
West virginia
免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進(jìn)行舉報,并提供相關(guān)證據(jù),一經(jīng)查實,將立刻刪除涉嫌侵權(quán)內(nèi)容。