<ol id="yentk"><em id="yentk"><fieldset id="yentk"></fieldset></em></ol>

溫馨提示×

溫馨提示×

您好，登錄后才能下訂單哦！

密碼登錄×

忘記密碼？

登錄注冊×

獲取短信驗證碼

其他方式登錄

點擊登錄注冊即表示同意《億速云用戶服務條款》

用戶登錄×

賬戶密碼登錄

請使用微信掃描上方二維碼

使用幫助

請求超時！

請點擊重新獲取二維碼

如何用幾行代碼做特征選擇

發(fā)布時間：2020-06-30 03:07:16 來源：網(wǎng)絡閱讀：3122 作者：necther 欄目：大數(shù)據(jù)

from sklearn.feature_selection import RFE
from sklearn.linear_model import LinearRegression
#Load boston housing dataset as an example
X = np.array(train1[feature_use].fillna(-1))[1:train1.size,:]
Y = np.array(train1['target'])[1:train1.size]
#print(X)
#print(Y)
names = feature_use
#use linear regression as the model
lr = LinearRegression()
#rank all features, i.e continue the elimination until the last one
rfe = RFE(lr, n_features_to_select=1)
rfe.fit(X,Y)
print("Features sorted by their score:")
#print(sorted(zip(map(lambda x: round(x, 4), rf.feature_importances_), names),             reverse=True))

sortedlist = sorted(zip(map(lambda x: round(x, 4), rfe.ranking_), names),
             reverse=True)
print(sortedlist)

feature_use = []
for index in sortedlist[len(sortedlist)-70 : ]:
    if index[0]>0:
        feature_use.append(index[1])
print(feature_use)

上面的X為數(shù)據(jù)集的特征集合 Y為標簽集合
在sortlist里對特征的重要性進行了排序

最近做機器學習的一點感悟是，特征的影響遠比模型參數(shù)來的大，特征是現(xiàn)實世界在算法中的倒影。
在特征工程中要對業(yè)務有非常深的理解，強調返璞歸真，刪除無效特征，減少引起干擾的特征。
加特征的過程需要一個一個來，還要多思考這些特征之間的關系，是否是強烈線性相關的。

# random forest select features
'''
from sklearn.ensemble import RandomForestRegressor
import numpy as np
#Load boston housing dataset as an example
X = np.array(train1[feature_use].fillna(-1))[1:train1.size,:]
Y = np.array(train1['target'])[1:train1.size]
print(X)
print(Y)
names = feature_use
rf = RandomForestRegressor()
rf.fit(X, Y)
print("Features sorted by their score:")
print(sorted(zip(map(lambda x: round(x, 4), rf.feature_importances_), names),
             reverse=True))
'''

向AI問一下細節(jié)

推薦閱讀：

免責聲明：本站發(fā)布的內容（圖片、視頻和文字）以原創(chuàng)、轉載和分享為主，文章觀點不代表本網(wǎng)站立場，如果涉及侵權請聯(lián)系站長郵箱：is@yisu.com進行舉報，并提供相關證據(jù)，一經(jīng)查實，將立刻刪除涉嫌侵權內容。

上一篇新聞：
ElasticSearch單機雙實例的配置方法
下一篇新聞：
0803使用pause函數(shù)將進程掛起，直到有SIGALRM信號發(fā)生時才從pause返回

猜你喜歡

AI
助
手

產(chǎn)品服務

地區(qū)劃分

專題活動

幫助支持

關于我們

售后咨詢

7*24小時在線電話：400-100-2938

7*24小時在線 QQ：800811969

關注億速云

億速云公眾號

手機網(wǎng)站二維碼

<label id="jtrol"><dl id="jtrol"></dl></label>