python如何實(shí)現(xiàn)H2O中的隨機(jī)森林算法

發(fā)布時間：2021-04-06 10:49:26 來源：億速云閱讀：252 作者：小新欄目：開發(fā)技術(shù)

這篇文章給大家分享的是有關(guān)python如何實(shí)現(xiàn)H2O中的隨機(jī)森林算法的內(nèi)容。小編覺得挺實(shí)用的，因此分享給大家做個參考，一起跟隨小編過來看看吧。

Ｈ2O中的隨機(jī)森林算法介紹及其項(xiàng)目實(shí)戰(zhàn)（python實(shí)現(xiàn)）

包的引入：from h3o.estimators.random_forest import H2ORandomForestEstimator

H2ORandomForestEstimator 的常用方法和參數(shù)介紹：

(一)建模方法：

model ＝H2ORandomForestEstimator(ntrees=ｎ,max_depth =ｍ)

model.train(x=random_pv.names,y='Catrgory',training_frame=trainData)

通過trainData來構(gòu)建隨機(jī)森林模型，model.train中的trainData：訓(xùn)練集，ｘ：預(yù)測變量名稱，ｙ：預(yù)測響應(yīng)變量的名稱

(二)預(yù)測方法：

pre_tag=H2ORandomForestEstimator.predict(model ,test_data) 利用訓(xùn)練好的模型來對測試集進(jìn)行預(yù)測，其中的model：訓(xùn)練好的模型， test_data：測試集。

(三)算法參數(shù)說明：

(1)ntrees：構(gòu)建模型時要生成的樹的棵樹。

(2)max_depth ：每棵樹的最大深度。

項(xiàng)目要求：

題目一：利用train.csv中的數(shù)據(jù)，通過H2O框架中的隨機(jī)森林算法構(gòu)建分類模型，然后利用模型對 test.csv中的數(shù)據(jù)進(jìn)行預(yù)測，并計算分類的準(zhǔn)確度進(jìn)而評價模型的分類效果；通過調(diào)節(jié)參數(shù)，觀察分類準(zhǔn)確度的變化情況。注：準(zhǔn)確度＝預(yù)測正確的數(shù)占樣本數(shù)的比例

題目二：通過H2o Flow 的隨機(jī)森林算法，用同題目一中所用同樣的訓(xùn)練數(shù)據(jù)和參數(shù)，構(gòu)建模型；參看模型中特征的重要性程度，從中選取前８個特征，再去訓(xùn)練模型，并重新預(yù)測結(jié)果，進(jìn)而計算分類的準(zhǔn)確度。

需求完成內(nèi)容：２個題目的代碼，認(rèn)為最好的準(zhǔn)確度的輸出值和test數(shù)據(jù)與預(yù)測結(jié)果合并后的數(shù)據(jù)集，命名為predict.csv

python實(shí)現(xiàn)代碼如下：

(1) 題目一：

#手動進(jìn)行調(diào)節(jié)參數(shù)得到最好的準(zhǔn)確率
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import h3o
h3o.init()
from h3o.estimators.random_forest import H2ORandomForestEstimator
from __future__ import division 
df=h3o.import_file('train.csv')
trainData=df[2:]
 
model=H2ORandomForestEstimator(ntrees=6,max_depth =16)
model.train(x=trainData.names,y='Catrgory',training_frame=trainData)
df2=h3o.import_file('test.csv')
test_data=df2[2:]
pre_tag=H2ORandomForestEstimator.predict(model ,test_data)
predict=df2.concat(pre_tag)
dfnew=predict[predict['Catrgory']==predict['predict']]
Precision=dfnew.nrow/predict.nrow
 
print(Precision)
h3o.download_csv(predict,'predict.csv')

運(yùn)行結(jié)果最好為87.0833%-6-16，如下