如何進行K均值算法K-Means的案例分析

發(fā)布時間：2021-12-10 11:25:24 來源：億速云閱讀：211 作者：柒染欄目：大數(shù)據(jù)

今天就跟大家聊聊有關如何進行K均值算法K-Means的案例分析，可能很多人都不太了解，為了讓大家更加了解，小編給大家總結了以下內(nèi)容，希望大家根據(jù)這篇文章可以有所收獲。

背景介紹

這是一種無監(jiān)督算法，可以解決聚類問題。它的過程遵循一種簡單的方法，可以通過一定數(shù)量的聚類（假設k個聚類）對給定的數(shù)據(jù)集進行分類。集群中的數(shù)據(jù)點對同級組是同質(zhì)的，并且是異構的。

還記得從墨水印跡中找出形狀嗎？ k表示此活動有點類似。您查看形狀并展開以解釋存在多少個不同的群集/種群！

如何進行K均值算法K-Means的案例分析

K-均值如何形成聚類：

K均值為每個群集選取k個點，稱為質(zhì)心。
每個數(shù)據(jù)點形成具有最接近質(zhì)心的群集，即k個群集。
根據(jù)現(xiàn)有集群成員查找每個集群的質(zhì)心。在這里，我們有了新的質(zhì)心。
當我們有了新的質(zhì)心時，請重復步驟2和3。找到每個數(shù)據(jù)點與新質(zhì)心的最近距離，并與新的k簇相關聯(lián)。重復此過程，直到會聚發(fā)生為止，即質(zhì)心不變。

如何確定K的值：

在K均值中，我們有聚類，每個聚類都有自己的質(zhì)心。質(zhì)心和群集中數(shù)據(jù)點之間的差平方和構成該群集的平方值之和。同樣，當所有聚類的平方和相加時，它成為聚類解的平方和之內(nèi)的總和。

我們知道，隨著簇數(shù)的增加，該值會不斷減少，但是如果繪制結果，您可能會看到平方距離的總和急劇減小，直到達到某個k值，然后才逐漸減小。在這里，我們可以找到最佳的群集數(shù)量。

如何進行K均值算法K-Means的案例分析

下面來看使用Python實現(xiàn)的案例：

'''The following code is for the K-MeansCreated by - ANALYTICS VIDHYA'''
# importing required librariesimport pandas as pdfrom sklearn.cluster import KMeans
# read the train and test datasettrain_data = pd.read_csv('train-data.csv')test_data = pd.read_csv('test-data.csv')
# shape of the datasetprint('Shape of training data :',train_data.shape)print('Shape of testing data :',test_data.shape)
# Now, we need to divide the training data into differernt clusters# and predict in which cluster a particular data point belongs.  
'''Create the object of the K-Means modelYou can also add other parameters and test your code hereSome parameters are : n_clusters and max_iterDocumentation of sklearn KMeans: 
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html '''
model = KMeans()  
# fit the model with the training datamodel.fit(train_data)
# Number of Clustersprint('\nDefault number of Clusters : ',model.n_clusters)
# predict the clusters on the train datasetpredict_train = model.predict(train_data)print('\nCLusters on train data',predict_train) 
# predict the target on the test datasetpredict_test = model.predict(test_data)print('Clusters on test data',predict_test) 
# Now, we will train a model with n_cluster = 3model_n3 = KMeans(n_clusters=3)
# fit the model with the training datamodel_n3.fit(train_data)
# Number of Clustersprint('\nNumber of Clusters : ',model_n3.n_clusters)
# predict the clusters on the train datasetpredict_train_3 = model_n3.predict(train_data)print('\nCLusters on train data',predict_train_3) 
# predict the target on the test datasetpredict_test_3 = model_n3.predict(test_data)print('Clusters on test data',predict_test_3)

運行結果：

Shape of training data : (100, 5)Shape of testing data : (100, 5)
Default number of Clusters :  8
CLusters on train data [6 7 0 7 6 5 5 7 7 3 1 1 3 0 7 1 0 4 5 6 4 3 3 0 4 0 1 1 0 3 4 3 3 0 0 1 2 1 4 3 0 2 1 1 0 3 3 0 7 1 3 0 5 1 0 1 5 4 6 4 3 6 5 0 3 0 4 33 1 5 1 6 5 7 7 6 3 5 3 5 3 1 5 2 5 0 3 2 3 4 7 1 0 1 5 3 6 1 6]Clusters on test data [3 6 2 0 5 6 0 3 5 2 3 4 5 5 5 3 3 5 5 70 0 5 5 3 5 0 6 5 0 1 6 3 5 6 0 1 7 3 0 0 6 2 0 5 3 5 7 3 3 4 6 3 1 6 3 1 3 3 2 3 3 5 1 7 5 1 53 3 5 2 0 1 5 0 3 0 3 6 3 5 4 0 2 6 3 5 6 0 6 4 3 5 0 6 6 6 1 0]
Number of Clusters :  3
CLusters on train data [2 0 1 0 2 1 2 0 0 2 0 0 2 1 0 0 1 2 2 2 2 2 2 1 2 1 0 0 1 2 2 2 2 1 1 0 2 0 2 2 1 2 0 0 1 2 2 1 0 0 2 1 2 0 1 0 2 2 2 2 2 2 2 1 2 1 2 22 0 1 0 2 2 0 0 0 2 0 2 2 2 0 2 2 2 1 2 2 2 2 0 0 1 0 2 2 2 0 2]Clusters on test data [2 2 2 1 2 2 1 2 2 2 2 2 2 1 1 2 2 2 2 01 1 2 2 2 2 1 2 2 1 0 2 2 2 2 1 0 0 2 1 1 2 2 1 2 2 2 0 2 2 2 2 2 0 2 2 0 2 2 2 2 2 2 0 0 2 0 22 2 0 2 1 0 2 1 2 1 2 0 2 2 2 1 2 2 2 2 2 1 2 2 2 2 1 2 2 2 0 1]

看完上述內(nèi)容，你們對如何進行K均值算法K-Means的案例分析有進一步的了解嗎？如果還想了解更多知識或者相關內(nèi)容，請關注億速云行業(yè)資訊頻道，感謝大家的支持。

向AI問一下細節(jié)

如何進行K均值算法K-Means的案例分析

背景介紹

猜你喜歡

最新資訊

相關推薦

相關標簽