溫馨提示×

溫馨提示×

您好,登錄后才能下訂單哦!

密碼登錄×
登錄注冊×
其他方式登錄
點(diǎn)擊 登錄注冊 即表示同意《億速云用戶服務(wù)條款》

一個(gè)Hadoop管理員的職責(zé)(翻譯)

發(fā)布時(shí)間:2020-04-07 10:13:58 來源:網(wǎng)絡(luò) 閱讀:1305 作者:碳豪搶 欄目:大數(shù)據(jù)

最近看過一篇與Hadoop有關(guān)的英文文檔,其實(shí)就是一本書里的一部分內(nèi)容。覺得很好,基本闡述了一個(gè)hadoop管理員的職責(zé)。平時(shí),工作當(dāng)中接觸到hadoop的朋友,可以看下,這篇文檔中所描述的知識和技能,大家是否都已經(jīng)具備了?
譯文:
一個(gè)Hadoop管理員的職責(zé)

隨著對大數(shù)據(jù)日益增長的興趣和洞察力,各個(gè)組織正在積極計(jì)劃或者組建他們的大數(shù)據(jù)團(tuán)隊(duì)。要開始進(jìn)行數(shù)據(jù)工作,他們需要一個(gè)良好而扎實(shí)的基礎(chǔ)架構(gòu)。
一旦他們具備基礎(chǔ)架構(gòu),他們就須要針對集群的維護(hù),管理和排錯(cuò)進(jìn)行控制和指定策略。

市場對Hadoop管理員的需求日益增長,他們的工作(創(chuàng)建和維護(hù)集群)使得數(shù)據(jù)分析成為真正的可能。

Hadoop管理員在網(wǎng)絡(luò),操作系統(tǒng),和存儲方面,須要很好的系統(tǒng)操作技能。在復(fù)雜的網(wǎng)絡(luò)環(huán)境中,對于計(jì)算機(jī)硬件和硬件操作,他們需要具備大量的知識。

Apache Hadoop軟件主要運(yùn)行在Linux操作系統(tǒng),所有必須對Linux操作系統(tǒng)具備諸如:監(jiān)控,排錯(cuò),配置,安全管理等這些技能。

為集群設(shè)置節(jié)點(diǎn)涉及很多重復(fù)性的工作,Hadoop管理員應(yīng)該使用快速而有效率的方法把這些服務(wù)器使用起來,比如使用Puppet,Chef和CFEngine這樣的管理工具.
除了這些工具,管理也應(yīng)該具備良好的規(guī)劃技能去設(shè)計(jì)和規(guī)劃集群.

在一個(gè)集群中許多節(jié)點(diǎn)須要復(fù)制數(shù)據(jù),比如,namenode守護(hù)進(jìn)程的fsimage文件,可以被配置為寫入相同節(jié)點(diǎn)的不同硬盤,或者寫入不同節(jié)點(diǎn)。
所以hadoop管理員須要理解NFS掛載點(diǎn)以及如何配合集群來建立NFS掛載.管理員也可能被要求在特定的節(jié)點(diǎn)上配置磁盤RAID.

因?yàn)镠adoop所有的服務(wù)和守護(hù)進(jìn)程都是建立在Java之上,所以JVM(Java Virtual Machine Java虛擬機(jī))的基本知識,和對Java異常的理解將會非常有用.
這些知識能夠幫助管理員快速的確認(rèn)問題.

Hadoop管理員應(yīng)具備進(jìn)行基準(zhǔn)測試的技能,能夠在高流量的場景下測試集群的性能.

集群總是在持續(xù)不斷的運(yùn)行,并處理大量的數(shù)據(jù),所以集群比較容易出現(xiàn)故障.為了監(jiān)控集群的健康狀況,管理員須要部署監(jiān)控工具,諸如:Nagios 和 Ganglia等等.
并且管理員須要為關(guān)鍵節(jié)點(diǎn)配置告警和監(jiān)控,在出現(xiàn)問題之前,提前預(yù)見到問題.

具備良好的腳步語言編程知識,諸如: Python,Ruby, 或者 Shell,將會極大的幫助到Hadoop管理員.
通常,Hadoop管理員會被要求把一些預(yù)定的文件從外部文件源,分期的導(dǎo)入至HDFS. 腳步技能可以幫助管理員通過執(zhí)行腳本來自動化地管理這些工作.

最重要的是,Hadoop管理員應(yīng)該很好的了解Apache Hadoop的體系結(jié)構(gòu)和它的內(nèi)部運(yùn)作.

下面這些項(xiàng)目是Hadoop管理員必須掌握的一些關(guān)鍵hadoop操作:
規(guī)劃集群,評估集群須要處理的數(shù)據(jù)量,以此來決定集群中的節(jié)點(diǎn)數(shù)量.
在集群上安裝和升級Apache Hadoop.
通過使用Hadoop的各種配置文件來配置和調(diào)試Hadoop.
理解所有Hadoop守護(hù)進(jìn)程,以及它們在集群中的角色和承擔(dān)的職責(zé).
Hadoop 管理員應(yīng)該知如何閱讀和解釋Hadoop的日志.
在集群中添加和刪除節(jié)點(diǎn).
在集群中重新平衡節(jié)點(diǎn).
使用認(rèn)證和認(rèn)證系統(tǒng)來啟用安全機(jī)制,比如Kerberos

幾乎所有的組織都會遵循一定的策略來備份他們的數(shù)據(jù),執(zhí)行數(shù)據(jù)備份工作是Hadoop管理員的責(zé)任.
所以Hadoop管理員應(yīng)該熟悉服務(wù)器的備份和恢復(fù)操作.


原文:
Responsibilities of a Hadoop administrator

With the increase in the interest to derive insight on their big data,
organizations are now planning and building their big data teams aggressively.
To start working on their data, they need to have a good solid infrastructure.
Once they have this setup, they need several controls and system policies in place to maintain, manage,and troubleshoot their cluster.

There is an ever-increasing demand for Hadoop Administrators in the market
as their function (setting up and maintaining Hadoop clusters) is what makes analysis really possible.

The Hadoop administrator needs to be very good at system operations, networking, operating systems, and storage.
They need to have a strong knowledge of computer hardware and their operations, in a complex network.

Apache Hadoop, mainly, runs on Linux. So having good Linux skills such as monitoring, troubleshooting, confguration, and security is a must.

Setting up nodes for clusters involves a lot of repetitive tasks
and the Hadoop administrator should use quicker and effcient ways to bring up these servers using confguration management tools
such as Puppet, Chef, and CFEngine.
Apart from these tools, the administrator should also have good capacity planning skills to design and plan clusters.

There are several nodes in a cluster that would need duplication of data,
for example, the fsimage file of the namenode daemon can be confgured to write to two different disks on the same node
or on a disk on a different node.
An understanding of NFS mount points and how to set it up within a cluster is required.
The administrator may also be asked to set up RAID for disks on specifc nodes.

As all Hadoop services/daemons are built on Java,
a basic knowledge of the JVM along with the ability to understand Java exceptions would be very useful.
This helps administrators identify issues quickly.

The Hadoop administrator should possess the skills to benchmark the cluster to test performance under high traffc scenarios.

Clusters are prone to failures as they are up all the time and are processing large amounts of data regularly.
To monitor the health of the cluster, the administrator should deploy monitoring tools such as Nagios and Ganglia
and should confgure alerts and monitors for critical nodes of the cluster to foresee issues before they occur.

Knowledge of a good scripting language such as Python, Ruby, or Shell would greatly help the function of an administrator.
Often, administrators are asked to set up some kind of a scheduled file staging from an external source to HDFS.
The scripting skills help them execute these requests by building scripts and automating them.

Above all, the Hadoop administrator should have a very good understanding of the Apache Hadoop architecture and its inner workings.

The following are some of the key Hadoop-related operations that the Hadoop administrator should know:

Planning the cluster, deciding on the number of nodes based on the estimated amount of data the cluster is going to serve.

Installing and upgrading Apache Hadoop on a cluster.

Confguring and tuning Hadoop using the various confguration files available within Hadoop.

An understanding of all the Hadoop daemons along with their roles and responsibilities in the cluster.

The administrator should know how to read and interpret Hadoop logs.

Adding and removing nodes in the cluster.

Rebalancing nodes in the cluster.

Employ security using an authentication and authorization system such as Kerberos.

Almost all organizations follow the policy of backing up their data
and it is the responsibility of the administrator to perform this activity.
So, an administrator should be well versed with backups and recovery operations of servers

向AI問一下細(xì)節(jié)

免責(zé)聲明:本站發(fā)布的內(nèi)容(圖片、視頻和文字)以原創(chuàng)、轉(zhuǎn)載和分享為主,文章觀點(diǎn)不代表本網(wǎng)站立場,如果涉及侵權(quán)請聯(lián)系站長郵箱:is@yisu.com進(jìn)行舉報(bào),并提供相關(guān)證據(jù),一經(jīng)查實(shí),將立刻刪除涉嫌侵權(quán)內(nèi)容。

AI