CDH简介
CDH是Cloudera公司提供的Hadoop发行版,它在原生开源的Apache Hadoop基础之上,针对特定版本的Hadoop以及Hadoop相关的软件,如Zookeeper、HBase、Flume、Sqoop等做了兼容性开发,我们在安装CDH发行版的Hadoop时就无需进行额外繁琐的兼容性测试。
以往安装配置使用Apache Hadoop时,完全需要手动在服务器上,通过命令和脚本进行安装配置,比较复杂而繁琐。使用CDH,我们可以通过Cloudera提供的CM(Cloudera Manager)来进行安装,CM是一个面向Hadoop相关软件的强大SCM工具,它提供了通过Web界面向导的方式进行软件的安装配置,此外还提供了比较基础、友好的监控、预警功能,通过Web UI展示各种已安装软件的资源使用情况、系统运行状态等等。
如果使用CM来管理CDH平台,因为CM使用了监控管理、运行状态数据采集、预警等等很多服务,所以在集群服务器资源使用方面也会比通常的Apache Hadoop版本多很多,如果所需要的Hadoop集群规模超大,比如成百上千个节点,使用CM来安装管理CDH集群能够节省大量时间,而且节省了对整个集群基本的监控的配置管理;如果集群规模比较小,如5~10个节点左右,建议对这类集群最好单个节点的资源(内存、CPU、磁盘、网络)比较充足一些为好。
环境准备
操作系统:Ubuntu 14.04 LTS
操作系统 | 对应版本 |
---|---|
Ubuntu | 16.04 LTS (Xenial) 14.04 LTS (Trusty) 12.04 LTS (Precise) |
节点规划
IP地址 | 主机名 | 节点角色 |
---|---|---|
192.168.241.140 | Hadoop-Master | NameNode、SecondaryNameNode、ResourceMaster、cloudera-scm-server、cloudera-scm-agent、MySQL |
192.168.241.141 | Hadoop-Slave | DataNode、NodeManager、cloudera-scm-agent |
192.168.241.142 | Hadoop-Slave2 | DataNode、NodeManager、cloudera-scm-agent |
软件准备
-
Cloudera Manager:版本:
5.15.1
,下载地址:http://archive.cloudera.com/cm5/cm/5/cloudera-manager-trusty-cm5.15.1_amd64.tar.gz
-
CDH Percel包:版本:
5.15.1
,下载地址:http://archive.cloudera.com/cdh5/parcels/5.15.1/CDH-5.15.1-1.cdh5.15.1.p0.4-trusty.parcel
http://archive.cloudera.com/cdh5/parcels/5.15.1/CDH-5.15.1-1.cdh5.15.1.p0.4-sles11.parcel.sha1
http://archive.cloudera.com/cdh5/parcels/5.15.1/manifest.json
系统环境准备
- 配置网络IP和主机名
- SSH免密钥登录
- 关闭防火墙
- 安装JDK并配置环境变量
- 设置时钟同步
- 安装配置mysql
安装Cloudera Manager
1、安装Cloudera Manager Server、Agent
sudo mkdir /opt/cloudera-manager
sudo tar zxvf cloudera-manager-trusty-cm5.15.1_amd64.tar.gz -C /opt/cloudera-manager
2、创建用户cloudera-scm 所有的服务器都要执行:
sudo useradd --system --home=/opt/cloudera-manager/cm-5.15.1/run/cloudera-scm-server --no-create-home --shell=/bin/false --comment "Cloudera SCM User" cloudera-scm
3、配置CM Agent
- 修改文件
/opt/cloudera-manager/cm-5.15.1/etc/cloudera-scm-agent/config.ini
中的server_host以及server_portserver_host=192.168.241.140
4、配置CM Server数据库
- 所有服务器:拷贝mysql-connector-java.jar文件到目录 /usr/share/java/
- grant all on . to ‘root’@’%’ identified by ‘123456’ with grant option;
- cd /opt/cloudera-manager/cm-5.15.1/share/cmf/schema/
sudo ./scm_prepare_database.sh mysql cm -h192.168.241.131 -uroot -p123456 root 123456
格式:数据库类型、数据库、数据库服务器、用户名、密码、 cm server服务器
5、同步CM到其它节点
Hadoop-Master: scp -r /opt/cloudera-manager hadoop-slave:~/
Hadoop-Slave: sudo mv cloudera-manager /opt
Hadoop-Master: scp -r /opt/cloudera-manager hadoop-slave2:~/
Hadoop-Slave2: sudo mv cloudera-manager /opt
6、创建Parcel目录
- Server节点
sudo mkdir -p /opt/cloudera/parcel-repo sudo chown cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo
- Agent节点
sudo mkdir -p /opt/cloudera/parcels sudo chown cloudera-scm:cloudera-scm /opt/cloudera/parcels
7、制作CDH本地源
CDH-5.15.1-1.cdh5.15.1.p0.4-trusty.parcel
以及manifest.json
,将这两个文件放到server节点的/opt/cloudera/parcel-repo下sudo cp CDH-5.15.1-1.cdh5.15.1.p0.4-trusty.parcel.sha1 /opt/cloudera/parcel-repo/CDH-5.15.1-1.cdh5.15.1.p0.4-trusty.parcel.sha
8、启动CM Server、Agent
cd /opt/cloudera-manager/cm-5.15.1/etc/init.d/
sudo ./cloudera-scm-server start
sudo ./cloudera-scm-agent start
访问
访问http://192.168.241.140:7182
遇到的问题
[22/Sep/2018 10:25:09 +0000] 1943 MainThread agent ERROR Heartbeating to 192.168.241.140:7182 failed.
Traceback (most recent call last):
File "/opt/cloudera-manager/cm-5.15.1/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.15.1-py2.7.egg/cmf/agent.py", line 1435, in _send_heartbeat
self.master_port)
File "/opt/cloudera-manager/cm-5.15.1/lib/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 469, in __init__
self.conn.connect()
File "/usr/lib/python2.7/httplib.py", line 778, in connect
self.timeout, self.source_address)
File "/usr/lib/python2.7/socket.py", line 571, in create_connection
raise err
error: [Errno 111] Connection refused
[22/Sep/2018 10:25:09 +0000] 1943 MainThread heartbeat_tracker INFO HB stats (seconds): num:1 LIFE_MIN:0.19 min:0.19 mean:0.19 max:0.19 LIFE_MAX:0.19
[22/Sep/2018 10:25:09 +0000] 1943 Monitor-HostMonitor throttling_logger ERROR Error getting directory attributes for /opt/cloudera-manager/cm-5.15.1/log/cloudera-scm-agent
Traceback (most recent call last):
File "/opt/cloudera-manager/cm-5.15.1/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.15.1-py2.7.egg/cmf/monitor/dir_monitor.py", line 90, in _get_directory_attributes
name = pwd.getpwuid(uid)[0]
KeyError: 'getpwuid(): uid not found: 1106'
Src file /opt/cloudera/parcels/.flood/CDH-5.15.1-1.cdh5.15.1.p0.4-trusty.parcel/CDH-5.15.1-1.cdh5.15.1.p0.4-trusty.parcel does not exist - cdh-master
解决:
rm -rf /opt/cloudera/parcels/.flood
service cloudera-scm-agent restart
参考
在Ubuntu 14.04 LTS下通过Cloudera CDH 5.4.8搭建Hadoop集群