【架构】Cloudera大数据平台环境离线搭建(九)

Posted by Kaka Blog on September 20, 2018

CDH简介

CDH是Cloudera公司提供的Hadoop发行版,它在原生开源的Apache Hadoop基础之上,针对特定版本的Hadoop以及Hadoop相关的软件,如Zookeeper、HBase、Flume、Sqoop等做了兼容性开发,我们在安装CDH发行版的Hadoop时就无需进行额外繁琐的兼容性测试。

以往安装配置使用Apache Hadoop时,完全需要手动在服务器上,通过命令和脚本进行安装配置,比较复杂而繁琐。使用CDH,我们可以通过Cloudera提供的CM(Cloudera Manager)来进行安装,CM是一个面向Hadoop相关软件的强大SCM工具,它提供了通过Web界面向导的方式进行软件的安装配置,此外还提供了比较基础、友好的监控、预警功能,通过Web UI展示各种已安装软件的资源使用情况、系统运行状态等等。

如果使用CM来管理CDH平台,因为CM使用了监控管理、运行状态数据采集、预警等等很多服务,所以在集群服务器资源使用方面也会比通常的Apache Hadoop版本多很多,如果所需要的Hadoop集群规模超大,比如成百上千个节点,使用CM来安装管理CDH集群能够节省大量时间,而且节省了对整个集群基本的监控的配置管理;如果集群规模比较小,如5~10个节点左右,建议对这类集群最好单个节点的资源(内存、CPU、磁盘、网络)比较充足一些为好。

img

环境准备

操作系统:Ubuntu 14.04 LTS

操作系统 对应版本
Ubuntu 16.04 LTS (Xenial)
14.04 LTS (Trusty)
12.04 LTS (Precise)

节点规划

IP地址 主机名 节点角色
192.168.241.140 Hadoop-Master NameNode、SecondaryNameNode、ResourceMaster、cloudera-scm-server、cloudera-scm-agent、MySQL
192.168.241.141 Hadoop-Slave DataNode、NodeManager、cloudera-scm-agent
192.168.241.142 Hadoop-Slave2 DataNode、NodeManager、cloudera-scm-agent

软件准备

  • Cloudera Manager:版本:5.15.1,下载地址: http://archive.cloudera.com/cm5/cm/5/cloudera-manager-trusty-cm5.15.1_amd64.tar.gz

  • CDH Percel包:版本:5.15.1,下载地址: http://archive.cloudera.com/cdh5/parcels/5.15.1/CDH-5.15.1-1.cdh5.15.1.p0.4-trusty.parcel http://archive.cloudera.com/cdh5/parcels/5.15.1/CDH-5.15.1-1.cdh5.15.1.p0.4-sles11.parcel.sha1 http://archive.cloudera.com/cdh5/parcels/5.15.1/manifest.json

系统环境准备

  • 配置网络IP和主机名
  • SSH免密钥登录
  • 关闭防火墙
  • 安装JDK并配置环境变量
  • 设置时钟同步
  • 安装配置mysql

安装Cloudera Manager

1、安装Cloudera Manager Server、Agent

sudo mkdir /opt/cloudera-manager
sudo tar zxvf cloudera-manager-trusty-cm5.15.1_amd64.tar.gz -C /opt/cloudera-manager

2、创建用户cloudera-scm 所有的服务器都要执行:

sudo useradd --system --home=/opt/cloudera-manager/cm-5.15.1/run/cloudera-scm-server --no-create-home --shell=/bin/false --comment "Cloudera SCM User" cloudera-scm

3、配置CM Agent

  • 修改文件/opt/cloudera-manager/cm-5.15.1/etc/cloudera-scm-agent/config.ini中的server_host以及server_port
    server_host=192.168.241.140
    

4、配置CM Server数据库

  • 所有服务器:拷贝mysql-connector-java.jar文件到目录 /usr/share/java/
  • grant all on . to ‘root’@’%’ identified by ‘123456’ with grant option;
  • cd /opt/cloudera-manager/cm-5.15.1/share/cmf/schema/ sudo ./scm_prepare_database.sh mysql cm -h192.168.241.131 -uroot -p123456 root 123456 格式:数据库类型、数据库、数据库服务器、用户名、密码、 cm server服务器

5、同步CM到其它节点

Hadoop-Master: scp -r /opt/cloudera-manager hadoop-slave:~/
Hadoop-Slave: sudo mv cloudera-manager /opt
Hadoop-Master: scp -r /opt/cloudera-manager hadoop-slave2:~/
Hadoop-Slave2: sudo mv cloudera-manager /opt

6、创建Parcel目录

  • Server节点
    sudo mkdir -p /opt/cloudera/parcel-repo
    sudo chown cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo
    
  • Agent节点
    sudo mkdir -p /opt/cloudera/parcels
    sudo chown cloudera-scm:cloudera-scm /opt/cloudera/parcels
    

7、制作CDH本地源

  • CDH-5.15.1-1.cdh5.15.1.p0.4-trusty.parcel以及manifest.json,将这两个文件放到server节点的/opt/cloudera/parcel-repo下
  • sudo cp CDH-5.15.1-1.cdh5.15.1.p0.4-trusty.parcel.sha1 /opt/cloudera/parcel-repo/CDH-5.15.1-1.cdh5.15.1.p0.4-trusty.parcel.sha

8、启动CM Server、Agent

cd /opt/cloudera-manager/cm-5.15.1/etc/init.d/
sudo ./cloudera-scm-server start
sudo ./cloudera-scm-agent start

访问

访问http://192.168.241.140:7182

遇到的问题

[22/Sep/2018 10:25:09 +0000] 1943 MainThread agent        ERROR    Heartbeating to 192.168.241.140:7182 failed.
Traceback (most recent call last):
  File "/opt/cloudera-manager/cm-5.15.1/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.15.1-py2.7.egg/cmf/agent.py", line 1435, in _send_heartbeat
    self.master_port)
  File "/opt/cloudera-manager/cm-5.15.1/lib/cmf/agent/build/env/lib/python2.7/site-packages/avro-1.6.3-py2.7.egg/avro/ipc.py", line 469, in __init__
    self.conn.connect()
  File "/usr/lib/python2.7/httplib.py", line 778, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 571, in create_connection
    raise err
error: [Errno 111] Connection refused
[22/Sep/2018 10:25:09 +0000] 1943 MainThread heartbeat_tracker INFO     HB stats (seconds): num:1 LIFE_MIN:0.19 min:0.19 mean:0.19 max:0.19 LIFE_MAX:0.19
[22/Sep/2018 10:25:09 +0000] 1943 Monitor-HostMonitor throttling_logger ERROR    Error getting directory attributes for /opt/cloudera-manager/cm-5.15.1/log/cloudera-scm-agent
Traceback (most recent call last):
  File "/opt/cloudera-manager/cm-5.15.1/lib/cmf/agent/build/env/lib/python2.7/site-packages/cmf-5.15.1-py2.7.egg/cmf/monitor/dir_monitor.py", line 90, in _get_directory_attributes
    name = pwd.getpwuid(uid)[0]
KeyError: 'getpwuid(): uid not found: 1106'

Src file /opt/cloudera/parcels/.flood/CDH-5.15.1-1.cdh5.15.1.p0.4-trusty.parcel/CDH-5.15.1-1.cdh5.15.1.p0.4-trusty.parcel does not exist - cdh-master 解决: rm -rf /opt/cloudera/parcels/.flood service cloudera-scm-agent restart

参考

Cloudera官方文档

CDH-5.7.0:基于Parcels方式离线安装配置

在Ubuntu 14.04 LTS下通过Cloudera CDH 5.4.8搭建Hadoop集群

[CDH版本要求]–CDH 5和Cloudera Manager 5要求和支持的版本

CDH安装手册

Cloudera大数据平台环境搭建(CDH5.13.1)傻瓜式说明书