网络环境
首先要保证 Gateway 节点在 E-MapReduce 对应集群的安全组中,Gateway 节点可以顺利的访问 E-MapReduce 集群。设置节点的安全组请参考创建安全组。
软件环境
系统环境:推荐使用 CentOS 7.2 及以上版本。
Java环境:安装 JDK 1.7 及以上版本,推荐使用 OpenJDK version 1.8.0 版本。
搭建步骤
E-MapReduce 2.7 及以上版本,3.2 及以上版本
这些版本推荐您直接使用 E-MapReduce 控制台来创建 Gateway。如果您选择手动搭建,请先创建一个脚本,脚本内容如下所示,然后在 Gataway 节点上执行。执行命令为:
sh deploy.sh <masteri_ip> master_password_file
deploy.sh:脚本名称,内容见下面代码
masteri_ip:集群的master节点的IP,请确保可以访问
master_password_file:保存 master 节点的密码文件,将 master 节点的密码直接写在文件内即可
#!/usr/bin/bashif [ $# != 2 ]then echo "Usage: $0 master_ip master_password_file" exit 1; fi masterip=$1masterpwdfile=$2if ! type sshpass >/dev/null 2>&1; then yum install -y sshpass fiif ! type java >/dev/null 2>&1; then yum install -y java-1.8.0-openjdk fi mkdir -p /opt/apps mkdir -p /etc/ecm echo "Start to copy package from $masterip to local gateway(/opt/apps)"echo " -copying hadoop-2.7.2"sshpass -f $masterpwdfile scp -r -o 'StrictHostKeyChecking no' root@$masterip:/usr/lib/hadoop-current /opt/apps/echo " -copying hive-2.0.1"sshpass -f $masterpwdfile scp -r root@$masterip:/usr/lib/hive-current /opt/apps/echo " -copying spark-2.1.1"sshpass -f $masterpwdfile scp -r root@$masterip:/usr/lib/spark-current /opt/apps/echo " -copying tez"sshpass -f $masterpwdfile scp -r root@$masterip:/usr/lib/tez-current /opt/apps/echo " -extra-jars"sshpass -f $masterpwdfile scp -r root@$masterip:/opt/apps/extra-jars /opt/apps/ echo "Start to link /usr/lib/\${app}-current to /opt/apps/\${app}"if [ -L /usr/lib/hadoop-current ]then unlink /usr/lib/hadoop-currentfi ln -s /opt/apps/hadoop-current /usr/lib/hadoop-currentif [ -L /usr/lib/hive-current ]then unlink /usr/lib/hive-currentfi ln -s /opt/apps/hive-current /usr/lib/hive-currentif [ -L /usr/lib/spark-current ]then unlink /usr/lib/spark-currentfi ln -s /opt/apps/spark-current /usr/lib/spark-currentif [ -L /usr/lib/tez-current ]then unlink /usr/lib/tez-currentfi ln -s /opt/apps/tez-current /usr/lib/tez-currentecho "Start to copy conf from $masterip to local gateway(/etc/ecm)"sshpass -f $masterpwdfile scp -r root@$masterip:/etc/ecm/hadoop-conf /etc/ecm/hadoop-conf sshpass -f $masterpwdfile scp -r root@$masterip:/etc/ecm/hive-conf /etc/ecm/hive-conf sshpass -f $masterpwdfile scp -r root@$masterip:/etc/ecm/spark-conf /etc/ecm/spark-conf sshpass -f $masterpwdfile scp -r root@$masterip:/etc/ecm/tez-conf /etc/ecm/tez-conf echo "Start to copy environment from $masterip to local gateway(/etc/profile.d)"sshpass -f $masterpwdfile scp root@$masterip:/etc/profile.d/hdfs.sh /etc/profile.d/ sshpass -f $masterpwdfile scp root@$masterip:/etc/profile.d/yarn.sh /etc/profile.d/ sshpass -f $masterpwdfile scp root@$masterip:/etc/profile.d/hive.sh /etc/profile.d/ sshpass -f $masterpwdfile scp root@$masterip:/etc/profile.d/spark.sh /etc/profile.d/ sshpass -f $masterpwdfile scp root@$masterip:/etc/profile.d/tez.sh /etc/profile.d/if [ -L /usr/lib/jvm/java ]then unlink /usr/lib/jvm/javafi echo "" >>/etc/profile.d/hdfs.sh echo export JAVA_HOME=/usr/lib/jvm/jre-1.8.0 >>/etc/profile.d/hdfs.sh echo "Start to copy host info from $masterip to local gateway(/etc/hosts)"sshpass -f $masterpwdfile scp root@$masterip:/etc/hosts /etc/hosts_bak cat /etc/hosts_bak | grep emr | grep cluster >>/etc/hostsif ! id hadoop >& /dev/nullthen useradd hadoop fi
E-MapReduce 2.7 以下版本,3.2以下版本
需要创建一个脚本,脚本内容如下所示,然后在 Gataway 节点上执行。执行命令为:
sh deploy.sh <masteri_ip> master_password_file
deploy.sh:脚本名称,内容见下面代码
masteri_ip:集群的 master 节点的 IP,请确保可以访问
master_password_file:保存 master 节点的密码文件,将 master 节点的密码直接写在文件内即可
!/usr/bin/bash if [ $# != 2 ] then echo "Usage: $0 master_ip master_password_file" exit 1; fi masterip=$1 masterpwdfile=$2 if ! type sshpass >/dev/null 2>&1; then yum install -y sshpass fi if ! type java >/dev/null 2>&1; then yum install -y java-1.8.0-openjdk fi mkdir -p /opt/apps mkdir -p /etc/emr echo "Start to copy package from $masterip to local gateway(/opt/apps)" echo " -copying hadoop-2.7.2" sshpass -f $masterpwdfile scp -r -o 'StrictHostKeyChecking no' root@$masterip:/usr/lib/hadoop-current /opt/apps/ echo " -copying hive-2.0.1"sshpass -f $masterpwdfile scp -r root@$masterip:/usr/lib/hive-current /opt/apps/echo " -copying spark-2.1.1"sshpass -f $masterpwdfile scp -r root@$masterip:/usr/lib/spark-current /opt/apps/echo "Start to link /usr/lib/\${app}-current to /opt/apps/\${app}"if [ -L /usr/lib/hadoop-current ]then unlink /usr/lib/hadoop-currentfi ln -s /opt/apps/hadoop-current /usr/lib/hadoop-currentif [ -L /usr/lib/hive-current ]then unlink /usr/lib/hive-currentfi ln -s /opt/apps/hive-current /usr/lib/hive-currentif [ -L /usr/lib/spark-current ]then unlink /usr/lib/spark-currentfi ln -s /opt/apps/spark-current /usr/lib/spark-currentecho "Start to copy conf from $masterip to local gateway(/etc/emr)"sshpass -f $masterpwdfile scp -r root@$masterip:/etc/emr/hadoop-conf /etc/emr/hadoop-conf sshpass -f $masterpwdfile scp -r root@$masterip:/etc/emr/hive-conf /etc/emr/hive-conf sshpass -f $masterpwdfile scp -r root@$masterip:/etc/emr/spark-conf /etc/emr/spark-conf echo "Start to copy environment from $masterip to local gateway(/etc/profile.d)"sshpass -f $masterpwdfile scp root@$masterip:/etc/profile.d/hadoop.sh /etc/profile.d/if [ -L /usr/lib/jvm/java ]then unlink /usr/lib/jvm/javafi ln -s /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.131-3.b12.el7_3.x86_64/jre /usr/lib/jvm/javaecho "Start to copy host info from $masterip to local gateway(/etc/hosts)"sshpass -f $masterpwdfile scp root@$masterip:/etc/hosts /etc/hosts_bak cat /etc/hosts_bak | grep emr | grep cluster >>/etc/hostsif ! id hadoop >& /dev/nullthen useradd hadoop fi
测试
Hive
[hadoop@iZ23bc05hrvZ ~]$ hive hive> show databases; OKdefaultTime taken: 1.124 seconds, Fetched: 1 row(s) hive> create database school; OK Time taken: 0.362 seconds hive>
运行 Hadoop 作业
[hadoop@iZ23bc05hrvZ ~]$ hadoop jar /usr/lib/hadoop-current/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 10 10Number of Maps = 10Samples per Map = 10Wrote input for Map #0Wrote input for Map #1Wrote input for Map #2Wrote input for Map #3Wrote input for Map #4Wrote input for Map #5Wrote input for Map #6Wrote input for Map #7Wrote input for Map #8Wrote input for Map #9 File Input Format Counters Bytes Read=1180 File Output Format Counters Bytes Written=97Job Finished in 29.798 seconds Estimated value of Pi is 3.20000000000000000000