1.环境变量(/etc/profile)

export EDITOR=vim

export JAVA_HOME=/usr/local/jdk

export JRE_HOME=/usr/local/jdk/jre

export SCALA_HOME=/usr/local/scala

export HADOOP_HOME=/usr/local/hadoop

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

export HBASE_HOME=/usr/local/hbase

export HIVE_HOME=/usr/local/hive

export ZOOKEEPER_HOME=/usr/local/zookeeper

export SPARK_HOME=/usr/local/spark

export PYSPARK_PYTHON=/home/x/anaconda3/bin/python

export CLASSPATH="./:$JAVA_HOME/lib:$JRE_HOME/lib"

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin:$SPARK_HOME/bin:/usr/local/sbt/bin:/usr/local/cassandra/bin:/usr/local/kafka/bin:/usr/local/mongodb/bin:/usr/local/es/bin:$HBASE_HOME/bin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin

2.下载hadoop及spark等软件包,解压到/usr/local,并根据上面的环境变量创建软链接

例如:

sudo tar -zxf ~/hadoop-2.7.3.tar.gz -C /usr/local   # 解压到/usr/local中

cd /usr/local/

sudo ln -s  ./hadoop-2.7.3/ ./hadoop         # 将文件夹名改为hadoop

sudo chown -R x:x ./hadoop           # 修改文件权限,其中x为使用hadoop的用户及组

对于spark等其他程序,同样的解压步骤。

3.配置各软件,配置列表如下:

1)hadoop(/usr/local/hadoop/etc/hadoop)

core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://v1:9000</value>
    </property>
<property>
        <name>hadoop.tmp.dir</name>
        <value>/home/x/hadoop/tmp</value>
    </property>
</configuration>

mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
<property>
        <name>mapreduce.admin.user.env</name>
        <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
 <value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value>
</property>
</configuration>

hdfs-site.xml

<configuration>
 <property>
        <name>dfs.replication</name>
        <value>2</value>
 </property>
<property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
</property>
<property>
        <name>dfs.blocksize</name>
        <value>134217728</value>
</property>
</configuration>

yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
   <!-- 设置 resourcemanager 在哪个节点-->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>v1</value>
    </property>

<property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>24576</value>
    </property>
<property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>24576</value>
    </property>
<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
<property>
        <name>yarn.nodemanager.env-whitelist</name>
 <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ</value>
    </property><property>
        <name>yarn.nodemanager.vmem-check-enable</name>
        <value>true</value>
    </property>
   <property>
        <name>yarn.nodemanager.vmem-pmem-ratio</name>
        <value>45</value>
    </property>
<property>
        <name>yarn.nodemanager.resource.cpu-vcores</name>
        <value>40</value>
    </property>
    <!-- 每个节点的能用的vcores的个数,最好设置为你机器的核数  -->

</configuration>

workers

v1
v2
v3

2)spark(/usr/local/spark/conf)

slaves

v2
v3

4.分发和启动:

1).分发配置好的软件包到v2,v3:

scp -r apache-hhadoop-2.7.3-bin v3:/home/x/

在v2,v3上复制到/usr/local,并与v1相同地设置链接和权限

2).启动集群的alias

添加在~/.bashrc文件中:

alias es="sh /usr/local/es/bin/elasticsearch && /usr/local/logstash/bin/logstash -f /usr/local/logstash/config/logstash.conf& && /usr/local/kibana/bin/kibana&"

alias starth="start-dfs.sh&&start-yarn.sh"

alias stoph="stop-yarn.sh&&stop-dfs.sh"

alias starts="/usr/local/spark/sbin/start-all.sh"

alias stops="/usr/local/spark/sbin/stop-all.sh"

alias starthbase="/usr/local/hbase/bin/start-hbase.sh&&/usr/local/hbase/bin/hbase shell"

alias stophbase="/usr/local/hbase/bin/stop-hbase.sh"

alias starthiveserver="$HIVE_HOME/bin/hiveserver2 &"

alias starthive="$HIVE_HOME/bin/beeline -u jdbc:hive2://v1:10000"

alias sparksh="spark-shell --master yarn --num-executors 8 --driver-memory 6g --executor-memory 4g --executor-cores 4"

export PYSPARK_DRIVER_PYTHON=jupyter

export PYSPARK_DRIVER_PYTHON_OPTS="notebook"

alias pysparksh="pyspark --master yarn --num-executors 8 --driver-memory 6g --executor-memory 4g --executor-cores 4"

# added by Anaconda3 installer

export PATH="/home/x/anaconda3/bin:$PATH"

其中供pyspark使用的python环境为安装在~/anaconda3下的anaconda

3).初始化和启动集群

hdfs namenode -format#仅格式化一次

starth #启动hadoop集群

starts #启动spark集群

web访问页面:

yarn

sparkmaster

hadoop

hdfs文件操作:

hdfs dfs -mkdir /user

hdfs dfs -mkdir input

hdfs dfs -put etc/hadoop/*.xml input

hdfs dfs -get output output

cat output/*

版权声明:本文为博主原创文章,转载请注明出处。 旭日酒馆