腾讯云设备搭建简单的HADOOP

环境说明

本环境有三台机器。其中master(10.0.0.2),slave1(10.0.0.3),slave2(10.0.0.4)。

一、初始化hadoop环境

1. 创建hadoop帐号
代码语言:txt
复制
useradd -d /data/hadoop -u 600 -g root hadoop

#修改hadoop的密码

passwd hadoop

2.修改主机名称
  • 将主机名称改成master,从机依次改成slave1,slave2.

vi /etc/hostname

代码语言:txt
复制
maste

注意:如果是slave1,则此处填写slave1,如果是slave2,则填写slave2

  • 修改hosts的配置,其他机器复制此配置

vi /etc/hosts

代码语言:txt
复制
10.0.0.2 maste

10.0.0.3 slave1

10.0.0.4 slave2

127.0.0.1 localhost localhost.localdomain localhost

  • 修改network的配置

vi /etc/sysconfig/network

代码语言:txt
复制
# Created by anaconda

NETWORKING=yes

NETWORKING_IPV6=no

HOSTNAME=maste

  • 重启master,slave1,slave2,使配置生效
3. 设置面密码登录
  • 生成密钥对

执行ssh-keygen -t rsa命令。一直按enter建进入即可

  • 将公钥复制到slave1,slave2

执行ssh-copy-id -i ~/.ssh/id_rsa.pub slave1

二、安装java

1.到oracle官网下载java
2.将java文件解压到安装目录
代码语言:txt
复制
tar -xzvf jdk-8u91-linux-x64.tar.gz
3.设置java的环境变量
代码语言:txt
复制
vi ~/.bash_profile

.bash_profile 文件内容如下所示

代码语言:txt
复制
# .bash_profile

Get the aliases and functions

if [ -f ~/.bashrc ]; then

    . ~/.bashrc

fi

User specific environment and startup programs

PATH=PATH:HOME/.local/bin:$HOME/bin:/data/hadoop/hadoop-2.6.4/share/

export PATH

JAVA_HOME=/data/hadoop/jdk1.8.0_91

CLASSPATH=.:$JAVA_HOME/lib

PATH=JAVA\_HOME/bin:PATH

export JAVA_HOME CLASSPATH PATH

注意:

  1. 在CLASSPATH 前面必须有.:这个目录,否则会出现【找不到或无法加载主类】的报错

三、安装hadoop

1. 下载hadoop。我们选择hadoop-2.6.4.tar.gz文件,此文件是编译后版本,直接解压后即可。
2. 将文件解压到安装到目录
代码语言:txt
复制
tar -xzvf hadoop-2.6.4.tar.gz

3.设置环境变量

vi ~/.bashrc

代码语言:txt
复制
# .bashrc

Source global definitions

if [ -f /etc/bashrc ]; then

    . /etc/bashrc

fi

Uncomment the following line if you don't like systemctl's auto-paging feature:

export SYSTEMD_PAGER=

User specific aliases and functions

export HADOOP_PREFIX=$HOME/hadoop-2.6.4

export HADOOP_COMMON_HOME=$HADOOP_PREFIX

export HADOOP_HDFS_HOME=$HADOOP_PREFIX

export HADOOP_MAPRED_HOME=$HADOOP_PREFIX

export HADOOP_YARH_HOME=$HADOOP_PREFIX

export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop

export PATH=PATH:HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin

source ~/.bashrc 使配置文件生效

4.修改hadoop配置文件
  • 进入配置文件目录

cd /data/hadoop/hadoop-2.6.4/etc/hadoop

  • 修改hadoop-env.sh ,在JAVA_HOME目录下面添加java路径
代码语言:txt
复制
# distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License.

Set Hadoop-specific environment variables here.

The only required environment variable is JAVA_HOME. All others are

optional. When running a distributed configuration it is best to

set JAVA_HOME in this file, so that it is correctly defined on

remote nodes.

The java implementation to use.

export JAVA_HOME=/data/hadoop/jdk1.8.0_91

  • 将slave注册。
代码语言:txt
复制
#localhost

slave1

slave2

  • 修改core-site.xml
代码语言:txt
复制
<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

&lt;name&gt;fs.defaultFS&lt;/name&gt;

&lt;value&gt;hdfs://master:9000&lt;/value&gt;

</property>

<property>

&lt;name&gt;hadoop.tmp.dir&lt;/name&gt;

&lt;value&gt;/data/hadoop/tmp/hadoop-master&lt;/value&gt;

&lt;description&gt;Abase for other temporary directories.&lt;/description&gt;

</property>

</configuration>

  • 修改hdfs-site.xml
代码语言:txt
复制
<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

   &lt;name&gt;dfs.namenode.secondary.http-address&lt;/name&gt;

   &lt;value&gt;Master:50090&lt;/value&gt;

</property>

<property>

&lt;name&gt;dfs.datanode.data.dir&lt;/name&gt;

&lt;value&gt;file:///data/hadoop/tmp/hdfs/datanode&lt;/value&gt;

</property>

<property>

&lt;name&gt;dfs.datanode.name.dir&lt;/name&gt;

&lt;value&gt;file:///data/hadoop/tmp/hdfs/namenode&lt;/value&gt;

</property>

<property>

&lt;name&gt;dfs.namenode.checkpoint.dir&lt;/name&gt;

&lt;value&gt;file:///data/hadoop/tmp/hdfs/namesecondary&lt;/value&gt;

</property>

<property>

&lt;name&gt;dfs.replication&lt;/name&gt;

&lt;value&gt;2&lt;/value&gt;

</property>

</configuration>

注意:dfs.replication说明的是节点的数量。本案例中有两个slave,因此填写数值为2

  • 修改mapred-site.xml
代码语言:txt
复制
<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

&lt;name&gt;mapreduce.framework.name&lt;/name&gt;

&lt;value&gt;yarn&lt;/value&gt;

</property>

<property>

&lt;name&gt;mapreduece.jobtracker.staging.root.dir&lt;/name&gt;

&lt;value&gt;/user&lt;/value&gt;

</property>

<property>

  &lt;name&gt;mapreduce.jobhistory.address&lt;/name&gt;

  &lt;value&gt;Master:10020&lt;/value&gt;

</property>

<property>

  &lt;name&gt;mapreduce.jobhistory.webapp.address&lt;/name&gt;

  &lt;value&gt;Master:19888&lt;/value&gt;

</property>

</configuration>

  • 修改yarn-site.xml
代码语言:txt
复制
<?xml version="1.0"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

<configuration>

<!-- Site specific YARN configuration properties -->

<property>

&lt;name&gt;yarn.nodemanager.aux-services&lt;/name&gt;

&lt;value&gt;mapreduce\_shuffle&lt;/value&gt;

</property>

<property>

&lt;name&gt;yarn.resourcemanager.hostname&lt;/name&gt;

&lt;value&gt;master&lt;/value&gt;

</property>

</configuration>

5.将hadoop用户下面的目录全部打包,复制到slave1,slave2上。

四、hadoop 启动验证

1.启动hadoop

start-all.sh 即可

2.检查是否正常

hdfs dfsadmin -report

代码语言:txt
复制
[hadoop@master hadoop]$ vi yarn-site.xml

[hadoop@master hadoop]$ hdfs dfsadmin -report

Configured Capacity: 20867301376 (19.43 GB)

Present Capacity: 16041099264 (14.94 GB)

DFS Remaining: 15645147136 (14.57 GB)

DFS Used: 395952128 (377.61 MB)

DFS Used%: 2.47%

Under replicated blocks: 0

Blocks with corrupt replicas: 0

Missing blocks: 0


Live datanodes (2):

Name: 10.0.0.3:50010 (slave1)

Hostname: slave1

Decommission Status : Normal

Configured Capacity: 10433650688 (9.72 GB)

DFS Used: 197976064 (188.80 MB)

Non DFS Used: 2413105152 (2.25 GB)

DFS Remaining: 7822569472 (7.29 GB)

DFS Used%: 1.90%

DFS Remaining%: 74.97%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Thu Jun 23 11:03:48 CST 2016

Name: 10.0.0.4:50010 (slave2)

Hostname: slave2

Decommission Status : Normal

Configured Capacity: 10433650688 (9.72 GB)

DFS Used: 197976064 (188.80 MB)

Non DFS Used: 2413096960 (2.25 GB)

DFS Remaining: 7822577664 (7.29 GB)

DFS Used%: 1.90%

DFS Remaining%: 74.97%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Thu Jun 23 11:03:49 CST 2016

说明:因为我们是两个节点,因此只要在这里看到两个节点,就说明正常