• Posts tagged "hdfs"

Blog Archives

Hadoop编程调用HDFS

Hadoop家族系列文章,主要介绍Hadoop家族产品,常用的项目包括Hadoop, Hive, Pig, HBase, Sqoop, Mahout, Zookeeper, Avro, Ambari, Chukwa,新增加的项目包括,YARN, Hcatalog, Oozie, Cassandra, Hama, Whirr, Flume, Bigtop, Crunch, Hue等。

从2011年开始,中国进入大数据风起云涌的时代,以Hadoop为代表的家族软件,占据了大数据处理的广阔地盘。开源界及厂商,所有数据软件,无一不向Hadoop靠拢。Hadoop也从小众的高富帅领域,变成了大数据开发的标准。在Hadoop原有技术基础之上,出现了Hadoop家族产品,通过“大数据”概念不断创新,推出科技进步。

作为IT界的开发人员,我们也要跟上节奏,抓住机遇,跟着Hadoop一起雄起!

关于作者:

  • 张丹(Conan), 程序员Java,R,PHP,Javascript
  • weibo:@Conan_Z
  • blog: http://blog.fens.me
  • email: bsspirit@gmail.com

转载请注明出处:
http://blog.fens.me/hadoop-hdfs-api/

hadoop-hdfs-api

前言

HDFS 全称Hadoop分步文件系统(Hadoop Distributed File System),是Hadoop的核心部分之一。要实现MapReduce的分步式算法时,数据必需提前放在HDFS上。因此,对于HDFS的操作就变得非常重要。Hadoop的命令行,提供了一套完整命令接口,就像Linux命令一样方便使用。

不过,有时候我们还需要在程序中直接访问HDFS,我们可以通过API的方式进行HDFS操作。

目录

  1. 系统环境
  2. ls操作
  3. rmr操作
  4. mkdir操作
  5. copyFromLocal操作
  6. cat操作
  7. copyToLocal操作
  8. 创建一个新文件,并写入内容

1. 系统环境

Hadoop集群环境

  • Linux Ubuntu 64bit Server 12.04.2 LTS
  • Java 1.6.0_29
  • Hadoop 1.1.2
    • 如何搭建Hadoop集群环境? 请参考文章:Hadoop历史版本安装

      开发环境

      • Win7 64bit
      • Java 1.6.0_45
      • Maven 3
      • Hadoop 1.1.2
      • Eclipse Juno Service Release 2
        • 如何用Maven搭建Win7的Hadoop开发环境? 请参考文章:用Maven构建Hadoop项目

          注:hadoop-core-1.1.2.jar,已重新编译,已解决了Win远程调用Hadoop的问题,请参考文章:Hadoop历史版本安装

          Hadooop命令行:java FsShell

          
          ~ hadoop fs
          
          Usage: java FsShell
                     [-ls ]
                     [-lsr ]
                     [-du ]
                     [-dus ]
                     [-count[-q] ]
                     [-mv  ]
                     [-cp  ]
                     [-rm [-skipTrash] ]
                     [-rmr [-skipTrash] ]
                     [-expunge]
                     [-put  ... ]
                     [-copyFromLocal  ... ]
                     [-moveFromLocal  ... ]
                     [-get [-ignoreCrc] [-crc]  ]
                     [-getmerge   [addnl]]
                     [-cat ]
                     [-text ]
                     [-copyToLocal [-ignoreCrc] [-crc]  ]
                     [-moveToLocal [-crc]  ]
                     [-mkdir ]
                     [-setrep [-R] [-w]  ]
                     [-touchz ]
                     [-test -[ezd] ]
                     [-stat [format] ]
                     [-tail [-f] ]
                     [-chmod [-R]  PATH...]
                     [-chown [-R] [OWNER][:[GROUP]] PATH...]
                     [-chgrp [-R] GROUP PATH...]
                     [-help [cmd]]
          

          上面列出了30个命令,我只实现了一部分的HDFS的命令!

          新建文件:HdfsDAO.java,用来调用HDFS的API。

          
          public class HdfsDAO {
          
              //HDFS访问地址
              private static final String HDFS = "hdfs://192.168.1.210:9000/";
          
              public HdfsDAO(Configuration conf) {
                  this(HDFS, conf);
              }
          
              public HdfsDAO(String hdfs, Configuration conf) {
                  this.hdfsPath = hdfs;
                  this.conf = conf;
              }
          
              //hdfs路径
              private String hdfsPath;
              //Hadoop系统配置
              private Configuration conf;
          
              //启动函数
              public static void main(String[] args) throws IOException {
                  JobConf conf = config();
                  HdfsDAO hdfs = new HdfsDAO(conf);
                  hdfs.mkdirs("/tmp/new/two");
                  hdfs.ls("/tmp/new");
              }        
              
              //加载Hadoop配置文件
              public static JobConf config(){
                  JobConf conf = new JobConf(HdfsDAO.class);
                  conf.setJobName("HdfsDAO");
                  conf.addResource("classpath:/hadoop/core-site.xml");
                  conf.addResource("classpath:/hadoop/hdfs-site.xml");
                  conf.addResource("classpath:/hadoop/mapred-site.xml");
                  return conf;
              }
          
              //API实现
              public void cat(String remoteFile) throws IOException {...}
              public void mkdirs(String folder) throws IOException {...}
              
              ...
          }
          

          2. ls操作

          说明:查看目录文件

          对应Hadoop命令:

          
          ~ hadoop fs -ls /
          Found 3 items
          drwxr-xr-x   - conan         supergroup          0 2013-10-03 05:03 /home
          drwxr-xr-x   - Administrator supergroup          0 2013-10-03 13:49 /tmp
          drwxr-xr-x   - conan         supergroup          0 2013-10-03 09:11 /user
          

          Java程序:

          
              public void ls(String folder) throws IOException {
                  Path path = new Path(folder);
                  FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);
                  FileStatus[] list = fs.listStatus(path);
                  System.out.println("ls: " + folder);
                  System.out.println("==========================================================");
                  for (FileStatus f : list) {
                      System.out.printf("name: %s, folder: %s, size: %d\n", f.getPath(), f.isDir(), f.getLen());
                  }
                  System.out.println("==========================================================");
                  fs.close();
              }
          
              public static void main(String[] args) throws IOException {
                  JobConf conf = config();
                  HdfsDAO hdfs = new HdfsDAO(conf);
                  hdfs.ls("/");
              }   
          

          控制台输出:

          ls: /
          ==========================================================
          name: hdfs://192.168.1.210:9000/home, folder: true, size: 0
          name: hdfs://192.168.1.210:9000/tmp, folder: true, size: 0
          name: hdfs://192.168.1.210:9000/user, folder: true, size: 0
          ==========================================================
          

          3. mkdir操作

          说明:创建目录,可以创建多级目录

          对应Hadoop命令:

          
          ~ hadoop fs -mkdir /tmp/new/one
          ~ hadoop fs -ls /tmp/new
          Found 1 items
          drwxr-xr-x   - conan supergroup          0 2013-10-03 15:35 /tmp/new/one
          

          Java程序:

          
              public void mkdirs(String folder) throws IOException {
                  Path path = new Path(folder);
                  FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);
                  if (!fs.exists(path)) {
                      fs.mkdirs(path);
                      System.out.println("Create: " + folder);
                  }
                  fs.close();
              }
          
              public static void main(String[] args) throws IOException {
                  JobConf conf = config();
                  HdfsDAO hdfs = new HdfsDAO(conf);
                  hdfs.mkdirs("/tmp/new/two");
                  hdfs.ls("/tmp/new");
              }   
          

          控制台输出:

          
          Create: /tmp/new/two
          ls: /tmp/new
          ==========================================================
          name: hdfs://192.168.1.210:9000/tmp/new/one, folder: true, size: 0
          name: hdfs://192.168.1.210:9000/tmp/new/two, folder: true, size: 0
          ==========================================================
          

          4. rmr操作

          说明:删除目录和文件

          对应Hadoop命令:

          
          ~ hadoop fs -rmr /tmp/new/one
          Deleted hdfs://master:9000/tmp/new/one
          
          ~  hadoop fs -ls /tmp/new
          Found 1 items
          drwxr-xr-x   - Administrator supergroup          0 2013-10-03 15:38 /tmp/new/two
          

          Java程序:

          
              public void rmr(String folder) throws IOException {
                  Path path = new Path(folder);
                  FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);
                  fs.deleteOnExit(path);
                  System.out.println("Delete: " + folder);
                  fs.close();
              }
          
              public static void main(String[] args) throws IOException {
                  JobConf conf = config();
                  HdfsDAO hdfs = new HdfsDAO(conf);
                  hdfs.rmr("/tmp/new/two");
                  hdfs.ls("/tmp/new");
              }     
          

          控制台输出:

          
          Delete: /tmp/new/two
          ls: /tmp/new
          ==========================================================
          ==========================================================
          

          5. copyFromLocal操作

          说明:复制本地文件系统到HDFS

          对应Hadoop命令:

          
          ~ hadoop fs -copyFromLocal /home/conan/datafiles/item.csv /tmp/new/
          
          ~ hadoop fs -ls /tmp/new/
          Found 1 items
          -rw-r--r--   1 conan supergroup        210 2013-10-03 16:07 /tmp/new/item.csv
          

          Java程序:

          
              public void copyFile(String local, String remote) throws IOException {
                  FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);
                  fs.copyFromLocalFile(new Path(local), new Path(remote));
                  System.out.println("copy from: " + local + " to " + remote);
                  fs.close();
              }
          
              public static void main(String[] args) throws IOException {
                  JobConf conf = config();
                  HdfsDAO hdfs = new HdfsDAO(conf);
                  hdfs.copyFile("datafile/randomData.csv", "/tmp/new");
                  hdfs.ls("/tmp/new");
              }    
          

          控制台输出:

          
          copy from: datafile/randomData.csv to /tmp/new
          ls: /tmp/new
          ==========================================================
          name: hdfs://192.168.1.210:9000/tmp/new/item.csv, folder: false, size: 210
          name: hdfs://192.168.1.210:9000/tmp/new/randomData.csv, folder: false, size: 36655
          ==========================================================
          

          6. cat操作

          说明:查看文件内容

          对应Hadoop命令:

          
          ~ hadoop fs -cat /tmp/new/item.csv
          1,101,5.0
          1,102,3.0
          1,103,2.5
          2,101,2.0
          2,102,2.5
          2,103,5.0
          2,104,2.0
          3,101,2.5
          3,104,4.0
          3,105,4.5
          3,107,5.0
          4,101,5.0
          4,103,3.0
          4,104,4.5
          4,106,4.0
          5,101,4.0
          5,102,3.0
          5,103,2.0
          5,104,4.0
          5,105,3.5
          5,106,4.0
          

          Java程序:

          
              public void cat(String remoteFile) throws IOException {
                  Path path = new Path(remoteFile);
                  FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);
                  FSDataInputStream fsdis = null;
                  System.out.println("cat: " + remoteFile);
                  try {  
                      fsdis =fs.open(path);
                      IOUtils.copyBytes(fsdis, System.out, 4096, false);  
                    } finally {  
                      IOUtils.closeStream(fsdis);
                      fs.close();
                    }
              }
          
              public static void main(String[] args) throws IOException {
                  JobConf conf = config();
                  HdfsDAO hdfs = new HdfsDAO(conf);
                  hdfs.cat("/tmp/new/item.csv");
              } 
          

          控制台输出:

          
          cat: /tmp/new/item.csv
          1,101,5.0
          1,102,3.0
          1,103,2.5
          2,101,2.0
          2,102,2.5
          2,103,5.0
          2,104,2.0
          3,101,2.5
          3,104,4.0
          3,105,4.5
          3,107,5.0
          4,101,5.0
          4,103,3.0
          4,104,4.5
          4,106,4.0
          5,101,4.0
          5,102,3.0
          5,103,2.0
          5,104,4.0
          5,105,3.5
          5,106,4.0
          

          7. copyToLocal操作

          说明:从HDFS复制文件在本地操作系

          对应Hadoop命令:

          
          ~ hadoop fs -copyToLocal /tmp/new/item.csv /home/conan/datafiles/tmp/
          
          ~ ls -l /home/conan/datafiles/tmp/
          -rw-rw-r-- 1 conan conan 210 Oct  3 16:16 item.csv
          

          Java程序:

          
              public void download(String remote, String local) throws IOException {
                  Path path = new Path(remote);
                  FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);
                  fs.copyToLocalFile(path, new Path(local));
                  System.out.println("download: from" + remote + " to " + local);
                  fs.close();
              }
          
              public static void main(String[] args) throws IOException {
                  JobConf conf = config();
                  HdfsDAO hdfs = new HdfsDAO(conf);
                  hdfs.download("/tmp/new/item.csv", "datafile/download");
                  
                  File f = new File("datafile/download/item.csv");
                  System.out.println(f.getAbsolutePath());
              }    
          

          控制台输出:

          
          2013-10-12 17:17:32 org.apache.hadoop.util.NativeCodeLoader 
          警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
          download: from/tmp/new/item.csv to datafile/download
          D:\workspace\java\myMahout\datafile\download\item.csv
          

          8. 创建一个新文件,并写入内容

          说明:创建一个新文件,并写入内容。

          • touchz:可以用来创建一个新文件,或者修改文件的时间戳。
          • 写入内容没有对应命令。

          对应Hadoop命令:

          
          ~ hadoop fs -touchz /tmp/new/empty
          
          ~ hadoop fs -ls /tmp/new
          Found 3 items
          -rw-r--r--   1 conan         supergroup          0 2013-10-03 16:24 /tmp/new/empty
          -rw-r--r--   1 conan         supergroup        210 2013-10-03 16:07 /tmp/new/item.csv
          -rw-r--r--   3 Administrator supergroup      36655 2013-10-03 16:09 /tmp/new/randomData.csv
          
          ~ hadoop fs -cat /tmp/new/empty
          

          Java程序:

          
              public void createFile(String file, String content) throws IOException {
                  FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);
                  byte[] buff = content.getBytes();
                  FSDataOutputStream os = null;
                  try {
                      os = fs.create(new Path(file));
                      os.write(buff, 0, buff.length);
                      System.out.println("Create: " + file);
                  } finally {
                      if (os != null)
                          os.close();
                  }
                  fs.close();
              }
          
              public static void main(String[] args) throws IOException {
                  JobConf conf = config();
                  HdfsDAO hdfs = new HdfsDAO(conf);
                  hdfs.createFile("/tmp/new/text", "Hello world!!");
                  hdfs.cat("/tmp/new/text");
              }   
          

          控制台输出:

          
          Create: /tmp/new/text
          cat: /tmp/new/text
          Hello world!!
          

          完整的文件:HdfsDAO.java
          https://github.com/bsspirit/maven_mahout_template/blob/mahout-0.8/src/main/java/org/conan/mymahout/hdfs/HdfsDAO.java

          转载请注明出处:
          http://blog.fens.me/hadoop-hdfs-api/

          打赏作者

Hadoop历史版本安装

Hadoop家族系列文章,主要介绍Hadoop家族产品,常用的项目包括Hadoop, Hive, Pig, HBase, Sqoop, Mahout, Zookeeper, Avro, Ambari, Chukwa,新增加的项目包括,YARN, Hcatalog, Oozie, Cassandra, Hama, Whirr, Flume, Bigtop, Crunch, Hue等。

从2011年开始,中国进入大数据风起云涌的时代,以Hadoop为代表的家族软件,占据了大数据处理的广阔地盘。开源界及厂商,所有数据软件,无一不向Hadoop靠拢。Hadoop也从小众的高富帅领域,变成了大数据开发的标准。在Hadoop原有技术基础之上,出现了Hadoop家族产品,通过“大数据”概念不断创新,推出科技进步。

作为IT界的开发人员,我们也要跟上节奏,抓住机遇,跟着Hadoop一起雄起!

关于作者:

  • 张丹(Conan), 程序员Java,R,PHP,Javascript
  • weibo:@Conan_Z
  • blog: http://blog.fens.me
  • email: bsspirit@gmail.com

转载请注明出处:
http://blog.fens.me/hadoop-history-source-install/

hadoop-history

前言

介绍Hadoop安装的文章,已经写过2篇了,老生常谈的话题又被拿出来了。这次要重新安装Hadoop-1.1.2的历史版本,来满足Mahout-0.8版本的依赖要求。本来只想简单说几句,不过遇到了几个小问题,因此写篇文章总结一下吧。

Hadoop安装的其他文章:

目录

  1. 找到Hadoop历史版本
  2. 用源代码构建Hadoop环境
  3. 快速Hadoop配置环境脚本
  4. 为win环境编译hadoop-core.jar

1. 找到Hadoop历史版本

这里我需要的Hadoop版本是1.1.2。打开hadoop的下载页面

http://www.apache.org/dyn/closer.cgi/hadoop/common/

随便打开一个下载镜像,我们都找不到1.1.2这个版本。

hadoop-download

看看github上的情况

hadoop-github

找到release-1.1.2

https://github.com/apache/hadoop-common/releases/tag/release-1.1.2

下载源代码:

~ wget https://github.com/apache/hadoop-common/archive/release-1.1.2.tar.gz

查看branch-1.1

https://github.com/apache/hadoop-common/tree/branch-1.1

2. 用源代码构建Hadoop环境

注:github上面最新hadoop-3.0.0-SNAPSHOT,包结构已经完全改变,用Maven代替了Ant+Ivy的构建过程。

Linux系统环境:

  • Linux Ubuntu 64bit Server 12.04.2 LTS
  • Java 1.6.0_29
  • Ant 1.8.4

安装Hadoop


~ tar xvf release-1.1.2.tar.gz
~ mkdir /home/conan/hadoop/
~ mv hadoop-common-release-1.1.2 /home/conan/hadoop/
~ cd /home/conan/hadoop/
~ mv hadoop-common-release-1.1.2/ hadoop-1.1.2

查看hadoop目录


~ ls -l
total 648
drwxrwxr-x  2 conan conan   4096 Mar  6  2013 bin
-rw-rw-r--  1 conan conan 120025 Mar  6  2013 build.xml
-rw-rw-r--  1 conan conan 467130 Mar  6  2013 CHANGES.txt
drwxrwxr-x  2 conan conan   4096 Oct  3 02:31 conf
drwxrwxr-x  2 conan conan   4096 Oct  3 02:28 ivy
-rw-rw-r--  1 conan conan  10525 Mar  6  2013 ivy.xml
drwxrwxr-x  4 conan conan   4096 Mar  6  2013 lib
-rw-rw-r--  1 conan conan  13366 Mar  6  2013 LICENSE.txt
drwxrwxr-x  2 conan conan   4096 Oct  3 03:35 logs
-rw-rw-r--  1 conan conan    101 Mar  6  2013 NOTICE.txt
-rw-rw-r--  1 conan conan   1366 Mar  6  2013 README.txt
-rw-rw-r--  1 conan conan   7815 Mar  6  2013 sample-conf.tgz
drwxrwxr-x 16 conan conan   4096 Mar  6  2013 src

在根目录下面,没有hadoop-*.jar的各种类库,也没有依赖库。

执行Ant进行编译

从Ant官方网站下载Ant, http://ant.apache.org/bindownload.cgi


~ wget http://archive.apache.org/dist/ant/binaries/apache-ant-1.8.4-bin.tar.gz
~ tar xvf apache-ant-1.8.4-bin.tar.gz
~ mkdir /home/conan/toolkit/
~ mv apache-ant-1.8.4 /home/conan/toolkit/
~ cd /home/conan/toolkit/
~ mv apache-ant-1.8.4 ant184

设置ant到环境变量


# 编辑environment文件
~  sudo vi /etc/environment

PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/home/conan/toolkit/jdk16/bin:/home/conan/toolkit/ant184/bin"

JAVA_HOME=/home/conan/toolkit/jdk16
ANT_HOME=/home/conan/toolkit/ant184

#让环境变量生效
~ . /etc/environment

#运行ant命令,检查环境变量是否配置成功
~ ant
Buildfile: build.xml does not exist!
Build failed

用Ant编译Hadoop


# 安装编译用的库
~ sudo apt-get install autoconf
~ sudo apt-get install libtool

# 用Ant编译Hadoop
~ cd /home/conan/hadoop/hadoop-1.1.2/
~ ant

要等几分钟,然后build成功。

查看生成的build目录


~ ls -l build
drwxrwxr-x  3 conan conan    4096 Oct  3 04:06 ant
drwxrwxr-x  2 conan conan    4096 Oct  3 04:02 c++
drwxrwxr-x  3 conan conan    4096 Oct  3 04:05 classes
drwxrwxr-x 13 conan conan    4096 Oct  3 04:06 contrib
drwxrwxr-x  2 conan conan    4096 Oct  3 04:05 empty
drwxrwxr-x  2 conan conan    4096 Oct  3 04:02 examples
-rw-rw-r--  1 conan conan     423 Oct  3 04:05 hadoop-client-1.1.3-SNAPSHOT.jar
-rw-rw-r--  1 conan conan 4035744 Oct  3 04:05 hadoop-core-1.1.3-SNAPSHOT.jar
-rw-rw-r--  1 conan conan     426 Oct  3 04:05 hadoop-minicluster-1.1.3-SNAPSHOT.jar
-rw-rw-r--  1 conan conan  306827 Oct  3 04:05 hadoop-tools-1.1.3-SNAPSHOT.jar
drwxrwxr-x  4 conan conan    4096 Oct  3 04:02 ivy
drwxrwxr-x  3 conan conan    4096 Oct  3 04:02 src
drwxrwxr-x  6 conan conan    4096 Oct  3 04:02 test
drwxrwxr-x  3 conan conan    4096 Oct  3 04:05 tools
drwxrwxr-x  9 conan conan    4096 Oct  3 04:02 webapps

发现hadoop-*.jar都是hadoop-*-1.1.3-SNAPSHOT.jar结尾的,合理的解释就是:上一个版本的release就是下一个版本的SNAPSHOT。

我们修改一下build.xml文件31行,把包名改成hadoop-*-1.1.2.jar,再重新ant


~ vi build.xml

~ rm -rf build
~ ant

把生成的hadoop-*-1.1.2.jar复制到根目录


~ cp build/*.jar .
~ ls -l
drwxrwxr-x  2 conan conan    4096 Oct  3 04:24 bin
drwxrwxr-x 13 conan conan    4096 Oct  3 04:21 build
-rw-rw-r--  1 conan conan  120016 Oct  3 04:20 build.xml
-rw-rw-r--  1 conan conan  467130 Mar  6  2013 CHANGES.txt
drwxrwxr-x  2 conan conan    4096 Oct  3 02:31 conf
-rw-rw-r--  1 conan conan     414 Oct  3 04:24 hadoop-client-1.1.2.jar
-rw-rw-r--  1 conan conan 4035726 Oct  3 04:24 hadoop-core-1.1.2.jar
-rw-rw-r--  1 conan conan     417 Oct  3 04:24 hadoop-minicluster-1.1.2.jar
-rw-rw-r--  1 conan conan  306827 Oct  3 04:24 hadoop-tools-1.1.2.jar
drwxrwxr-x  2 conan conan    4096 Oct  3 02:28 ivy
-rw-rw-r--  1 conan conan   10525 Mar  6  2013 ivy.xml
drwxrwxr-x  4 conan conan    4096 Mar  6  2013 lib
-rw-rw-r--  1 conan conan   13366 Mar  6  2013 LICENSE.txt
drwxrwxr-x  3 conan conan    4096 Oct  3 04:28 logs
-rw-rw-r--  1 conan conan     101 Mar  6  2013 NOTICE.txt
-rw-rw-r--  1 conan conan    1366 Mar  6  2013 README.txt
-rw-rw-r--  1 conan conan    7815 Mar  6  2013 sample-conf.tgz
drwxrwxr-x 16 conan conan    4096 Mar  6  2013 src

这个就生成好了,hadoop环境,和我们之前下载的就一样了。

3. 快速Hadoop配置环境脚本

  • 1. 配置环境变量
  • 2. Hadoop的3个配置文件
  • 3. 创建Hadoop目录
  • 4. 配置hostname和hosts
  • 5. 生成SSH免登陆
  • 6. 第一次启动格式化HDFS
  • 7. 启动Hadoop的服务

下面的脚本要写在一起


~ sudo vi /etc/environment
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/home/conan/toolkit/jdk16/bin:/home/conan/toolkit/ant184/bin:/home/conan/toolkit/maven3/bin:/home/conan/toolkit/tomcat7/bin:/home/conan/hadoop/hadoop-1.1.2/bin"
JAVA_HOME=/home/conan/toolkit/jdk16
ANT_HOME=/home/conan/toolkit/ant184
MAVEN_HOME=/home/conan/toolkit/maven3
NUTCH_HOME=/home/conan/toolkit/nutch16
TOMCAT_HOME=/home/conan/toolkit/tomcat7
HADOOP_HOME=/home/conan/hadoop/hadoop-1.1.2
HADOOP_CMD=/home/conan/hadoop/hadoop-1.1.2/bin/hadoop
HADOOP_STREAMING=/home/conan/hadoop/hadoop-1.1.2/contrib/streaming/hadoop-streaming-1.1.2.jar

~ . /etc/environment

~ vi conf/core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/conan/hadoop/tmp</value>
</property>
<property>
<name>io.sort.mb</name>
<value>256</value>
</property>
</configuration>

~ vi conf/hdfs-site.xml
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/home/conan/hadoop/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>

~ vi conf/mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://master:9001</value>
</property>
</configuration>

~ mkdir /home/conan/hadoop/data
~ mkdir /home/conan/hadoop/tmp
~ sudo chmod 755 /home/conan/hadoop/data/
~ sudo chmod 755 /home/conan/hadoop/tmp/

~ sudo hostname master
~ sudo vi /etc/hosts
192.168.1.210   master
127.0.0.1       localhost

~ ssh-keygen -t rsa
~ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

~ bin/hadoop namenode -format
~ bin/start-all.sh

检查hadoop运行状态


~ jps
15574 DataNode
16324 Jps
15858 SecondaryNameNode
16241 TaskTracker
15283 NameNode
15942 JobTracker

~ bin/hadoop dfsadmin -report
Configured Capacity: 18751434752 (17.46 GB)
Present Capacity: 14577520655 (13.58 GB)
DFS Remaining: 14577491968 (13.58 GB)
DFS Used: 28687 (28.01 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)

Name: 192.168.1.210:50010
Decommission Status : Normal
Configured Capacity: 18751434752 (17.46 GB)
DFS Used: 28687 (28.01 KB)
Non DFS Used: 4173914097 (3.89 GB)
DFS Remaining: 14577491968(13.58 GB)
DFS Used%: 0%
DFS Remaining%: 77.74%
Last contact: Thu Oct 03 05:03:50 CST 2013

4. 为win环境编译hadoop-core.jar

在win环境开发中,经常会遇到一个错误权限检查的异常,我们需要修改FileUtil.java文件,注释688行–693行。

hadoop-win-code


~ vi src/core/org/apache/hadoop/fs/FileUtil.java

685 private static void checkReturnValue(boolean rv, File p,
686                                        FsPermission permission
687                                        ) throws IOException {
688   /*  if (!rv) {
689       throw new IOException("Failed to set permissions of path: " + p +
690                             " to " +
691                             String.format("%04o", permission.toShort()));
692     }
693   */
694   }

用ant重新打包后,就生成了可以在win运行的hadoop-core-1.1.2.jar!使用方法可以参考文章:用Maven构建Hadoop项目,我们后面一直会用到!

转载请注明出处:
http://blog.fens.me/hadoop-history-source-install/

打赏作者