• Archive by category "Dataguru作业"

Blog Archives

海量Web日志分析 用Hadoop提取KPI统计指标

Hadoop家族系列文章,主要介绍Hadoop家族产品,常用的项目包括Hadoop, Hive, Pig, HBase, Sqoop, Mahout, Zookeeper, Avro, Ambari, Chukwa,新增加的项目包括,YARN, Hcatalog, Oozie, Cassandra, Hama, Whirr, Flume, Bigtop, Crunch, Hue等。

从2011年开始,中国进入大数据风起云涌的时代,以Hadoop为代表的家族软件,占据了大数据处理的广阔地盘。开源界及厂商,所有数据软件,无一不向Hadoop靠拢。Hadoop也从小众的高富帅领域,变成了大数据开发的标准。在Hadoop原有技术基础之上,出现了Hadoop家族产品,通过“大数据”概念不断创新,推出科技进步。

作为IT界的开发人员,我们也要跟上节奏,抓住机遇,跟着Hadoop一起雄起!

关于作者:

  • 张丹(Conan), 程序员Java,R,PHP,Javascript
  • weibo:@Conan_Z
  • blog: http://blog.fens.me
  • email: bsspirit@gmail.com

转载请注明出处:
http://blog.fens.me/hadoop-mapreduce-log-kpi/

hadoop-kpi-logo

前言

Web日志包含着网站最重要的信息,通过日志分析,我们可以知道网站的访问量,哪个网页访问人数最多,哪个网页最有价值等。一般中型的网站(10W的PV以上),每天会产生1G以上Web日志文件。大型或超大型的网站,可能每小时就会产生10G的数据量。

对于日志的这种规模的数据,用Hadoop进行日志分析,是最适合不过的了。

目录

  1. Web日志分析概述
  2. 需求分析:KPI指标设计
  3. 算法模型:Hadoop并行算法
  4. 架构设计:日志KPI系统架构
  5. 程序开发1:用Maven构建Hadoop项目
  6. 程序开发2:MapReduce程序实现

1. Web日志分析概述

Web日志由Web服务器产生,可能是Nginx, Apache, Tomcat等。从Web日志中,我们可以获取网站每类页面的PV值(PageView,页面访问量)、独立IP数;稍微复杂一些的,可以计算得出用户所检索的关键词排行榜、用户停留时间最高的页面等;更复杂的,构建广告点击模型、分析用户行为特征等等。

在Web日志中,每条日志通常代表着用户的一次访问行为,例如下面就是一条nginx日志:


222.68.172.190 - - [18/Sep/2013:06:49:57 +0000] "GET /images/my.jpg HTTP/1.1" 200 19939
 "http://www.angularjs.cn/A00n" "Mozilla/5.0 (Windows NT 6.1)
 AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66 Safari/537.36"

拆解为以下8个变量

  • remote_addr: 记录客户端的ip地址, 222.68.172.190
  • remote_user: 记录客户端用户名称, –
  • time_local: 记录访问时间与时区, [18/Sep/2013:06:49:57 +0000]
  • request: 记录请求的url与http协议, “GET /images/my.jpg HTTP/1.1”
  • status: 记录请求状态,成功是200, 200
  • body_bytes_sent: 记录发送给客户端文件主体内容大小, 19939
  • http_referer: 用来记录从那个页面链接访问过来的, “http://www.angularjs.cn/A00n”
  • http_user_agent: 记录客户浏览器的相关信息, “Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66 Safari/537.36”

注:要更多的信息,则要用其它手段去获取,通过js代码单独发送请求,使用cookies记录用户的访问信息。

利用这些日志信息,我们可以深入挖掘网站的秘密了。

少量数据的情况

少量数据的情况(10Mb,100Mb,10G),在单机处理尚能忍受的时候,我可以直接利用各种Unix/Linux工具,awk、grep、sort、join等都是日志分析的利器,再配合perl, python,正则表达工,基本就可以解决所有的问题。

例如,我们想从上面提到的nginx日志中得到访问量最高前10个IP,实现很简单:


~ cat access.log.10 | awk '{a[$1]++} END {for(b in a) print b"\t"a[b]}' | sort -k2 -r | head -n 10
163.177.71.12   972
101.226.68.137  972
183.195.232.138 971
50.116.27.194   97
14.17.29.86     96
61.135.216.104  94
61.135.216.105  91
61.186.190.41   9
59.39.192.108   9
220.181.51.212  9

海量数据的情况

当数据量每天以10G、100G增长的时候,单机处理能力已经不能满足需求。我们就需要增加系统的复杂性,用计算机集群,存储阵列来解决。在Hadoop出现之前,海量数据存储,和海量日志分析都是非常困难的。只有少数一些公司,掌握着高效的并行计算,分步式计算,分步式存储的核心技术。

Hadoop的出现,大幅度的降低了海量数据处理的门槛,让小公司甚至是个人都能力,搞定海量数据。并且,Hadoop非常适用于日志分析系统。

2.需求分析:KPI指标设计

下面我们将从一个公司案例出发来全面的解释,如何用进行海量Web日志分析,提取KPI数据

案例介绍
某电子商务网站,在线团购业务。每日PV数100w,独立IP数5w。用户通常在工作日上午10:00-12:00和下午15:00-18:00访问量最大。日间主要是通过PC端浏览器访问,休息日及夜间通过移动设备访问较多。网站搜索浏量占整个网站的80%,PC用户不足1%的用户会消费,移动用户有5%会消费。

通过简短的描述,我们可以粗略地看出,这家电商网站的经营状况,并认识到愿意消费的用户从哪里来,有哪些潜在的用户可以挖掘,网站是否存在倒闭风险等。

KPI指标设计

  • PV(PageView): 页面访问量统计
  • IP: 页面独立IP的访问量统计
  • Time: 用户每小时PV的统计
  • Source: 用户来源域名的统计
  • Browser: 用户的访问设备统计

注:商业保密限制,无法提供电商网站的日志。
下面的内容,将以我的个人网站为例提取数据进行分析。

百度统计,对我个人网站做的统计!http://www.fens.me

基本统计指标:
hadoop-kpi-baidu

用户的访问设备统计指标:
hadoop-kpi-baidu2

从商业的角度,个人网站的特征与电商网站不太一样,没有转化率,同时跳出率也比较高。从技术的角度,同样都关注KPI指标设计。

3.算法模型:Hadoop并行算法

hadoop-kpi-log

并行算法的设计:
注:找到第一节有定义的8个变量

PV(PageView): 页面访问量统计

  • Map过程{key:$request,value:1}
  • Reduce过程{key:$request,value:求和(sum)}

IP: 页面独立IP的访问量统计

  • Map: {key:$request,value:$remote_addr}
  • Reduce: {key:$request,value:去重再求和(sum(unique))}

Time: 用户每小时PV的统计

  • Map: {key:$time_local,value:1}
  • Reduce: {key:$time_local,value:求和(sum)}

Source: 用户来源域名的统计

  • Map: {key:$http_referer,value:1}
  • Reduce: {key:$http_referer,value:求和(sum)}

Browser: 用户的访问设备统计

  • Map: {key:$http_user_agent,value:1}
  • Reduce: {key:$http_user_agent,value:求和(sum)}

4.架构设计:日志KPI系统架构

hadoop-kpi-architect

上图中,左边是Application业务系统,右边是Hadoop的HDFS, MapReduce。

  1. 日志是由业务系统产生的,我们可以设置web服务器每天产生一个新的目录,目录下面会产生多个日志文件,每个日志文件64M。
  2. 设置系统定时器CRON,夜间在0点后,向HDFS导入昨天的日志文件。
  3. 完成导入后,设置系统定时器,启动MapReduce程序,提取并计算统计指标。
  4. 完成计算后,设置系统定时器,从HDFS导出统计指标数据到数据库,方便以后的即使查询。

hadoop-kpi-process

上面这幅图,我们可以看得更清楚,数据是如何流动的。蓝色背景的部分是在Hadoop中的,接下来我们的任务就是完成MapReduce的程序实现。

5.程序开发1:用Maven构建Hadoop项目

请参考文章:用Maven构建Hadoop项目

win7的开发环境 和 Hadoop的运行环境 ,在上面文章中已经介绍过了。

我们需要放日志文件,上传的HDFS里/user/hdfs/log_kpi/目录,参考下面的命令操作


~ hadoop fs -mkdir /user/hdfs/log_kpi
~ hadoop fs -copyFromLocal /home/conan/datafiles/access.log.10 /user/hdfs/log_kpi/

我已经把整个MapReduce的实现都放到了github上面:

https://github.com/bsspirit/maven_hadoop_template/releases/tag/kpi_v1

6.程序开发2:MapReduce程序实现

开发流程:

  1. 对日志行的解析
  2. Map函数实现
  3. Reduce函数实现
  4. 启动程序实现

1). 对日志行的解析
新建文件:org.conan.myhadoop.mr.kpi.KPI.java


package org.conan.myhadoop.mr.kpi;

import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Locale;

/*
 * KPI Object
 */
public class KPI {
    private String remote_addr;// 记录客户端的ip地址
    private String remote_user;// 记录客户端用户名称,忽略属性"-"
    private String time_local;// 记录访问时间与时区
    private String request;// 记录请求的url与http协议
    private String status;// 记录请求状态;成功是200
    private String body_bytes_sent;// 记录发送给客户端文件主体内容大小
    private String http_referer;// 用来记录从那个页面链接访问过来的
    private String http_user_agent;// 记录客户浏览器的相关信息

    private boolean valid = true;// 判断数据是否合法
    
    @Override
    public String toString() {
        StringBuilder sb = new StringBuilder();
        sb.append("valid:" + this.valid);
        sb.append("\nremote_addr:" + this.remote_addr);
        sb.append("\nremote_user:" + this.remote_user);
        sb.append("\ntime_local:" + this.time_local);
        sb.append("\nrequest:" + this.request);
        sb.append("\nstatus:" + this.status);
        sb.append("\nbody_bytes_sent:" + this.body_bytes_sent);
        sb.append("\nhttp_referer:" + this.http_referer);
        sb.append("\nhttp_user_agent:" + this.http_user_agent);
        return sb.toString();
    }

    public String getRemote_addr() {
        return remote_addr;
    }

    public void setRemote_addr(String remote_addr) {
        this.remote_addr = remote_addr;
    }

    public String getRemote_user() {
        return remote_user;
    }

    public void setRemote_user(String remote_user) {
        this.remote_user = remote_user;
    }

    public String getTime_local() {
        return time_local;
    }

    public Date getTime_local_Date() throws ParseException {
        SimpleDateFormat df = new SimpleDateFormat("dd/MMM/yyyy:HH:mm:ss", Locale.US);
        return df.parse(this.time_local);
    }
    
    public String getTime_local_Date_hour() throws ParseException{
        SimpleDateFormat df = new SimpleDateFormat("yyyyMMddHH");
        return df.format(this.getTime_local_Date());
    }

    public void setTime_local(String time_local) {
        this.time_local = time_local;
    }

    public String getRequest() {
        return request;
    }

    public void setRequest(String request) {
        this.request = request;
    }

    public String getStatus() {
        return status;
    }

    public void setStatus(String status) {
        this.status = status;
    }

    public String getBody_bytes_sent() {
        return body_bytes_sent;
    }

    public void setBody_bytes_sent(String body_bytes_sent) {
        this.body_bytes_sent = body_bytes_sent;
    }

    public String getHttp_referer() {
        return http_referer;
    }
    
    public String getHttp_referer_domain(){
        if(http_referer.length()<8){ 
            return http_referer;
        }
        
        String str=this.http_referer.replace("\"", "").replace("http://", "").replace("https://", "");
        return str.indexOf("/")>0?str.substring(0, str.indexOf("/")):str;
    }

    public void setHttp_referer(String http_referer) {
        this.http_referer = http_referer;
    }

    public String getHttp_user_agent() {
        return http_user_agent;
    }

    public void setHttp_user_agent(String http_user_agent) {
        this.http_user_agent = http_user_agent;
    }

    public boolean isValid() {
        return valid;
    }

    public void setValid(boolean valid) {
        this.valid = valid;
    }

    public static void main(String args[]) {
        String line = "222.68.172.190 - - [18/Sep/2013:06:49:57 +0000] \"GET /images/my.jpg HTTP/1.1\" 200 19939 \"http://www.angularjs.cn/A00n\" \"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66 Safari/537.36\"";
        System.out.println(line);
        KPI kpi = new KPI();
        String[] arr = line.split(" ");

        kpi.setRemote_addr(arr[0]);
        kpi.setRemote_user(arr[1]);
        kpi.setTime_local(arr[3].substring(1));
        kpi.setRequest(arr[6]);
        kpi.setStatus(arr[8]);
        kpi.setBody_bytes_sent(arr[9]);
        kpi.setHttp_referer(arr[10]);
        kpi.setHttp_user_agent(arr[11] + " " + arr[12]);
        System.out.println(kpi);

        try {
            SimpleDateFormat df = new SimpleDateFormat("yyyy.MM.dd:HH:mm:ss", Locale.US);
            System.out.println(df.format(kpi.getTime_local_Date()));
            System.out.println(kpi.getTime_local_Date_hour());
            System.out.println(kpi.getHttp_referer_domain());
        } catch (ParseException e) {
            e.printStackTrace();
        }
    }

}

从日志文件中,取一行通过main函数写一个简单的解析测试。

控制台输出:


222.68.172.190 - - [18/Sep/2013:06:49:57 +0000] "GET /images/my.jpg HTTP/1.1" 200 19939 "http://www.angularjs.cn/A00n" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.66 Safari/537.36"
valid:true
remote_addr:222.68.172.190
remote_user:-
time_local:18/Sep/2013:06:49:57
request:/images/my.jpg
status:200
body_bytes_sent:19939
http_referer:"http://www.angularjs.cn/A00n"
http_user_agent:"Mozilla/5.0 (Windows
2013.09.18:06:49:57
2013091806
www.angularjs.cn

我们看到日志行,被正确的解析成了kpi对象的属性。我们把解析过程,单独封装成一个方法。


    private static KPI parser(String line) {
        System.out.println(line);
        KPI kpi = new KPI();
        String[] arr = line.split(" ");
        if (arr.length > 11) {
            kpi.setRemote_addr(arr[0]);
            kpi.setRemote_user(arr[1]);
            kpi.setTime_local(arr[3].substring(1));
            kpi.setRequest(arr[6]);
            kpi.setStatus(arr[8]);
            kpi.setBody_bytes_sent(arr[9]);
            kpi.setHttp_referer(arr[10]);
            
            if (arr.length > 12) {
                kpi.setHttp_user_agent(arr[11] + " " + arr[12]);
            } else {
                kpi.setHttp_user_agent(arr[11]);
            }

            if (Integer.parseInt(kpi.getStatus()) >= 400) {// 大于400,HTTP错误
                kpi.setValid(false);
            }
        } else {
            kpi.setValid(false);
        }
        return kpi;
    }

对map方法,reduce方法,启动方法,我们单独写一个类来实现

下面将分别介绍MapReduce的实现类:

  • PV:org.conan.myhadoop.mr.kpi.KPIPV.java
  • IP: org.conan.myhadoop.mr.kpi.KPIIP.java
  • Time: org.conan.myhadoop.mr.kpi.KPITime.java
  • Browser: org.conan.myhadoop.mr.kpi.KPIBrowser.java

1). PV:org.conan.myhadoop.mr.kpi.KPIPV.java


package org.conan.myhadoop.mr.kpi;

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

public class KPIPV { 

    public static class KPIPVMapper extends MapReduceBase implements Mapper {
        private IntWritable one = new IntWritable(1);
        private Text word = new Text();

        @Override
        public void map(Object key, Text value, OutputCollector output, Reporter reporter) throws IOException {
            KPI kpi = KPI.filterPVs(value.toString());
            if (kpi.isValid()) {
                word.set(kpi.getRequest());
                output.collect(word, one);
            }
        }
    }

    public static class KPIPVReducer extends MapReduceBase implements Reducer {
        private IntWritable result = new IntWritable();

        @Override
        public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException {
            int sum = 0;
            while (values.hasNext()) {
                sum += values.next().get();
            }
            result.set(sum);
            output.collect(key, result);
        }
    }

    public static void main(String[] args) throws Exception {
        String input = "hdfs://192.168.1.210:9000/user/hdfs/log_kpi/";
        String output = "hdfs://192.168.1.210:9000/user/hdfs/log_kpi/pv";

        JobConf conf = new JobConf(KPIPV.class);
        conf.setJobName("KPIPV");
        conf.addResource("classpath:/hadoop/core-site.xml");
        conf.addResource("classpath:/hadoop/hdfs-site.xml");
        conf.addResource("classpath:/hadoop/mapred-site.xml");

        conf.setMapOutputKeyClass(Text.class);
        conf.setMapOutputValueClass(IntWritable.class);

        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);

        conf.setMapperClass(KPIPVMapper.class);
        conf.setCombinerClass(KPIPVReducer.class);
        conf.setReducerClass(KPIPVReducer.class);

        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf, new Path(input));
        FileOutputFormat.setOutputPath(conf, new Path(output));

        JobClient.runJob(conf);
        System.exit(0);
    }
}

在程序中会调用KPI类的方法

KPI kpi = KPI.filterPVs(value.toString());

通过filterPVs方法,我们可以实现对PV,更多的控制。

在KPK.java中,增加filterPVs方法


    /**
     * 按page的pv分类
     */
    public static KPI filterPVs(String line) {
        KPI kpi = parser(line);
        Set pages = new HashSet();
        pages.add("/about");
        pages.add("/black-ip-list/");
        pages.add("/cassandra-clustor/");
        pages.add("/finance-rhive-repurchase/");
        pages.add("/hadoop-family-roadmap/");
        pages.add("/hadoop-hive-intro/");
        pages.add("/hadoop-zookeeper-intro/");
        pages.add("/hadoop-mahout-roadmap/");

        if (!pages.contains(kpi.getRequest())) {
            kpi.setValid(false);
        }
        return kpi;
    }

在filterPVs方法,我们定义了一个pages的过滤,就是只对这个页面进行PV统计。

我们运行一下KPIPV.java


2013-10-9 11:53:28 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush
信息: Starting flush of map output
2013-10-9 11:53:28 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill
信息: Finished spill 0
2013-10-9 11:53:28 org.apache.hadoop.mapred.Task done
信息: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
2013-10-9 11:53:30 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
信息: hdfs://192.168.1.210:9000/user/hdfs/log_kpi/access.log.10:0+3025757
2013-10-9 11:53:30 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
信息: hdfs://192.168.1.210:9000/user/hdfs/log_kpi/access.log.10:0+3025757
2013-10-9 11:53:30 org.apache.hadoop.mapred.Task sendDone
信息: Task 'attempt_local_0001_m_000000_0' done.
2013-10-9 11:53:30 org.apache.hadoop.mapred.Task initialize
信息:  Using ResourceCalculatorPlugin : null
2013-10-9 11:53:30 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
信息: 
2013-10-9 11:53:30 org.apache.hadoop.mapred.Merger$MergeQueue merge
信息: Merging 1 sorted segments
2013-10-9 11:53:30 org.apache.hadoop.mapred.Merger$MergeQueue merge
信息: Down to the last merge-pass, with 1 segments left of total size: 213 bytes
2013-10-9 11:53:30 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
信息: 
2013-10-9 11:53:30 org.apache.hadoop.mapred.Task done
信息: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
2013-10-9 11:53:30 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
信息: 
2013-10-9 11:53:30 org.apache.hadoop.mapred.Task commit
信息: Task attempt_local_0001_r_000000_0 is allowed to commit now
2013-10-9 11:53:30 org.apache.hadoop.mapred.FileOutputCommitter commitTask
信息: Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://192.168.1.210:9000/user/hdfs/log_kpi/pv
2013-10-9 11:53:31 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
信息:  map 100% reduce 0%
2013-10-9 11:53:33 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
信息: reduce > reduce
2013-10-9 11:53:33 org.apache.hadoop.mapred.Task sendDone
信息: Task 'attempt_local_0001_r_000000_0' done.
2013-10-9 11:53:34 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
信息:  map 100% reduce 100%
2013-10-9 11:53:34 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
信息: Job complete: job_local_0001
2013-10-9 11:53:34 org.apache.hadoop.mapred.Counters log
信息: Counters: 20
2013-10-9 11:53:34 org.apache.hadoop.mapred.Counters log
信息:   File Input Format Counters 
2013-10-9 11:53:34 org.apache.hadoop.mapred.Counters log
信息:     Bytes Read=3025757
2013-10-9 11:53:34 org.apache.hadoop.mapred.Counters log
信息:   File Output Format Counters 
2013-10-9 11:53:34 org.apache.hadoop.mapred.Counters log
信息:     Bytes Written=183
2013-10-9 11:53:34 org.apache.hadoop.mapred.Counters log
信息:   FileSystemCounters
2013-10-9 11:53:34 org.apache.hadoop.mapred.Counters log
信息:     FILE_BYTES_READ=545
2013-10-9 11:53:34 org.apache.hadoop.mapred.Counters log
信息:     HDFS_BYTES_READ=6051514
2013-10-9 11:53:34 org.apache.hadoop.mapred.Counters log
信息:     FILE_BYTES_WRITTEN=83472
2013-10-9 11:53:34 org.apache.hadoop.mapred.Counters log
信息:     HDFS_BYTES_WRITTEN=183
2013-10-9 11:53:34 org.apache.hadoop.mapred.Counters log
信息:   Map-Reduce Framework
2013-10-9 11:53:34 org.apache.hadoop.mapred.Counters log
信息:     Map output materialized bytes=217
2013-10-9 11:53:34 org.apache.hadoop.mapred.Counters log
信息:     Map input records=14619
2013-10-9 11:53:34 org.apache.hadoop.mapred.Counters log
信息:     Reduce shuffle bytes=0
2013-10-9 11:53:34 org.apache.hadoop.mapred.Counters log
信息:     Spilled Records=16
2013-10-9 11:53:34 org.apache.hadoop.mapred.Counters log
信息:     Map output bytes=2004
2013-10-9 11:53:34 org.apache.hadoop.mapred.Counters log
信息:     Total committed heap usage (bytes)=376569856
2013-10-9 11:53:34 org.apache.hadoop.mapred.Counters log
信息:     Map input bytes=3025757
2013-10-9 11:53:34 org.apache.hadoop.mapred.Counters log
信息:     SPLIT_RAW_BYTES=110
2013-10-9 11:53:34 org.apache.hadoop.mapred.Counters log
信息:     Combine input records=76
2013-10-9 11:53:34 org.apache.hadoop.mapred.Counters log
信息:     Reduce input records=8
2013-10-9 11:53:34 org.apache.hadoop.mapred.Counters log
信息:     Reduce input groups=8
2013-10-9 11:53:34 org.apache.hadoop.mapred.Counters log
信息:     Combine output records=8
2013-10-9 11:53:34 org.apache.hadoop.mapred.Counters log
信息:     Reduce output records=8
2013-10-9 11:53:34 org.apache.hadoop.mapred.Counters log
信息:     Map output records=76

用hadoop命令查看HDFS文件


~ hadoop fs -cat /user/hdfs/log_kpi/pv/part-00000

/about  5
/black-ip-list/ 2
/cassandra-clustor/     3
/finance-rhive-repurchase/      13
/hadoop-family-roadmap/ 13
/hadoop-hive-intro/     14
/hadoop-mahout-roadmap/ 20
/hadoop-zookeeper-intro/        6

这样我们就得到了,刚刚日志文件中的,指定页面的PV值。

指定页面,就像网站的站点地图一样,如果没有指定所有访问链接都会被找出来,通过“站点地图”的指定,我们可以更容易地找到,我们所需要的信息。

后面,其他的统计指标的提取思路,和PV的实现过程都是类似的,大家可以直接下载源代码,运行看到结果!!

######################################################
看文字不过瘾,作者视频讲解,请访问网站:http://onbook.me/video
######################################################

转载请注明出处:
http://blog.fens.me/hadoop-mapreduce-log-kpi/

打赏作者

开发kettle插件 环境搭建

无所不能的Java系列文章,涵盖了Java的思想,应用开发,设计模式,程序架构等,通过我的经验去诠释Java的强大。

说起Java,真的有点不知道从何说起。Java是一门全领域发展的语言,从基础的来讲有4大块,Java语法,JDK,JVM,第三方类库。官方又以面向不同应用的角度,又把JDK分为JavaME,JavaSE,JavaEE三个部分。Java可以做客户端界面,可以做中间件,可以做手机系统,可以做应用,可以做工具,可以做游戏,可以做算法…,Java几乎无所不能。

在Java的世界里,Java就是一切。

关于作者

  • 张丹(Conan), 程序员Java,R,PHP,Javascript
  • weibo:@Conan_Z
  • blog: http://blog.fens.me
  • email: bsspirit@gmail.com

转载请注明出处:
http://blog.fens.me/java-kettle-plugin-eclipse

kettle-plugin

前言

Kettle一个开源的ETL工具,提供了一套界面操作的解决方案,从而代替原有的程序开发。但有时我们还需要开发自己的插件,来满足我们的业务需求。Kettle基于Eclipse的架构系统,通过JAVA作为客户端的实现。强大的ETL功能,和图形界面的操作,让Kettle成为免费的ETL工具的首选。

目录

  1. Kettle插件开发介绍
  2. 搭建kettle源代码环境
  3. 在Eclipse中构建kettle项目
  4. 在Eclipse中构建插件项目
  5. 配置插件到Kettle中
  6. Kettle项目启动
  7. 在kettle项目集成插件源代码

1. Kettle插件开发介绍

在我们做ETL工作的时候,在某些项目中往往会遇到一些特别的流程任务,kettle原有的流程处理节点已经不能满足我们的要求,这时候我们就需要定制流程处理节点了。定制流程节点主要是针对数据的管理、数据的验证和某些特别文件数据的提取。大家通过查看kettle源代码,就可以知道怎样去创建你自己的kettle插件了。

Kettle的插件开发,需要依赖于Kettle的源代码环境。

2. 搭建kettle源代码环境

1). 我的系统环境

  • Win7: 64bit desktop
  • Java: 64bit 1.6.0_45

kettle源在svn上面,我们需要下载SVN工具,然后才能下载源代码。

2). 下载svn工具:Subversion 1.8.3 (Windows 64-bit), 注册后下载

http://www.collab.net/downloads/subversion

3). 安装Subversion

4). 下载kettle源代码

~ D:\workspace\java>svn co http://source.pentaho.org/svnkettleroot/Kettle/tags/4.4.0-stable/ kettle

A    kettle\.directory
A    kettle\.project
A    kettle\cobertura
A    kettle\cobertura\cobertura.jar
A    kettle\cobertura\lib
A    kettle\cobertura\lib\log4j-1.2.9.jar
A    kettle\cobertura\lib\LICENSE
A    kettle\cobertura\lib\javancss.jar
A    kettle\cobertura\lib\junit.jar
A    kettle\cobertura\lib\cpl-v10.html
A    kettle\cobertura\lib\jakarta-oro-2.0.8.jar
A    kettle\cobertura\lib\asm-2.1.jar
A    kettle\cobertura\lib\ccl.jar
A    kettle\src
A    kettle\src\kettle-steps.xml
A    kettle\src\kettle-job-entries.xml
A    kettle\src\kettle-import-rules.xml
A    kettle\src\org
A    kettle\src\org\pentaho
A    kettle\src\org\pentaho\xul
A    kettle\src\org\pentaho\xul\swt
A    kettle\src\org\pentaho\reporting
A    kettle\src\org\pentaho\reporting\plugin
A    kettle\src\org\pentaho\hadoop
A    kettle\src\org\pentaho\hadoop\HadoopCompression.java
A    kettle\src\org\pentaho\di
A    kettle\src\org\pentaho\di\repository
A    kettle\src\org\pentaho\di\repository\kdr
A    kettle\src\org\pentaho\di\repository\kdr\KettleDatabaseRepositorySecurityProvider.java
A    kettle\src\org\pentaho\di\repository\kdr\KettleDatabaseRepositoryCreationHelper.java
A    kettle\src\org\pentaho\di\repository\kdr\KettleDatabaseRepositoryMeta.java
A    kettle\src\org\pentaho\di\repository\kdr\KettleDatabaseRepositoryBase.java
A    kettle\src\org\pentaho\di\repository\kdr\KettleDatabaseRepository.java
A    kettle\src\org\pentaho\di\repository\kdr\delegates
A    kettle\src\org\pentaho\di\repository\kdr\delegates\KettleDatabaseRepositoryBaseDelegate.java

下载的非常慢,不可以忍了。

查看SVN服务器位置:

~ ping source.pentaho.org
正在 Ping source.pentaho.org [74.205.95.173] 具有 32 字节的数据:
来自 74.205.95.173 的回复: 字节=32 时间=210ms TTL=50
来自 74.205.95.173 的回复: 字节=32 时间=209ms TTL=50
来自 74.205.95.173 的回复: 字节=32 时间=211ms TTL=50
来自 74.205.95.173 的回复: 字节=32 时间=210ms TTL=50

kettle-svn

发现SVN服务器在美国!!换另外一种思路,下载源代码!

5). 在github上面做了一个clone版

    • a. 在一台美国的vps通过svn下载代码。(30s下载完成)
    • b. 在github上面新建一个git项目
    • c. 增加gitignore屏蔽.svn目录
    • d. 上传到自己的github的库里面
    • e. 在本地的开发环境从github下载代码
git clone https://github.com/bsspirit/kettle-4.4.0-stable.git

6). 下载完成,执行ant

~ D:\workspace\java\kettle>ant
Buildfile: D:\workspace\java\kettle\build.xml

init:
     [echo] Init...
    [mkdir] Created dir: D:\workspace\java\kettle\build
    [mkdir] Created dir: D:\workspace\java\kettle\classes
    [mkdir] Created dir: D:\workspace\java\kettle\classes\META-INF
    [mkdir] Created dir: D:\workspace\java\kettle\classes-ui
    [mkdir] Created dir: D:\workspace\java\kettle\classes-ui\ui
    [mkdir] Created dir: D:\workspace\java\kettle\classes-core
    [mkdir] Created dir: D:\workspace\java\kettle\classes-db
    [mkdir] Created dir: D:\workspace\java\kettle\classes-dbdialog
    [mkdir] Created dir: D:\workspace\java\kettle\testClasses
    [mkdir] Created dir: D:\workspace\java\kettle\lib
    [mkdir] Created dir: D:\workspace\java\kettle\distrib
    [mkdir] Created dir: D:\workspace\java\kettle\osx-distrib
    [mkdir] Created dir: D:\workspace\java\kettle\docs\api
    [mkdir] Created dir: D:\workspace\java\kettle\webstart
    [mkdir] Created dir: D:\workspace\java\kettle\junit
    [mkdir] Created dir: D:\workspace\java\kettle\pdi-ce-distrib
     [echo] Revision set to r1

compile-core:
     [echo] Compiling Kettle CORE...
    [javac] Compiling 196 source files to D:\workspace\java\kettle\classes-core

copy-core:
     [echo] Copying core images etc to classes directory...
     [copy] Copying 73 files to D:\workspace\java\kettle\classes-core

kettle-core:
     [echo] Generating the Kettle core library kettle-core.jar ...
      [jar] Building jar: D:\workspace\java\kettle\lib\kettle-core.jar

compile-db:
     [echo] Compiling Kettle DB...
    [javac] Compiling 66 source files to D:\workspace\java\kettle\classes-db

copy-db:
     [echo] Copying db images etc to classes-db directory...
     [copy] Copying 9 files to D:\workspace\java\kettle\classes-db

kettle-db:
     [echo] Generating the Kettle DB library kettle-db.jar ...
      [jar] Building jar: D:\workspace\java\kettle\lib\kettle-db.jar

compile:
     [echo] Compiling Kettle...
    [javac] Compiling 1138 source files to D:\workspace\java\kettle\classes
    [javac] D:\workspace\java\kettle\src\org\pentaho\di\job\entry\JobEntryDialogInterface.java:37: 警告:编码 GBK 的不可
映射字符
    [javac]  *
	If the user changed any settings, the JobEntryInterface object抯 揷hanged?flag must be set to true
[javac] ^
[javac] D:\workspace\java\kettle\src\org\pentaho\di\job\entry\JobEntryDialogInterface.java:43: 警告:编码 GBK 的不可
映射字符
[javac] *The JobEntryInterface object抯 揷hanged?flag must be set to the value it had at the time the dialog o
pened
	
  • [javac] ^ [javac] D:\workspace\java\kettle\src\org\pentaho\di\job\entry\JobEntryInterface.java:75: 警告:编码 GBK 的不可映射字 符 [javac] * public void loadXML(? [javac] ^ [javac] D:\workspace\java\kettle\src\org\pentaho\di\job\entry\JobEntryInterface.java:81: 警告:编码 GBK 的不可映射字 符 [javac] * public void saveRep(? [javac] ^ [javac] D:\workspace\java\kettle\src\org\pentaho\di\job\entry\JobEntryInterface.java:89: 警告:编码 GBK 的不可映射字 符 [javac] * public void loadRep(? [javac] ^ [javac] D:\workspace\java\kettle\src\org\pentaho\di\trans\steps\mondrianinput\MondrianHelper.java:121: 警告:[deprec ation] mondrian.olap.Connection 中的 execute(mondrian.olap.Query) 已过时 [javac] result = connection.execute(query); [javac] ^ [javac] 6 警告copy: [echo] Copying images etc to classes directory... [copy] Copying 1884 files to D:\workspace\java\kettle\classes [copy] Copying 1 file to D:\workspace\java\kettle\classes\META-INF kettle: [echo] Generating the Kettle library kettle-engine.jar ... [jar] Building jar: D:\workspace\java\kettle\lib\kettle-engine.jar compile-dbdialog: [echo] Compiling Kettle DB... [javac] Compiling 5 source files to D:\workspace\java\kettle\classes-dbdialog copy-dbdialog: [echo] Copying db images etc to classes-dbdialog directory... [copy] Copying 23 files to D:\workspace\java\kettle\classes-dbdialog kettle-dbdialog: [echo] Generating the Kettle DB library kettle-dbdialog.jar ... [jar] Building jar: D:\workspace\java\kettle\lib\kettle-dbdialog.jar compile-ui: [echo] Compiling Kettle UI... [javac] Compiling 585 source files to D:\workspace\java\kettle\classes-ui [javac] D:\workspace\java\kettle\src-ui\org\pentaho\di\ui\job\entries\getpop\JobEntryGetPOPDialog.java:2102: 警告: 编码 GBK 的不可映射字符 [javac] mb.setMessage("Veuillez svp donner un nom 锟?cette entr锟絜 t锟絚he!"); [javac] ^ [javac] 1 警告 [copy] Copying 200 files to D:\workspace\java\kettle\classes-ui [copy] Copying 379 files to D:\workspace\java\kettle\classes-ui\ui kettle-ui: [echo] Generating the Kettle library kettle-ui-swt.jar ... [jar] Building jar: D:\workspace\java\kettle\lib\kettle-ui-swt.jar antcontrib.download-check: antcontrib.download: [mkdir] Created dir: C:\Users\Administrator\.subfloor\tmp [get] Getting: http://downloads.sourceforge.net/ant-contrib/ant-contrib-1.0b3-bin.zip [get] To: C:\Users\Administrator\.subfloor\tmp\antcontrib.zip [get] http://downloads.sourceforge.net/ant-contrib/ant-contrib-1.0b3-bin.zip permanently moved to http://downloads .sourceforge.net/project/ant-contrib/ant-contrib/1.0b3/ant-contrib-1.0b3-bin.zip [get] http://downloads.sourceforge.net/project/ant-contrib/ant-contrib/1.0b3/ant-contrib-1.0b3-bin.zip moved to ht tp://jaist.dl.sourceforge.net/project/ant-contrib/ant-contrib/1.0b3/ant-contrib-1.0b3-bin.zip [unzip] Expanding: C:\Users\Administrator\.subfloor\tmp\antcontrib.zip into C:\Users\Administrator\.subfloor\tmp [copy] Copying 5 files to C:\Users\Administrator\.subfloor\ant-contrib install-antcontrib: compile-plugins-standalone: [echo] Compiling Kettle Plugin kettle-gpload-plugin... [mkdir] Created dir: D:\workspace\java\kettle\src-plugins\kettle-gpload-plugin\bin\classes [javac] Compiling 5 source files to D:\workspace\java\kettle\src-plugins\kettle-gpload-plugin\bin\classes [copy] Copying 7 files to D:\workspace\java\kettle\src-plugins\kettle-gpload-plugin\bin\classes [echo] Compiling Kettle Plugin kettle-palo-plugin... [mkdir] Created dir: D:\workspace\java\kettle\src-plugins\kettle-palo-plugin\bin\classes [javac] Compiling 17 source files to D:\workspace\java\kettle\src-plugins\kettle-palo-plugin\bin\classes [copy] Copying 28 files to D:\workspace\java\kettle\src-plugins\kettle-palo-plugin\bin\classes [echo] Compiling Kettle Plugin kettle-hl7-plugin... [mkdir] Created dir: D:\workspace\java\kettle\src-plugins\kettle-hl7-plugin\bin\classes [javac] Compiling 13 source files to D:\workspace\java\kettle\src-plugins\kettle-hl7-plugin\bin\classes [copy] Copying 14 files to D:\workspace\java\kettle\src-plugins\kettle-hl7-plugin\bin\classes [echo] Compiling Kettle Plugin market... [mkdir] Created dir: D:\workspace\java\kettle\src-plugins\market\bin\classes [javac] Compiling 9 source files to D:\workspace\java\kettle\src-plugins\market\bin\classes [javac] D:\workspace\java\kettle\src-plugins\market\src\org\pentaho\di\core\market\Market.java:533: 警告:[deprecati on] org.pentaho.di.ui.core.gui.GUIResource 中的 reload() 已过时 [javac] GUIResource.getInstance().reload(); [javac] ^ [javac] 1 警告 [copy] Copying 2 files to D:\workspace\java\kettle\src-plugins\market\bin\classes compile-plugins: kettle-plugins-jar-standalone: [echo] Generating the Kettle Plugin Jar ${plugin} ... [mkdir] Created dir: D:\workspace\java\kettle\src-plugins\kettle-gpload-plugin\dist [jar] Building jar: D:\workspace\java\kettle\src-plugins\kettle-gpload-plugin\dist\kettle-gpload-plugin.jar [echo] Generating the Kettle Plugin Jar ${plugin} ... [mkdir] Created dir: D:\workspace\java\kettle\src-plugins\kettle-palo-plugin\dist [jar] Building jar: D:\workspace\java\kettle\src-plugins\kettle-palo-plugin\dist\kettle-palo-plugin.jar [echo] Generating the Kettle Plugin Jar ${plugin} ... [mkdir] Created dir: D:\workspace\java\kettle\src-plugins\kettle-hl7-plugin\dist [jar] Building jar: D:\workspace\java\kettle\src-plugins\kettle-hl7-plugin\dist\kettle-hl7-plugin.jar [echo] Generating the Kettle Plugin Jar ${plugin} ... [mkdir] Created dir: D:\workspace\java\kettle\src-plugins\market\dist [jar] Building jar: D:\workspace\java\kettle\src-plugins\market\dist\market.jar kettle-plugins-jar: kettle-plugins-standalone: [echo] Staging the Kettle plugin kettle-gpload-plugin ... [mkdir] Created dir: D:\workspace\java\kettle\src-plugins\kettle-gpload-plugin\bin\stage\kettle-gpload-plugin [copy] Copying 1 file to D:\workspace\java\kettle\src-plugins\kettle-gpload-plugin\bin\stage\kettle-gpload-plugin [mkdir] Created dir: D:\workspace\java\kettle\src-plugins\kettle-gpload-plugin\bin\stage\kettle-gpload-plugin\lib [echo] Staging the Kettle plugin kettle-palo-plugin ... [mkdir] Created dir: D:\workspace\java\kettle\src-plugins\kettle-palo-plugin\bin\stage\kettle-palo-plugin [copy] Copying 1 file to D:\workspace\java\kettle\src-plugins\kettle-palo-plugin\bin\stage\kettle-palo-plugin [mkdir] Created dir: D:\workspace\java\kettle\src-plugins\kettle-palo-plugin\bin\stage\kettle-palo-plugin\lib [copy] Copying 1 file to D:\workspace\java\kettle\src-plugins\kettle-palo-plugin\bin\stage\kettle-palo-plugin\lib [echo] Staging the Kettle plugin kettle-hl7-plugin ... [mkdir] Created dir: D:\workspace\java\kettle\src-plugins\kettle-hl7-plugin\bin\stage\kettle-hl7-plugin [copy] Copying 1 file to D:\workspace\java\kettle\src-plugins\kettle-hl7-plugin\bin\stage\kettle-hl7-plugin [mkdir] Created dir: D:\workspace\java\kettle\src-plugins\kettle-hl7-plugin\bin\stage\kettle-hl7-plugin\lib [copy] Copying 10 files to D:\workspace\java\kettle\src-plugins\kettle-hl7-plugin\bin\stage\kettle-hl7-plugin\lib [echo] Staging the Kettle plugin market ... [mkdir] Created dir: D:\workspace\java\kettle\src-plugins\market\bin\stage\market [copy] Copying 1 file to D:\workspace\java\kettle\src-plugins\market\bin\stage\market [mkdir] Created dir: D:\workspace\java\kettle\src-plugins\market\bin\stage\market\lib [copy] Copying 1 file to D:\workspace\java\kettle\src-plugins\market\bin\stage\market kettle-plugins: compileTests: [echo] Compiling Kettle tests... [javac] Compiling 122 source files to D:\workspace\java\kettle\testClasses kettle-test: [echo] Generating the Kettle library kettle-test.jar ... [jar] Building jar: D:\workspace\java\kettle\lib\kettle-test.jar distrib-nodeps: [echo] Construct the distribution package... [copy] Copying 34 files to D:\workspace\java\kettle\distrib [copy] Copied 10 empty directories to 2 empty directories under D:\workspace\java\kettle\distrib [mkdir] Created dir: D:\workspace\java\kettle\distrib\lib [copy] Copying 1 file to D:\workspace\java\kettle\distrib\lib [copy] Copying 1 file to D:\workspace\java\kettle\distrib\lib [copy] Copying 1 file to D:\workspace\java\kettle\distrib\lib [copy] Copying 1 file to D:\workspace\java\kettle\distrib\lib [copy] Copying 1 file to D:\workspace\java\kettle\distrib\lib [copy] Copying 1 file to D:\workspace\java\kettle\distrib\lib [mkdir] Created dir: D:\workspace\java\kettle\distrib\libext [copy] Copying 214 files to D:\workspace\java\kettle\distrib\libext [mkdir] Created dir: D:\workspace\java\kettle\distrib\libswt [copy] Copying 21 files to D:\workspace\java\kettle\distrib\libswt [mkdir] Created dir: D:\workspace\java\kettle\distrib\plugins [copy] Copying 15 files to D:\workspace\java\kettle\distrib\plugins [copy] Copied 11 empty directories to 3 empty directories under D:\workspace\java\kettle\distrib\plugins [copy] Copying 1 file to D:\workspace\java\kettle\distrib\plugins [copy] Copied 2 empty directories to 1 empty directory under D:\workspace\java\kettle\distrib\plugins [copy] Copying 2 files to D:\workspace\java\kettle\distrib\plugins [copy] Copying 11 files to D:\workspace\java\kettle\distrib\plugins [copy] Copying 2 files to D:\workspace\java\kettle\distrib\plugins [copy] Copied 2 empty directories to 1 empty directory under D:\workspace\java\kettle\distrib\plugins [mkdir] Created dir: D:\workspace\java\kettle\distrib\ui [copy] Copying 387 files to D:\workspace\java\kettle\distrib\ui [mkdir] Created dir: D:\workspace\java\kettle\distrib\docs [copy] Copying 354 files to D:\workspace\java\kettle\distrib\docs [mkdir] Created dir: D:\workspace\java\kettle\distrib\pwd [copy] Copying 6 files to D:\workspace\java\kettle\distrib\pwd [mkdir] Created dir: D:\workspace\java\kettle\distrib\launcher [copy] Copying 3 files to D:\workspace\java\kettle\distrib\launcher [mkdir] Created dir: D:\workspace\java\kettle\distrib\simple-jndi [copy] Copying 1 file to D:\workspace\java\kettle\distrib\simple-jndi [mkdir] Created dir: D:\workspace\java\kettle\distrib\samples [mkdir] Created dir: D:\workspace\java\kettle\distrib\samples\transformations [mkdir] Created dir: D:\workspace\java\kettle\distrib\samples\jobs [mkdir] Created dir: D:\workspace\java\kettle\distrib\samples\transformations\output [mkdir] Created dir: D:\workspace\java\kettle\distrib\samples\jobs\output [copy] Copying 248 files to D:\workspace\java\kettle\distrib\samples distrib: default: BUILD SUCCESSFUL Total time: 1 minute 29 seconds
  • 虽然,有一些警告,但是build成功!!

    3. 在Eclipse中构建kettle项目

    7). 把kettle项目,导入到Eclipse中。

    kettle-eclipse

    4. 在Eclipse中构建插件项目

    8). 构建插件项目:我可以基于一个模板去构建插件。

    下载kettle-TemplatePlugin项目

    wget http://www.ahuoo.com/download/TemplateStepPlugin.rar

    9). 解压后导入到eclipse工程:kettle-TemplatePlugin

    复制类库

    • 从kettle项目中,复制lib的*.jar到kettle-TemplatePlugin中的libext目录
    • 从kettle项目中,复制libswt/win64的swt.js到kettle-TemplatePlugin中的libswt/win64目录

    10). 刚才复制的类库加入项目依赖

    kettle-eclipse-template

    11). 在kettle-TemplatePlugin项目,执行ant

    
    ~ D:\workspace\java\kettle-TemplatePlugin>ant
    Buildfile: D:\workspace\java\kettle-TemplatePlugin\build.xml
    
    init:
         [echo] Init...
    
    compile:
         [echo] Compiling Jasper Reporting Plugin...
        [javac] D:\workspace\java\kettle-TemplatePlugin\build.xml:40: warning: 'includeantruntime' was not set, defaulting t
    o build.sysclasspath=last; set to false for repeatable builds
    
    copy:
         [echo] Copying images etc to classes directory...
    
    lib:
         [echo] Generating the Jasper Reporting library TemplateStepPlugin.jar ...
          [jar] Building jar: D:\workspace\java\kettle-TemplatePlugin\lib\TemplateStepPlugin.jar
    
    distrib:
         [echo] Copying libraries to distrib directory...
         [copy] Copying 1 file to D:\workspace\java\kettle-TemplatePlugin\distrib
    
    deploy:
         [echo] deploying plugin...
    
    default:
    
    BUILD SUCCESSFUL
    Total time: 0 seconds
    

    12). 修改distrib目录的文件

    • icon.png:图标文件
    • plugin.xml: 插件的配置文件(4.4以后的版本,可以去掉)
    • TemplateStepPlugin.jar:是通过ant生成的文件

    5. 配置插件到Kettle中

    13). 把kettle-TemplatePlugin发布到kettle中

    a. 在kettle是工程增加2个目录

    
    ~ mkdir D:\workspace\java\kettle\distrib\plugins\steps\myPlugin
    ~ mkdir D:\workspace\java\kettle\plugins\steps\myPlugin
    

    b. 修改kettle-TemplatePlugin的build.xml文件

    
    <property name="deploydir" location="D:\workspace\java\kettle\distrib\plugins\steps\myPlugin"/>
    <property name="projectdir" location="D:\workspace\java\kettle\plugins\steps\myPlugin"/>
    
    <fileset dir="${libswt}/win64/" includes="*.jar"/>
    
    <target name="deploy" depends="distrib" description="Deploy distribution..." >
    <echo>deploying plugin...</echo>
    <copy todir="${deploydir}">
    <fileset dir="${distrib}" includes="**/*.*"/>
    </copy>
    
    <copy todir="${projectdir}">
    <fileset dir="${distrib}" includes="**/*.*"/>
    </copy>
    </target>
    

    c. kettle-TemplatePlugin项目执行ant

    
    D:\workspace\java\kettle-TemplatePlugin>ant
    Buildfile: D:\workspace\java\kettle-TemplatePlugin\build.xml
    
    init:
         [echo] Init...
    
    compile:
         [echo] Compiling Jasper Reporting Plugin...
        [javac] D:\workspace\java\kettle-TemplatePlugin\build.xml:43: warning: 'includeantruntime' was not set, defaulting t
    o build.sysclasspath=last; set to false for repeatable builds
    
    copy:
         [echo] Copying images etc to classes directory...
    
    lib:
         [echo] Generating the Jasper Reporting library TemplateStepPlugin.jar ...
    
    distrib:
         [echo] Copying libraries to distrib directory...
    
    deploy:
         [echo] deploying plugin...
         [copy] Copying 3 files to D:\workspace\java\kettle\distrib\plugins\steps\myPlugin
         [copy] Copying 6 files to D:\workspace\java\kettle\distrib\plugins\steps\myPlugin
         [copy] Copying 3 files to D:\workspace\java\kettle\plugins\steps\myPlugin
    
    default:
    
    BUILD SUCCESSFUL
    Total time: 0 seconds
    

    14). 在kettle中查看目录:D:\workspace\java\kettle\distrib\plugins\steps\myPlugin

    kettle-dist

    kettle-TemplatePlugin项目的3个文件,已经被放到了正确的位置

    6. 命令行项目启动

    15). 命令行启动kettle
    a. 修改Spoon启动命令,不开启新窗口,直接以JAVA运行

    
    ~ vi D:\workspace\java\kettle\distrib\Spoon.bat
    
    @echo on
    REM start "Spoon" "%_PENTAHO_JAVA%" %OPT% -jar launcher\launcher.jar -lib ..\%LIBSPATH% %_cmdline%
    java %OPT% -jar launcher\launcher.jar -lib ..\%LIBSPATH% %_cmdline%
    

    b. 运行Spoon.bat命令

    
    ~ D:\workspace\java\kettle\distrib>Spoon.bat
    
    DEBUG: Using JAVA_HOME
    DEBUG: _PENTAHO_JAVA_HOME=D:\toolkit\java\jdk6
    DEBUG: _PENTAHO_JAVA=D:\toolkit\java\jdk6\bin\javaw
    
    D:\workspace\java\kettle\distrib>REM start "Spoon" "D:\toolkit\java\jdk6\bin\javaw" "-Xmx512m" "-XX:MaxPermSize=256m" "-
    Djava.library.path=libswt\win64" "-DKETTLE_HOME=" "-DKETTLE_REPOSITORY=" "-DKETTLE_USER=" "-DKETTLE_PASSWORD=" "-DKETTLE
    _PLUGIN_PACKAGES=" "-DKETTLE_LOG_SIZE_LIMIT=" -jar launcher\launcher.jar -lib ..\libswt\win64
    
    D:\workspace\java\kettle\distrib>java "-Xmx512m" "-XX:MaxPermSize=256m" "-Djava.library.path=libswt\win64" "-DKETTLE_HOM
    E=" "-DKETTLE_REPOSITORY=" "-DKETTLE_USER=" "-DKETTLE_PASSWORD=" "-DKETTLE_PLUGIN_PACKAGES=" "-DKETTLE_LOG_SIZE_LIMIT="
    -jar launcher\launcher.jar -lib ..\libswt\win64
    INFO  21-09 12:26:35,717 - Spoon - Logging goes to file:///C:/Users/ADMINI~1/AppData/Local/Temp/spoon_0042f442-2276-11e3
    -bf49-6be1282e1ee0.log
    INFO  21-09 12:26:36,655 - Spoon - 要求资源库
    INFO  21-09 12:26:36,795 - RepositoriesMeta - Reading repositories XML file: C:\Users\Administrator\.kettle\repositories
    .xml
    INFO  21-09 12:26:37,783 - Version checker - OK
    

    16). 查看kettle-TemplatePlugin 插件
    kettle-debug1

    7. 在kettle项目集成插件源代码

    17). 通过Eclipse的 link source功能,连接kettle-TemplatePlugin项目
    a. 在kettle项目中,选择link source
    kettle-link-source

    b. 在kettle项目中编程
    kettle-source

    18). 通过Eclipse启动kettle
    a. 在Eclipse中配置启动Main Class: org.pentaho.di.ui.spoon.Spoon
    kettle-run1

    b.增加64位的swt.jar类库
    kettle-run2

    c. 在Eclipse中启动kettle
    kettle-run3

    19). 通过Eclipse调用kettle-TemplatePlugin
    a. 修改TemplateStepDialog.java,找到open方法,增加一行输出

    
    public String open() { 
    System.out.println(“Open a dialog!!!”);
    
    ...
    }
    

    b. 在Eclipse中,通过debug启动:org.pentaho.di.ui.spoon.Spoon
    双点Template Plugin的图标,看到日志显示”Open a dialog!!!“

    kettle-debug1

    这样我们就构建好了,kettle插件的开发环境。接下来,我们就可以进行插件开发了!!

    转载请注明出处:
    http://blog.fens.me/java-kettle-plugin-eclipse

    打赏作者

    Nova安装攻略

    自己搭建VPS系列文章,介绍了如何利用自己的计算机资源,通过虚拟化技术搭建VPS。

    在互联网2.0时代,每个人都有自己的博客,还有很多专属于自己的互联网应用。这些应用大部分都是互联网公司提供的。对于一些有能力的开发人员(geek)来说,他们希望做一些自己的应用,可以用到最新最炫的技术,并且有自己的域名,有自己的服务器。这时就要去租一些互联网上的VPS主机。VPS主机就相当于是一台远程的计算机,可以部署自己的应用程序,然后申请一个域名,就可以正式发布在互联网上了。本站“@晒粉丝” 就使用的Linode主机VPS在美国达拉斯机房。

    其实,VPS还可以自己搭建的。只要我们有一台高性能的服务器,一个IP地址,一个路由。可以把一台高性能的服务器,很快的变成5台,10台,20台的虚拟VPS。我们就可以在自己的VPS上面的,发布各种的应用,还可以把剩余的服务器资源租给其他的互联网使用者。 本系列文章将分为以下几个部分介绍:“虚拟化技术选型”,“动态IP解析”,“在Ubuntu上安装KVM并搭建虚拟环境”,“给KVM虚拟机增加硬盘”,“Nova安装攻略”,“VPS内网的网络架构设计”,“VPS租用云服务”。

    关于作者:

    • 张丹(Conan), 程序员Java,R,PHP,Javascript
    • weibo:@Conan_Z
    • blog: http://blog.fens.me
    • email: bsspirit@gmail.com

    转载请注明出处:
    http://blog.fens.me/vps-nova-setup/

    openstack

    前言

    Nova是Openstack一个重要的组成部分,最核心的功能就是对虚拟机进行管理(kvm, qemu, xen, vmware, virtual box)。

    本次安装实验对Linux Ubuntu系统版本有严格的要求,必须是12.04 LTS。其他版本模拟实验均不成功,请大家严格执行。

    本次实验的参考图书:
    OpenStack Cloud Computing Cookbook
    Chapter 1: Starting OpenStack Compute

    目录

    1. nova安装方案
    2. VirtrulBox虚拟机环境
    3. 操作系统环境
    4. 软件包依赖
    5. nova配置
    6. 创建nova实例
    7. 登陆云实例
    8. 错误总结

    1. nova安装方案

    使用VirtrulBox虚拟机,嵌套qemu虚拟机。nova安装在VirtrulBox环境中,云实例则安装qemu环境中。通过nova管理qemu云实例。

    2. VirtrulBox虚拟机环境

    VirtrulBox虚拟机: 6G内存,4核CPU, Linux Ubuntu 12.04 LTS
    CPU支持VT-x/AMD-V,嵌套分页,PAE/NX

    nova1

    3张虚拟网卡:

    • 网卡1:桥接网卡
    • 网卡2:Host-Only
    • 网卡3:Host-Only

    vbox2

    Host-Only网卡在虚拟机中的全局设置:
    vbox1

    3. 操作系统环境

    再次强调:本次实验的Ubuntu版本,必须12.04 LTS

    
    ~ uname -a
    Linux nova 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
    
    ~ cat /etc/issue
    Ubuntu 12.04 LTS \n \l
    
    ~ ifconfig
    eth0      Link encap:Ethernet  HWaddr 08:00:27:90:e8:19
              inet addr:192.168.1.200  Bcast:192.168.1.255  Mask:255.255.255.0
              inet6 addr: fe80::a00:27ff:fe90:e819/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:162 errors:0 dropped:0 overruns:0 frame:0
              TX packets:132 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:16399 (16.3 KB)  TX bytes:22792 (22.7 KB)
    
    lo        Link encap:Local Loopback
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:16436  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
    

    创建openstack用户
    创建新用户openstack,密码openstack,增加到sudo组

    
    ~ sudo useradd openstack
    ~ sudo passwd openstack  
    ~ sudo adduser openstack sudo
    
    ~ sudo mkdir -p /home/openstack
    ~ sudo chown openstack:openstack /home/openstack
    
    ~ ls -l /home
    drwxr-xr-x 8 conan     conan     4096 Jul 13 17:07 conan
    drwxr-xr-x 2 openstack openstack 4096 Jul 13 17:21 openstack
    

    用openstack账号重新登陆,测试sudo命令
    以下所有操作请使用openstack用户

    
    ssh openstack@192.168.1.200
    
    openstack@u1:~$ whoami
    openstack
    
    openstack@u1:~$ sudo -i  
    [sudo] password for openstack:
    
    root@u1:~# whoami
    root
    
    root@u1:~# exit
    logout
    

    虚拟机网卡配置

    
    ~ sudo vi /etc/network/interfaces
    
    auto lo
    iface lo inet loopback
    
    auto eth0
    iface eth0 inet static
    address 192.168.1.200
    netmask 255.255.255.0
    network 192.168.1.0
    broadcase 192.168.1.255
    gateway 192.168.1.1
    
    #public interface
    auto eth1
    iface eth1 inet static
    address 172.16.0.1
    netmask 255.255.0.0
    network 172.16.0.0
    broadcase 172.16.255.255
    
    #private interface
    auto eth2
    iface eth2 inet manual
    up ifconfig eth2 up
    

    重新启动网卡

    
    ~ sudo /etc/init.d/networking restart
    
     * Running /etc/init.d/networking restart is deprecated because it may not enable again some interfaces
     * Reconfiguring network interfaces...                                                 ssh stop/waiting
    ssh start/running, process 2040
    ssh stop/waiting
    ssh start/running, process 2082
    ssh stop/waiting
    ssh start/running, process 2121
                                                                                [ OK ]
    ~ ifconfig
    eth0      Link encap:Ethernet  HWaddr 08:00:27:90:e8:19
              inet addr:192.168.1.200  Bcast:192.168.1.255  Mask:255.255.255.0
              inet6 addr: fe80::a00:27ff:fe90:e819/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:3408 errors:0 dropped:0 overruns:0 frame:0
              TX packets:2244 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:3321759 (3.3 MB)  TX bytes:250703 (250.7 KB)
    
    eth1      Link encap:Ethernet  HWaddr 08:00:27:4e:06:74
              inet addr:172.16.0.1  Bcast:172.16.255.255  Mask:255.255.0.0
              inet6 addr: fe80::a00:27ff:fe4e:674/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:18 errors:0 dropped:0 overruns:0 frame:0
              TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:1656 (1.6 KB)  TX bytes:468 (468.0 B)
    
    eth2      Link encap:Ethernet  HWaddr 08:00:27:5a:b1:1f
              inet6 addr: fe80::a00:27ff:fe5a:b11f/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:33 errors:0 dropped:0 overruns:0 frame:0
              TX packets:5 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:3156 (3.1 KB)  TX bytes:378 (378.0 B)
    
    lo        Link encap:Local Loopback
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:16436  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
    

    DNS配置

    
    ~ vi /etc/resolv.conf 
    nameserver 8.8.8.8
    
    ~ ping www.163.com
    PING 163.xdwscache.glb0.lxdns.com (101.23.128.17) 56(84) bytes of data.
    64 bytes from 101.23.128.17: icmp_req=1 ttl=53 time=20.3 ms
    64 bytes from 101.23.128.17: icmp_req=2 ttl=53 time=18.5 ms
    

    4. 软件包依赖

    更新源

    
    ~ sudo vi /etc/apt/sources.list
    
    deb http://mirrors.163.com/ubuntu/ precise main universe restricted multiverse 
    deb-src http://mirrors.163.com/ubuntu/ precise main universe restricted multiverse 
    deb http://mirrors.163.com/ubuntu/ precise-security universe main multiverse restricted 
    deb-src http://mirrors.163.com/ubuntu/ precise-security universe main multiverse restricted 
    deb http://mirrors.163.com/ubuntu/ precise-updates universe main multiverse restricted 
    deb http://mirrors.163.com/ubuntu/ precise-proposed universe main multiverse restricted 
    deb-src http://mirrors.163.com/ubuntu/ precise-proposed universe main multiverse restricted 
    deb http://mirrors.163.com/ubuntu/ precise-backports universe main multiverse restricted 
    deb-src http://mirrors.163.com/ubuntu/ precise-backports universe main multiverse restricted 
    deb-src http://mirrors.163.com/ubuntu/ precise-updates universe main multiverse restricted
    

    nova相关软件安装

    
    ~ sudo apt-get update
    ~ sudo apt-get -y install rabbitmq-server nova-api nova-objectstore nova-scheduler nova-network nova-compute nova-cert glance qemu unzip
    ~ sudo apt-get install pm-utils
    

    注:如果没有安装pm-utils,libvirtd日志中会有错误

    
    Cannot find 'pm-is-supported' in path: No such file or directory
    

    查看系统进程

    
    ~ pstree
    init─┬─acpid
         ├─atd
         ├─beam.smp─┬─cpu_sup
         │          ├─inet_gethost───inet_gethost
         │          └─38*[{beam.smp}]
         ├─cron
         ├─dbus-daemon
         ├─dhclient3
         ├─dnsmasq
         ├─epmd
         ├─5*[getty]
         ├─irqbalance
         ├─2*[iscsid]
         ├─libvirtd───10*[{libvirtd}]
         ├─login───bash
         ├─rsyslogd───3*[{rsyslogd}]
         ├─sshd───sshd───bash
         ├─sshd─┬─sshd───sshd───sh───bash───pstree
         │      └─sshd───sshd───sh───bash
         ├─su───glance-api
         ├─su───glance-registry
         ├─su───nova-api
         ├─su───nova-cert
         ├─su───nova-network
         ├─su───nova-objectstor
         ├─su───nova-scheduler
         ├─su───nova-compute
         ├─udevd───2*[udevd]
         ├─upstart-socket-
         ├─upstart-udev-br
         └─whoopsie───{whoopsie}
    

    安装ntp时间同步服务

    
    ~ sudo apt-get -y install ntp
    
    #修改配置文件
    ~ sudo vi /etc/ntp.conf
    
    #Replace ntp.ubuntu.com with an NTP server on your network
    server ntp.ubuntu.com
    server 127.127.1.0
    fudge 127.127.1.0 stratum 10
    
    #重启ntp
    ~ sudo service ntp restart
    
    ~ ps -aux|grep ntp
    ntp       6990  0.0  0.0  37696  2180 ?        Ss   19:50   0:00 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 113:120
    
    #查看系统当前时间
    ~ date
    Sat Jul 13 19:50:12 CST 2013
    

    安装MySQL

    
    ~ sudo apt-get install mysql-server
    
    #配置允许其他计算访问
    ~ sudo sed -i 's/127.0.0.1/0.0.0.0/g' /etc/mysql/my.cnf
    ~ sudo service mysql restart
    
    #创建nova数据库及配置nova用户
    ~ MYSQL_PASS=mysql
    ~ mysql -uroot -p$MYSQL_PASS -e 'CREATE DATABASE nova;'
    ~ mysql -uroot -p$MYSQL_PASS -e "GRANT ALL PRIVILEGES ON nova.* TO 'nova'@'%'"
    ~ mysql -uroot -p$MYSQL_PASS -e "SET PASSWORD FOR 'nova'@'%' = PASSWORD('openstack');"
    

    5. nova配置

    修改nova.conf配置文件
    可以去掉–verbose,只是为了打印更多的日志信息。

    
    ~ sudo vi /etc/nova/nova.conf
    
    --dhcpbridge_flagfile=/etc/nova/nova.conf
    --dhcpbridge=/usr/bin/nova-dhcpbridge
    --logdir=/var/log/nova
    --state_path=/var/lib/nova
    --lock_path=/var/lock/nova
    --force_dhcp_release
    --iscsi_helper=tgtadm
    --libvirt_use_virtio_for_bridges
    --connection_type=libvirt
    --root_helper=sudo nova-rootwrap
    --verbose
    --ec2_private_dns_show_ip
    --sql_connection=mysql://nova:openstack@172.16.0.1/nova
    --use_deprecated_auth
    --s3_host=172.16.0.1
    --rabbit_host=172.16.0.1
    --ec2_host=172.16.0.1
    --ec2_dmz_host=172.16.0.1
    --public_interface=eth1
    --image_service=nova.image.glance.GlanceImageService
    --glance_api_servers=172.16.0.1:9292
    --auto_assign_floating_ip=true
    --scheduler_default_filters=AllHostsFilter
    

    修改VMM设置,nova-compute.conf
    这里要使用qemu虚拟机,如果我们不是嵌套虚拟化的模式,建议使用kvm虚拟机。

    
    ~ sudo vi /etc/nova/nova-compute.conf
    
    --libvirt_type=qemu
    

    把nova源数据写入MySQL

    
    ~ sudo nova-manage db sync
    
    2013-07-14 21:18:56 DEBUG nova.utils [-] backend  from (pid=8750) __get_backend /usr/lib/python2.7/dist-packages/nova/utils.py:663
    2013-07-14 21:19:06 WARNING nova.utils [-] /usr/lib/python2.7/dist-packages/sqlalchemy/pool.py:639: SADeprecationWarning: The 'listeners' argument to Pool (and create_engine()) is deprecated.  Use event.listen().
      Pool.__init__(self, creator, **kw)
    
    2013-07-14 21:19:06 WARNING nova.utils [-] /usr/lib/python2.7/dist-packages/sqlalchemy/pool.py:145: SADeprecationWarning: Pool.add_listener is deprecated.  Use event.listen()
      self.add_listener(l)
    
    2013-07-14 21:19:06 AUDIT nova.db.sqlalchemy.fix_dns_domains [-] Applying database fix for Essex dns_domains table.
    

    创建openstack私有网络

    
    ~ sudo nova-manage network create vmnet --fixed_range_v4=10.0.0.0/8 --network_size=64 --bridge_interface=eth2
    
    2013-07-14 21:19:34 DEBUG nova.utils [req-152fee41-ddc9-4ac5-902d-fb93f7af67a8 None None] backend  from (pid=8807) __get_backend /usr/lib/python2.7/dist-packages/nova/utils.py:663
    

    设置openstack浮动IP

    
    ~ sudo nova-manage floating create --ip_range=172.16.1.0/24
    
    2013-07-14 21:19:48 DEBUG nova.utils [req-7171e8bc-6542-40d2-b24c-b4593505fd87 None None] backend  from (pid=8814) __get_backend /usr/lib/python2.7/dist-packages/nova/utils.py:663
    

    查看mysql中nova数据库

    
    ~ mysql -uroot -p
    
    mysql> show databases;
    +--------------------+
    | Database           |
    +--------------------+
    | information_schema |
    | ape_biz            |
    | mysql              |
    | nova               |
    | performance_schema |
    | test               |
    +--------------------+
    6 rows in set (0.00 sec)
    
    mysql> use nova
    
    mysql> show tables;
    +-------------------------------------+
    | Tables_in_nova                      |
    +-------------------------------------+
    | agent_builds                        |
    | aggregate_hosts                     |
    | aggregate_metadata                  |
    | aggregates                          |
    | auth_tokens                         |
    | block_device_mapping                |
    | bw_usage_cache                      |
    | cells                               |
    | certificates                        |
    | compute_nodes                       |
    | console_pools                       |
    | consoles                            |
    | dns_domains                         |
    | fixed_ips                           |
    | floating_ips                        |
    | instance_actions                    |
    | instance_faults                     |
    | instance_info_caches                |
    | instance_metadata                   |
    | instance_type_extra_specs           |
    | instance_types                      |
    | instances                           |
    | iscsi_targets                       |
    | key_pairs                           |
    | migrate_version                     |
    | migrations                          |
    | networks                            |
    | projects                            |
    | provider_fw_rules                   |
    | quotas                              |
    | s3_images                           |
    | security_group_instance_association |
    | security_group_rules                |
    | security_groups                     |
    | services                            |
    | sm_backend_config                   |
    | sm_flavors                          |
    | sm_volume                           |
    | snapshots                           |
    | user_project_association            |
    | user_project_role_association       |
    | user_role_association               |
    | users                               |
    | virtual_interfaces                  |
    | virtual_storage_arrays              |
    | volume_metadata                     |
    | volume_type_extra_specs             |
    | volume_types                        |
    | volumes                             |
    +-------------------------------------+
    49 rows in set (0.00 sec)
    

    重启服务nova,libvirt,glance

    
    #停止
    ~ sudo stop nova-compute
    ~ sudo stop nova-network
    ~ sudo stop nova-api
    ~ sudo stop nova-scheduler
    ~ sudo stop nova-objectstore
    ~ sudo stop nova-cert
    
    ~ sudo stop libvirt-bin
    ~ sudo stop glance-registry
    ~ sudo stop glance-api
    
    #启动
    ~ sudo start nova-compute
    ~ sudo start nova-network
    ~ sudo start nova-api
    ~ sudo start nova-scheduler
    ~ sudo start nova-objectstore
    ~ sudo start nova-cert
    
    ~ sudo start libvirt-bin
    ~ sudo start glance-registry
    ~ sudo start glance-api
    
    #查看系统进程树
    ~ pstree
    init─┬─acpid
         ├─atd
         ├─beam.smp─┬─cpu_sup
         │          ├─inet_gethost───inet_gethost
         │          └─38*[{beam.smp}]
         ├─cron
         ├─dbus-daemon
         ├─dhclient3
         ├─dnsmasq
         ├─epmd
         ├─5*[getty]
         ├─irqbalance
         ├─2*[iscsid]
         ├─libvirtd───10*[{libvirtd}]
         ├─login───bash
         ├─mysqld───19*[{mysqld}]
         ├─ntpd
         ├─rsyslogd───3*[{rsyslogd}]
         ├─sshd───sshd───bash
         ├─sshd─┬─sshd───sshd───sh───bash───pstree
         │      └─sshd───sshd───sh───bash
         ├─su───glance-registry
         ├─su───glance-api
         ├─su───nova-network
         ├─su───nova-api
         ├─su───nova-scheduler
         ├─su───nova-objectstor
         ├─su───nova-cert
         ├─su───nova-compute
         ├─udevd───2*[udevd]
         ├─upstart-socket-
         ├─upstart-udev-br
         └─whoopsie───{whoopsie}
    

    创建nova用户,角色,项目

    
    #创建用户
    ~ sudo nova-manage user admin openstack
    2013-07-14 21:22:00 DEBUG nova.utils [req-6a95dd03-04db-4f60-9198-d77a4d4936e8 None None] backend  from (pid=9254) __get_backend /usr/lib/python2.7/dist-packages/nova/utils.py:663
    2013-07-14 21:22:00 AUDIT nova.auth.manager [-] Created user openstack (admin: True)
    export EC2_ACCESS_KEY=62ff82fa-74a9-4ffb-a420-ea190e893863
    export EC2_SECRET_KEY=f1f32aed-85fe-406d-8f28-bbf02d7a7134
    
    #创建角色
    ~ sudo nova-manage role add openstack cloudadmin
    2013-07-14 21:22:15 AUDIT nova.auth.manager [-] Adding sitewide role cloudadmin to user openstack
    2013-07-14 21:22:15 DEBUG nova.utils [req-a9d8cdfa-263c-4d6a-8c69-d6571aabee00 None None] backend  from (pid=9262) __get_backend /usr/lib/python2.7/dist-packages/nova/utils.py:663
    
    #创建项目
    ~ sudo nova-manage project create cookbook openstack
    2013-07-14 21:22:34 DEBUG nova.utils [req-3a340500-6674-439e-ac95-e28954637cf5 None None] backend  from (pid=9395) __get_backend /usr/lib/python2.7/dist-packages/nova/utils.py:663
    2013-07-14 21:22:34 AUDIT nova.auth.manager [-] Created project cookbook with manager openstack
    
    ~ sudo nova-manage project zipfile cookbook openstack
    2013-07-14 21:22:49 DEBUG nova.utils [req-429e7839-6009-4862-98c5-af01ceac9cee None None] backend  from (pid=9402) __get_backend /usr/lib/python2.7/dist-packages/nova/utils.py:663
    2013-07-14 21:22:49 DEBUG nova.utils [-] Running cmd (subprocess): openssl genrsa -out /tmp/tmprZNpYT/temp.key 1024 from (pid=9402) execute /usr/lib/python2.7/dist-packages/nova/utils.py:224
    2013-07-14 21:22:49 DEBUG nova.utils [-] Running cmd (subprocess): openssl req -new -key /tmp/tmprZNpYT/temp.key -out /tmp/tmprZNpYT/temp.csr -batch -subj /C=US/ST=California/O=OpenStack/OU=NovaDev/CN=cookbook-openstack-2013-07-14T13:22:49Z from (pid=9402) execute /usr/lib/python2.7/dist-packages/nova/utils.py:224
    2013-07-14 21:22:49 DEBUG nova.crypto [-] Flags path: /var/lib/nova/CA from (pid=9402) _sign_csr /usr/lib/python2.7/dist-packages/nova/crypto.py:290
    2013-07-14 21:22:49 DEBUG nova.utils [-] Running cmd (subprocess): openssl ca -batch -out /tmp/tmpJZvmrM/outbound.csr -config ./openssl.cnf -infiles /tmp/tmpJZvmrM/inbound.csr from (pid=9402) execute /usr/lib/python2.7/dist-packages/nova/utils.py:224
    2013-07-14 21:22:49 DEBUG nova.utils [-] Running cmd (subprocess): openssl x509 -in /tmp/tmpJZvmrM/outbound.csr -serial -noout from (pid=9402) execute /usr/lib/python2.7/dist-packages/nova/utils.py:224
    2013-07-14 21:22:49 WARNING nova.auth.manager [-] No vpn data for project cookbook
    

    安装命令行工具及配置

    
    ~ sudo apt-get install euca2ools python-novaclient unzip
    
    ~ pwd
    /home/openstack
    
    ~ ls -l
    -rw-r--r-- 1 root root 5930 Jul 13 20:38 nova.zip
    
    #解压工具包
    ~ unzip nova.zip
    Archive:  nova.zip
    extracting: novarc
    extracting: pk.pem
    extracting: cert.pem
    extracting: cacert.pem
    
    #增加环境变量
    ~ . novarc
    
    #查看环境变量
    ~ env
    LC_PAPER=en_US
    LC_ADDRESS=en_US
    LC_MONETARY=en_US
    SHELL=/bin/sh
    TERM=xterm
    SSH_CLIENT=192.168.1.11 60377 22
    LC_NUMERIC=en_US
    EUCALYPTUS_CERT=/home/openstack/cacert.pem
    OLDPWD=/var/log/libvirt
    SSH_TTY=/dev/pts/0
    LC_ALL=en_US.UTF-8
    USER=openstack
    LC_TELEPHONE=en_US
    NOVA_CERT=/home/openstack/cacert.pem
    EC2_SECRET_KEY=6f964a16-6036-44ef-bdf3-23dff94f5b94
    NOVA_PROJECT_ID=cookbook
    EC2_USER_ID=42
    PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/home/conan/toolkit/jdk16/bin:/home/conan/toolkit/cassandra/bin
    MAIL=/var/mail/openstack
    NOVA_VERSION=1.1
    LC_IDENTIFICATION=en_US
    NOVA_USERNAME=openstack
    PWD=/home/openstack
    JAVA_HOME=/home/conan/toolkit/jdk16
    LANG=en_US.UTF-8
    CASSANDRA_HOME=/home/conan/toolkit/cassandra125
    LC_MEASUREMENT=en_US
    NOVA_API_KEY=openstack
    NOVA_URL=http://172.16.0.1:8774/v1.1/
    SHLVL=1
    HOME=/home/openstack
    LANGUAGE=en_US:en
    EC2_URL=http://172.16.0.1:8773/services/Cloud
    LOGNAME=openstack
    SSH_CONNECTION=192.168.1.11 60377 192.168.1.200 22
    EC2_ACCESS_KEY=openstack:cookbook
    EC2_PRIVATE_KEY=/home/openstack/pk.pem
    DISPLAY=localhost:10.0
    S3_URL=http://172.16.0.1:3333
    LC_TIME=en_US
    EC2_CERT=/home/openstack/cert.pem
    LC_NAME=en_US
    _=/usr/bin/env
    
    #创建密钥
    ~ euca-add-keypair openstack > openstack.pem
    ~ chmod 0600 *.pem
    
    ~ ls -l
    -rw------- 1 openstack openstack 1029 Jul 13 20:38 cacert.pem
    -rw------- 1 openstack openstack 2515 Jul 13 20:38 cert.pem
    -rw------- 1 openstack openstack 1113 Jul 13 20:38 novarc
    -rw-r--r-- 1 root      root      5930 Jul 13 20:38 nova.zip
    -rw------- 1 openstack openstack  954 Jul 13 20:50 openstack.pem
    -rw------- 1 openstack openstack  887 Jul 13 20:38 pk.pem
    

    查看nova服务

    
    ~ euca-describe-availability-zones verbose
    AVAILABILITYZONE        nova    available
    AVAILABILITYZONE        |- nova
    AVAILABILITYZONE        | |- nova-scheduler     enabled :-) 2013-07-14 13:23:53
    AVAILABILITYZONE        | |- nova-compute       enabled :-) 2013-07-14 13:23:53
    AVAILABILITYZONE        | |- nova-cert  enabled :-) 2013-07-14 13:23:53
    AVAILABILITYZONE        | |- nova-network       enabled :-) 2013-07-14 13:23:53
    

    6. 创建nova实例

    上传云实例到虚拟主机
    ubuntu-12.04-server-cloudimg-i386.tar.gz文件,请自己下载:http://uec-images.ubuntu.com/releases/precise/release/ubuntu-12.04-server-cloudimg-i386.tar.gz

    
    ~ scp ubuntu-12.04-server-cloudimg-i386.tar.gz openstack@192.168.1.200:/home/openstack
    ubuntu-12.04-server-cloudimg-i386.tar.gz                                              100%  206MB -11880.-5KB/s   00:07
    
    ~ ls -l
    -rw------- 1 openstack openstack      1029 Jul 14 21:22 cacert.pem
    -rw------- 1 openstack openstack      2515 Jul 14 21:22 cert.pem
    -rw------- 1 openstack openstack      1113 Jul 14 21:22 novarc
    -rw-r--r-- 1 root      root           5930 Jul 14 21:22 nova.zip
    -rw------- 1 openstack openstack       954 Jul 14 21:23 openstack.pem
    -rw------- 1 openstack openstack       887 Jul 14 21:22 pk.pem
    -rw-r--r-- 1 openstack openstack 215487341 Jul 14 21:25 ubuntu-12.04-server-cloudimg-i386.tar.gz
    

    安装云实例

    
    ~ cloud-publish-tarball ubuntu-12.04-server-cloudimg-i386.tar.gz images i386
    
    Sun Jul 14 21:25:34 CST 2013: ====== extracting image ======
    Warning: no ramdisk found, assuming '--ramdisk none'
    kernel : precise-server-cloudimg-i386-vmlinuz-virtual
    ramdisk: none
    image  : precise-server-cloudimg-i386.img
    Sun Jul 14 21:25:40 CST 2013: ====== bundle/upload kernel ======
    Sun Jul 14 21:26:06 CST 2013: ====== bundle/upload image ======
    Sun Jul 14 21:27:04 CST 2013: ====== done ======
    emi="ami-00000002"; eri="none"; eki="aki-00000001";
    

    查看云实例
    两种查看方式:euca客户端和nova客户端

    注:注册image的操作要经过:decrypting,untarring,uploading,available这几个状态,需要等待几分钟

    
    ~ euca-describe-images
    
    IMAGE   aki-00000001    images/precise-server-cloudimg-i386-vmlinuz-virtual.manifest.xmlavailable        private         i386    kernel                          instance-store
    IMAGE   ami-00000002    images/precise-server-cloudimg-i386.img.manifest.xml            available        private         i386    machine aki-00000001                    instance-store
    
    ~ nova image-list
    +--------------------------------------+-----------------------------------------------------+--------+--------+
    |                  ID                  |                         Name                        | Status | Server |
    +--------------------------------------+-----------------------------------------------------+--------+--------+
    | 306eb471-bbc5-495e-b7a1-484e11f71502 | images/precise-server-cloudimg-i386-vmlinuz-virtual | ACTIVE |        |
    | 9dbf632e-b0d8-4230-a0e5-ee3836040492 | images/precise-server-cloudimg-i386.img             | ACTIVE |        |
    +--------------------------------------+-----------------------------------------------------+--------+--------+
    

    设置云实例网络

    
    ~ euca-authorize default -P tcp -p 22 -s 0.0.0.0/0
    GROUP   default
    PERMISSION      default ALLOWS  tcp     22      22      FROM    CIDR    0.0.0.0/0
    
    ~ euca-authorize default -P icmp -t -1:-1
    GROUP   default
    PERMISSION      default ALLOWS  icmp    -1      -1      FROM    CIDR    0.0.0.0/0
    

    查看系统空间

    
    ~ df -h
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/sda1        34G  3.7G   29G  12% /
    udev            3.0G  4.0K  3.0G   1% /dev
    tmpfs           1.2G  316K  1.2G   1% /run
    none            5.0M     0  5.0M   0% /run/lock
    none            3.0G     0  3.0G   0% /run/shm
    cgroup          3.0G     0  3.0G   0% /sys/fs/cgroup
    
    ~ nova flavor-list
    +----+-----------+-----------+------+-----------+------+-------+-------------+
    | ID |    Name   | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor |
    +----+-----------+-----------+------+-----------+------+-------+-------------+
    | 1  | m1.tiny   | 512       | 0    | 0         |      | 1     | 1.0         |
    | 2  | m1.small  | 2048      | 10   | 20        |      | 1     | 1.0         |
    | 3  | m1.medium | 4096      | 10   | 40        |      | 2     | 1.0         |
    | 4  | m1.large  | 8192      | 10   | 80        |      | 4     | 1.0         |
    | 5  | m1.xlarge | 16384     | 10   | 160       |      | 8     | 1.0         |
    +----+-----------+-----------+------+-----------+------+-------+-------------+
    

    还有29G硬盘空间,根据云实例创建favor-list,我可以选择 tiny或者small。

    这里我选择用tiny模式

    
    ~ euca-run-instances ami-00000002 -t m1.tiny -k openstack
    RESERVATION     r-f6y5tydu      cookbook        default
    INSTANCE        i-00000001      ami-00000002                    pending openstack (cookbook, None)       0               m1.tiny 2013-07-14T13:30:02.000Z        unknown zone    aki-00000001                     monitoring-disabled     
    

    查看云实例列表
    这个操作要等几分钟:

    
    ~ euca-describe-instances
    RESERVATION     r-f6y5tydu      cookbook        default
    INSTANCE        i-00000001      ami-00000002    172.16.1.1      10.0.0.3        running openstack (cookbook, nova)       0               m1.tiny 2013-07-14T13:30:02.000Z        nova     aki-00000001                    monitoring-disabled     172.16.1.1      10.0.0.3instance-store
    
    ~ nova list
    +--------------------------------------+----------+--------+----------------------------+
    |                  ID                  |   Name   | Status |          Networks          |
    +--------------------------------------+----------+--------+----------------------------+
    | d6e5fe88-1950-48f4-853a-2fd57e6c72f4 | Server 1 | ACTIVE | vmnet=10.0.0.3, 172.16.1.1 |
    +--------------------------------------+----------+--------+----------------------------+
    
    ~ top
    11424 libvirt-  20   0 1451m 322m 7284 S  105  5.4   1:20.58 qemu-system-x86
     8962 nova      20   0  265m 104m 5936 S    3  1.8   0:17.57 nova-api
       35 root      25   5     0    0    0 S    1  0.0   0:00.66 ksmd
     5933 rabbitmq  20   0 2086m  28m 2468 S    1  0.5   0:03.33 beam.smp
     8969 nova      20   0  195m  48m 4748 S    1  0.8   0:02.77 nova-scheduler
     9190 nova      20   0 1642m  63m 6660 S    1  1.1   0:08.14 nova-compute
     8522 mysql     20   0 1118m  54m 8276 S    1  0.9   0:04.45 mysqld
     8951 nova      20   0  197m  50m 4756 S    1  0.9   0:03.09 nova-network
    

    7. 登陆云实例

    ping通过云实例

    
    ~ ping 172.16.1.1
    PING 172.16.1.1 (172.16.1.1) 56(84) bytes of data.
    64 bytes from 172.16.1.1: icmp_req=1 ttl=64 time=5.98 ms
    64 bytes from 172.16.1.1: icmp_req=2 ttl=64 time=2.00 ms
    64 bytes from 172.16.1.1: icmp_req=3 ttl=64 time=3.27 ms
    

    通过证书登陆云实例

    
    ~ ssh -i openstack.pem ubuntu@172.16.1.1
    The authenticity of host '172.16.1.1 (172.16.1.1)' can't be established.
    ECDSA key fingerprint is b8:0b:a6:18:0d:30:06:ea:79:c7:17:e5:29:34:55:39.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added '172.16.1.1' (ECDSA) to the list of known hosts.
    

    在云实例简单操作

    
    ~ ubuntu@server-1:~$ who
    ubuntu   pts/0        2013-07-14 13:35 (172.16.1.1)
    
    ~ ubuntu@server-1:~$ ifconfig
    eth0      Link encap:Ethernet  HWaddr fa:16:3e:22:75:8f
              inet addr:10.0.0.3  Bcast:10.0.0.63  Mask:255.255.255.192
              inet6 addr: fe80::f816:3eff:fe22:758f/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:263 errors:0 dropped:0 overruns:0 frame:0
              TX packets:250 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:29823 (29.8 KB)  TX bytes:28061 (28.0 KB)
    
    lo        Link encap:Local Loopback
              inet addr:127.0.0.1  Mask:255.0.0.0
              inet6 addr: ::1/128 Scope:Host
              UP LOOPBACK RUNNING  MTU:16436  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0
              RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
    

    nova实验安装完成!!

    8. 错误总结

    看上去上面过程,就是命令操作,但其实会遇到很多的问题。

    1. 版本问题:如果用Ubunut 12.04.2时,和上面完全一样的操作,在创建云实例的时候会失败。云实例起不来,从RUNNING状态直接进入SHUTDOWN状态。ping不通,ssh也连不上。

    2. 版本问题:如果用Ubunut 13.03时,命令操作,Nova数据库,Nova配置文件, Nova服务都已经和书中实例不一样了。按照书的操作无法进行。

    3. 依赖软件问题:pm-utils这个软件竟然没有做libvirtd的依赖,需要手动安装。
    如果没有安装会出现下面的错误

    
    ~ sudo cat /var/log/libvirt/libvirtd.log
    
    2013-07-13 12:20:25.511+0000: 9292: info : libvirt version: 0.9.8
    2013-07-13 12:20:25.511+0000: 9292: error : virExecWithHook:327 : Cannot find 'pm-is-supported' in path: No such file or directory
    2013-07-13 12:20:25.511+0000: 9292: warning : qemuCapsInit:856 : Failed to get host power management capabilities
    2013-07-13 12:20:25.653+0000: 9292: error : virExecWithHook:327 : Cannot find 'pm-is-supported' in path: No such file or directory
    2013-07-13 12:20:25.653+0000: 9292: warning : lxcCapsInit:77 : Failed to get host power management capabilities
    2013-07-13 12:20:25.654+0000: 9292: error : virExecWithHook:327 : Cannot find 'pm-is-supported' in path: No such file or directory
    2013-07-13 12:20:25.655+0000: 9292: warning : umlCapsInit:87 : Failed to get host power management capabilities
    
    #解决pm-is-supported错误
    ~ sudo apt-get -y install pm-utils
    ~ sudo stop libvirt-bin
    ~ sudo start libvirt-bin
    

    4. nova状态问题:注册image时间,会卡在untarring永久不动了。nova image-list状态是saving。
    这是因为image由于各种原因没有注册成功,我第一次就由于硬盘满了,就一直卡在这个状态了,也看不到错误信息太郁闷了。

    解决办法:删除原来卡住的 image,重新注册。

    
    euca-deregister aki-00000001
    euca-deregister ami-00000002
    

    耐心很重要,坚持就是胜利。

    转载请注明出处:
    http://blog.fens.me/vps-nova-setup/

    打赏作者

    Cassandra单集群实验2个节点

    cassandra-title

    前言

    Apache Cassandra是一套开源分布式Key-Value存储系统。它最初由Facebook开发,用于储存特别大的数据。主要特性:分布式,基于column的结构化,高伸展性。作为NoSQL的一支代表,虽然现在已经被hbase超越,但Cassandra的很多的设计思想是非常值得我们学习和借鉴的。感谢tigerfish老师的详细讲解,让我收获颇多!

    Cassandra中非常有用的几个概念:一致性哈希,Gossip协议,Snitch,复制策略,DHT,BloomFilter。

    关于作者:
    张丹(Conan), 程序员Java,R,PHP,Javascript
    weibo:@Conan_Z
    blog: http://blog.fens.me
    email: bsspirit@gmail.com

    转载请注明出处:
    http://blog.fens.me/cassandra-clustor/

    目录

    1. Cassandra集群实验2个节点
    2. 实验过程的错误及修复

    1. Cassandra集群实验2个节点

    1. 下载Cassandra并配置JAVA环境(跳过)
    2. 安装第一个Cassandra节点,解压到/home/conan/toolkit/cassandra125目录

    ~ pwd
    /home/conan/toolkit/cassandra125
    

    ip地址:192.168.1.200

    
    ~ ifconfig
    eth0      Link encap:Ethernet  HWaddr 08:00:27:90:e8:19
          inet addr:192.168.1.200  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fe90:e819/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:16943 errors:0 dropped:0 overruns:0 frame:0
          TX packets:19527 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1433046 (1.4 MB)  TX bytes:2902059 (2.9 MB)
    

    3. 设置环境变更

    
    ~ sudo vi /etc/environment
    CASSANDRA_HOME=/home/conan/toolkit/cassandra125
    
    ~ . /etc/environment
    
    ~ export |grep /home/conan/toolkit/cassandra125
    declare -x CASSANDRA_HOME="/home/conan/toolkit/cassandra125"
    declare -x OLDPWD="/home/conan/toolkit/cassandra125"
    declare -x PWD="/home/conan/toolkit/cassandra125/bin"
    

    4. 创建存储和日志目录

    
    ~ sudo rm -rf /var/lib/cassandra
    
    ~ sudo mkdir -p /var/lib/cassandra/data
    ~ sudo mkdir -p /var/lib/cassandra/saved_caches
    ~ sudo mkdir -p /var/lib/cassandra/commitlog
    ~ sudo mkdir -p /var/log/cassandra/
    
    ~ sudo chown -R conan:conan /var/lib/cassandra
    ~ sudo chown -R conan:conan /var/log/cassandra
    
    ~ ls -l /var/lib/cassandra
    drwxr-xr-x  2 conan conan 4096 Jul  4 00:15 commitlog/
    drwxr-xr-x  2 conan conan 4096 Jul  4 00:15 data/
    drwxr-xr-x  2 conan conan 4096 Jul  4 00:15 saved_caches/
    

    5. 修改配置文件cassandra.yaml,按文件顺序列表修改的地方

    
    ~ vi /home/conan/toolkit/cassandra125/conf/cassandra.yaml
    
    cluster_name: 'case1'
    num_tokens: 256
    
    data_file_directories:
    - /var/lib/cassandra/data
    
    commitlog_directory: /var/lib/cassandra/commitlog
    
    saved_caches_directory: /var/lib/cassandra/saved_caches
    
    seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
          - seeds: "192.168.1.200"
    
    #listen_address: localhost
    listen_address: 192.168.1.200
    
    #rpc_address: localhost
    rpc_address: 192.168.1.200
    
    endpoint_snitch: SimpleSnitch
    

    6. 启动节点

    
    ~ cd /home/conan/toolkit/cassandra125/
    
    ~ bin/cassandra -f
    
    #部分日志
    INFO 00:23:22,785 Enqueuing flush of Memtable-schema_columnfamilies@1792194126(1097/1097 serialized/live bytes, 20 ops)
    INFO 00:23:22,786 Writing Memtable-schema_columnfamilies@1792194126(1097/1097 serialized/live bytes, 20 ops)
    INFO 00:23:22,796 Completed flushing /var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-ic-2-Data.db (698 bytes) for commitlog position ReplayPosition(segmentId=1372868601408, position=64705)
    INFO 00:23:22,797 Enqueuing flush of Memtable-schema_columns@552364977(251/251 serialized/live bytes, 5 ops)
    INFO 00:23:22,798 Writing Memtable-schema_columns@552364977(251/251 serialized/live bytes, 5 ops)
    INFO 00:23:22,808 Completed flushing /var/lib/cassandra/data/system/schema_columns/system-schema_columns-ic-2-Data.db (209 bytes) for commitlog position ReplayPosition(segmentId=1372868601408, position=64705)
    INFO 00:23:22,894 Starting listening for CQL clients on /192.168.1.200:9042...
    INFO 00:23:22,906 Binding thrift service to /192.168.1.200:9160
    INFO 00:23:22,931 Using TFramedTransport with a max frame size of 15728640 bytes.
    INFO 00:23:22,952 Using synchronous/threadpool thrift server on 192.168.1.200 : 9160
    INFO 00:23:22,953 Listening for thrift clients...
    INFO 00:23:33,101 Created default superuser 'cassandra'
    

    7. 查看集群的状态

    
    bin/nodetool status
    Datacenter: datacenter1
    =======================
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address        Load       Tokens  Owns (effective)  Host ID                               Rack
    UN  192.168.1.200  51.01 KB   256     100.0%            e7106e0a-1a9e-43a2-9bcc-fc1201076fee  rack1
    

    8. 增加第2个节点到集群:2个节点
    计算机ip: 192.168.1.201

    
    ~ ifconfig
    eth0      Link encap:Ethernet  HWaddr 08:00:27:0d:0b:0b
              inet addr:192.168.1.201  Bcast:192.168.1.255  Mask:255.255.255.0
              inet6 addr: fe80::a00:27ff:fe0d:b0b/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:45455 errors:0 dropped:0 overruns:0 frame:0
              TX packets:14717 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:33590582 (33.5 MB)  TX bytes:2549931 (2.5 MB)
    

    9. 从第一节点192.168.1.200把cassandra125和jdk目录复制过来

    
    ~ pwd
    /home/conan/toolkit
    
    ~ scp -r conan@192.168.1.200:/home/conan/toolkit/cassandra125 .
    ~ scp -r conan@192.168.1.200:/home/conan/toolkit/jdk16 .
    
    ~ ls -l
    drwxrwxr-x  9 conan conan 4096 Apr 25 03:04 cassandra125
    drwxr-xr-x 10 conan conan 4096 Apr 25 03:33 jdk16
    

    10. 设置环境变量

    
    ~ sudo vi /etc/environment
    
    PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/home/conan/toolkit/jdk16/bin:/home/conan/toolkit/cassandra/bin"
    JAVA_HOME=/home/conan/toolkit/jdk16
    CASSANDRA_HOME=/home/conan/toolkit/cassandra125
    
    ~ . /etc/environment
    
    ~ java -version
    java version "1.6.0_29"
    Java(TM) SE Runtime Environment (build 1.6.0_29-b11)
    Java HotSpot(TM) 64-Bit Server VM (build 20.4-b02, mixed mode)
    

    11. 创建存储和日志目录

    
    ~ sudo rm -rf /var/lib/cassandra
    
    ~ sudo mkdir -p /var/lib/cassandra/data
    ~ sudo mkdir -p /var/lib/cassandra/saved_caches
    ~ sudo mkdir -p /var/lib/cassandra/commitlog
    ~ sudo mkdir -p /var/log/cassandra/
    
    ~ sudo chown -R conan:conan /var/lib/cassandra
    ~ sudo chown -R conan:conan /var/log/cassandra
    
    ~ ls -l /var/lib/cassandra
    drwxr-xr-x  2 conan conan 4096 Jul  4 00:15 commitlog/
    drwxr-xr-x  2 conan conan 4096 Jul  4 00:15 data/
    drwxr-xr-x  2 conan conan 4096 Jul  4 00:15 saved_caches/
    

    12. 修改配置文件cassandra.yaml,按文件顺序列表修改的地方

    
    ~ vi /home/conan/toolkit/cassandra125/conf/cassandra.yaml
    
    cluster_name: 'case1'
    
    seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
          - seeds: "192.168.1.200"
    
    listen_address: 192.168.1.201
    
    rpc_address: 192.168.1.201
    

    13. 启动节点192.168.1.201

    
    ~ bin/cassandra -f
    
    //部分日志
    INFO 03:36:47,476 Completed flushing /var/lib/cassandra/data/system/local/system-local-ic-4-Data.db (75 bytes) for commitlog position ReplayPosition(segmentId=1366832174115, position=77582)
    INFO 03:36:47,504 Compacting [SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-ic-1-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-ic-3-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-ic-4-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-ic-2-Data.db')]
    INFO 03:36:47,527 Enqueuing flush of Memtable-local@692438881(10094/10094 serialized/live bytes, 257 ops)
    INFO 03:36:47,533 Writing Memtable-local@692438881(10094/10094 serialized/live bytes, 257 ops)
    INFO 03:36:47,547 Completed flushing /var/lib/cassandra/data/system/local/system-local-ic-5-Data.db (5365 bytes) for commitlog position ReplayPosition(segmentId=1366832174115, position=89585)
    INFO 03:36:47,660 Node /192.168.1.201 state jump to normal
    INFO 03:36:47,663 Startup completed! Now serving reads.
    INFO 03:36:47,686 Compacted 4 sstables to [/var/lib/cassandra/data/system/local/system-local-ic-6,].  5,956 bytes to 5,687 (~95% of original) in 158ms = 0.034326MB/s.  4 total rows, 1 unique.  Row merge counts were {1:0, 2:0, 3:0, 4:1, }
    INFO 03:36:47,771 Starting listening for CQL clients on /192.168.1.201:9042...
    INFO 03:36:47,785 Binding thrift service to /192.168.1.201:9160
    INFO 03:36:47,810 Using TFramedTransport with a max frame size of 15728640 bytes.
    INFO 03:36:47,834 Using synchronous/threadpool thrift server on 192.168.1.201 : 9160
    INFO 03:36:47,834 Listening for thrift clients...
    

    14. 查看节点1,192.168.1.200的日志

    
    INFO 01:01:27,382 InetAddress /192.168.1.201 is now UP
    INFO 01:01:58,660 Beginning transfer to /192.168.1.201
    INFO 01:01:58,661 Flushing memtables for [CFS(Keyspace='system_auth', ColumnFamily='users')]...
    INFO 01:01:58,663 Enqueuing flush of Memtable-users@1338035062(28/28 serialized/live bytes, 2 ops)
    INFO 01:01:58,668 Writing Memtable-users@1338035062(28/28 serialized/live bytes, 2 ops)
    INFO 01:01:59,010 Completed flushing /var/lib/cassandra/data/system_auth/users/system_auth-users-ic-1-Data.db (64 bytes) for commitlog position ReplayPosition(segmentId=1372868601408, position=65900)
    INFO 01:01:59,047 Stream context metadata [/var/lib/cassandra/data/system_auth/users/system_auth-users-ic-1-Data.db sections=1 progress=0/64 - 0%], 1 sstables.
    INFO 01:01:59,048 Streaming to /192.168.1.201
    INFO 01:01:59,122 Successfully sent /var/lib/cassandra/data/system_auth/users/system_auth-users-ic-1-Data.db to /192.168.1.201
    INFO 01:01:59,123 Finished streaming session to /192.168.1.201
    INFO 01:01:59,424 Enqueuing flush of Memtable-peers@1855686378(10279/10279 serialized/live bytes, 271 ops)
    INFO 01:01:59,425 Writing Memtable-peers@1855686378(10279/10279 serialized/live bytes, 271 ops)
    INFO 01:01:59,497 Completed flushing /var/lib/cassandra/data/system/peers/system-peers-ic-1-Data.db (5538 bytes) for commitlog position ReplayPosition(segmentId=1372868601408, position=77902)
    

    15. 集群中已经成功加载了192.168.1.201个节点了。

    
    bin/nodetool status
    
    Datacenter: datacenter1
    =======================
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address        Load       Tokens  Owns (effective)  Host ID                               Rack
    UN  192.168.1.200  65.46 KB   256     48.7%             e7106e0a-1a9e-43a2-9bcc-fc1201076fee  rack1
    UN  192.168.1.201  58.04 KB   256     51.3%             8eef1965-9822-44bf-a9f6-fff5b87bc474  rack1
    

    实验完成!!

    2. 实验过程的错误及修复

    1. 不要在已经建立keystore的节点,再修改cluster name
    出现下面的错误

    
    ERROR 23:29:58,004 Fatal exception during initialization
    org.apache.cassandra.exceptions.ConfigurationException: Saved cluster name Test Cluster != configured name case1
    

    解决办法:http://wiki.apache.org/cassandra/FAQ

    
    Cassandra says "ClusterName mismatch: oldClusterName != newClusterName" and refuses to start
    
    To prevent operator errors, Cassandra stores the name of the cluster in its system table. If you need to rename a cluster for some reason, you can:
    
    Perform these steps on each node:
    
    Start the cassandra-cli connected locally to this node.
    Run the following:
    use system;
    set LocationInfo[utf8('L')][utf8('ClusterName')]=utf8('');
    exit;
    Run nodetool flush on this node.
    Update the cassandra.yaml file for the cluster_name as the same as 2b).
    Restart the node.
    Once all nodes have been had this operation performed and restarted, nodetool ring should show all nodes as UP.
    

    2. 修改配置文件时:后一定要有空格
    会出现下面的错误提示:

    
    while scanning a simple key; could not found expected ':'
    

    举例:修改下面配置

    
    #错误语法
    listen_address:localhost
    
    #正确语法
    listen_address: localhost
    

    问题解释:
    上面问题是由于,:和”之间没有空格,引起的解析错误。

    转载请注明出处:
    http://blog.fens.me/cassandra-clustor/

    打赏作者

    Dataguru北京线下聚会圆满成功

    跨界知识聚会系列文章,“知识是用来分享和传承的”,各种会议、论坛、沙龙都是分享知识的绝佳场所。我也有幸作为演讲嘉宾参加了一些国内的大型会议,向大家展示我所做的一些成果。从听众到演讲感觉是不一样的,把知识分享出来,你才能收获更多。

    关于作者

    • 张丹(Conan), 程序员Java,R,PHP,Javascript
    • weibo:@Conan_Z
    • blog: http://blog.fens.me
    • email: bsspirit@gmail.com

    转载请注明出处:
    http://blog.fens.me/dataguru-beijing-meeting-20130616/

    title

    嘉宾PPT分享下载:

    感谢活动团队:

    • 发起人、组织者:张丹
    • 报名统计:于双海,何青
    • 场地赞助:何青
    • 通         知:张丹 
    • 主   持  人:李阳
    • 维持秩序 :于双海,何青
    • 摄   影  师:张丹,于双海

    01

     

    参加活动的同学:

    姓名 公司 dataguruID 参加课程
    张丹
    bsspirit R, R展现,SAS,Hadoop, NoSQL,Oracle,Openstack, Kettle
    李阳 中科辅龙 casliyang hadoop1
    于双海  博彦科技 sunev_yu hadoop1
    何青 微课网 heqingcool hadoop1
    张宣彬 Oracle linou hadoop1
    刘盛 金融机构 leonarding oracle系列
    王非  当当网
    阮宏博  浙江和仁科技有限公司 976073363@qq.com
    李雪峰  浙江和仁科技有限公司
    丁波  zejia 独角兽老头 R1
    董红磊 董红磊
    梁胜和  oracle海量架构、性能优化、SAS、R
    邓奕 板凳总 Hadoop、NoSQL
    马光东  亚信联创 27112 oracle深入课程

     

    02

    活动现场视频:

    自我介绍

    Nginx fastdfs图片系统–王非

    大数据解决方案之基于用户位置分析的应用 – 张宣彬

    Oracle 索引优化思路–案例分享 — 刘盛

    数据分析平台规划–课程贯穿 — 张丹

    转载请注明出处:
    http://blog.fens.me/dataguru-beijing-meeting-20130616/

    打赏作者