• Posts tagged "cassandra"

Blog Archives

在Ubuntu中安装Cassandra

R利剑NoSQL系列文章,主要介绍通过R语言连接使用nosql数据库。涉及的NoSQL产品,包括RedisMongoDBHBaseHiveCassandraNeo4j。希望通过我的介绍让广大的R语言爱好者,有更多的开发选择,做出更多地激动人心的应用。

关于作者:

  • 张丹(Conan), 程序员Java,R,PHP,Javascript
  • weibo:@Conan_Z
  • blog: http://blog.fens.me
  • email: bsspirit@gmail.com

转载请注明出处:
http://blog.fens.me/linux-cassandra-install/

linux-cassandra

前言

Cassandra是由Facebook开发,然后开源的一套分步式数据存储系统,用于海量数据的高伸展性存储。Cassandra的无中心设计,一致性哈希,BloomFilter等技术是亮点。

目录

  1. 在Ubuntu中环境准备
  2. 下载cassandra软件包
  3. 配置Cassandra
  4. 设置环境变量
  5. 启动cassandra服务器
  6. 用客户端访问cassandra

1. 在Ubuntu中环境准备

Cassandra是基于Java开发的NoSQL数据库软件,Cassandra没有提供Windows系统安装版本。我在这里也只介绍Cassandra在Linux Ubuntu系统中的安装。

由于Cassandra是用Java开发的,因此我们需要先安装好Java的环境,Java的安装请参考文章:在Ubuntu中安装Java(JDK)

Cassandra没有提供apt的软件源安装,我们需要自己去官方网络下载Cassandra软件包进行安装。Cassandra下载页:http://cassandra.apache.org/download/

下载Cassandra时,发现cassandra有两个版本在并行发布。因此我们有2种选择,最新2.0分支的版本2.0.6(2014-03-10发布);或者1.2分支的版本1.2.15(2014-02-07)。

由于我不清楚2.0分支,就底有哪些改进;所以本文将以1.2分支1.2.15版本为例,进行单机的安装和配置。

系统环境:

  • Linux Ubuntu 12.04.2 LTS 64bit server
  • Java JDK 1.6.0_45

2. 下载cassandra软件包

下载cassandra


# 1.2.15版本下载
~ wget http://apache.dataguru.cn/cassandra/1.2.15/apache-cassandra-1.2.15-bin.tar.gz

# 解压软件包
~ tar xvf apache-cassandra-1.2.15-bin.tar.gz

# 查看文件和目录
~ tree -L 2
.
├── apache-cassandra-1.2.15
│   ├── bin
│   ├── CHANGES.txt
│   ├── conf
│   ├── interface
│   ├── javadoc
│   ├── lib
│   ├── LICENSE.txt
│   ├── NEWS.txt
│   ├── NOTICE.txt
│   ├── pylib
│   ├── README.txt
│   └── tools
├── apache-cassandra-1.2.15-bin.tar.gz

# 对Cassandra解压目录改名
~ mv apache-cassandra-1.2.15/ cassandra1215

# 进行目录
~ cd cassandra1215/

3 配置Cassandra

设置cassandra数据目录

  • data_file_directories:为数据文件目录
  • commitlog_directory:为日志文件目录
  • saved_caches_directory:为缓存文件目录

用vi打开cassandra的配置文件cassandra.yaml


~ vi conf/cassandra.yaml

data_file_directories:
    - /var/lib/cassandra/data
commitlog_directory: /var/lib/cassandra/commitlog
saved_caches_directory: /var/lib/cassandra/saved_caches

确认操作系统中,这几个目录已被创建。
同时确认/var/log/cassandra/目录,对于操作用户是可写的。


# 新建目录
~ sudo mkdir -p /var/lib/cassandra/data
~ sudo mkdir -p /var/lib/cassandra/saved_caches
~ sudo mkdir -p /var/lib/cassandra/commitlog
~ sudo mkdir -p /var/log/cassandra/

# 把目录归属改成操作用户
~ sudo chown -R conan:conan /var/lib/cassandra
~ sudo chown -R conan:conan /var/log/cassandra/

# 相看目录权限
~ ll /var/lib/cassandra
drwxr-xr-x  2 conan conan 4096  3月 22 06:23 commitlog/
drwxr-xr-x  2 conan conan 4096  3月 22 06:23 data/
drwxr-xr-x  2 conan conan 4096  3月 22 06:23 saved_caches/

4 设置环境变量


~ sudo vi /etc/environment
CASSANDRA_HOME=/home/conan/tookit/cassandra1215

# 让环境变量生效
~ . /etc/environment

#查看环境变量
~ echo $CASSANDRA_HOME
/home/conan/tookit/cassandra1215

5 启动cassandra服务器

通过命令,启动cassandra服务器


#注:-f参数是绑定到console,不加-f则是后台启动。
~ bin/cassandra

 INFO 06:28:08,777 Logging initialized
 INFO 06:28:08,795 JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_45
 INFO 06:28:08,799 Heap size: 2051014656/2051014656
 INFO 06:28:08,799 Classpath: bin/../conf:bin/../build/classes/main:bin/../build/classes/thrift:bin/../lib/antlr-3.2.jar:bin/../lib/apache-cassandra-1.2.15.jar:bin/../lib/apache-cassandra-clientutil-1.2.15.jar:bin/../lib/apache-cassandra-thrift-1.2.15.jar:bin/../lib/avro-1.4.0-fixes.jar:bin/../lib/avro-1.4.0-sources-fixes.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-lang-2.6.jar:bin/../lib/compress-lzf-0.8.4.jar:bin/../lib/concurrentlinkedhashmap-lru-1.3.jar:bin/../lib/guava-13.0.1.jar:bin/../lib/high-scale-lib-1.1.2.jar:bin/../lib/jackson-core-asl-1.9.2.jar:bin/../lib/jackson-mapper-asl-1.9.2.jar:bin/../lib/jamm-0.2.5.jar:bin/../lib/jbcrypt-0.3m.jar:bin/../lib/jline-1.0.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-0.7.0.jar:bin/../lib/log4j-1.2.16.jar:bin/../lib/lz4-1.1.0.jar:bin/../lib/metrics-core-2.2.0.jar:bin/../lib/netty-3.6.6.Final.jar:bin/../lib/servlet-api-2.5-20081211.jar:bin/../lib/slf4j-api-1.7.2.jar:bin/../lib/slf4j-log4j12-1.7.2.jar:bin/../lib/snakeyaml-1.6.jar:bin/../lib/snappy-java-1.0.5.jar:bin/../lib/snaptree-0.1.jar:bin/../lib/jamm-0.2.5.jar
 INFO 06:28:08,801 JNA not found. Native methods will be disabled.
 INFO 06:28:08,813 Loading settings from file:/home/conan/tookit/cassandra1215/conf/cassandra.yaml
 
// 省略日志

查看cassandra系统进程


# 查看cassandra系统进程
~ ps -axu|grep cassandra

conan     5983 18.1  2.1 4499456 172832 pts/1  Sl   06:31   0:05 java -ea -javaagent:bin/../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms1996M -Xmx1996M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -cp bin/../conf:bin/../build/classes/main:bin/../build/classes/thrift:bin/../lib/antlr-3.2.jar:bin/../lib/apache-cassandra-1.2.15.jar:bin/../lib/apache-cassandra-clientutil-1.2.15.jar:bin/../lib/apache-cassandra-thrift-1.2.15.jar:bin/../lib/avro-1.4.0-fixes.jar:bin/../lib/avro-1.4.0-sources-fixes.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-lang-2.6.jar:bin/../lib/compress-lzf-0.8.4.jar:bin/../lib/concurrentlinkedhashmap-lru-1.3.jar:bin/../lib/guava-13.0.1.jar:bin/../lib/high-scale-lib-1.1.2.jar:bin/../lib/jackson-core-asl-1.9.2.jar:bin/../lib/jackson-mapper-asl-1.9.2.jar:bin/../lib/jamm-0.2.5.jar:bin/../lib/jbcrypt-0.3m.jar:bin/../lib/jline-1.0.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-0.7.0.jar:bin/../lib/log4j-1.2.16.jar:bin/../lib/lz4-1.1.0.jar:bin/../lib/metrics-core-2.2.0.jar:bin/../lib/netty-3.6.6.Final.jar:bin/../lib/servlet-api-2.5-20081211.jar:bin/../lib/slf4j-api-1.7.2.jar:bin/../lib/slf4j-log4j12-1.7.2.jar:bin/../lib/snakeyaml-1.6.jar:bin/../lib/snappy-java-1.0.5.jar:bin/../lib/snaptree-0.1.jar org.apache.cassandra.service.CassandraDaemon

# 查看系统端口
~ netstat -nlt|grep 9160
tcp        0      0 127.0.0.1:9160          0.0.0.0:*               LISTEN

6 用客户端访问cassandra

通过客户端程序访问Cassandra服务器


~ bin/cassandra-cli
Connected to: "Test Cluster" on 127.0.0.1/9160
Welcome to Cassandra CLI version 1.2.15

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

# 查看命令行帮助
[default@unknown] ?

Getting around:
?                       Display this help.
help;                   Display this help.
help ;         Display command-specific help.
exit;                   Exit this utility.
quit;                   Exit this utility.

Commands:
assume                  Apply client side validation.
connect                 Connect to a Cassandra node.
consistencylevel        Sets consisteny level for the client to use.
count                   Count columns or super columns.
create column family    Add a column family to an existing keyspace.
create keyspace         Add a keyspace to the cluster.
del                     Delete a column, super column or row.
decr                    Decrements a counter column.
describe cluster        Describe the cluster configuration.
describe                Describe a keyspace and its column families or column family in current keyspace.
drop column family      Remove a column family and its data.
drop keyspace           Remove a keyspace and its data.
drop index              Remove an existing index from specific column.
get                     Get rows and columns.
incr                    Increments a counter column.
list                    List rows in a column family.
set                     Set columns.
show api version        Show the server API version.
show cluster name       Show the cluster name.
show keyspaces          Show all keyspaces and their column families.
show schema             Show a cli script to create keyspaces and column families.
truncate                Drop the data in a column family.
update column family    Update the settings for a column family.
update keyspace         Update the settings for a keyspace.
use                     Switch to a keyspace.

单节的Cassandra,我们已经成功能安装在Linux Ubuntu系统中。

转载请注明出处:
http://blog.fens.me/linux-cassandra-install/

打赏作者

R利剑NoSQL系列文章 之 Cassandra

R利剑NoSQL系列文章,主要介绍通过R语言连接使用nosql数据库。涉及的NoSQL产品,包括Redis,MongoDBHBaseHiveCassandra, Neo4j。希望通过我的介绍让广大的R语言爱好者,有更多的开发选择,做出更多地激动人心的应用。

关于作者:

  • 张丹(Conan), 程序员Java,R,PHP,Javascript
  • weibo:@Conan_Z
  • blog: http://blog.fens.me
  • email: bsspirit@gmail.com

转载请注明:
http://blog.fens.me/nosql-r-cassandra/

rcassandra

第三篇 R利剑Cassandra,分为7个章节。

  1. Cassandra介绍
  2. Cassandra安装
  3. RCassandra安装
  4. RCassandra函数库
  5. RCassandra基本使用操作
  6. RCassandra使用案例
  7. Cassandra的没落

每一章节,都会分为”文字说明部分”和”代码部分”,保持文字说明与代码的连贯性。

1. Cassandra介绍

Apache Cassandra是一套开源分布式NoSQL数据库系统。它最初由Facebook开发,用于储存收件箱等简单格式数据,集Google BigTable的数据模型与Amazon Dynamo的完全分布式的架构于一身。Facebook于2008将 Cassandra 开源,此后,由于Cassandra良好的可扩放性,被Digg、Twitter等知名Web 2.0网站所采纳,成为了一种流行的分布式结构化数据存储方案。

Cassandra 的名称来源于希腊神话,是特洛伊的一位悲剧性的女先知的名字,因此项目的Logo是一只放光的眼睛。

Cassandra的数据会写入多个节点,来保证数据的可靠性,在一致性、可用性和网络分区耐受能力(CAP)的折衷问题上,Cassandra比较灵活,用户在读取时可以指定要求所有副本一致(高一致性)、读到一个副本即可(高可用性)或是通过选举来确认多数副本一致即可(折衷)。这样,Cassandra可以适用于有节点、网络失效,以及多数据中心的场景。

Cassandra介绍摘自:维基百科(http://zh.wikipedia.org/wiki/Cassandra)

2. Cassandra安装

文字说明部分:
首先环境准备,这里我选择了Linux Ubuntu操作系统12.04的64位服务器版本,大家可以根据自己的使用习惯选择顺手的Linux。

JDK使用SUN官方版本JDK 1.6.0_29,请不要用Linux自带的openjdk。

手动下载并安装Cassandra。

Cassandra配置,需要提前初始化几个目录。

  • data_file_directories:为数据文件目录
  • commitlog_directory:为日志文件目录
  • saved_caches_directory:为缓存文件目录

下面将介绍单节点的安装,集群安装请参考:Cassandra单集群实验2个节点

代码部分:
单节点安装:系统环境 Linux Ubuntu 12.04 LTS 64bit server


~ uname -a
Linux u1 3.5.0-23-generic #35~precise1-Ubuntu SMP Fri Jan 25 17:13:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

~ cat /etc/issue
Ubuntu 12.04.2 LTS \n \l

JDK环境:SUN官方JDK 1.6.0_29


~ java -version

java version "1.6.0_29"
Java(TM) SE Runtime Environment (build 1.6.0_29-b11)
Java HotSpot(TM) 64-Bit Server VM (build 20.4-b02, mixed mode)

下载Cassandra并解压


~ wget http://mirrors.tuna.tsinghua.edu.cn/apache/cassandra/1.2.5/apache-cassandra-1.2.5-bin.tar.gz

~ tar xvf apache-cassandra-1.2.5-bin.tar.gz
~ mv apache-cassandra-1.2.5-bin cassandra125
~ mv cassandra125 /home/conan/toolkit/

~ pwd
/home/conan/toolkit

~ ls -l
drwxrwxr-x  9 conan conan 4096 Jun  1 06:10 cassandra125/
drwxr-xr-x 10 conan conan  4096 Apr 23 14:36 jdk16

初始化cassandra


~ cd /home/conan/toolkit/cassandra125

#配置Cassandra数据文件目录
~ vi conf/cassandra.yaml

data_file_directories:
    - /var/lib/cassandra/data
commitlog_directory: /var/lib/cassandra/commitlog
saved_caches_directory: /var/lib/cassandra/saved_caches

目录的介绍:
data_file_directories:为数据文件目录
commitlog_directory:为日志文件目录
saved_caches_directory:为缓存文件目录

确认操作系统中,这几个目录已被创建。
同时确认/var/log/cassandra/目录,对于cassandra是可写的。


~ sudo mkdir -p /var/lib/cassandra/data
~ sudo mkdir -p /var/lib/cassandra/saved_caches
~ sudo mkdir -p /var/lib/cassandra/commitlog
~ sudo mkdir -p /var/log/cassandra/

~ sudo chown -R conan:conan /var/lib/cassandra
~ sudo chown -R conan:conan /var/log/cassandra/

~ ll /var/lib/cassandra
drwxr-xr-x  2 conan conan 4096 Jun  1 06:21 commitlog/
drwxr-xr-x  2 conan conan 4096 Jun  1 06:21 data/
drwxr-xr-x  2 conan conan 4096 Jun  1 06:21 saved_caches/

设置环境变量


~ sudo vi /etc/environment
CASSANDRA_HOME=/home/conan/toolkit/cassandra125

#让变量生效
~ . /etc/environment

#查看环境变量
~ export |grep /home/conan/toolkit/cassandra125
declare -x CASSANDRA_HOME="/home/conan/toolkit/cassandra125"
declare -x OLDPWD="/home/conan/toolkit/cassandra125"
declare -x PWD="/home/conan/toolkit/cassandra125/bin"

启动cassandra


~ bin/cassandra -f
#注:-f参数是绑定到console,不加-f则是后台启动。

~ jps
19971 CassandraDaemon
20440 Jps

打开客户端


~ bin/cassandra-cli

Connected to: "Test Cluster" on 127.0.0.1/9160
Welcome to Cassandra CLI version 1.2.5

Type 'help;' or '?' for help.
Type 'quit;' or 'exit;' to quit.

[default@unknown]

单节的cassandra,我们已经成功能安装好了。

Cassandra的集群安装请参考:Cassandra单集群实验2个节点

3. RCassandra安装

文字说明部分:
R语言的版本请使用2.15.3,下面介绍如何安装R。

首先,增加一个软件源deb http://mirror.bjtu.edu.cn/cran/bin/linux/ubuntu precise/。
更新及指定安装2.15.3-1precise0precise1版本。

启动R程序,安装RCassandra包。

代码部分
测试环境R语言的版本是:2.15.3

安装R语言


~  sudo vi /etc/apt/sources.list
deb http://mirrors.163.com/ubuntu/ precise main universe restricted multiverse
deb-src http://mirrors.163.com/ubuntu/ precise main universe restricted multiverse
deb http://mirrors.163.com/ubuntu/ precise-security universe main multiverse restricted
deb-src http://mirrors.163.com/ubuntu/ precise-security universe main multiverse restricted
deb http://mirrors.163.com/ubuntu/ precise-updates universe main multiverse restricted
deb http://mirrors.163.com/ubuntu/ precise-proposed universe main multiverse restricted
deb-src http://mirrors.163.com/ubuntu/ precise-proposed universe main multiverse restricted
deb http://mirrors.163.com/ubuntu/ precise-backports universe main multiverse restricted
deb-src http://mirrors.163.com/ubuntu/ precise-backports universe main multiverse restricted
deb-src http://mirrors.163.com/ubuntu/ precise-updates universe main multiverse restricted
deb http://mirror.bjtu.edu.cn/cran/bin/linux/ubuntu precise/

更新apt-get源


~ sudo apt-get update

#声明安装2.15.3的版本
~ sudo apt-get install r-base-core=2.15.3-1precise0precise1

#启动R
~ R
R version 2.15.3 (2013-03-01) -- "Security Blanket"
Copyright (C) 2013 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

安装RCassandra


> install.packages('RCassandra')
> library(RCassandra)

4. RCassandra函数库

文字说明部分
列出有的RCassandra支持的函数,只有17个。记得rredis有100个函数,rmongodb有153个函数。相比之下RCassandra太轻量了。

但是这17个函数,并没有覆盖Cassandra的所有操作,就连一些的基本的操作都没有函数支持,要在命令行处理。不知道是什么原因?!希望RCassandra能继续发展,完善没有实现的功能函数。

不支持的常用操作:
创建keyspaces,删除keyspaces
创建列族,删除列族
删除一行
删除一行的某列数据

下面列出了这17个函数,并与Cassandra的命令做了对比说明。

代码部分
共有17个函数


RC.close               RC.insert
RC.cluster.name        RC.login
RC.connect             RC.mget.range
RC.consistency         RC.mutate
RC.describe.keyspace   RC.read.table
RC.describe.keyspaces  RC.use
RC.get                 RC.version
RC.get.range           RC.write.table
RC.get.range.slices

Cassandra和RCassandra的基本操作对比:


#连接到集群
Cassandra: connect 192.168.1.200/9160;
RCassandra: conn<-RC.connect(host="192.168.1.200",port=9160)

#查看当前集群名字
Cassandra: show cluster name;
RCassandra: RC.cluster.name(conn)

#列出当前集群所有keyspaces
Cassandra: show keyspaces;
RCassandra: RC.describe.keyspaces(conn)

#查看DEMO的keyspace
Cassandra: show schema DEMO;
RCassandra: RC.describe.keyspace(conn,'DEMO')

#选择DEMO的keyspace
Cassandra: use DEMO;
RCassandra: RC.use(conn,'DEMO')

#设置一致性级别
Cassandra: consistencylevel as ONE;
RCassandra: RC.consistency(conn,level="one")

#插入数据
Cassandra:set Users[1][name] = scott;
RCassandra:RC.insert(conn,'Users','1', 'name', 'scott')

#插入数据框
Cassandra:NA
RCassandra:RC.write.table(conn, "Users", df)

#读取列族所有数据
Cassandra: list Users;
RCassandra: RC.read.table(conn,"Users")

#读取数据
Cassandra: get Users[1]['name'];
RCassandra:RC.get(conn,'Users','1', c('name'))

#退出连接
Cassandra: exit; quit;
RCassandra: RC.close(conn)

 

5. RCassandra基本使用操作

文字说明部分
介绍RCassandra的基本函数操作,以iris的数据集为例,介绍了如何利用RCassandra操作Cassandra数据库。

代码部分


#安装RCassandra
install.packages('RCassandra')

#加载RCassandra类库
library(RCassandra)

#建立服务器连接
conn<-RC.connect(host="192.168.1.200")

#当前集群的名字(2个节点集群的名字)
RC.cluster.name(conn)
[1] "case1"

#当前协议的版本
RC.version(conn)
[1] "19.36.0"

#列出所有keyspaces配置信息
RC.describe.keyspaces(conn)

#列出叫的DEMO的keyspaces配置信息
RC.describe.keyspace(conn, "DEMO")

#RCassandra是不能创建的列族的,提前通过Cassandra命令创建一个列族
#[default@DEMO] create column family iris;

#插入iris数据
head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

#iris是一个data.frame
RC.write.table(conn, "iris", iris)

attr(,"class")
[1] "CassandraConnection"

#查看第1行,Sepal.Length列和Species的值
RC.get(conn, "iris", "1", c("Sepal.Length", "Species"))
           key  value           ts
1 Sepal.Length    5.1 1.372881e+15
2      Species setosa 1.372881e+15
#注:ts是时间戳

#查看第1行
RC.get.range(conn, "iris", "1")
           key  value           ts
1 Petal.Length    1.4 1.372881e+15
2  Petal.Width    0.2 1.372881e+15
3 Sepal.Length    5.1 1.372881e+15
4  Sepal.Width    3.5 1.372881e+15
5      Species setosa 1.372881e+15

#查看
r <- RC.get.range.slices(conn, "iris")
class(r)
[1] "list"

r[[1]]
           key  value           ts
1 Petal.Length    1.7 1.372881e+15
2  Petal.Width    0.4 1.372881e+15
3 Sepal.Length    5.4 1.372881e+15
4  Sepal.Width    3.9 1.372881e+15
5      Species setosa 1.372881e+15

rk <- RC.get.range.slices(conn, "iris", limit=0)
y <- RC.read.table(conn, "iris")
y <- y[order(as.integer(row.names(y))),]

head(y)
  Petal.Length Petal.Width Sepal.Length Sepal.Width Species
1          1.4         0.2          5.1         3.5  setosa
2          1.4         0.2          4.9         3.0  setosa
3          1.3         0.2          4.7         3.2  setosa
4          1.5         0.2          4.6         3.1  setosa
5          1.4         0.2          5.0         3.6  setosa
6         

不支持的常用操作

  • 创建keyspaces,删除keyspaces
  • 创建列族,删除列族
  • 删除一行
  • 删除一行的某列数据

6. RCassandra使用案例

文字说明部分
通过一个业务需求的例子,加深我们对RCassandra的认识。下面是一个非常简单的业务场景。

业务需求:
1. 创建一个Users列族,包含name,password两列
2. 在已经数据的情况下,有动态增加一个新列age

代码部分
在Cassandra命令行,创建列族Users


[default@DEMO] create column family Users
...     with key_validation_class = 'UTF8Type'
...     and comparator = 'UTF8Type'
...     and default_validation_class = 'UTF8Type';

89a2fb75-f7d0-399e-b017-30a974b19f4a

RCassandra插入数据,包含name,password两列


> df<-data.frame(name=c('a1','a2'),password=c('a1','a2')) > print(df)
  name password
1   a1       a1
2   a2       a2

#插入数据
> RC.write.table(conn, "Users", df)
attr(,"class")
[1] "CassandraConnection"

#查看数据
> RC.read.table(conn,"Users")
     name password
2    a2       a2
1    a1       a1

#新插入: 一行KEY=1234,并增加age列
> RC.insert(conn,'Users','1234', 'name', 'scott')
> RC.insert(conn,'Users','1234', 'password', 'tiger')
> RC.insert(conn,'Users','1234', 'age', '20')

#查看数据
> RC.read.table(conn,"Users")
     age  name password
1234  20 scott    tiger
2     NA    a2       a2
1     NA    a1       a1

#修改: KEY=1的行中,name=a11, age=12
> RC.insert(conn,'Users','1', 'name', 'a11')
> RC.insert(conn,'Users','1', 'age', '12')

#查看数据
> RC.read.table(conn,"Users")
     age  name password
1234  20 scott    tiger
2     NA    a2       a2
1     12   a11       a1

7. Cassandra的没落

越来越多的基于cassandra构建的应用,开始向hbase迁移。

Cassandra的没落,在技术上可能存在的一些原因:

1. 读的性能太慢

无中心的设计,造成读数据时通过逆熵做计算,性能损耗很大,甚至会严重影响服务器运作。

2. 数据同步太慢(最终一致性延迟可能非常大)

由于无中心设计,要靠各节点传递信息。相互发通知告知状态,如果副本集有多份,其中又出现节点有宕机的情况,那么做到数据的一致性,延迟可能非常大,效率也很低的。

3. 用插入和更新代替查询,缺乏灵活性,所有查询都要求提前定义好。

与大多数数据库为读优化不同,Cassandra的写性能理论上是高于读性能的,因此非常适合流式的数据存储,尤其是写负载高于读负载的。与HBase比起来,它的随机访问性能要高很多,但不是很擅长区间扫描,因此可以作为HBase的即时查询缓存,由HBase进行批量的大数据处理,由Cassandra提供随机查询的接口

4. 不支持直接接入hadoop,不能实现MapReduce。

现在大数据的代名词就是hadoop,做为海量数据的框架不支持hadoop及MapReduce,就将被取代。除非Cassandra能够给出其他的定位,或者海量数据解决方案。DataStax公司,正在用Cassandra重够HDFS的文件系统,不知道是否可以成功。

让我期待Cassandra未来的发展吧!

转载请注明:
http://blog.fens.me/nosql-r-cassandra/

打赏作者

Cassandra单集群实验2个节点

cassandra-title

前言

Apache Cassandra是一套开源分布式Key-Value存储系统。它最初由Facebook开发,用于储存特别大的数据。主要特性:分布式,基于column的结构化,高伸展性。作为NoSQL的一支代表,虽然现在已经被hbase超越,但Cassandra的很多的设计思想是非常值得我们学习和借鉴的。感谢tigerfish老师的详细讲解,让我收获颇多!

Cassandra中非常有用的几个概念:一致性哈希,Gossip协议,Snitch,复制策略,DHT,BloomFilter。

关于作者:
张丹(Conan), 程序员Java,R,PHP,Javascript
weibo:@Conan_Z
blog: http://blog.fens.me
email: bsspirit@gmail.com

转载请注明出处:
http://blog.fens.me/cassandra-clustor/

目录

  1. Cassandra集群实验2个节点
  2. 实验过程的错误及修复

1. Cassandra集群实验2个节点

1. 下载Cassandra并配置JAVA环境(跳过)
2. 安装第一个Cassandra节点,解压到/home/conan/toolkit/cassandra125目录

~ pwd
/home/conan/toolkit/cassandra125

ip地址:192.168.1.200


~ ifconfig
eth0      Link encap:Ethernet  HWaddr 08:00:27:90:e8:19
      inet addr:192.168.1.200  Bcast:192.168.1.255  Mask:255.255.255.0
      inet6 addr: fe80::a00:27ff:fe90:e819/64 Scope:Link
      UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
      RX packets:16943 errors:0 dropped:0 overruns:0 frame:0
      TX packets:19527 errors:0 dropped:0 overruns:0 carrier:0
      collisions:0 txqueuelen:1000
      RX bytes:1433046 (1.4 MB)  TX bytes:2902059 (2.9 MB)

3. 设置环境变更


~ sudo vi /etc/environment
CASSANDRA_HOME=/home/conan/toolkit/cassandra125

~ . /etc/environment

~ export |grep /home/conan/toolkit/cassandra125
declare -x CASSANDRA_HOME="/home/conan/toolkit/cassandra125"
declare -x OLDPWD="/home/conan/toolkit/cassandra125"
declare -x PWD="/home/conan/toolkit/cassandra125/bin"

4. 创建存储和日志目录


~ sudo rm -rf /var/lib/cassandra

~ sudo mkdir -p /var/lib/cassandra/data
~ sudo mkdir -p /var/lib/cassandra/saved_caches
~ sudo mkdir -p /var/lib/cassandra/commitlog
~ sudo mkdir -p /var/log/cassandra/

~ sudo chown -R conan:conan /var/lib/cassandra
~ sudo chown -R conan:conan /var/log/cassandra

~ ls -l /var/lib/cassandra
drwxr-xr-x  2 conan conan 4096 Jul  4 00:15 commitlog/
drwxr-xr-x  2 conan conan 4096 Jul  4 00:15 data/
drwxr-xr-x  2 conan conan 4096 Jul  4 00:15 saved_caches/

5. 修改配置文件cassandra.yaml,按文件顺序列表修改的地方


~ vi /home/conan/toolkit/cassandra125/conf/cassandra.yaml

cluster_name: 'case1'
num_tokens: 256

data_file_directories:
- /var/lib/cassandra/data

commitlog_directory: /var/lib/cassandra/commitlog

saved_caches_directory: /var/lib/cassandra/saved_caches

seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
  parameters:
      - seeds: "192.168.1.200"

#listen_address: localhost
listen_address: 192.168.1.200

#rpc_address: localhost
rpc_address: 192.168.1.200

endpoint_snitch: SimpleSnitch

6. 启动节点


~ cd /home/conan/toolkit/cassandra125/

~ bin/cassandra -f

#部分日志
INFO 00:23:22,785 Enqueuing flush of Memtable-schema_columnfamilies@1792194126(1097/1097 serialized/live bytes, 20 ops)
INFO 00:23:22,786 Writing Memtable-schema_columnfamilies@1792194126(1097/1097 serialized/live bytes, 20 ops)
INFO 00:23:22,796 Completed flushing /var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-ic-2-Data.db (698 bytes) for commitlog position ReplayPosition(segmentId=1372868601408, position=64705)
INFO 00:23:22,797 Enqueuing flush of Memtable-schema_columns@552364977(251/251 serialized/live bytes, 5 ops)
INFO 00:23:22,798 Writing Memtable-schema_columns@552364977(251/251 serialized/live bytes, 5 ops)
INFO 00:23:22,808 Completed flushing /var/lib/cassandra/data/system/schema_columns/system-schema_columns-ic-2-Data.db (209 bytes) for commitlog position ReplayPosition(segmentId=1372868601408, position=64705)
INFO 00:23:22,894 Starting listening for CQL clients on /192.168.1.200:9042...
INFO 00:23:22,906 Binding thrift service to /192.168.1.200:9160
INFO 00:23:22,931 Using TFramedTransport with a max frame size of 15728640 bytes.
INFO 00:23:22,952 Using synchronous/threadpool thrift server on 192.168.1.200 : 9160
INFO 00:23:22,953 Listening for thrift clients...
INFO 00:23:33,101 Created default superuser 'cassandra'

7. 查看集群的状态


bin/nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns (effective)  Host ID                               Rack
UN  192.168.1.200  51.01 KB   256     100.0%            e7106e0a-1a9e-43a2-9bcc-fc1201076fee  rack1

8. 增加第2个节点到集群:2个节点
计算机ip: 192.168.1.201


~ ifconfig
eth0      Link encap:Ethernet  HWaddr 08:00:27:0d:0b:0b
          inet addr:192.168.1.201  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fe0d:b0b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:45455 errors:0 dropped:0 overruns:0 frame:0
          TX packets:14717 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:33590582 (33.5 MB)  TX bytes:2549931 (2.5 MB)

9. 从第一节点192.168.1.200把cassandra125和jdk目录复制过来


~ pwd
/home/conan/toolkit

~ scp -r conan@192.168.1.200:/home/conan/toolkit/cassandra125 .
~ scp -r conan@192.168.1.200:/home/conan/toolkit/jdk16 .

~ ls -l
drwxrwxr-x  9 conan conan 4096 Apr 25 03:04 cassandra125
drwxr-xr-x 10 conan conan 4096 Apr 25 03:33 jdk16

10. 设置环境变量


~ sudo vi /etc/environment

PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/home/conan/toolkit/jdk16/bin:/home/conan/toolkit/cassandra/bin"
JAVA_HOME=/home/conan/toolkit/jdk16
CASSANDRA_HOME=/home/conan/toolkit/cassandra125

~ . /etc/environment

~ java -version
java version "1.6.0_29"
Java(TM) SE Runtime Environment (build 1.6.0_29-b11)
Java HotSpot(TM) 64-Bit Server VM (build 20.4-b02, mixed mode)

11. 创建存储和日志目录


~ sudo rm -rf /var/lib/cassandra

~ sudo mkdir -p /var/lib/cassandra/data
~ sudo mkdir -p /var/lib/cassandra/saved_caches
~ sudo mkdir -p /var/lib/cassandra/commitlog
~ sudo mkdir -p /var/log/cassandra/

~ sudo chown -R conan:conan /var/lib/cassandra
~ sudo chown -R conan:conan /var/log/cassandra

~ ls -l /var/lib/cassandra
drwxr-xr-x  2 conan conan 4096 Jul  4 00:15 commitlog/
drwxr-xr-x  2 conan conan 4096 Jul  4 00:15 data/
drwxr-xr-x  2 conan conan 4096 Jul  4 00:15 saved_caches/

12. 修改配置文件cassandra.yaml,按文件顺序列表修改的地方


~ vi /home/conan/toolkit/cassandra125/conf/cassandra.yaml

cluster_name: 'case1'

seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
  parameters:
      - seeds: "192.168.1.200"

listen_address: 192.168.1.201

rpc_address: 192.168.1.201

13. 启动节点192.168.1.201


~ bin/cassandra -f

//部分日志
INFO 03:36:47,476 Completed flushing /var/lib/cassandra/data/system/local/system-local-ic-4-Data.db (75 bytes) for commitlog position ReplayPosition(segmentId=1366832174115, position=77582)
INFO 03:36:47,504 Compacting [SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-ic-1-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-ic-3-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-ic-4-Data.db'), SSTableReader(path='/var/lib/cassandra/data/system/local/system-local-ic-2-Data.db')]
INFO 03:36:47,527 Enqueuing flush of Memtable-local@692438881(10094/10094 serialized/live bytes, 257 ops)
INFO 03:36:47,533 Writing Memtable-local@692438881(10094/10094 serialized/live bytes, 257 ops)
INFO 03:36:47,547 Completed flushing /var/lib/cassandra/data/system/local/system-local-ic-5-Data.db (5365 bytes) for commitlog position ReplayPosition(segmentId=1366832174115, position=89585)
INFO 03:36:47,660 Node /192.168.1.201 state jump to normal
INFO 03:36:47,663 Startup completed! Now serving reads.
INFO 03:36:47,686 Compacted 4 sstables to [/var/lib/cassandra/data/system/local/system-local-ic-6,].  5,956 bytes to 5,687 (~95% of original) in 158ms = 0.034326MB/s.  4 total rows, 1 unique.  Row merge counts were {1:0, 2:0, 3:0, 4:1, }
INFO 03:36:47,771 Starting listening for CQL clients on /192.168.1.201:9042...
INFO 03:36:47,785 Binding thrift service to /192.168.1.201:9160
INFO 03:36:47,810 Using TFramedTransport with a max frame size of 15728640 bytes.
INFO 03:36:47,834 Using synchronous/threadpool thrift server on 192.168.1.201 : 9160
INFO 03:36:47,834 Listening for thrift clients...

14. 查看节点1,192.168.1.200的日志


INFO 01:01:27,382 InetAddress /192.168.1.201 is now UP
INFO 01:01:58,660 Beginning transfer to /192.168.1.201
INFO 01:01:58,661 Flushing memtables for [CFS(Keyspace='system_auth', ColumnFamily='users')]...
INFO 01:01:58,663 Enqueuing flush of Memtable-users@1338035062(28/28 serialized/live bytes, 2 ops)
INFO 01:01:58,668 Writing Memtable-users@1338035062(28/28 serialized/live bytes, 2 ops)
INFO 01:01:59,010 Completed flushing /var/lib/cassandra/data/system_auth/users/system_auth-users-ic-1-Data.db (64 bytes) for commitlog position ReplayPosition(segmentId=1372868601408, position=65900)
INFO 01:01:59,047 Stream context metadata [/var/lib/cassandra/data/system_auth/users/system_auth-users-ic-1-Data.db sections=1 progress=0/64 - 0%], 1 sstables.
INFO 01:01:59,048 Streaming to /192.168.1.201
INFO 01:01:59,122 Successfully sent /var/lib/cassandra/data/system_auth/users/system_auth-users-ic-1-Data.db to /192.168.1.201
INFO 01:01:59,123 Finished streaming session to /192.168.1.201
INFO 01:01:59,424 Enqueuing flush of Memtable-peers@1855686378(10279/10279 serialized/live bytes, 271 ops)
INFO 01:01:59,425 Writing Memtable-peers@1855686378(10279/10279 serialized/live bytes, 271 ops)
INFO 01:01:59,497 Completed flushing /var/lib/cassandra/data/system/peers/system-peers-ic-1-Data.db (5538 bytes) for commitlog position ReplayPosition(segmentId=1372868601408, position=77902)

15. 集群中已经成功加载了192.168.1.201个节点了。


bin/nodetool status

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns (effective)  Host ID                               Rack
UN  192.168.1.200  65.46 KB   256     48.7%             e7106e0a-1a9e-43a2-9bcc-fc1201076fee  rack1
UN  192.168.1.201  58.04 KB   256     51.3%             8eef1965-9822-44bf-a9f6-fff5b87bc474  rack1

实验完成!!

2. 实验过程的错误及修复

1. 不要在已经建立keystore的节点,再修改cluster name
出现下面的错误


ERROR 23:29:58,004 Fatal exception during initialization
org.apache.cassandra.exceptions.ConfigurationException: Saved cluster name Test Cluster != configured name case1

解决办法:http://wiki.apache.org/cassandra/FAQ


Cassandra says "ClusterName mismatch: oldClusterName != newClusterName" and refuses to start

To prevent operator errors, Cassandra stores the name of the cluster in its system table. If you need to rename a cluster for some reason, you can:

Perform these steps on each node:

Start the cassandra-cli connected locally to this node.
Run the following:
use system;
set LocationInfo[utf8('L')][utf8('ClusterName')]=utf8('');
exit;
Run nodetool flush on this node.
Update the cassandra.yaml file for the cluster_name as the same as 2b).
Restart the node.
Once all nodes have been had this operation performed and restarted, nodetool ring should show all nodes as UP.

2. 修改配置文件时:后一定要有空格
会出现下面的错误提示:


while scanning a simple key; could not found expected ':'

举例:修改下面配置


#错误语法
listen_address:localhost

#正确语法
listen_address: localhost

问题解释:
上面问题是由于,:和”之间没有空格,引起的解析错误。

转载请注明出处:
http://blog.fens.me/cassandra-clustor/

打赏作者