• Posts tagged "MySQL"

Blog Archives

在Ubuntu中安装MySQL

Ubuntu实用工具系列文章,将介绍基于Linux ubuntu的各种工具软件的配置和使用。有些工具大家早已耳熟能详,有些工具经常用到但确依然陌生。我将记录我在使用操作系统时,安装及配置工具上面的一些方法,把使用心得记录下来也便于自己的以后查找和回忆。

关于作者:

  • 张丹(Conan), 程序员Java,R,PHP,Javascript
  • weibo:@Conan_Z
  • blog: http://blog.fens.me
  • email: bsspirit@gmail.com

转载请注明出处:
http://blog.fens.me/linux-mysql-install/

ubuntu-mysql-install

前言

安装MySQL是个老话题,我安装MySQL服务器已不下百次了,为了博客文章结构的连贯性,还是再写一篇做为环境基础,同时也给自己一个备忘。

目录

  1. MySQL在Windows中安装
  2. MySQL在Linux Ubuntu中安装
  3. 通过命令行客户端访问MySQL
  4. 修改MySQL服务器的配置
  5. 新建数据库并设置访问账号
  6. 改变数据存储位置

1. MySQL在Windows中安装

在Windows系统上安装MySQl数据库是件非常简单的事情,下载压缩包,解压即可。下载地址:http://dev.mysql.com/downloads/mysql/

  • MySQL服务器运行命令:MySQL安装目录/bin/mysqld.exe
  • MySQL客户端运行命令:MySQL安装目录/bin/mysql.exe

2. MySQL在Linux Ubuntu中安装

本文使用的Linux是Ubuntu 12.04.2 LTS 64bit的系统,安装MySQL数据库软件包可以通过apt-get实现。

在Linux Ubuntu中安装MySQL数据库


#安装MySQL服务器端
~ sudo apt-get install mysql-server

安装过程会弹出提示框,输入root用户的密码,我在这里设置密码为mysql。

安装完成后,MySQL服务器会自动启动,我们检查MySQL服务器程序


# 检查MySQL服务器系统进程
~ ps -aux|grep mysql
mysql     3205  2.0  0.5 549896 44092 ?        Ssl  20:10   0:00 /usr/sbin/mysqld
conan     3360  0.0  0.0  11064   928 pts/0    S+   20:10   0:00 grep --color=auto mysql

# 检查MySQL服务器占用端口
~ netstat -nlt|grep 3306
tcp        0      0 127.0.0.1:3306          0.0.0.0:*               LISTEN

# 通过启动命令检查MySQL服务器状态
~ sudo /etc/init.d/mysql status
Rather than invoking init scripts through /etc/init.d, use the service(8)
utility, e.g. service mysql status

Since the script you are attempting to invoke has been converted to an
Upstart job, you may also use the status(8) utility, e.g. status mysql
mysql start/running, process 3205

# 通过系统服务命令检查MySQL服务器状态
~ service mysql status
mysql start/running, process 3205

3. 通过命令行客户端访问MySQL

安装MySQL服务器,会自动地一起安装MySQL命令行客户端程序。

在本机输入mysql命令就可以启动,客户端程序访问MySQL服务器。


~ mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 42
Server version: 5.5.35-0ubuntu0.12.04.2 (Ubuntu)

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

使用户名和密码,登陆服务器


~ mysql -uroot -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 37
Server version: 5.5.35-0ubuntu0.12.04.2 (Ubuntu)

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

MySQL的一些简单的命令操作。


# 查看所有的数据库
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| test               |
+--------------------+
2 rows in set (0.00 sec)

# 切换到information_schema库
mysql> use information_schema
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed

# 查看information_schema库中所有的表
mysql> show tables;
+---------------------------------------+
| Tables_in_information_schema          |
+---------------------------------------+
| CHARACTER_SETS                        |
| COLLATIONS                            |
| COLLATION_CHARACTER_SET_APPLICABILITY |
| COLUMNS                               |
| COLUMN_PRIVILEGES                     |
| ENGINES                               |
| EVENTS                                |
| FILES                                 |
| GLOBAL_STATUS                         |
| GLOBAL_VARIABLES                      |
| KEY_COLUMN_USAGE                      |
| PARAMETERS                            |
| PARTITIONS                            |
| PLUGINS                               |
| PROCESSLIST                           |
| PROFILING                             |
| REFERENTIAL_CONSTRAINTS               |
| ROUTINES                              |
| SCHEMATA                              |
| SCHEMA_PRIVILEGES                     |
| SESSION_STATUS                        |
| SESSION_VARIABLES                     |
| STATISTICS                            |
| TABLES                                |
| TABLESPACES                           |
| TABLE_CONSTRAINTS                     |
| TABLE_PRIVILEGES                      |
| TRIGGERS                              |
| USER_PRIVILEGES                       |
| VIEWS                                 |
| INNODB_BUFFER_PAGE                    |
| INNODB_TRX                            |
| INNODB_BUFFER_POOL_STATS              |
| INNODB_LOCK_WAITS                     |
| INNODB_CMPMEM                         |
| INNODB_CMP                            |
| INNODB_LOCKS                          |
| INNODB_CMPMEM_RESET                   |
| INNODB_CMP_RESET                      |
| INNODB_BUFFER_PAGE_LRU                |
+---------------------------------------+
40 rows in set (0.01 sec)

# 查看数据库的字符集编码
mysql> show variables like '%char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

4. 修改MySQL服务器的配置

接下来,我需要做一些配置,让MySQL符合基本的开发要求。

4.1 将字符编码设置为UTF-8

默认情况下,MySQL的字符集是latin1,因此在存储中文的时候,会出现乱码的情况,所以我们需要把字符集统一改成UTF-8。

用vi打开MySQL服务器的配置文件my.cnf


~ sudo vi /etc/mysql/my.cnf

#在[client]标签下,增加客户端的字符编码
[client]
default-character-set=utf8

#在[mysqld]标签下,增加服务器端的字符编码
[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci

4.2 让MySQL服务器被远程访问

默认情况下,MySQL服务器不允许远程访问,只允许本机访问,所以我们需要设置打开远程访问的功能。

用vi打开MySQL服务器的配置文件my.cnf


~ sudo vi /etc/mysql/my.cnf

#注释bind-address
#bind-address            = 127.0.0.1

修改后,重启MySQL服务器。


~ sudo /etc/init.d/mysql restart
Rather than invoking init scripts through /etc/init.d, use the service(8)
utility, e.g. service mysql restart

Since the script you are attempting to invoke has been converted to an
Upstart job, you may also use the stop(8) and then start(8) utilities,
e.g. stop mysql ; start mysql. The restart(8) utility is also available.
mysql start/running, process 3577

重新登陆服务器


~ mysql -uroot -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 37
Server version: 5.5.35-0ubuntu0.12.04.2 (Ubuntu)

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

# 再次查看字符串编码
mysql> show variables like '%char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

我们检查MySQL的网络监听端口


# 检查MySQL服务器占用端口
~ netstat -nlt|grep 3306
  tcp        0      0 0.0.0.0:3306            0.0.0.0:*               LISTEN

我们看到从之间的网络监听从 127.0.0.1:3306 变成 0 0.0.0.0:3306,表示MySQL已经允许远程登陆访问。通过root账号远程访问,是非常不安全的操作,因此我们下一步,将新建一个数据库,再新建一个用户进行远程访问。

5. 新建数据库并设置访问账号

通过root账号登陆MySQl服务器


~ mysql -uroot -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 39
Server version: 5.5.35-0ubuntu0.12.04.2 (Ubuntu)

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

# 新建数据库abc
mysql> CREATE DATABASE abc;

# 使用数据库abc
mysql> use abc;
Database changed

# 在数据库abc中,新建一张表a1
mysql> create table a1(id int primary key,name varchar(32) not null);
Query OK, 0 rows affected (0.05 sec)

# 新建book用户,密码为book,允许book可以远程访问abc数据库,授权book对abc进行所有数据库
mysql> GRANT ALL ON abc.* to book@'%' IDENTIFIED BY 'book';
Query OK, 0 rows affected (0.00 sec)

#允许book可以本地访问abc数据库,授权book对abc进行所有数据库
mysql> GRANT ALL ON abc.* to book@localhost IDENTIFIED BY 'book';
Query OK, 0 rows affected (0.00 sec)

我们在本地使用book用户登陆


# 使用book用户登陆
~ mysql -ubook -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 40
Server version: 5.5.35-0ubuntu0.12.04.2 (Ubuntu)

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

#进行abc数据库
mysql> use abc;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed

#查看abc数据库的表
mysql> show tables;
+---------------+
| Tables_in_abc |
+---------------+
| a1            |
+---------------+
1 row in set (0.00 sec)

我们在远程的另一台Linux使用book用户登陆


~ mysql -ubook -p -h 192.168.1.199
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 41
Server version: 5.5.35-0ubuntu0.12.04.2 (Ubuntu)

Copyright (c) 2000, 2012, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use abc
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+---------------+
| Tables_in_abc |
+---------------+
| a1            |
+---------------+
1 row in set (0.00 sec)

5. 改变数据存储位置

有时候我们可能还需要改变MySQL数据存储的位置,一种方法是直接修改配置文件 /etc/mysql/my.cnf,找到datadir属性修改目录。


~ vi /etc/mysql/my.cnf

[mysqld]
datadir         = /var/lib/mysql

如果通过这种方法修改,那么其他的调用存储路径的地方,我们也都需要进行修改,比如 用到了/usr/bin/mysql_install_db 命令,文件中ldata的属性也需要修改,关于mysql_install_db 命令的使用可以参考文章,[MySQL优化]为MySQL数据文件ibdata1瘦身

还有另一种修改存储位置的方法,就是通过Linux系统的软连(ln -s)接来做的。当我们新挂载一块硬盘,停止MySQL服务,然后把/var/lib/mysql目录移动到新的硬盘存储,在/var/lib/mysql处建立指定新位置的软连接就行了。


# 停止MySQL服务器
~ /etc/init.d/mysql stop

# 挂载硬盘
~ mount -t ext4 /dev/vdb1 /vdb1

# 建立新存储目录
~ mkdir /vdb1/data

# 移动MySQL数据目录到新目录
~ mv /var/lib/mysql /vdb1/data/

# 软连接
~ ln -s /vdb1/data/mysql /var/lib/mysql

修改apparmor的别名定义文件


~ vi /etc/apparmor.d/tunables/alias

alias /var/lib/mysql/ -> /vdb1/data/mysql/,

注:如果没有修改apparmor的配置,MySQL会启动不了,并一直提示是权限的问题。


# 重启apparmor服务
~ /etc/init.d/apparmor restart

# 重启MySQL服务器
~ /etc/init.d/mysql start

这样就完成了,MySQL数据存储位置修改。

通过上面的操作,我们就把MySQL数据库服务器,在Linux Ubuntu中的系统安装完成。

转载请注明出处:
http://blog.fens.me/linux-mysql-install/

打赏作者

RMySQL数据库编程指南

R的极客理想系列文章,涵盖了R的思想,使用,工具,创新等的一系列要点,以我个人的学习和体验去诠释R的强大。

R语言作为统计学一门语言,一直在小众领域闪耀着光芒。直到大数据的爆发,R语言变成了一门炙手可热的数据分析的利器。随着越来越多的工程背景的人的加入,R语言的社区在迅速扩大成长。现在已不仅仅是统计领域,教育,银行,电商,互联网….都在使用R语言。

要成为有理想的极客,我们不能停留在语法上,要掌握牢固的数学,概率,统计知识,同时还要有创新精神,把R语言发挥到各个领域。让我们一起动起来吧,开始R的极客理想。

关于作者:

  • 张丹(Conan), 程序员Java,R,PHP,Javascript
  • weibo:@Conan_Z
  • blog: http://blog.fens.me
  • email: bsspirit@gmail.com

转载请注明出处:
http://blog.fens.me/r-mysql-rmysql/

r-rmysql

前言

MySQL是一款最常用到开源数据库软件,安装简单,运行稳定,非常适用于中小型的数据存储。R作为数据分析的工具,当然要支持数据库驱动接口。让R和MySQL配合在一起,所能爆发出的能量是巨大的。

由于操作系统的原因,让Win和Linux有不一样的字符集,不一样的运行时环境。所以,今天我们讲一下如何在Linux和Win上面安装和使用RMySQL。

目录

  1. RMySQL介绍
  2. RMySQL在Linux下安装
  3. RMySQL在Win7下安装
  4. RMySQL函数使用
  5. RMySQL案例实践

1. RMySQL介绍

RMySQL一个R语言程序包,提供了访问MySQL数据库的R语言接口程序,RMySQL需求依赖于DBI项目。RMySQL不仅提供了基本的数据库访问,SQL查询,还封装了一些方法。比较读整表,分页,data.frame快速插入等等的功能。掌握好RMySQL,数据库编辑将得心应手!!

2. RMySQL在Linux下安装

Linux系统环境:

  • Linux: Ubuntu 12.04.2 LTS 64bit server
  • Linux字符集: en_US.UTF-8
  • R: 3.0.1, x86_64-pc-linux-gnu (64-bit)
  • MySQL: Ver 14.14 Distrib 5.5.29 64bit server
  • MySQL字符集: utf8

~ uname -a
Linux conan 3.5.0-23-generic #35~precise1-Ubuntu SMP Fri Jan 25 17:13:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

~ cat /etc/issue
Ubuntu 12.04.2 LTS \n \l

~ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

~ R --version
R version 3.0.1 (2013-05-16) -- "Good Sport"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.
For more information about these matters see
http://www.gnu.org/licenses/.

~ mysql --version
mysql  Ver 14.14 Distrib 5.5.29, for debian-linux-gnu (x86_64) using readline 6.2

mysql> show variables like '%char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

在R环境中安装RMySQL


~ R
> install.packages('RMySQL')

also installing the dependency ‘DBI’

trying URL 'http://cran.dataguru.cn/src/contrib/DBI_0.2-7.tar.gz'
Content type 'application/x-gzip' length 194699 bytes (190 Kb)
opened URL
==================================================
downloaded 190 Kb

trying URL 'http://cran.dataguru.cn/src/contrib/RMySQL_0.9-3.tar.gz'
Content type 'application/x-gzip' length 165363 bytes (161 Kb)
opened URL
==================================================
downloaded 161 Kb

...

Configuration error:
  could not find the MySQL installation include and/or library
  directories.  Manually specify the location of the MySQL
  libraries and the header files and re-run R CMD INSTALL.

INSTRUCTIONS:

1. Define and export the 2 shell variables PKG_CPPFLAGS and
   PKG_LIBS to include the directory for header files (*.h)
   and libraries, for example (using Bourne shell syntax):

      export PKG_CPPFLAGS="-I"
      export PKG_LIBS="-L -lmysqlclient"

   Re-run the R INSTALL command:

      R CMD INSTALL RMySQL_.tar.gz

2. Alternatively, you may pass the configure arguments
      --with-mysql-dir= (distribution directory)
   or
      --with-mysql-inc= (where MySQL header files reside)
      --with-mysql-lib= (where MySQL libraries reside)
   in the call to R INSTALL --configure-args='...'

   R CMD INSTALL --configure-args='--with-mysql-dir=DIR' RMySQL_.tar.gz

ERROR: configuration failed for package ‘RMySQL’
* removing ‘/home/conan/R/x86_64-pc-linux-gnu-library/3.0/RMySQL’

The downloaded source packages are in
        ‘/tmp/Rtmpu0Gn88/downloaded_packages’
Warning message:
In install.packages("RMySQL") :
  installation of package ‘RMySQL’ had non-zero exit status

安装出错了,提示我们需要增加MySQL安装目录的配置参数


# 安装mysql类库 
~ sudo apt-get install libdbd-mysql libmysqlclient-dev

# 找到mysql的安装目录
~ whereis mysql
mysql: /usr/bin/mysql /etc/mysql /usr/lib/mysql /usr/bin/X11/mysql /usr/share/mysql /usr/share/man/man1/mysql.1.gz

# 找到刚刚下载的RMySQL_.tar.gz
~ ls /tmp/Rtmpu0Gn88/downloaded_packages
DBI_0.2-7.tar.gz  RMySQL_0.9-3.tar.gz

# 通过命令安装RMySQL
~ R CMD INSTALL --configure-args='--with-mysql-dir=/usr/lib/mysql' /tmp/Rtmpu0Gn88/downloaded_packages/RMySQL_0.9-3.tar.gz

* installing to library ‘/home/conan/R/x86_64-pc-linux-gnu-library/3.0’
* installing *source* package ‘RMySQL’ ...
** package ‘RMySQL’ successfully unpacked and MD5 sums checked
checking for gcc... gcc
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ANSI C... none needed
checking how to run the C preprocessor... gcc -E
checking for compress in -lz... yes
checking for getopt_long in -lc... yes
checking for mysql_init in -lmysqlclient... yes
checking for egrep... grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking mysql.h usability... no
checking mysql.h presence... no
checking for mysql.h... no
checking /usr/local/include/mysql/mysql.h usability... no
checking /usr/local/include/mysql/mysql.h presence... no
checking for /usr/local/include/mysql/mysql.h... no
checking /usr/include/mysql/mysql.h usability... yes
checking /usr/include/mysql/mysql.h presence... yes
checking for /usr/include/mysql/mysql.h... yes
configure: creating ./config.status
config.status: creating src/Makevars
** libs
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG -I/usr/include/mysql     -fpic  -O3 -pipe  -g  -c RS-DBI.c -o RS-DBI.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG -I/usr/include/mysql     -fpic  -O3 -pipe  -g  -c RS-MySQL.c -o RS-MySQL.o
gcc -std=gnu99 -shared -o RMySQL.so RS-DBI.o RS-MySQL.o -lmysqlclient -lz -L/usr/lib/R/lib -lR
installing to /home/conan/R/x86_64-pc-linux-gnu-library/3.0/RMySQL/libs
** R
** inst
** preparing package for lazy loading
Creating a generic function for ‘format’ from package ‘base’ in package ‘RMySQL’
Creating a generic function for ‘print’ from package ‘base’ in package ‘RMySQL’
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (RMySQL)

RMySQL安装成功.

在MySQL中建库建表


~ mysql -uroot -p

mysql> create database rmysql;
Query OK, 1 row affected (0.00 sec)

mysql> grant all on rmysql.* to rmysql@'%' identified by 'rmysql';
Query OK, 0 rows affected (0.00 sec)

mysql> grant all on rmysql.* to rmysql@localhost identified by 'rmysql';
Query OK, 0 rows affected (0.00 sec)

mysql> use rmysql
Database changed

mysql> CREATE TABLE t_user(
    -> id INT PRIMARY KEY AUTO_INCREMENT,
    -> user varchar(12) NOT NULL UNIQUE
    -> )ENGINE=INNODB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.07 sec)

mysql> INSERT INTO t_user(user) values('A1'),('AB'),('fens.me');
Query OK, 3 rows affected (0.04 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM t_user;
+----+---------+
| id | user    |
+----+---------+
|  1 | A1      |
|  2 | AB      |
|  3 | fens.me |
+----+---------+
3 rows in set (0.00 sec)

通过R程序,读MySQL数据库数据


~ R

> library(RMySQL)
Loading required package: DBI

> conn <- dbConnect(MySQL(), dbname = "rmysql", username="rmysql", password="rmysql")
> users = dbGetQuery(conn, "SELECT * FROM t_user")
> dbDisconnect(conn)
[1] TRUE
> users
  id    user
1  1      A1
2  2      AB
3  3 fens.me

好了,我们实现了在Linux下R和MySQL的连接。

3. RMySQL在Win7下安装

Win系统环境:

  • Win7: 64位 旗舰版
  • Win字符集: gbk,utf8
  • R: 3.0.1, x86_64-w64-mingw32/x64 (64-bit)
  • MySQL: mysql Ver 14.14 Distrib 5.6.11, for Win64 (x86_64)

~ R --version
R version 3.0.1 (2013-05-16) -- "Good Sport"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.
For more information about these matters see
http://www.gnu.org/licenses/.

~ mysql --version
mysql  Ver 14.14 Distrib 5.6.11, for Win64 (x86_64)

mysql> show variables like '%char%';
+--------------------------+------------------------------------+
| Variable_name            | Value                              |
+--------------------------+------------------------------------+
| character_set_client     | gbk                                |
| character_set_connection | gbk                                |
| character_set_database   | utf8                               |
| character_set_filesystem | binary                             |
| character_set_results    | gbk                                |
| character_set_server     | utf8                               |
| character_set_system     | utf8                               |
| character_sets_dir       | D:\toolkit\mysql56\share\charsets\ |
+--------------------------+------------------------------------+
8 rows in set (0.07 sec)

在R环境中安装RMySQL


~ D:\workspace\R\mysql>R

> install.packages('RMySQl')
package 'RMySQl' is not available (for R version 3.0.1)

我们看到提示,没有对应的RMySQL安装版本。


# 下载RMySQL源代码包
> install.packages("RMySQL", type="source")
URLhttp://cran.dataguru.cn/src/contrib/RMySQL_0.9-3.tar.gz'
Content type 'application/x-gzip' length 165363 bytes (161 Kb)
URL
downloaded 161 Kb

* installing *source* package 'RMySQL' ...
** 'RMySQL'MD5
checking for $MYSQL_HOME... not found... searching registry...

cygwin warning:
  MS-DOS style path detected: C:/PROGRA~1/R/R-30~1.1/bin/x64/Rscript
  Preferred POSIX equivalent is: /cygdrive/c/PROGRA~1/R/R-30~1.1/bin/x64/Rscript
  CYGWIN environment variable option "nodosfilewarning" turns off this warning.
  Consult the user's guide for more details about POSIX paths:
    http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
readRegistry("SOFTWARE\\MySQL AB", hive = "HLM", maxdepth = 2) :
  Registry key 'SOFTWARE\MySQL AB' not found

ERROR: configuration failed for package 'RMySQL'
* removing 'C:/Program Files/R/R-3.0.1/library/RMySQL'
        'C:\Users\Administrator\AppData\Local\Temp\RtmpsfqQjK\downloaded_packages'

In install.packages("RMySQL", type = "source") :
  'RMySQL'B0

找到源代码包:RMySQL_0.9-3.tar.gz


~ dir C:\Users\Administrator\AppData\Local\Temp\RtmpsfqQjK\downloaded_packages
2013-09-24  13:16           165,363 RMySQL_0.9-3.tar.gz

通过源代码包安装


~ D:\workspace\R\mysql>R CMD INSTALL C:\Users\Administrator\AppData\Local\Temp\RtmpsfqQjK\downloaded_packages\RMySQL_0.9-3
.tar.gz
* installing to library 'C:/Program Files/R/R-3.0.1/library'
* installing *source* package 'RMySQL' ...
** 'RMySQL'MD5
checking for $MYSQL_HOME... not found... searching registry...

cygwin warning:
  MS-DOS style path detected: C:/PROGRA~1/R/R-30~1.1/bin/x64/Rscript
  Preferred POSIX equivalent is: /cygdrive/c/PROGRA~1/R/R-30~1.1/bin/x64/Rscript
  CYGWIN environment variable option "nodosfilewarning" turns off this warning.
  Consult the user's guide for more details about POSIX paths:
    http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
readRegistry("SOFTWARE\\MySQL AB", hive = "HLM", maxdepth = 2) :
  Registry key 'SOFTWARE\MySQL AB' not found

ERROR: configuration failed for package 'RMySQL'
* removing 'C:/Program Files/R/R-3.0.1/library/RMySQL'

设置MYSQL_HOME的环境变量


set MYSQL_HOME=D:\toolkit\mysql56

注: MYSQL_HOME建议设置在系统环境变量中。

再一次安装RMySQL


D:\workspace\R\mysql>R CMD INSTALL C:\Users\Administrator\AppData\Local\Temp\RtmpsfqQjK\downloaded_packages\RMySQL_0.9-3
.tar.gz
* installing to library 'C:/Program Files/R/R-3.0.1/library'
* installing *source* package 'RMySQL' ...
** 'RMySQL'MD5
checking for $MYSQL_HOME... D:\toolkit\mysql56
cygwin warning:
  MS-DOS style path detected: D:\toolkit\mysql56
  Preferred POSIX equivalent is: /cygdrive/d/toolkit/mysql56
  CYGWIN environment variable option "nodosfilewarning" turns off this warning.
  Consult the user's guide for more details about POSIX paths:
    http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
** libs
: this package has a non-empty 'configure.win' file,
so building only the main architecture

cygwin warning:
  MS-DOS style path detected: C:/PROGRA~1/R/R-30~1.1/etc/x64/Makeconf
  Preferred POSIX equivalent is: /cygdrive/c/PROGRA~1/R/R-30~1.1/etc/x64/Makeconf
  CYGWIN environment variable option "nodosfilewarning" turns off this warning.
  Consult the user's guide for more details about POSIX paths:
    http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
gcc -m64 -I"C:/PROGRA~1/R/R-30~1.1/include" -DNDEBUG -I"D:\toolkit\mysql56"/include    -I"d:/RCompile/CRANpkg/extralibs6
4/local/include"     -O2 -Wall  -std=gnu99 -mtune=core2 -c RS-DBI.c -o RS-DBI.o
RS-DBI.c: In function 'RS_na_set':
RS-DBI.c:1219:11: warning: variable 'c' set but not used [-Wunused-but-set-variable]
gcc -m64 -I"C:/PROGRA~1/R/R-30~1.1/include" -DNDEBUG -I"D:\toolkit\mysql56"/include    -I"d:/RCompile/CRANpkg/extralibs6
4/local/include"     -O2 -Wall  -std=gnu99 -mtune=core2 -c RS-MySQL.c -o RS-MySQL.o
RS-MySQL.c: In function 'RS_MySQL_fetch':
RS-MySQL.c:657:13: warning: variable 'fld_nullOk' set but not used [-Wunused-but-set-variable]
RS-MySQL.c: In function 'RS_DBI_invokeBeginGroup':
RS-MySQL.c:1137:30: warning: variable 'val' set but not used [-Wunused-but-set-variable]
RS-MySQL.c: In function 'RS_DBI_invokeNewRecord':
RS-MySQL.c:1158:20: warning: variable 'val' set but not used [-Wunused-but-set-variable]
RS-MySQL.c: In function 'RS_MySQL_dbApply':
RS-MySQL.c:1219:38: warning: variable 'fld_nullOk' set but not used [-Wunused-but-set-variable]
gcc -m64 -shared -s -static-libgcc -o RMySQL.dll tmp.def RS-DBI.o RS-MySQL.o D:\toolkit\mysql56/bin/libmySQL.dll -Ld:/RC
ompile/CRANpkg/extralibs64/local/lib/x64 -Ld:/RCompile/CRANpkg/extralibs64/local/lib -LC:/PROGRA~1/R/R-30~1.1/bin/x64 -l
R
gcc.exe: error: D:\toolkit\mysql56/bin/libmySQL.dll: No such file or directory
ERROR: compilation failed for package 'RMySQL'
* removing 'C:/Program Files/R/R-3.0.1/library/RMySQL'

错误为没有找到动态链接库:D:\toolkit\mysql56/bin/libmySQL.dll


# 复制动态链接库libmySQL.dll
cp D:\toolkit\mysql56\lib\libmysql.dll D:\toolkit\mysql56\bin\
mv D:\toolkit\mysql56\bin\libmysql.dll D:\toolkit\mysql56\bin\libmySQL.dll

再一次安装RMySQL


~ D:\workspace\R\mysql>R CMD INSTALL C:\Users\Administrator\AppData\Local\Temp\RtmpsfqQjK\downloaded_packages\RMySQL_0.9-3
.tar.gz
* installing to library 'C:/Program Files/R/R-3.0.1/library'
* installing *source* package 'RMySQL' ...
** 'RMySQL'MD5
checking for $MYSQL_HOME... D:\toolkit\mysql56
cygwin warning:
  MS-DOS style path detected: D:\toolkit\mysql56
  Preferred POSIX equivalent is: /cygdrive/d/toolkit/mysql56
  CYGWIN environment variable option "nodosfilewarning" turns off this warning.
  Consult the user's guide for more details about POSIX paths:
    http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
** libs
: this package has a non-empty 'configure.win' file,
so building only the main architecture

cygwin warning:
  MS-DOS style path detected: C:/PROGRA~1/R/R-30~1.1/etc/x64/Makeconf
  Preferred POSIX equivalent is: /cygdrive/c/PROGRA~1/R/R-30~1.1/etc/x64/Makeconf
  CYGWIN environment variable option "nodosfilewarning" turns off this warning.
  Consult the user's guide for more details about POSIX paths:
    http://cygwin.com/cygwin-ug-net/using.html#using-pathnames
gcc -m64 -I"C:/PROGRA~1/R/R-30~1.1/include" -DNDEBUG -I"D:\toolkit\mysql56"/include    -I"d:/RCompile/CRANpkg/extralibs6
4/local/include"     -O2 -Wall  -std=gnu99 -mtune=core2 -c RS-DBI.c -o RS-DBI.o
RS-DBI.c: In function 'RS_na_set':
RS-DBI.c:1219:11: warning: variable 'c' set but not used [-Wunused-but-set-variable]
gcc -m64 -I"C:/PROGRA~1/R/R-30~1.1/include" -DNDEBUG -I"D:\toolkit\mysql56"/include    -I"d:/RCompile/CRANpkg/extralibs6
4/local/include"     -O2 -Wall  -std=gnu99 -mtune=core2 -c RS-MySQL.c -o RS-MySQL.o
RS-MySQL.c: In function 'RS_MySQL_fetch':
RS-MySQL.c:657:13: warning: variable 'fld_nullOk' set but not used [-Wunused-but-set-variable]
RS-MySQL.c: In function 'RS_DBI_invokeBeginGroup':
RS-MySQL.c:1137:30: warning: variable 'val' set but not used [-Wunused-but-set-variable]
RS-MySQL.c: In function 'RS_DBI_invokeNewRecord':
RS-MySQL.c:1158:20: warning: variable 'val' set but not used [-Wunused-but-set-variable]
RS-MySQL.c: In function 'RS_MySQL_dbApply':
RS-MySQL.c:1219:38: warning: variable 'fld_nullOk' set but not used [-Wunused-but-set-variable]
gcc -m64 -shared -s -static-libgcc -o RMySQL.dll tmp.def RS-DBI.o RS-MySQL.o D:\toolkit\mysql56/bin/libmySQL.dll -Ld:/RC
ompile/CRANpkg/extralibs64/local/lib/x64 -Ld:/RCompile/CRANpkg/extralibs64/local/lib -LC:/PROGRA~1/R/R-30~1.1/bin/x64 -l
R
installing to C:/Program Files/R/R-3.0.1/library/RMySQL/libs/x64
** R
** inst
** preparing package for lazy loading
Creating a generic function for 'format' from package 'base' in package 'RMySQL'
Creating a generic function for 'print' from package 'base' in package 'RMySQL'
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
MYSQL_HOME defined as D:\toolkit\mysql56
* DONE (RMySQL)

安装成功!

在MySQL中建库建表


~ mysql -uroot -p

mysql> create database rmysql;
Query OK, 1 row affected (0.04 sec)

mysql> grant all on rmysql.* to rmysql@'%' identified by 'rmysql';
Query OK, 0 rows affected (0.00 sec)

mysql> grant all on rmysql.* to rmysql@localhost identified by 'rmysql';
Query OK, 0 rows affected (0.00 sec)

mysql> use rmysql
Database changed
mysql> CREATE TABLE t_user(
    -> id INT PRIMARY KEY AUTO_INCREMENT,
    -> user varchar(12) NOT NULL UNIQUE
    -> )ENGINE=INNODB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (1.01 sec)

mysql>
mysql> INSERT INTO t_user(user) values('A1'),('AB'),('fens.me');
Query OK, 3 rows affected (0.05 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM t_user;
+----+---------+
| id | user    |
+----+---------+
|  1 | A1      |
|  2 | AB      |
|  3 | fens.me |
+----+---------+
3 rows in set (0.03 sec)

通过R程序,读MySQL数据库数据。

注:如果刚才没有把MYSQL_HOME的变量写到环境变更中,每次在启动R之前要,先设置变量。


~ set MYSQL_HOME=D:\toolkit\mysql56
~ R

> library(RMySQL)
DBI
MYSQL_HOME defined as D:\toolkit\mysql56

> conn <- dbConnect(MySQL(), dbname = "rmysql", username="root", password="",client.flag=CLIENT_MULTI_STATEMENTS)
> users = dbGetQuery(conn, "SELECT * FROM t_user")
> dbDisconnect(conn)
[1] TRUE
> users
  id    user
1  1      A1
2  2      AB
3  3 fens.me

好了,我们实现了在Win7下R和MySQL的连接。

4. RMySQL函数使用

环境都安装好了,接下来我们具体使用一下RMySQL的包。

  • RMySQL辅助操作
  • RMySQL数据库操作
  • 针对win的字符集设置

1). RMySQL辅助操作

加载类库

> library(RMySQL)

建立本地连接


> conn <- dbConnect(MySQL(), dbname = "rmysql", username="rmysql", password="rmysql",client.flag=CLIENT_MULTI_STATEMENTS)

建立远程连接


> conn <- dbConnect(MySQL(), dbname = "rmysql", username="rmysql", password="rmysql",host="192.168.1.201",port=3306)

关闭连接

dbDisconnect(conn)

查看数据库的表


> dbListTables(conn)
[1] "t_user"

查看表的字段


> dbListFields(conn, "t_user")
[1] "id"   "user"

查询MySQL信息


> summary(MySQL(), verbose = TRUE)
<MySQLDriver:(23864)> 
  Driver name:  MySQL 
  Max  connections: 16 
  Conn. processed: 3 
  Default records per fetch: 500 
  DBI API version:  

# MySQL连接实例信息
> summary(conn, verbose = TRUE)
<MySQLConnection:(23864,2)> 
  User: root 
  Host: localhost 
  Dbname: rmysql 
  Connection type: localhost via TCP/IP 
  MySQL server version:  5.6.11 
  MySQL client version:  5.6.11 
  MySQL protocol version:  10 
  MySQL server thread id:  35 
  No resultSet available

# MySQL连接信息
> dbListConnections(MySQL())
[[1]]
<MySQLConnection:(23864,2)> 

2). RMySQL数据库操作
RMySQL数据库操作


# 建表并插入数据
> t_demo<-data.frame(
  a=seq(1:10),
  b=letters[1:10],
  c=rnorm(10)
)
> dbWriteTable(conn, "t_demo", t_demo)

# 获得整个表数据
> dbReadTable(conn, "t_demo")
    a b           c
1   1 a  0.98868164
2   2 b -0.66935770
3   3 c  0.27703638
4   4 d  1.36137156
5   5 e -0.70291017
6   6 f  1.61235088
7   7 g  0.17616068
8   8 h  0.29700017
9   9 i  0.19032719
10 10 j -0.06222173

# 插入新数据
> dbWriteTable(conn, "t_demo", t_demo, append=TRUE)
> dbReadTable(conn, "t_demo")
   row_names  a b           c
1          1  1 a  0.98868164
2          2  2 b -0.66935770
3          3  3 c  0.27703638
4          4  4 d  1.36137156
5          5  5 e -0.70291017
6          6  6 f  1.61235088
7          7  7 g  0.17616068
8          8  8 h  0.29700017
9          9  9 i  0.19032719
10        10 10 j -0.06222173
11         1  1 a  0.98868164
12         2  2 b -0.66935770
13         3  3 c  0.27703638
14         4  4 d  1.36137156
15         5  5 e -0.70291017
16         6  6 f  1.61235088
17         7  7 g  0.17616068
18         8  8 h  0.29700017
19         9  9 i  0.19032719
20        10 10 j -0.06222173

# 覆盖原表数据
> dbWriteTable(conn, "t_demo", t_demo, overwrite=TRUE)

# 1). 查询数据
> d0 <- dbGetQuery(conn, "SELECT * FROM t_demo where c>0")
> class(d0)
[1] "data.frame"

> d0
  row_names a b         c
1         1 1 a 0.9886816
2         3 3 c 0.2770364
3         4 4 d 1.3613716
4         6 6 f 1.6123509
5         7 7 g 0.1761607
6         8 8 h 0.2970002
7         9 9 i 0.1903272

# 2). 执行SQL脚本查询,并分页
> rs <- dbSendQuery(conn, "SELECT * FROM t_demo where c>0")
> class(rs)
[1] "MySQLResult"
attr(,"package")
[1] "RMySQL"
> mysqlCloseResult(rs)
[1] TRUE

> d1 <- fetch(rs, n = 3)
> d1
  row_names a b         c
1         1 1 a 0.9886816
2         3 3 c 0.2770364
3         4 4 d 1.3613716

# 3). 查看集统计信息
> summary(rs, verbose = TRUE)
  row_names               a              b                   c         
 Length:7           Min.   :1.000   Length:7           Min.   :0.1762  
 Class :character   1st Qu.:3.500   Class :character   1st Qu.:0.2337  
 Mode  :character   Median :6.000   Mode  :character   Median :0.2970  
                    Mean   :5.429                      Mean   :0.7004  
                    3rd Qu.:7.500                      3rd Qu.:1.1750  
                    Max.   :9.000                      Max.   :1.6124

# 不插入row.names字段
> dbWriteTable(conn, "t_demo", t_demo,row.names=FALSE,overwrite=TRUE)
> dbGetQuery(conn, "SELECT * FROM t_demo where c>0")
  a b         c
1 1 a 0.9886816
2 3 c 0.2770364
3 4 d 1.3613716
4 6 f 1.6123509
5 7 g 0.1761607
6 8 h 0.2970002
7 9 i 0.1903272

# 删除表
> if(dbExistsTable(conn,'t_demo')){
+     dbRemoveTable(conn, "t_demo")
+ }
[1] TRUE

执行SQL语句,dbSendQuery


> query<-dbSendQuery(conn, "show tables")
> data <- fetch(query, n = -1)
> data
  Tables_in_rmysql
1           t_demo
2           t_user
> mysqlCloseResult(query)
[1] TRUE

4). win的字符集设置
在win7中,向MySQL插入中文


mysql> INSERT INTO t_user(user) values('小朋友'),('你好'),('正确了');
Query OK, 3 rows affected (0.07 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> select * from t_user;
+----+---------+
| id | user    |
+----+---------+
|  1 | A1      |
|  2 | AB      |
|  3 | fens.me |
|  5 | 你好    |
|  4 | 小朋友  |
|  6 | 正确了  |
+----+---------+
6 rows in set (0.07 sec)

通过RMySQL查询


> dbGetQuery(conn, "SELECT * FROM t_user")
  id    user
1  1      A1
2  2      AB
3  3 fens.me
4  5      ??
5  4     ???
6  6     ???

设置GKB字符集


> dbDisconnect(conn)
> conn <- dbConnect(MySQL(), dbname = "rmysql", username="root", password="",client.flag=CLIENT_MULTI_STATEMENTS)
> dbSendQuery(conn,'SET NAMES gbk')
 
> query<-dbSendQuery(conn, "SELECT * FROM t_user")
> data <- fetch(query, n = -1)
> mysqlCloseResult(query)
[1] TRUE
> data
  id    user
1  1      A1
2  2      AB
3  3 fens.me
4  5    你好
5  4  小朋友
6  6  正确了

OK,我们在win下面修正字符编号的问题。

5. RMySQL案例实践

系统需求描述:Linux MySQL,Win7的R环境,远程连接

  • 1. 通过SQL新建表t_blog,主键索引,唯一键索引
  • 2. 用RMySQL插入数据,包括中文字段
  • 3. 再用RMySQL取出数据

1). 通过SQL新建表t_blog,主键索引,唯一键索引
建表语句


CREATE TABLE t_blog(
id INT PRIMARY KEY AUTO_INCREMENT,
title varchar(12) NOT NULL UNIQUE,
author varchar(12) NOT NULL, 
length int NOT NULL,
create_date timestamp NOT NULL DEFAULT now()
)ENGINE=INNODB DEFAULT CHARSET=UTF8;

mysql> desc t_blog;
+-------------+-------------+------+-----+-------------------+----------------+
| Field       | Type        | Null | Key | Default           | Extra          |
+-------------+-------------+------+-----+-------------------+----------------+
| id          | int(11)     | NO   | PRI | NULL              | auto_increment |
| title       | varchar(12) | NO   | UNI | NULL              |                |
| author      | varchar(12) | NO   |     | NULL              |                |
| length      | int(11)     | NO   |     | NULL              |                |
| create_date | timestamp   | NO   |     | CURRENT_TIMESTAMP |                |
+-------------+-------------+------+-----+-------------------+----------------+
5 rows in set (0.00 sec)

mysql> show indexes from t_blog;
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table  | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| t_blog |          0 | PRIMARY  |            1 | id          | A         |           3 |     NULL | NULL   |      | BTREE      |         |               |
| t_blog |          0 | title    |            1 | title       | A         |           3 |     NULL | NULL   |      | BTREE      |         |               |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)


INSERT INTO t_blog(title,author,length) values('你好,第一篇','Conan',20),('RMySQL数据库编程','Conan',99),('R的极客理想系列文章','Conan',15);

mysql> select * from t_blog;
+----+------------------------------+--------+--------+---------------------+
| id | title                        | author | length | create_date         |
+----+------------------------------+--------+--------+---------------------+
|  1 | 你好,第一篇                 | Conan  |     20 | 2013-08-15 00:13:13 |
|  2 | RMySQL数据库编程             | Conan  |     99 | 2013-08-15 00:13:13 |
|  3 | R的极客理想系列文章          | Conan  |     15 | 2013-08-15 00:13:13 |
+----+------------------------------+--------+--------+---------------------+
3 rows in set (0.00 sec)

2). 用RMySQL插入数据,包括中文字段,再取出数据


> library(RMySQL)
> conn <- dbConnect(MySQL(), dbname = "rmysql", username="rmysql", password="rmysql",host="192.168.1.201",port=3306)
> 
> dbSendQuery(conn,'SET NAMES gbk')
 
> dbSendQuery(conn,"INSERT INTO t_blog(title,author,length) values('R插入的新文章','Conan',50)");
 
> 
> query<-dbSendQuery(conn, "SELECT * FROM t_blog")
Warning message:
In mysqlExecStatement(conn, statement, ...) :
  RS-DBI driver warning: (unrecognized MySQL field type 7 in column 4 imported as character)
> data <- fetch(query, n = -1)
> mysqlCloseResult(query)
[1] TRUE
> print(data)
  id               title author length         create_date
1  1        你好,第一篇  Conan     20 2013-08-15 00:13:13
2  2    RMySQL数据库编程  Conan     99 2013-08-15 00:13:13
3  3 R的极客理想系列文章  Conan     15 2013-08-15 00:13:13
4  4       R插入的新文章  Conan     50 2013-08-15 00:29:45
> 
> dbDisconnect(conn)
[1] TRUE

特别提示,不能用dbWriteTable函数!!

我们已经完成,掌握了RMySQL的各种使用技巧,希望大家理解原理后,能少犯错误,提高工作效率!

转载请注明出处:
http://blog.fens.me/r-mysql-rmysql/

打赏作者

用Nodejs连接MySQL

从零开始nodejs系列文章,将介绍如何利Javascript做为服务端脚本,通过Nodejs框架web开发。Nodejs框架是基于V8的引擎,是目前速度最快的Javascript引擎。chrome浏览器就基于V8,同时打开20-30个网页都很流畅。Nodejs标准的web开发框架Express,可以帮助我们迅速建立web站点,比起PHP的开发效率更高,而且学习曲线更低。非常适合小型网站,个性化网站,我们自己的Geek网站!!

关于作者

  • 张丹(Conan), 程序员Java,R,PHP,Javascript
  • weibo:@Conan_Z
  • blog: http://blog.fens.me
  • email: bsspirit@gmail.com

转载请注明出处:
http://blog.fens.me/nodejs-mysql-intro/

nodejs-mysql

前言

MySQL是一款常用的开源数据库产品,通常也是免费数据库的首选。查了一下NPM列表,发现Nodejs有13库可以访问MySQL,felixge/node-mysql似乎是最受关注项目,我也决定尝试用一下。

要注意名字,”felixge/node-mysql”非”node-mysql”,安装部分会介绍这个小插曲!

目录

  1. node-mysql介绍
  2. 建立MySQL测试库
  3. node-mysql安装
  4. node-mysql使用

1. node-mysql介绍

felixge/node-mysql是一个纯nodejs的用javascript实现的一个MySQL客户端程序。felixge/node-mysql封装了Nodejs对MySQL的基本操作,100% MIT公共许可证。

项目地址:https://github.com/felixge/node-mysql

2. 建立MySQL测试库

本地创建MySQL测试库:nodejs


~ mysql -uroot -p
mysql> CREATE DATABASE nodejs;
mysql> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| nodejs             |
| performance_schema |
+--------------------+
4 rows in set (0.00 sec)

mysql> GRANT ALL ON nodejs.* to nodejs@'%' IDENTIFIED BY 'nodejs';
mysql> GRANT ALL ON nodejs.* to nodejs@localhost IDENTIFIED BY 'nodejs';

重新登陆MySQL


C:\Users\Administrator>mysql -unodejs -p
Enter password: ******

mysql> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| nodejs             |
| test               |
+--------------------+
3 rows in set (0.00 sec)

mysql> USE nodejs
Database changed

新建一个user表


CREATE TABLE t_user(
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(16) NOT NULL ,
create_date TIMESTAMP NULL DEFAULT now()
)ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE UNIQUE INDEX t_quiz_IDX_0 on t_user(name);

mysql> SHOW TABLES;
+------------------+
| Tables_in_nodejs |
+------------------+
| t_user           |
+------------------+
1 row in set (0.04 sec)

3. node-mysql安装

我的系统环境

  • win7 64bit
  • Nodejs:v0.10.5
  • Npm:1.2.19
  • MySQL:Server version: 5.6.11 MySQL Community Server (GPL)

创建工程:nodejs-node-mysql

~ D:\workspace\javascript>mkdir nodejs-node-mysql
~ D:\workspace\javascript>cd nodejs-node-mysql
~ D:\workspace\javascript\nodejs-node-mysql>npm install node-mysql
node-mysql@0.2.0 node_modules\node-mysql
├── better-js-class@0.1.2
├── cps@0.1.7
├── underscore@1.5.2
└── mysql@2.0.0-alpha9 (require-all@0.0.3, bignumber.js@1.0.1)

这里有一个小插曲

安装“node-mysql”后,打开package.json文件发现,这个项目地址是

https://github.com/redblaze/node-mysql.git

从依赖关系可以看到,它依赖于mysql库,是对felixge/node-mysql的封装。

node-mysql1

由于这个项目star是0,fork也是0. 所以,我也不准备花时间测试了,重新安装felixge/node-mysql的包。

重新安装node-mysql

~ D:\workspace\javascript\nodejs-node-mysql>rm -rf node_modules
~ D:\workspace\javascript\nodejs-node-mysql>npm install mysql@2.0.0-alpha9
npm http GET https://registry.npmjs.org/mysql/2.0.0-alpha9
npm http 200 https://registry.npmjs.org/mysql/2.0.0-alpha9
npm http GET https://registry.npmjs.org/mysql/-/mysql-2.0.0-alpha9.tgz
npm http 200 https://registry.npmjs.org/mysql/-/mysql-2.0.0-alpha9.tgz
npm http GET https://registry.npmjs.org/require-all/0.0.3
npm http GET https://registry.npmjs.org/bignumber.js/1.0.1
npm http 304 https://registry.npmjs.org/require-all/0.0.3
npm http 304 https://registry.npmjs.org/bignumber.js/1.0.1
mysql@2.0.0-alpha9 node_modules\mysql
├── require-all@0.0.3
└── bignumber.js@1.0.1

这回就对了,继续下面的开发!

创建node程序启动文件:app.js

第一个测试

~ vi app.js

var mysql = require('mysql');
var conn = mysql.createConnection({
    host: 'localhost',
    user: 'nodejs',
    password: 'nodejs',
    database:'nodejs',
    port: 3306
});
conn.connect();
conn.query('SELECT 1 + 1 AS solution', function(err, rows, fields) {
    if (err) throw err;
    console.log('The solution is: ', rows[0].solution);
});
conn.end();

运行node

~ D:\workspace\javascript\nodejs-node-mysql>node app.js
The solution is:  2

这样我们就让Nodejs连接上了MySQL。

4. node-mysql使用

下面我们要对node-mysql的API进行常用的测试。

  • 表新删改查
  • 连接池配置
  • MySQL断线重连
  • 连接池超时测试

1). 表新删改查
修改app.js


~ vi app.js

var mysql = require('mysql');
var conn = mysql.createConnection({
    host: 'localhost',
    user: 'nodejs',
    password: 'nodejs',
    database: 'nodejs',
    port: 3306
});
conn.connect();

var insertSQL = 'insert into t_user(name) values("conan"),("fens.me")';
var selectSQL = 'select * from t_user limit 10';
var deleteSQL = 'delete from t_user';
var updateSQL = 'update t_user set name="conan update"  where name="conan"';

//delete
conn.query(deleteSQL, function (err0, res0) {
    if (err0) console.log(err0);
    console.log("DELETE Return ==> ");
    console.log(res0);

    //insert
    conn.query(insertSQL, function (err1, res1) {
        if (err1) console.log(err1);
        console.log("INSERT Return ==> ");
        console.log(res1);

        //query
        conn.query(selectSQL, function (err2, rows) {
            if (err2) console.log(err2);

            console.log("SELECT ==> ");
            for (var i in rows) {
                console.log(rows[i]);
            }

            //update
            conn.query(updateSQL, function (err3, res3) {
                if (err3) console.log(err3);
                console.log("UPDATE Return ==> ");
                console.log(res3);

                //query
                conn.query(selectSQL, function (err4, rows2) {
                    if (err4) console.log(err4);

                    console.log("SELECT ==> ");
                    for (var i in rows2) {
                        console.log(rows2[i]);
                    }
                });
            });
        });
    });
});

//conn.end();

控制台输出:

D:\workspace\javascript\nodejs-node-mysql>node app.js
DELETE Return ==>
{ fieldCount: 0,
  affectedRows: 2,
  insertId: 0,
  serverStatus: 34,
  warningCount: 0,
  message: '',
  protocol41: true,
  changedRows: 0 }
INSERT Return ==>
{ fieldCount: 0,
  affectedRows: 2,
  insertId: 33,
  serverStatus: 2,
  warningCount: 0,
  message: '&Records: 2  Duplicates: 0  Warnings: 0',
  protocol41: true,
  changedRows: 0 }
SELECT ==>
{ id: 33,
  name: 'conan',
  create_date: Wed Sep 11 2013 12:09:15 GMT+0800 (中国标准时间) }
{ id: 34,
  name: 'fens.me',
  create_date: Wed Sep 11 2013 12:09:15 GMT+0800 (中国标准时间) }
UPDATE Return ==>
{ fieldCount: 0,
  affectedRows: 1,
  insertId: 0,
  serverStatus: 2,
  warningCount: 0,
  message: '(Rows matched: 1  Changed: 1  Warnings: 0',
  protocol41: true,
  changedRows: 1 }
SELECT ==>
{ id: 33,
  name: 'conan update',
  create_date: Wed Sep 11 2013 12:09:15 GMT+0800 (中国标准时间) }
{ id: 34,
  name: 'fens.me',
  create_date: Wed Sep 11 2013 12:09:15 GMT+0800 (中国标准时间) }

由于node的异步的,上面是一个连续的操作,代码会被写的支离破碎。我们可以通过async库对上面代码进行封装,请参考文章:Nodejs异步流程控制Async

2). 连接池配置

增加文件:app-pooling.js

~ vi app-pooling.js

var mysql = require('mysql');
var pool = mysql.createPool({
    host: 'localhost',
    user: 'nodejs',
    password: 'nodejs',
    database: 'nodejs',
    port: 3306
});

var selectSQL = 'select * from t_user limit 10';

pool.getConnection(function (err, conn) {
    if (err) console.log("POOL ==> " + err);

    conn.query(selectSQL,function(err,rows){
        if (err) console.log(err);
        console.log("SELECT ==> ");
        for (var i in rows) {
            console.log(rows[i]);
        }
        conn.release();
    });
});

控制台输出:

D:\workspace\javascript\nodejs-node-mysql>node app-pooling.js
SELECT ==>
{ id: 39,
  name: 'conan update',
  create_date: Wed Sep 11 2013 13:41:18 GMT+0800 (中国标准时间) }
{ id: 40,
  name: 'fens.me',
  create_date: Wed Sep 11 2013 13:41:18 GMT+0800 (中国标准时间) }

3). MySQL断线重连

分别模拟3种错误

  • 登陆密码错误
  • 数据库宕机
  • 数据库连接超时

新增文件:app-reconnect.js

~ vi app-reconnect.js

var mysql = require('mysql');
var conn;
function handleError () {
    conn = mysql.createConnection({
        host: 'localhost',
        user: 'nodejs',
        password: 'nodejs',
        database: 'nodejs',
        port: 3306
    });

    //连接错误,2秒重试
    conn.connect(function (err) {
        if (err) {
            console.log('error when connecting to db:', err);
            setTimeout(handleError , 2000);
        }
    });

    conn.on('error', function (err) {
        console.log('db error', err);
        // 如果是连接断开,自动重新连接
        if (err.code === 'PROTOCOL_CONNECTION_LOST') {
            handleError();
        } else {
            throw err;
        }
    });
}
handleError();

a. 模拟密码错误

修改password: ‘nodejs11’

控制台输出。

D:\workspace\javascript\nodejs-node-mysql>node app-reconnect.js
error when connecting to db: { [Error: ER_ACCESS_DENIED_ERROR: Access denied for user 'nodejs'@'localhost' (using pass
rd: YES)]
  code: 'ER_ACCESS_DENIED_ERROR',
  errno: 1045,
  sqlState: '28000',
  fatal: true }
error when connecting to db: { [Error: ER_ACCESS_DENIED_ERROR: Access denied for user 'nodejs'@'localhost' (using pass
rd: YES)]
  code: 'ER_ACCESS_DENIED_ERROR',
  errno: 1045,
  sqlState: '28000',
  fatal: true }

b. 模拟数据库宕机
正常启动node,然后杀掉mysqld的进程。

控制台输出。


D:\workspace\javascript\nodejs-node-mysql>node app-reconnect.js
db error { [Error: read ECONNRESET]
  code: 'ECONNRESET',
  errno: 'ECONNRESET',
  syscall: 'read',
  fatal: true }

Error: read ECONNRESET
    at errnoException (net.js:884:11)
    at TCP.onread (net.js:539:19)

这个异常,直接导致node程序被杀死!

c. 模拟连接超时,PROTOCOL_CONNECTION_LOST
切换到root账户, 修改MySQL的wait_timeout参数,设置为10毫秒超时。

~ mysql -uroot -p
mysql> show variables like 'wait_timeout';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wait_timeout  | 28800 |
+---------------+-------+
1 row in set (0.00 sec)

mysql> set global wait_timeout=10;
Query OK, 0 rows affected (0.00 sec)

mysql> show variables like 'wait_timeout';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wait_timeout  | 10    |
+---------------+-------+
1 row in set (0.00 sec)

修改文件:app-reconnection.js,在最后增加代码

~ vi app-reconnection.js

function query(){
    console.log(new Date());
    var sql = "show variables like 'wait_timeout'";
    conn.query(sql, function (err, res) {
        console.log(res);
    });
}

query();
setInterval(query, 15*1000);

程序会每融15秒,做一次查询。

控制台输出

D:\workspace\javascript\nodejs-node-mysql>node app-reconnect.js
Wed Sep 11 2013 15:21:14 GMT+0800 (中国标准时间)
[ { Variable_name: 'wait_timeout', Value: '10' } ]
db error { [Error: Connection lost: The server closed the connection.] fatal: true, code: 'PROTOCOL_CONNECTION_LOST' }
Wed Sep 11 2013 15:21:28 GMT+0800 (中国标准时间)
[ { Variable_name: 'wait_timeout', Value: '10' } ]
db error { [Error: Connection lost: The server closed the connection.] fatal: true, code: 'PROTOCOL_CONNECTION_LOST' }
Wed Sep 11 2013 15:21:43 GMT+0800 (中国标准时间)
[ { Variable_name: 'wait_timeout', Value: '10' } ]

我们自己的程序捕获了“PROTOCOL_CONNECTION_LOST”异常,并自动的实现了数据库重连。

4). MySQL连接池的超时测试

针对wait_timeout问题,我们再对连接做一下测试。

修改app-pooling.js文件


var mysql = require('mysql');
var pool = mysql.createPool({
    host: 'localhost',
    user: 'nodejs',
    password: 'nodejs',
    database: 'nodejs',
    port: 3306
});

var selectSQL ="show variables like 'wait_timeout'";

pool.getConnection(function (err, conn) {
    if (err) console.log("POOL ==> " + err);

    function query(){
        conn.query(selectSQL, function (err, res) {
            console.log(new Date());
            console.log(res);
            conn.release();
        });
    }
    query();
    setInterval(query, 5000);
});

控制台输出:

D:\workspace\javascript\nodejs-node-mysql>node app-pooling.js
Wed Sep 11 2013 15:32:25 GMT+0800 (中国标准时间)
[ { Variable_name: 'wait_timeout', Value: '10' } ]
Wed Sep 11 2013 15:32:30 GMT+0800 (中国标准时间)
[ { Variable_name: 'wait_timeout', Value: '10' } ]
Wed Sep 11 2013 15:32:35 GMT+0800 (中国标准时间)
[ { Variable_name: 'wait_timeout', Value: '10' } ]

连接池,已经解决了自动重连的问题了,后面我们的开发,可以尽量使用pooling的方式。

转载请注明出处:
http://blog.fens.me/nodejs-mysql-intro/

打赏作者

[MySQL优化]为MySQL数据文件ibdata1瘦身

前言

MySQL在运行一段时间后,ibdata1的文件会增长大小,就算删除了表的数据,ibdata1的体积也不会减小。由于硬盘空间有限,这样一直膨胀下去磁盘空间接近崩溃。今天在导出数据的时候就发现了,磁盘竟然满了,明明预留了1个月的用量,1周就占满了,下面就要给ibdata1做个瘦身。

关于作者:

  • 张丹(Conan), 程序员Java,R,PHP,Javascript
  • weibo:@Conan_Z
  • blog: http://blog.fens.me
  • email: bsspirit@gmail.com

转载请注明出处:
http://blog.fens.me/mysql-ibdata1/

mysql-ibdata1

目录

  1. 系统环境
  2. 发现问题
  3. 解决问题

1. 系统环境

Linux Ubuntu 13.04 64bit server

~ uname -a
Linux d2 3.8.0-21-generic #32-Ubuntu SMP Tue May 14 22:16:46 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
~ cat /etc/issue
Ubuntu 13.04 \n \l

MySQL: 5.5.31-0ubuntu0.13.04.1

~ mysql --version
mysql  Ver 14.14 Distrib 5.5.31, for debian-linux-gnu (x86_64) using readline 6.2

硬盘:36G+4G+4G+36G


~ df -h
Filesystem                    Size  Used Avail Use% Mounted on
/dev/mapper/server3--vg-root   36G   31G  3.2G  91% /
none                          4.0K     0  4.0K   0% /sys/fs/cgroup
udev                          4.1G  1.1G  3.0G  26% /dev
tmpfs                         824M  280K  823M   1% /run
none                          5.0M     0  5.0M   0% /run/lock
none                          4.1G  3.4G  729M  83% /run/shm
none                          100M     0  100M   0% /run/user
/dev/vda1                     228M   30M  187M  14% /boot
192.168.1.10:/home/amg/data    36G   13G   21G  39% /home/amg/data

MySQL的ibdata1占用空间:20G


~ cd /var/lib/mysql
~ ls -l
drwxr-xr-x 2 mysql mysql        4096 Aug  2 19:38 CB
drwxr-xr-x 2 mysql mysql        4096 Jun 24 23:08 conan
drwxr-xr-x 2 mysql mysql        4096 Jun  2 00:52 dbwordpress
-rwxr-xr-x 1 root  root            0 May 23 00:48 debian-5.5.flag
-rwxr-xr-x 1 mysql mysql 20101201920 Aug  2 20:08 ibdata1
-rwxr-xr-x 1 mysql mysql     5242880 Aug  2 20:08 ib_logfile0
-rwxr-xr-x 1 mysql mysql     5242880 Aug  2 19:38 ib_logfile1
drwxr-xr-x 2 mysql mysql        4096 Jun 26 09:03 Macro
drwxr-xr-x 2 mysql root         4096 May 23 00:48 mysql
-rwxr-xr-x 1 root  root            6 May 23 00:48 mysql_upgrade_info
drwxr-xr-x 2 mysql mysql        4096 May 23 00:48 performance_schema
drwxr-xr-x 2 mysql mysql        4096 May 23 00:53 phpmyadmin
drwxr-xr-x 2 mysql root         4096 May 23 00:48 test
drwxr-xr-x 2 mysql mysql        4096 Jul 22 14:09 TF
drwxr-xr-x 2 mysql mysql        4096 Jun  2 01:04 wordpress

业务数据表


mysql> show tables;
+-----------------+
| Tables_in_CB    |
+-----------------+
| NSpremium       |
| cb_hft          |
| cb_hft_20130801 |
| cb_hft_20130802 |
+-----------------+
4 rows in set (0.00 sec)

2. 发现问题

ibdata1单个文件占用20G大小。
1. MySQL默认设置,没有按表空间分离数据,所有的表的数据都被放到ibdata1文件中。
2. 业务操作,每天会产生一张表cb_hft,晚上的时候对表进行重命名。


RENAME TABLE cb_hft TO cb_hft_20130801;
create table cb_hft like cb_hft_20130801;

3. 每周会把数据导出,同时drop表。但drop后,ibdata1不会减少,随着数据的积累ibdata1越来越大,根空间已经不够用了。

3. 解决问题

1). 导出数据
现在数据库中,有两个数据表,cb_hft_20130801,cb_hft_20130802,分别导出到/run/shm, /dev


~ cd /dev
~ mysqldump -uroot -p CB cb_hft_20130802 > export_cb_hft_20130802.sql

~ cd /run/shm
~ mysqldump -uroot -p CB cb_hft_20130801 > export_cb_hft_20130801.sql

~ df -h
Filesystem                    Size  Used Avail Use% Mounted on
/dev/mapper/server3--vg-root   36G   31G  3.2G  91% /
none                          4.0K     0  4.0K   0% /sys/fs/cgroup
udev                          4.1G  3.7G  368M  92% /dev
tmpfs                         824M  280K  823M   1% /run
none                          5.0M     0  5.0M   0% /run/lock
none                          4.1G  3.4G  729M  83% /run/shm
none                          100M     0  100M   0% /run/user
/dev/vda1                     228M   30M  187M  14% /boot
192.168.1.10:/home/amg/data    36G   13G   21G  39% /home/amg/data

两张表分别占了,3.4G,3.7G。

登陆mysql删除CB数据库


~ mysql -uroot -p
~ drop database CB

导出其他数据库数据


~ cd /run/shm
~ mysqldump -uroot -p -R -q --all-databases > others.sql

2). 修改配置文件/etc/mysql/my.cnf
对每张表使用单独的数据文件存储innodb_file_per_table

停止mysql服务器

~ sudo /etc/init.d/mysql stop

#清空所有数据文件
~ sudo rm -rf /var/lib/mysql/*

修改配置文件


~ sudo vi /etc/mysql/my.cnf

[mysqld]
innodb_file_per_table

重新构建数据库实例


~ /usr/bin/mysql_install_db

~ ls /var/lib/mysql
mysql  performance_schema  test

#启动MySQL
~ sudo /etc/init.d/mysql start

3). 恢复其他数据库


~ mysql < /run/shm/others.sql 
~ mysql -umysql -p 

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| Macro              |
| TF                 |
| conan              |
| dbwordpress        |
| mysql              |
| performance_schema |
| phpmyadmin         |
| test               |
| wordpress          |
+--------------------+
10 rows in set (0.01 sec)

#查看ibdata1大小
~ ls -l /var/lib/mysql
drwx------ 2 mysql mysql     4096 Aug  2 21:33 CB
drwx------ 2 mysql mysql     4096 Aug  2 21:23 conan
drwx------ 2 mysql mysql     4096 Aug  2 21:23 dbwordpress
-rw-rw---- 1 mysql mysql 18874368 Aug  2 21:34 ibdata1
-rw-rw---- 1 mysql mysql  5242880 Aug  2 21:34 ib_logfile0
-rw-rw---- 1 mysql mysql  5242880 Aug  2 21:34 ib_logfile1
drwx------ 2 mysql mysql     4096 Aug  2 21:23 Macro
drwx------ 2 mysql root      4096 Aug  2 21:23 mysql
drwx------ 2 mysql mysql     4096 Aug  2 21:19 performance_schema
drwx------ 2 mysql mysql     4096 Aug  2 21:23 phpmyadmin
drwx------ 2 mysql root      4096 Aug  2 21:19 test
drwx------ 2 mysql mysql     4096 Aug  2 21:23 TF
drwx------ 2 mysql mysql     4096 Aug  2 21:23 wordpress

4). 重置root密码


mysql> SET PASSWORD FOR 'root'@'localhost' = PASSWORD('123456');
mysql> FLUSH PRIVILEGES;

5). 恢复CB数据库


mysql> create database CB;
Query OK, 1 row affected (0.00 sec)

~  mysql --database CB < /run/shm/export_cb_hft_20130801.sql
~  mysql --database CB < /dev/export_cb_hft_20130802.sql
~  mysql --database CB < /dev/export_NSpremium.sql

#查看ibdata1大小:还是出初始值没有增长
~ ls -l /var/lib/mysql
drwx------ 2 mysql mysql     4096 Aug  2 21:33 CB
drwx------ 2 mysql mysql     4096 Aug  2 21:23 conan
drwx------ 2 mysql mysql     4096 Aug  2 21:23 dbwordpress
-rw-rw---- 1 mysql mysql 18874368 Aug  2 22:01 ibdata1
-rw-rw---- 1 mysql mysql  5242880 Aug  2 22:01 ib_logfile0
-rw-rw---- 1 mysql mysql  5242880 Aug  2 22:01 ib_logfile1
drwx------ 2 mysql mysql     4096 Aug  2 21:23 Macro
drwx------ 2 mysql root      4096 Aug  2 21:23 mysql
drwx------ 2 mysql mysql     4096 Aug  2 21:19 performance_schema
drwx------ 2 mysql mysql     4096 Aug  2 21:23 phpmyadmin
drwx------ 2 mysql root      4096 Aug  2 21:19 test
drwx------ 2 mysql mysql     4096 Aug  2 21:23 TF
drwx------ 2 mysql mysql     4096 Aug  2 21:23 wordpress

#查看CB库目录:所有的数据都保存在自己单独的数据文件
~ ls -l /var/lib/mysql/CB
-rw-rw---- 1 mysql mysql       9928 Aug  2 21:33 cb_hft_20130801.frm
-rw-rw---- 1 mysql mysql 7159676928 Aug  2 22:08 cb_hft_20130801.ibd
-rw-rw---- 1 mysql mysql       9928 Aug  2 22:09 cb_hft_20130802.frm
-rw-rw---- 1 mysql mysql 7805599744 Aug  2 22:38 cb_hft_20130802.ibd
-rw-rw---- 1 mysql mysql         61 Aug  2 21:30 db.opt

刚才设置的innodb_file_per_table参数已经起作用了,当我们再导出表drop后,对应的数据文件idb就会被删除,系统硬盘空间使用就会在正常值范围内。

查看表数据


mysql> show tables;
+-----------------+
| Tables_in_CB    |
+-----------------+
| cb_hft_20130801 |
| cb_hft_20130802 |
+-----------------+
2 rows in set (0.00 sec)

mysql> select count(1) from cb_hft_20130801;
+----------+
| count(1) |
+----------+
| 21063172 |
+----------+
1 row in set (1 min 1.46 sec)

#删除表
~ drop table cb_hft_20130801;

#查看数据文件
~ ls -l /var/lib/mysql/CB
-rw-rw---- 1 mysql mysql       9928 Aug  2 22:09 cb_hft_20130802.frm
-rw-rw---- 1 mysql mysql 7805599744 Aug  2 22:38 cb_hft_20130802.ibd
-rw-rw---- 1 mysql mysql         61 Aug  2 21:30 db.opt
-rw-rw---- 1 mysql mysql       9274 Aug  2 22:52 NSpremium.frm
-rw-rw---- 1 mysql mysql      98304 Aug  2 22:53 NSpremium.ibd

drop后,数据就一起被删除了。

经过对MySQL的调优,ibdata1已经被瘦身!数据库又可以继续正常的稳定的工作了。

转载请注明出处:
http://blog.fens.me/mysql-ibdata1/

打赏作者

Hive导入10G数据的测试

让Hadoop跑在云端系列文章,介绍了如何整合虚拟化和Hadoop,让Hadoop集群跑在VPS虚拟主机上,通过云向用户提供存储和计算的服务。

现在硬件越来越便宜,一台非品牌服务器,2颗24核CPU,配48G内存,2T的硬盘,已经降到2万块人民币以下了。这种配置如果简单地放几个web应用,显然是奢侈的浪费。就算是用来实现单节点的hadoop,对计算资源浪费也是非常高的。对于这么高性能的计算机,如何有效利用计算资源,就成为成本控制的一项重要议题了。

通过虚拟化技术,我们可以将一台服务器,拆分成12台VPS,每台2核CPU,4G内存,40G硬盘,并且支持资源重新分配。多么伟大的技术啊!现在我们有了12个节点的hadoop集群, 让Hadoop跑在云端,让世界加速。

关于作者:

  • 张丹(Conan), 程序员Java,R,PHP,Javascript
  • weibo:@Conan_Z
  • blog: http://blog.fens.me
  • email: bsspirit@gmail.com

转载请注明出处:
 http://blog.fens.me/hadoop-hive-10g/

hadoop-hive-10g

前言

Hadoop和Hive的环境已经搭建起来了,开始导入数据进行测试。我的数据1G大概对应500W行,MySQL的查询500W行大概3.29秒,用hive同样的查询大概30秒。如果我们把数据增加到10G,100G,让我们来看看Hive的表现吧。

目录

  1. 导出MySQL数据
  2. 导入到Hive
  3. 优化导入过程Hive Bucket
  4. 执行查询

1. 导出MySQL数据

下面是我的表,每天会产生一新表,用日期的方式命名。今天是2013年7月19日,对应的表是cb_hft,记录数646W条记录。


mysql> show tables;
+-----------------+
| Tables_in_CB    |
+-----------------+
| NSpremium       |
| cb_hft          |
| cb_hft_20130710 |
| cb_hft_20130712 |
| cb_hft_20130715 |
| cb_hft_20130716 |
+-----------------+
6 rows in set (0.00 sec)

mysql> select count(1) from cb_hft;
+----------+
| count(1) |
+----------+
|  6461338 |
+----------+
1 row in set (3.29 sec)

快速复制表:
由于这个表是离线系统的,没有线上应用,我重命名表cb_hft为cb_hft_20130719,再复制表结构。


mysql> RENAME TABLE cb_hft TO cb_hft_20130719;
Query OK, 0 rows affected (0.00 sec)

mysql> CREATE TABLE cb_hft like cb_hft_20130719;
Query OK, 0 rows affected (0.02 sec)

mysql> show tables;
+-----------------+
| Tables_in_CB    |
+-----------------+
| NSpremium       |
| cb_hft          |
| cb_hft_20130710 |
| cb_hft_20130712 |
| cb_hft_20130715 |
| cb_hft_20130716 |
| cb_hft_20130719 |
+-----------------+
7 rows in set (0.00 sec)

导出表到csv
以hft_20130712表为例


mysql> SELECT
  SecurityID,TradeTime,PreClosePx,OpenPx,HighPx,LowPx,LastPx, 
  BidSize1,BidPx1,BidSize2,BidPx2,BidSize3,BidPx3,BidSize4,BidPx4,BidSize5,BidPx5,
  OfferSize1,OfferPx1,OfferSize2,OfferPx2,OfferSize3,OfferPx3,OfferSize4,OfferPx4,OfferSize5,OfferPx5,
  NumTrades,TotalVolumeTrade,TotalValueTrade,PE,PE1,PriceChange1,PriceChange2,Positions
FROM cb_hft_20130712
INTO OUTFILE '/tmp/export_cb_hft_20130712.csv'  FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'; 

Query OK, 6127080 rows affected (2 min 55.04 sec)

查看数据文件


~ ls -l /tmp
-rw-rw-rw- 1 mysql mysql 1068707117 Jul 19 15:59 export_cb_hft_20130712.csv

2. 导入到Hive

登陆c1.wtmart.com机器,下载数据文件


~ ssh cos@c1.wtmart.com
~ cd /home/cos/hadoop/sqldb
~ scp -P 10003 cos@d2.wtmart.com:/tmp/export_cb_hft_20130712.csv .
export_cb_hft_20130712.csv                                                                 100% 1019MB  39.2MB/s   00:26

在hive上建表


~ bin/hive shell

#删除已存在的表
hive> DROP TABLE IF EXISTS t_hft_tmp;
Time taken: 4.898 seconds

#创建t_hft_tmp表
hive> CREATE TABLE t_hft_tmp(
  SecurityID STRING,TradeTime STRING,
  PreClosePx DOUBLE,OpenPx DOUBLE,HighPx DOUBLE,LowPx DOUBLE,LastPx DOUBLE,
  BidSize1 DOUBLE,BidPx1 DOUBLE,BidSize2 DOUBLE,BidPx2 DOUBLE,BidSize3 DOUBLE,BidPx3 DOUBLE,BidSize4 DOUBLE,BidPx4 DOUBLE,BidSize5 DOUBLE,BidPx5 DOUBLE,
  OfferSize1 DOUBLE,OfferPx1 DOUBLE,OfferSize2 DOUBLE,OfferPx2 DOUBLE,OfferSize3 DOUBLE,OfferPx3 DOUBLE,OfferSize4 DOUBLE,OfferPx4 DOUBLE,OfferSize5 DOUBLE,OfferPx5 DOUBLE,
  NumTrades INT,TotalVolumeTrade DOUBLE,TotalValueTrade DOUBLE,PE DOUBLE,PE1 DOUBLE,PriceChange1 DOUBLE,PriceChange2 DOUBLE,Positions DOUBLE
) PARTITIONED BY (tradeDate INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

Time taken: 0.189 seconds

#导入数据
hive> LOAD DATA LOCAL INPATH '/home/cos/hadoop/sqldb/export_cb_hft_20130712.csv' OVERWRITE INTO TABLE t_hft_tmp PARTITION (tradedate=20130712);
Copying data from file:/home/cos/hadoop/sqldb/export_cb_hft_20130712.csv
Copying file: file:/home/cos/hadoop/sqldb/export_cb_hft_20130712.csv
Loading data to table default.t_hft_tmp partition (tradedate=20130712)
Time taken: 16.535 seconds

当数据被加载至表中时,不会对数据进行任何转换。Load操作只是将数据复制至Hive表对应的位置,这个表只有一个文件,文件没有切分成多份。


hive> dfs -ls /user/hive/warehouse/t_hft_tmp/tradedate=20130712;
Found 1 items
-rw-r--r--   1 cos supergroup 1068707117 2013-07-19 16:07 /user/hive/warehouse/t_hft_tmp/tradedate=20130712/export_cb_hft_20130712.csv

3. 优化导入过程Hive Bucket

第二步导入,我们要把刚才的一个大文件切分成多少小文件,大概按照64M一个block的要求。我们设置做16个Bucket。

新建数据表t_hft_day,并定义CLUSTERED BY,SORTED BY,16 BUCKETS


hive> CREATE TABLE t_hft_day(
  SecurityID STRING,TradeTime STRING,
  PreClosePx DOUBLE,OpenPx DOUBLE,HighPx DOUBLE,LowPx DOUBLE,LastPx DOUBLE,
  BidSize1 DOUBLE,BidPx1 DOUBLE,BidSize2 DOUBLE,BidPx2 DOUBLE,BidSize3 DOUBLE,BidPx3 DOUBLE,BidSize4 DOUBLE,BidPx4 DOUBLE,BidSize5 DOUBLE,BidPx5 DOUBLE,
  OfferSize1 DOUBLE,OfferPx1 DOUBLE,OfferSize2 DOUBLE,OfferPx2 DOUBLE,OfferSize3 DOUBLE,OfferPx3 DOUBLE,OfferSize4 DOUBLE,OfferPx4 DOUBLE,OfferSize5 DOUBLE,OfferPx5 DOUBLE,
  NumTrades INT,TotalVolumeTrade DOUBLE,TotalValueTrade DOUBLE,PE DOUBLE,PE1 DOUBLE,PriceChange1 DOUBLE,PriceChange2 DOUBLE,Positions DOUBLE
) PARTITIONED BY (tradeDate INT)
CLUSTERED BY(SecurityID) SORTED BY(TradeTime) INTO 16 BUCKETS
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

从t_hft_tmp临时数据表导入到t_hft_day数据表


#强制执行装桶的操作
hive> set hive.enforce.bucketing = true;

#数据导入
hive> FROM t_hft_tmp
INSERT OVERWRITE TABLE t_hft_day
PARTITION (tradedate=20130712)
SELECT SecurityID , TradeTime ,
  PreClosePx ,OpenPx ,HighPx ,LowPx ,LastPx ,
  BidSize1 ,BidPx1 ,BidSize2 ,BidPx2 ,BidSize3 ,BidPx3 ,BidSize4 ,BidPx4 ,BidSize5 ,BidPx5 ,
  OfferSize1 ,OfferPx1 ,OfferSize2 ,OfferPx2 ,OfferSize3 ,OfferPx3 ,OfferSize4 ,OfferPx4 ,OfferSize5 ,OfferPx5 ,
  NumTrades,TotalVolumeTrade ,TotalValueTrade ,PE ,PE1 ,PriceChange1 ,PriceChange2 ,Positions 
WHERE tradedate=20130712;

MapReduce Total cumulative CPU time: 8 minutes 5 seconds 810 msec
Ended Job = job_201307191356_0016
Loading data to table default.t_hft_day partition (tradedate=20130712)
Partition default.t_hft_day{tradedate=20130712} stats: [num_files: 16, num_rows: 0, total_size: 1291728298, raw_data_size: 0]
Table default.t_hft_day stats: [num_partitions: 11, num_files: 176, num_rows: 0, total_size: 10425980914, raw_data_size: 0]
6127080 Rows loaded to t_hft_day
MapReduce Jobs Launched:
Job 0: Map: 4  Reduce: 16   Cumulative CPU: 485.81 sec   HDFS Read: 1068771008 HDFS Write: 1291728298 SUCCESS
Total MapReduce CPU Time Spent: 8 minutes 5 seconds 810 msec
OK
Time taken: 172.617 seconds

导入操作累计CPU时间是8分05秒,8*60+5=485秒。由于有4个Map并行,16个Reduce并行,所以实际消耗时间是172秒。

我们再看一下新表的文件是否被分片:


hive> dfs -ls /user/hive/warehouse/t_hft_day/tradedate=20130712;
Found 16 items
-rw-r--r--   1 cos supergroup   95292536 2013-07-19 16:18 /user/hive/warehouse/t_hft_day/tradedate=20130712/000000_0
-rw-r--r--   1 cos supergroup   97136495 2013-07-19 16:18 /user/hive/warehouse/t_hft_day/tradedate=20130712/000001_0
-rw-r--r--   1 cos supergroup   90695623 2013-07-19 16:18 /user/hive/warehouse/t_hft_day/tradedate=20130712/000002_0
-rw-r--r--   1 cos supergroup   84132171 2013-07-19 16:18 /user/hive/warehouse/t_hft_day/tradedate=20130712/000003_0
-rw-r--r--   1 cos supergroup   81552397 2013-07-19 16:18 /user/hive/warehouse/t_hft_day/tradedate=20130712/000004_0
-rw-r--r--   1 cos supergroup   80580028 2013-07-19 16:18 /user/hive/warehouse/t_hft_day/tradedate=20130712/000005_0
-rw-r--r--   1 cos supergroup   73195335 2013-07-19 16:18 /user/hive/warehouse/t_hft_day/tradedate=20130712/000006_0
-rw-r--r--   1 cos supergroup   68648786 2013-07-19 16:18 /user/hive/warehouse/t_hft_day/tradedate=20130712/000007_0
-rw-r--r--   1 cos supergroup   72210159 2013-07-19 16:18 /user/hive/warehouse/t_hft_day/tradedate=20130712/000008_0
-rw-r--r--   1 cos supergroup   66851502 2013-07-19 16:18 /user/hive/warehouse/t_hft_day/tradedate=20130712/000009_0
-rw-r--r--   1 cos supergroup   69292538 2013-07-19 16:18 /user/hive/warehouse/t_hft_day/tradedate=20130712/000010_0
-rw-r--r--   1 cos supergroup   75282272 2013-07-19 16:18 /user/hive/warehouse/t_hft_day/tradedate=20130712/000011_0
-rw-r--r--   1 cos supergroup   79572724 2013-07-19 16:18 /user/hive/warehouse/t_hft_day/tradedate=20130712/000012_0
-rw-r--r--   1 cos supergroup   78151866 2013-07-19 16:19 /user/hive/warehouse/t_hft_day/tradedate=20130712/000013_0
-rw-r--r--   1 cos supergroup   86850954 2013-07-19 16:18 /user/hive/warehouse/t_hft_day/tradedate=20130712/000014_0
-rw-r--r--   1 cos supergroup   92282912 2013-07-19 16:19 /user/hive/warehouse/t_hft_day/tradedate=20130712/000015_0

一共16个分片。

4. 执行查询

当前1G的文件,使用Hive执行一个简单的查询:34.974秒


hive> select count(1) from t_hft_day where tradedate=20130712;

MapReduce Total cumulative CPU time: 34 seconds 670 msec
Ended Job = job_201307191356_0017
MapReduce Jobs Launched:
Job 0: Map: 7  Reduce: 1   Cumulative CPU: 34.67 sec   HDFS Read: 1291793812 HDFS Write: 8 SUCCESS
Total MapReduce CPU Time Spent: 34 seconds 670 msec
6127080
Time taken: 34.974 seconds

MySQL执行同样的查询,在开始时我已经测试过3.29秒。
相差了10倍的时间,不过只有1G的数据量,是发挥不出hadoop的优势的。

接下来,按照上面的方法,我们把十几天的数据都导入到hive里面,然后再进行比较。

查看已导入hive的数据集


hive> SHOW PARTITIONS t_hft_day;
tradedate=20130627
tradedate=20130628
tradedate=20130701
tradedate=20130702
tradedate=20130703
tradedate=20130704
tradedate=20130705
tradedate=20130708
tradedate=20130709
tradedate=20130710
tradedate=20130712
tradedate=20130715
tradedate=20130716
tradedate=20130719
Time taken: 0.099 seconds

在MySQL中,对5张表进行查询。(5G数据量)


#单表:由于PreClosePx不是索引列,第一次查询
mysql> select SecurityID,20130719 as tradedate,count(1) as count from cb_hft_20130716 where PreClosePx>8.17 group by SecurityID limit 10;
+------------+-----------+-------+
| SecurityID | tradedate | count |
+------------+-----------+-------+
| 000001     |  20130719 |  5200 |
| 000002     |  20130719 |  5193 |
| 000003     |  20130719 |  1978 |
| 000004     |  20130719 |  3201 |
| 000005     |  20130719 |  1975 |
| 000006     |  20130719 |  1910 |
| 000007     |  20130719 |  3519 |
| 000008     |  20130719 |  4229 |
| 000009     |  20130719 |  5147 |
| 000010     |  20130719 |  2176 |
+------------+-----------+-------+
10 rows in set (24.60 sec)

#多表查询
select t.SecurityID,t.tradedate,t.count 
from (
select SecurityID,20130710 as tradedate,count(1) as count from cb_hft_20130710 where PreClosePx>8.17 group by SecurityID
union
select SecurityID,20130712 as tradedate,count(1) as count from cb_hft_20130712 group by SecurityID
union
select SecurityID,20130715 as tradedate,count(1) as count from cb_hft_20130715 where PreClosePx>8.17 group by SecurityID
union
select SecurityID,20130716 as tradedate,count(1) as count from cb_hft_20130716 where PreClosePx>8.17 group by SecurityID
union
select SecurityID,20130719 as tradedate,count(1) as count from cb_hft_20130719 where PreClosePx>8.17 group by SecurityID ) as t 
limit 10

#超过3分钟,无返回结果。
....

在Hive中,对同样的5张表进行查询。(5G数据量)


select SecurityID,tradedate,count(1) from t_hft_day where tradedate in (20130710,20130712,20130715,20130716,20130719) and PreClosePx>8.17 group by SecurityID,tradedate limit 10;

MapReduce Total cumulative CPU time: 3 minutes 56 seconds 540 msec
Ended Job = job_201307191356_0023
MapReduce Jobs Launched:
Job 0: Map: 25  Reduce: 7   Cumulative CPU: 236.54 sec   HDFS Read: 6577084486 HDFS Write: 1470 SUCCESS
Total MapReduce CPU Time Spent: 3 minutes 56 seconds 540 msec
OK
000001  20130710        5813
000004  20130715        3546
000005  20130712        1820
000005  20130719        2364
000006  20130716        1910
000008  20130710        2426
000011  20130715        2113
000012  20130712        3554
000012  20130719        3756
000013  20130716        1646
Time taken: 66.32 seconds

#对以上14张表的查询
MapReduce Total cumulative CPU time: 8 minutes 40 seconds 380 msec
Ended Job = job_201307191356_0022
MapReduce Jobs Launched:
Job 0: Map: 53  Reduce: 15   Cumulative CPU: 520.38 sec   HDFS Read: 14413501282 HDFS Write: 3146 SUCCESS
Total MapReduce CPU Time Spent: 8 minutes 40 seconds 380 msec
OK
000001  20130716        5200
000002  20130715        5535
000003  20130705        1634
000004  20130704        2173
000005  20130703        996
000005  20130712        1820
000006  20130702        1176
000007  20130701        2973
000007  20130710        4084
000010  20130716        2176
Time taken: 119.161 seconds

我们看到hadoop对以G为单位量级的数据增长是不敏感的,多了3倍的数据(15G),执行查询的时间是原来(5G)的两倍。而MySQL数据增长到5G,查询时间几乎是不可忍受的。

1G以下的数据是单机可以处理的,MySQL会非常好的完成查询任务。Hadoop只有在数据量大的情况下才能发挥出优势,当数据量到达10G时,MySQL的单表查询就显得就会性能不足。如果数据量到达了100G,MySQL就已经解决不了了,要通过各种优化的程序才能完成查询。

测试过程已经描述的很清楚了,我们接下来的工作就是把过程自动化。

转载请注明出处:
 http://blog.fens.me/hadoop-hive-10g/

打赏作者