###系统环境 linux版本:redhat6
jdk:jdk1.7

###1.本地安装与测试

####1.1安装 1.1.1下载Drill M1 binary release http://people.apache.org/~jacques/apache-drill-1.0.0-m1.rc3/apache-drill-1.0.0-m1-binary-release.tar.gz
1.1.2 解压apache-drill-1.0.0-m1-binary-release.tar.gz
tar -zxf apache-drill-1.0.0-m1-binary-release.tar.gz
1.1.3 做软链
ln -s apache-drill-1.0.0-m1 drill
1.1.4 配置环境变量
export DRILL_HOME=/home/{username}/drill
export PATH=$PATH:$DRILL_HOME/bin

####1.2测试 1.2.1连接
[sudo] sqlline -u jdbc:drill:schema=parquet-local -n admin -p admin
解析:schema原生定义了5种类型:
parquet-local(本地parquet),parquet-cp(classpath-parquet), jsonl(本地json),parquet(classpath-parquet),parquet
具体的定义,参照conf/storage-engines.json
1.2.2退出
jdbc:drill:schema=parquet-local> !q
1.2.3运行一个QUERY
select * from “sample-data/region.parquet”;
语句指南 https://developers.google.com/bigquery/query-reference https://cwiki.apache.org/confluence/display/DRILL/Running+Queries

###2.分布式安装与测试 ####2.1安装

2.1.1.安装Hadoop
当前drill的原生支持的版本为hadoop1.2 http://litongbupt.iteye.com/blog/1473179 http://litongbupt.iteye.com/blog/1473179
启动hadoop

2.1.2 安装Zookeepe
官网推荐安装Zookeeper3.4.3,经笔者测试,3.4.5也是可以使用的。
部署并启动zookeeper
http://litongbupt.iteye.com/admin/blogs/1987737

2.1.3 部署drill的分布式模式

  • 修改conf/drill-override.conf文件 zk:connect:“{zookeeper地址}:2181”
  • 修改conf/storage-engines文件

    “parquet” : { “type”:”parquet”, “dfsName” : “hdfs://{hadoop的namenode地址}:9000” }, “json” : { “type”:”json”, “dfsName” : “hdfs://{hadoop的namenode地址}:9000” }

  • 将drill目录拷贝到其他节点
  • 将.bashrc拷贝到其他节点
  • 在每一个节点启动drill: sudo drillbit.sh start

####2.2测试

2.2.1测试drill集群是否启动成功

zkCli.sh -server {zookeeper地址}:2181
get /drill/drillbits1
cZxid = 0x100000003
ctime = Tue Dec 10 10:18:42 CST 2013
mZxid = 0x100000003
mtime = Tue Dec 10 10:18:42 CST 2013
pZxid = 0x10000001c
cversion = 12
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 4

这次测试用了numChildren = 4个节点

2.2.2测试QUERY
把数据放到HDFS上 hadoop fs -put sample-data /
链接集群 sqlline -u jdbc:drill:schema=parquet

  • SELECT _MAP[‘R_REGIONKEY’] as region_key, _MAP[‘R_NAME’] AS name, _MAP[‘R_COMMENT’] AS comment FROM “/sample-data/region.parquet”;
  • SELECT count(distinct _MAP[‘N_REGIONKEY’]) FROM “/sample-data/nation.parquet”;
  • SELECT _MAP[‘N_REGIONKEY’] as regionKey, _MAP[‘N_NAME’] as name FROM “/sample-data/nation.parquet” WHERE cast(_MAP[‘N_NAME’] as varchar) < ‘M’;

2.3 关闭集群
2.3.1关闭drill集群 在每个节点上执行 sudo drillbit.sh stop
2.3.2关闭zookeeper 在每个节点上执行 sudo zkServer.sh stop
2.3.3在namenode上执行 sudo stop-all.sh


    本博文从老博客http://litongbupt.iteye.com转移过来。

转载请注明出处 http://www.xiangguo.li/big-data-system/2014/07/28/drill2

Categories: Tags:

相国 walter

一个热爱coding的青年

blog comments powered by Disqus