CREATE TABLE testA (
id INT,
name string,
area string
) PARTITIONED BY (create_time string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
创建testB:
CREATE TABLE testB (
id INT,
name string,
area string,
code string
) PARTITIONED BY (create_time string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
hive> LOAD DATA LOCAL INPATH '/home/hadoop/sourceA.txt' INTO TABLE testA PARTITION(create_time='2015-07-08');
Copying data from file:/home/hadoop/sourceA.txt
Copying file: file:/home/hadoop/sourceA.txt
Loading data to table default.testa partition (create_time=2015-07-08)
Partition default.testa{create_time=2015-07-08} stats: [numFiles=1, numRows=0, totalSize=58, rawDataSize=0] OK
Time taken: 0.237 seconds
hive> LOAD DATA LOCAL INPATH '/home/hadoop/sourceB.txt' INTO TABLE testB PARTITION(create_time='2015-07-09');
Copying data from file:/home/hadoop/sourceB.txt
Copying file: file:/home/hadoop/sourceB.txt
Loading data to table default.testb partition (create_time=2015-07-09)
Partition default.testb{create_time=2015-07-09} stats: [numFiles=1, numRows=0, totalSize=73, rawDataSize=0] OK
Time taken: 0.212 seconds
hive> select * from testA;
OK
1 fish1 SZ 2015-07-08 2 fish2 SH 2015-07-08 3 fish3 HZ 2015-07-08 4 fish4 QD 2015-07-08 5 fish5 SR 2015-07-08 Time taken: 0.029 seconds, Fetched: 5 row(s)
hive> select * from testB;
OK
1 zy1 SZ 1001 2015-07-09 2 zy2 SH 1002 2015-07-09 3 zy3 HZ 1003 2015-07-09 4 zy4 QD 1004 2015-07-09 5 zy5 SR 1005 2015-07-09 Time taken: 0.047 seconds, Fetched: 5 row(s)
(2)Hive表导入到Hive表
将testB的数据导入到testA表
hive> INSERT INTO TABLE testA PARTITION(create_time='2015-07-11') select id, name, area from testB where id = 1;
...(省略)
OK
Time taken: 14.744 seconds
hive> INSERT INTO TABLE testA PARTITION(create_time) select id, name, area, code from testB where id = 2;
...(省略)
OKTime taken: 19.852 secondshive> select * from testA;OK2 zy2 SH 10021 fish1 SZ 2015-07-082 fish2 SH 2015-07-083 fish3 HZ 2015-07-084 fish4 QD 2015-07-085 fish5 SR 2015-07-081 zy1 SZ 2015-07-11Time taken: 0.032 seconds, Fetched: 7 row(s)
hive> LOAD DATA INPATH '/home/hadoop/sourceA.txt' INTO TABLE testA PARTITION(create_time='2015-07-08');
...(省略)
OK
Time taken: 0.237 seconds
hive> LOAD DATA INPATH '/home/hadoop/sourceB.txt' INTO TABLE testB PARTITION(create_time='2015-07-09');
...(省略)
OK
Time taken: 0.212 seconds
hive> select * from testA;
OK
1 fish1 SZ 2015-07-08 2 fish2 SH 2015-07-08 3 fish3 HZ 2015-07-08 4 fish4 QD 2015-07-08 5 fish5 SR 2015-07-08 Time taken: 0.029 seconds, Fetched: 5 row(s)
hive> select * from testB;
OK
1 zy1 SZ 1001 2015-07-09 2 zy2 SH 1002 2015-07-09 3 zy3 HZ 1003 2015-07-09 4 zy4 QD 1004 2015-07-09 5 zy5 SR 1005 2015-07-09 Time taken: 0.047 seconds, Fetched: 5 row(s)
/home/hadoop/sourceA.txt'导入到testA表
/home/hadoop/sourceB.txt'导入到testB表
(4)创建表的过程中从其他表导入
hive> create table testC as select name, code from testB;
Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1449746265797_0106, Tracking URL = http://hadoopcluster79:8088/proxy/application_1449746265797_0106/
Kill Command = /home/hadoop/apache/hadoop-2.4.1/bin/hadoop job -kill job_1449746265797_0106
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-12-24 16:40:17,981 Stage-1 map = 0%, reduce = 0%
2015-12-24 16:40:23,115 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.11 sec
MapReduce Total cumulative CPU time: 1 seconds 110 msec
Ended Job = job_1449746265797_0106
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop2cluster/tmp/hive-root/hive_2015-12-24_16-40-09_983_6048680148773453194-1/-ext-10001
Moving data to: hdfs://hadoop2cluster/home/hadoop/hivedata/warehouse/testc
Table default.testc stats: [numFiles=1, numRows=0, totalSize=45, rawDataSize=0]
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 1.11 sec HDFS Read: 297 HDFS Write: 45 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 110 msec
OK
Time taken: 14.292 seconds
hive> desc testC;
OK
name string
code string
Time taken: 0.032 seconds, Fetched: 2 row(s)
二、Hive数据导出的几种方式
(1)导出到本地文件系统
hive> INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/output' ROW FORMAT DELIMITED FIELDS TERMINATED by ',' select * from testA;
Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1451024007879_0001, Tracking URL = http://hadoopcluster79:8088/proxy/application_1451024007879_0001/
Kill Command = /home/hadoop/apache/hadoop-2.4.1/bin/hadoop job -kill job_1451024007879_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-12-25 17:04:30,447 Stage-1 map = 0%, reduce = 0%
2015-12-25 17:04:35,616 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.16 sec
MapReduce Total cumulative CPU time: 1 seconds 160 msec
Ended Job = job_1451024007879_0001
Copying data to local directory /home/hadoop/output
Copying data to local directory /home/hadoop/output
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 1.16 sec HDFS Read: 305 HDFS Write: 110 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 160 msec
OK
Time taken: 16.701 seconds
通过INSERT OVERWRITE LOCAL DIRECTORY将hive表testA数据导入到/home/hadoop目录,众所周知,HQL会启动Mapreduce完成,其实/home/hadoop就是Mapreduce输出路径,产生的结果存放在文件名为:000000_0。
(2)导出到HDFS
导入到HDFS和导入本地文件类似,去掉HQL语句的LOCAL就可以了
hive> INSERT OVERWRITE DIRECTORY '/home/hadoop/output' select * from testA;
Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1451024007879_0002, Tracking URL = http://hadoopcluster79:8088/proxy/application_1451024007879_0002/
Kill Command = /home/hadoop/apache/hadoop-2.4.1/bin/hadoop job -kill job_1451024007879_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-12-25 17:08:51,034 Stage-1 map = 0%, reduce = 0%
2015-12-25 17:08:59,313 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
MapReduce Total cumulative CPU time: 1 seconds 400 msec
Ended Job = job_1451024007879_0002
Stage-3 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
Stage-4 is filtered out by condition resolver.
Moving data to: hdfs://hadoop2cluster/home/hadoop/hivedata/hive-hadoop/hive_2015-12-25_17-08-43_733_1768532778392261937-1/-ext-10000
Moving data to: /home/hadoop/output
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 1.4 sec HDFS Read: 305 HDFS Write: 110 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 400 msec
OK
Time taken: 16.667 seconds
[hadoop@hadoopcluster78 bin]$ ./hive -e "select * from testA" >> /home/hadoop/output/testA.txt
15/12/25 17:15:07 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
Logging initialized using configuration in file:/home/hadoop/apache/hive-0.13.1/conf/hive-log4j.properties
OK
Time taken: 1.128 seconds, Fetched: 5 row(s)
[hadoop@hadoopcluster78 bin]$ cat /home/hadoop/output/testA.txt
1 fish1 SZ 2015-07-08 2 fish2 SH 2015-07-08 3 fish3 HZ 2015-07-08 4 fish4 QD 2015-07-08 5 fish5 SR 2015-07-08
参数为: -f 的使用方式,后面接存放sql语句的文件。>>后面为输出文件路径
SQL语句文件:
[hadoop@hadoopcluster78 bin]$ cat /home/hadoop/output/sql.sql
select * from testA
使用-f参数执行:
[hadoop@hadoopcluster78 bin]$ ./hive -f /home/hadoop/output/sql.sql >> /home/hadoop/output/testB.txt
15/12/25 17:20:52 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
Logging initialized using configuration in file:/home/hadoop/apache/hive-0.13.1/conf/hive-log4j.properties
OK
Time taken: 1.1 seconds, Fetched: 5 row(s)
CREATE TABLE IF NOT EXISTS `runoob_tbl`(
`runoob_id` INT UNSIGNED AUTO_INCREMENT,
`runoob_title` VARCHAR(100) NOT NULL,
`runoob_author` VARCHAR(40) NOT NULL,
`submission_date` DATE,
PRI...
添加语句 INSERT插入语句:INSERT INTO 表名 VALUES (‘xx’,‘xx’)不指定插入的列INSERT INTO table_name VALUES (值1, 值2,…)指定插入的列INSERT INTO table_name (列1, 列2,…) VALUES (值1, 值2,…)查询插入语句: INSERT INTO 插入表 SELECT * FROM 查...
1.从本地文件系统中导入数据到hive表
(1)数据准备(/home/sopdm/test.dat):
1,wyp,25,13188888888
2,test,30,13899999999
3,zs,34,89931412
(2)首先创建表
use sopdm;
drop table if exists sopdm.wyp;
create table if not exists sopdm.wyp(id int,name string,age int,tel string)
row format delimited
fields terminated by ','
stored as textfile;
(3)从本地文件系统中导入数据到Hive表
load data local inpath ‘/home/sopdm/test.dat’ into table sopdm.wyp;
(4)可以到wyp表的数据目录下查看,如下命令
dfs -ls /user/sopdm/hive/warehouse/sopdm.db/wyp;
2.从HDFS上导入数据到Hive表
(1)现在hdfs中创建一个input目录存放HDFS文件
hadoop fs -mkdir input; 或 hadoop fs -mkdir /user/sopdm/input;
(2)把本地文件上传到HDFS中,并重命名为test_hdfs.dat
hadoop fs -put /home/sopdm/test.dat /user/sopdm/input/test_hdfs.dat;
(3)查看文件
dfs -cat /user/sopdm/input/test_hdfs.dat;
(4)将内容导入hive表中
--拷贝“本地数据”到“hive”使用:load data local…
--转移“HDFS”到“hive”(必须同一个集群)使用:load data…
load data inpath ‘/user/sopdm/input/test_hdfs.dat’ into table sopdm.wyp;
3.从别的Hive表中导入数据到Hive表中
create table if not exists sopdm.wyp2(id int,name string,tel string)
row format delimited
fields terminated by ','
stored as textfile;
--overwrite是覆盖,into是追加
insert into table sopdm.wyp2
select id,name,tel from sopdm.wyp;
--多表插入
--高效方式-查询语句插入多个分区
from sopdm.wyp w
insert overwrite table sopdm.wyp2
select w.id,w.name,w.tel where w.age=25
insert overwrite table sopdm.wyp2
select w.id,w.name,w.tel where w.age=27;
4.创建Hive表的同时导入查询数据
create table sopdm.wyp3
as select id,name,tel,age from sopdm.wyp where age=25;
5.使用sqoop从关系数据库导入数据到Hive表
一,Hive数据导入的几种方式
首先列出讲述下面几种导入方式的数据和hive表。
导入:
本地文件导入到Hive表;
Hive表导入到Hive表;
HDFS文件导入到Hive表;
创建表的过程中从其他表导入;
通过sqoop将mysql库导入到Hive表;示例见《通过sqoop进行mysql与hive的导入导出》和《定时从大数据平台同步HIVE数据到oracle》
导出:
Hive表导出到本地文件系统;
Hive表导出到HDFS;
通过sqoop将Hive表导出到mysql库;
Hive表:
创建testA:
创建testB:
数据文件(sourceA.txt):
数据文件(sourceB.txt):
(1)本地文件导入到Hive表
(2)Hive表导入到Hive表
将testB的数据导入到testA表
1.从本地文件系统中导入数据到hive表
(1)数据准备(/home/sopdm/test.dat):
1,wyp,25,13188888888
2,test,30,13899999999
3,zs,34,89931412
(2)首先创建表
use sopdm;
drop table if exists sopdm.wyp;
create table if not exists sopdm.wyp(id int,name string,age int,tel string)
row format delimited
fields terminated by ','
stored as textfile;
(3)从本地文件系统中导入数据到Hive表
load data local inpath ‘/home/sopdm/test.dat’ into table sopdm.wyp;
(4)可以到wyp表的数据目录下查看,如下命令
dfs -ls /user/sopdm/hive/warehouse/sopdm.db/wyp;
sqoop导入hive后怎么查看导入的数据导入数据的进程的并发数,默认是4。如果导入的数据不大的话,不妨设置成1,这样导入更快。一般来说Sqoop会使用主键来平均地分割数据。并发导入的时候可以设置相关的分割列等等,具体的做法参考官方的文档。
相关问题推荐
大数据(big data)一词越来越多地被提及,人们用它来描述和定义信息爆炸时代产生的海量数据,而这个海量数据的时代则被称为大数据时代。随着云时代的来临,大数据(Big data)也吸引了越来越多的关注。大数据(Big data)通常用来形容一个公司创造的大量非结...
Java和大数据的关系:Java是计算机的一门编程语言;可以用来做很多工作,大数据开发属于其中一种;大数据属于互联网方向,就像现在建立在大数据基础上的AI方向一样,他两不是一个同类,但是属于包含和被包含的关系;Java可以用来做大数据工作,大数据开发或者...
学完大数据可以从事很多工作,比如说:hadoop 研发工程师、大数据研发工程师、大数据分析工程师、数据库工程师、hadoop运维工程师、大数据运维工程师、java大数据工程师、spark工程师等等都是我们可以从事的工作岗位!不同的岗位,所具备的技术知识也是不一样...
简言之,大数据是指大数据集,这些数据集经过计算分析可以用于揭示某个方面相关的模式和趋势。大数据技术的战略意义不在于掌握庞大的数据信息,而在于对这些含有意义的数据进行专业化处理。大数据的特点:数据量大、数据种类多、 要求实时性强、数据所蕴藏的...
tail -f的时候,发现一个奇怪的现象,首先 我在一个窗口中 tail -f test.txt 然后在另一个窗口中用vim编辑这个文件,增加了几行字符,并保存,这个时候发现第一个窗口中并没有变化,没有将最新的内容显示出来。tail -F,重复上面的实验过程, 发现这次有变化了...
您好针对您的问题,做出以下回答,希望有所帮助!1、大数据行业还是有非常大的人才需求的,对于就业也有不同的岗位可选,比如大数据工程师,大数据运维,大数据架构师,大数据分析师等等,就业难就难在能否找到适合的工作,能否与你的能力和就业预期匹配。2、...
最小的基本单位是Byte应该没多少人不知道吧,下面先按顺序给出所有单位:Byte、KB、MB、GB、TB、PB、EB、ZB、YB、DB、NB,按照进率1024(2的十次方)计算:1Byte = 8 Bit1 KB = 1,024 Bytes 1 MB = 1,024 KB = 1,048,576 Bytes 1 GB = 1,024 MB = 1,048,576...
大数据的定义。大数据,又称巨量资料,指的是所涉及的数据资料量规模巨大到无法通过人脑甚至主流软件工具,在合理时间内达到撷取、管理、处理、并整理成为帮助企业经营决策更积极目的的资讯。大数据是对大量、动态、能持续的数据,通过运用新系统、新工具、新...
MySQL是一种关系型数据库管理系统,关系数据库将数据保存在不同的表中,而不是将所有数据放在一个大仓库内,这样就增加了速度并提高了灵活性。MySQL的版本:针对不同的用户,MySQL分为两种不同的版本:MySQL Community Server社区版本,免费,但是Mysql不提供...
mysql安装需要先使用yum安装mysql数据库的软件包 ;然后启动数据库服务并运行mysql_secure_installation去除安全隐患,最后登录数据库,便可完成安装
1.查看所有数据库showdatabases;2.查看当前使用的数据库selectdatabase();3.查看数据库使用端口showvariableslike'port';4.查看数据库编码showvariableslike‘%char%’;character_set_client 为客户端编码方式; character_set_connection 为建立连接...
CREATE TABLE IF NOT EXISTS `runoob_tbl`( `runoob_id` INT UNSIGNED AUTO_INCREMENT, `runoob_title` VARCHAR(100) NOT NULL, `runoob_author` VARCHAR(40) NOT NULL, `submission_date` DATE, PRI...
学习多久,我觉得看你基础情况。1、如果原来什么语言也没有学过,也没有基础,那我觉得最基础的要先选择一种语言来学习,是VB,C..,pascal,看个人的喜好,一般情况下,选择C语言来学习。2、如果是有过语言的学习,我看应该一个星期差不多,因为语言的理念互通...
添加语句 INSERT插入语句:INSERT INTO 表名 VALUES (‘xx’,‘xx’)不指定插入的列INSERT INTO table_name VALUES (值1, 值2,…)指定插入的列INSERT INTO table_name (列1, 列2,…) VALUES (值1, 值2,…)查询插入语句: INSERT INTO 插入表 SELECT * FROM 查...
看你什么岗位吧。如果是后端,只会CRUD。应该是可以找到实习的,不过公司应该不会太好。如果是数据库开发岗位,那这应该是不会找到的。
查找数据列 SELECT column1, column2, … FROM table_name; SELECT column_name(s) FROM table_name