Hive的SQL操作

大数据

发布日期: 2019-02-27

文章字数: 1.6k

阅读时长: 7 分

阅读次数:

1、分区表

1）创建分区表

hive> create table dept_partitions()
      > partition by()
      > row format
      > delimited fields
      > terminated by '';

例：

hive> create table dept_partitions(deptno int, dept string, loc string)
    > partitioned by(day string)
    > row format
    > delimited fields
    > terminated by '\t';
hive> load data local inpath '/root/dept.txt' into table dept_partitions
    > partition(day='0228');

2）查询

全查询
hive> select * from dept_partitions;
注意：此时查看的是整个分区表中的数据

单分区查询
hive> select * from dept_partitions where day = ‘0228’;
注意：此时查看的是指定分区中的数据

联合查询
hive> select * from dept_partitions where day = ‘0228’ union select * from dept_partitions where day = ‘0302’;

添加单个分区
hive> alter table dept_partitions add partition(day = ‘0303’);

注意：如果想一次添加多个的话空格分割即可
hive> alter table dept_partitions add partition(day = ‘0304’) partition(day = ‘0305’);

查看分区
hive> show partitions dept_partitions;

删除分区
hive> alter table dept_partitions drop partition(day=’0305’);
分区表在hdfs中分目录文件夹

hive> dfs -mkdir -p /user/hive/warehouse/dept_partitions/day=0305;

hive> dfs -put /root/dept.txt /user/hive/warehouse/dept_partitions/day=0305;

hive> show partitions dept_partitions;
此时并没有day=0305，需要进行下面操作

导入数据
相当于修复数据：msck repair table dept_partitions;

2、DML数据操作

1）数据的导入

hive> load data [local] inpath ” into table ;

2）向表中插入数据

hive> insert into table student_partitions partition(age = 20) values(1,’re’);
向表中插入sql查询结果数据
hive> insert overwrite table student_partitions partition(age = 20) select * from hsiehchou where id<3;

create方式：
hive> create table if not exists student_partitions1 as select * from student_partitions where id = 2;

3）创建表直接加载数据

hive> create table student_partitions3(id int,name string)
      > row format
      > delimited fields
      > terminated by '\t'
      > location '';

注意：locatition路径是hdfs路径
关联文件时不能有多级目录！！！
例：

hive> create table student_partitions4(id int,name string)
    > row format
    > delimited fields
    > terminated by '\t'
    > location '/wc';

4）把操作结果导出到本地linux

hive> insert overwrite local directory ‘/root/data’ select * from hsiehchou;

5）把hive中表数据导出到hdfs中

hive> export table hsiehchou to ‘/hsiehchou’;

把hdfs数据导入到hive中
hive> import table hsiehchou3 from ‘/hsiehchou/’;

6）清空表数据

hive> truncate table hsiehchou3;

3、查询操作

基础查询
select * from table;全表查询
hive> select hsiehchou.id,hsiehchou.name from table …;指定列

1）指定列查询

hive> select hsiehchou.name from hsiehchou;

2）指定列查询设置别名

hive> select hsiehchou.name as myname from hsiehchou;

3）创建员工表

hive> create table hive_db.emptable(empno int, ename string , job string,mgr int, birthday string, sal double, comm double, deptno int)
    > row format
    > delimited fields
    > terminated by '\t';
hive> load data local ‘/root/emp.txt’ into table hive_db.emptable;

4）查询员工姓名和工资(每个员工加薪1000块)

hive> select emptable.ename,emptable.sal+1000 salmoney from emptable;

5）查看公司有多少员工

hive> select count(1) empnumber from emptable;

6）查询工资最高的工资

hive> select max(sal) numberone from emptable;

7）查询工资最小的工资

hive> select min(sal) from emptable;

8）求工资的总和

hive> select sum(sal) sal_sum from emptable;

9）求该公司员工工资的平均值

hive> select avg(sal) sal_avg from emptable;

10）查询结果只显示前多少条

hive> select * from emptable limit 4;

11）where语句

作用：过滤
使用：where子句紧接着from

求出工资大于2600的员工
hive> select * from emptable where sal>2600;

求出工资在1000~2500范围的员工
hive> select * from emptable where sal>1000 and sal<2500;

或者
hive> select * from emptable where sal between 1000 and 2500;

查询工资在2000和3000这两个数的员工信息
hive> select ename from emptable where sal in(2000,3000);

12）is null与is not null

空与非空的过滤
空
hive> select * from emptable where comm is null;

非空
hive> select * from emptable where comm is not null;

13）like

模糊查询
使用：
通配符% 后面零个或者多个字符
_代表一个字符

查询工资以1开头的员工信息
hive> select * from emptable where sal like ‘1%’;

查询工资地第二位是1的员工信息
hive> select * from emptable where sal like ‘_1%’;

_代表一个字符
查询工资中有5的员工信息
hive> select * from emptable where sal like ‘%5%’;

14）And/Not/Or

查询部门号30并且工资大于1000的员工信息
hive> select * from emptable where sal>1000 and deptno=30;

查询部门号30或者工资大于1000的员工信息
hive> select * from emptable where sal>1000 or deptno=30;

查询工资在2000和3000这两个数的员工信息
hive> select * from emptable where sal in(2000,3000);

查询工资不在2000和3000这两个数的员工信息
hive> select * from emptable where sal not in(2000,3000);

15）分组操作

Group By语句
通常和一些聚合函数一起使用

求每个部门的平均工资
hive> select avg(sal) avg_sal,deptno from emptable group by deptno;
having
where：后不可以与分组函数，而having可以

求每个部门的平均工资大于2000的部门
hive> select deptno,avg(sal) avg_sal from emptable group by deptno hav
ing avg_sal>2000;

4、Join操作

hive> create table dept(deptno int, dname string, loc int)
      > row format
      > delimited fields
      > terminated by '\t';

员工表中只有部门编号，并没有部门名称
部门表中有部门标号和部门名称

等值join

1）查询员工编号、员工姓名、员工所在的部门名称

hive> select emptable.empno,emptable.ename,dept.dname from emptable join dept on emptable.deptno=dept.deptno;

2）查询员工编号、员工姓名、员工所在部门名称、部门所在地

内连接：只有连接的两张表中都存在与条件向匹配的数据才会被保留下来
hive> select e.empno,e.ename,d.dname,d.loc from emptable e join dept d on e.deptno=d.deptno;

3）左外连接(left join)

查询员工编号，员工姓名，部门名称
hive> select e.empno,e.ename,d.deptname from emptable e left join dept d on e.deptno=d.deptno;
特点：默认用的Left join 可以省略left
保留左表数据，右表没有join上显示为null

4）右外连接(right join)

hive> select e.empno,e.ename,d.dname from emptable e right join dept d on e.deptno=d.deptno;
特点：
保留右表数据，左表没有join上显示为null

5）满外连接(full join)

hive> select e.empno,e.ename,d.dname from emptable e full join dept d on e.deptno=d.deptno;
特点：结果会返回所有表中符合条件的所有记录，如果有字段没有符合条件用null值代替

6）多表连接

hive> create table location(loc int, loc_name string)
      > row format
      > delimited fields
      > terminated by '\t';

加载数据
hive> load data local inpath ‘/root/location.txt’ into table location;

查询员工名、部门名称、地域名称
hive> select e.ename,d.dname,l.loc_name from emptable e join dept d on
e.deptno=d.deptno join location l on d.loc=l.loc;

谢舟

https://blog.hsiehchou.com/2019/02/27/hive-de-sql-cao-zuo/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源谢舟 !

大数据 Hive SQL

Hive函数&压缩

1、排序Order By:全局排序1）按照员工表的奖金金额进行正序排序select * from emptable order by emptable.comm asc;可以省略asc 2）按照员工表的奖金金额进行倒序排序select *

2019-02-28 大数据

大数据 Hive

Hive基础

Hive官网：http://hive.apache.org/Apache Hive?数据仓库软件有助于使用SQL读取，编写和管理驻留在分布式存储中的大型数据集。可以将结构投影到已存储的数据中。提供了命令行工具和JDBC驱动程序以将用户连接到

2019-02-25 大数据

大数据 Hive

1、分区表

1）创建分区表

2）查询

2、DML数据操作

1）数据的导入

2）向表中插入数据

3）创建表直接加载数据

4）把操作结果导出到本地linux

5）把hive中表数据导出到hdfs中

6）清空表数据

3、查询操作

1）指定列查询

2）指定列查询设置别名

3）创建员工表

4）查询员工姓名和工资(每个员工加薪1000块)

5）查看公司有多少员工

6）查询工资最高的工资

7）查询工资最小的工资

8）求工资的总和

9）求该公司员工工资的平均值

10）查询结果只显示前多少条

11）where语句

12）is null与is not null

13）like

14）And/Not/Or

15）分组操作

4、Join操作

1）查询员工编号、员工姓名、员工所在的部门名称

2）查询员工编号、员工姓名、员工所在部门名称、部门所在地

3）左外连接(left join)

4）右外连接(right join)

5）满外连接(full join)

6）多表连接

你的赏识是我前进的动力