博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
shell脚本中向hive动态分区插入数据
阅读量:5953 次
发布时间:2019-06-19

本文共 4300 字,大约阅读时间需要 14 分钟。

在hive上建表与普通分区表创建方法一样;

1 CREATE  TABLE `dwa_m_user_association_circle`( 2   `device_number` string,  3   `oppo_number` string,  4   `prov_id_oppo` string,  5   `area_id_oppo` string,  6   `dealer_oppo` string,  7   `short_call_nums` bigint,  8   `long3_call_nums` bigint,  9   `long5_call_nums` bigint, 10   `long10_call_nums` bigint, 11   `short_total_nums` bigint, 12   `long3_total_nums` bigint, 13   `long5_total_nums` bigint, 14   `long10_total_nums` bigint, 15   `area_id` string)16 PARTITIONED BY ( 17   `month_id` string, 18   `prov_id` string, 19   `dealer` string)20 ROW FORMAT DELIMITED 21   FIELDS TERMINATED BY '|' 22   NULL DEFINED AS '' 23 STORED AS INPUTFORMAT 24   'org.apache.hadoop.mapred.TextInputFormat' 25 OUTPUTFORMAT 26   'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'27 LOCATION28   'hdfs://beh/user/hive/warehouse/all_ana_pro.db/dwa_m_user_association_circle'29 TBLPROPERTIES (30   'numPartitions'='248', 31   'numFiles'='648', 32   'transient_lastDdlTime'='1542952067', 33   'totalSize'='247584222644', 34   'numRows'='0', 35   'rawDataSize'='0')

 

在shell脚本中,需设置的参数: 

set hive.exec.dynamic.partition=true;                     #开启动态分区,默认是false

set hive.exec.dynamic.partition.mode=nostrict;      #开启允许所有分区都是动态的,否则必须要有静态分区才能使用。

set hive.exec.max.created.files=1000000;             #允许创建的最大文件数,当分区是2个或三个分区时,文件会被分成很多小文件,该设置就是将文件的最大数目设成100w;

1 insert overwrite table ALL_ANA_PRO.dwa_m_user_association_circle  PARTITION (month_id=${v_month},prov_id,dealer)  2 select device_number, 3         oppo_number  , 4         prov_id_oppo , 5         area_id_oppo , 6         dealer_oppo , 7         short_call_nums, 8         long3_call_nums, 9         long5_call_nums,10         long10_call_nums,11         sum(short_call_nums) over(distribute by device_number)  short_total_nums,12         sum(long3_call_nums ) over(distribute by device_number) long3_total_nums,13         sum(long5_call_nums) over(distribute by  device_number) long5_total_nums,14         sum(long10_call_nums) over(distribute by device_number) long10_total_nums,15         area_id      ,16         prov_id      ,17         dealer   18 from 19         (SELECT device_number,20                 prov_id      ,21                 area_id      ,22                 dealer       ,23                 oppo_number  ,24                 prov_id_oppo ,25                 area_id_oppo ,26                 dealer_oppo  ,27                 sum(case when t.bill_times < 60 then 1 else 0 end ) short_call_nums,28                 sum(case when t.bill_times >180 then 1 else 0 end ) long3_call_nums,29                 sum(case when t.bill_times >300 then 1 else 0 end ) long5_call_nums,30                 sum(case when t.bill_times >600 then 1 else 0 end ) long10_call_nums31         FROM ( SELECT prov_id      ,32                       area_id      ,33                       device_number,34                       0 dealer     ,35                       prov_id_oppo ,36                       area_id_oppo ,37                       oppo_number  ,38                       dealer_oppo  ,39                       bill_times40         FROM   ALL_ANA_PRO.DWA_M_CALL_RING_BASE t41         WHERE  month_id = '${v_month}'42         AND    ticket_type = 143         AND    dealer_oppo > -144         UNION ALL45         SELECT prov_id_oppo prov_id      ,46               area_id_oppo area_id      ,47               oppo_number device_number ,48               dealer_oppo dealer        ,49               prov_id prov_id_oppo      ,50               area_id area_id_oppo      ,51               device_number oppo_number ,52               0 dealer_oppo             ,53               bill_times54         FROM   ALL_ANA_PRO.DWA_M_CALL_RING_BASE t55         WHERE  month_id = '${v_month}'56         AND    ticket_type = 157         AND    dealer_oppo > -158            ) t59         GROUP BY60               device_number,61               prov_id      ,62               area_id      ,63               dealer       ,64               oppo_number  ,65               prov_id_oppo ,66               area_id_oppo ,67               dealer_oppo68         ) t;

 这里,需要说明的是,向目标表插数:

insert overwrite table ALL_ANA_PRO.dwa_m_user_association_circle  PARTITION (month_id=${v_month},prov_id,dealer)

其中select语句中动态分区prov_id,dealer两个字段一般在最后面;

 

转载于:https://www.cnblogs.com/xiao02fang/p/10007008.html

你可能感兴趣的文章
Windows下同时安装两个版本Jdk
查看>>
uoj#228. 基础数据结构练习题(线段树)
查看>>
JS键盘事件监听
查看>>
ios开发周期之--(向上,向下,四舍五入)取整
查看>>
加油!
查看>>
拦截导弹问题(动态规划)
查看>>
iOS 单元测试(Unit Test 和 UI Test)
查看>>
[linux小技巧]
查看>>
文件下载_中文乱码:"Content-disposition","attachment; filename=中文名
查看>>
HBase 笔记3
查看>>
2017.11.23 display fun --STM8
查看>>
深入学习jQuery选择器系列第八篇——过滤选择器之伪子元素选择器
查看>>
一个关于log4j的悲伤的故事
查看>>
PCA
查看>>
ajax上传文件
查看>>
java中通过绝对路径将图片存入数据库
查看>>
简要记录浮点型数据的二进制存储格式
查看>>
ConcurrentHashMap(Java8)源码分析
查看>>
Python文件处理之文件指针(四)
查看>>
Numpy用法详解
查看>>