数据仓库理论一和二,主要讲流量域;

数据仓库理论三和四,主要讲业务域,即业务库里的数据。

一、sqoop导入数据处理

字典表,小杂表:全量导入

实体表(量级很大),事实表(每天都变化的业务表):增量导入

增量导入后的数据,存储在数仓的 ODS 层中,对于统计分析,不便利;需要滚动合并生成全量快照。

1、将数据全量导入

建表并全量导入

2、将每天的增量数据使用sqoop导入,放在ODS层

导入增量脚本:

########################################################
# #
# @author hunter@doitedu #
# @date ${DT_INCR} #
# @desc oms_order增量抽取任务 启动脚本 #
# #
########################################################

export SQOOP_HOME=/opt/apps/sqoop-1.4.7/
export HIVE_HOME=/opt/apps/hive-3.1.2/

# 加载上次import的最大id
if [ -f preid ]
then
START_ID=`cat preid`
else
START_ID=0
fi

res=`mysql -h 192.168.77.2 -uroot -p123456 <<EOF
use realtimedw;
select max(id) from oms_order_item;
EOF`

CUR_MAX_ID=`echo ${res} | awk '{print $2}'`
echo ${CUR_MAX_ID} > preid

DT_INCR=`date -d'-1 day' +%Y-%m-%d`

if [ $1 && $2 ]
then
ID=$1
DT_EXTRACT=$2
fi

echo "上次导入的最大ID为:${START_ID}"
echo "本次导入所能达到的最大ID为: ${CUR_MAX_ID}"

${SQOOP_HOME}/bin/sqoop import \
--connect jdbc:mysql://192.168.77.2:3306/realtimedw \
--username root \
--password 123456 \
--table oms_order_item \
--target-dir "/incr_test/oms_order_item/${DT_EXTRACT}" \
--incremental append \
--check-column id \
--null-string '\\N' \
--null-non-string '\\N' \
--last-value "${START_ID}" \
--fields-terminated-by ',' \
--split-by id \
-m 2 \

if [ $? -eq 0 ]
then
echo "congratulations! sqoop数据导入成功! 邮件已发送至admin@51doit.com"
echo "准备加载到hive表分区:dt=${DT_INCR}"
${HIVE_HOME}/bin/hive -e "alter table ods17.oms_order_incr add partition(dt='${DT_INCR}')"

if [ $? -eq 0 ]
then
echo "congratulations! hive表分区加载成功! 邮件已发送至admin@51doit.com"
else
echo "节哀顺变! 表分区加载失败! 邮件已发送至admin@51doit.com"
fi

else
echo "节哀顺变! sqoop数据迁移任务失败! 邮件已发送至admin@51doit.com"
fi

注意:需要更新hive元数据分区日期,然后将hdfs数据load到hive或者直接写到hive表,然后使用sqoop的partition by属性

3、对每天的增量数据+前一天的全量数据,进行数据合并,放在DWD层一个新表里; 也可以重新写入旧表里,对之前的数据覆盖更新

建立一个新表,执行合并数据脚本:

########################################################
# #
# @author hunter@doitedu #
# @date ${DT_INCR} #
# @desc oms_order 数据合并任务 启动脚本 #
# #
########################################################

export SQOOP_HOME=/opt/apps/sqoop-1.4.7/
export HIVE_HOME=/opt/apps/hive-3.1.2/

DT_WHOLE=`date -d'-2 day' +%Y-%m-%d`
DT_INCR=`date -d'-1 day' +%Y-%m-%d`

if [[ $1 && $2 ]]
then
DT_WHOLE=$1
DT_INCR=$2
fi

${HIVE_HOME}/bin/hive -e "

with tmp as (
select
id ,
member_id ,
coupon_id ,
order_sn ,
create_time ,
member_username ,
total_amount ,
pay_amount ,
freight_amount ,
promotion_amount ,
integration_amount ,
coupon_amount ,
discount_amount ,
pay_type ,
source_type ,
status ,
order_type ,
delivery_company ,
delivery_sn ,
auto_confirm_day ,
integration ,
growth ,
promotion_info ,
bill_type ,
bill_header ,
bill_content ,
bill_receiver_phone ,
bill_receiver_email ,
receiver_name ,
receiver_phone ,
receiver_post_code ,
receiver_province ,
receiver_city ,
receiver_region ,
receiver_detail_address ,
note ,
confirm_status ,
delete_status ,
use_integration ,
payment_time ,
delivery_time ,
receive_time ,
comment_time ,
modify_time
from ods17.oms_order_incr where dt='${DT_INCR}'

union all

select
id ,
member_id ,
coupon_id ,
order_sn ,
create_time ,
member_username ,
total_amount ,
pay_amount ,
freight_amount ,
promotion_amount ,
integration_amount ,
coupon_amount ,
discount_amount ,
pay_type ,
source_type ,
status ,
order_type ,
delivery_company ,
delivery_sn ,
auto_confirm_day ,
integration ,
growth ,
promotion_info ,
bill_type ,
bill_header ,
bill_content ,
bill_receiver_phone ,
bill_receiver_email ,
receiver_name ,
receiver_phone ,
receiver_post_code ,
receiver_province ,
receiver_city ,
receiver_region ,
receiver_detail_address ,
note ,
confirm_status ,
delete_status ,
use_integration ,
payment_time ,
delivery_time ,
receive_time ,
comment_time ,
modify_time

from dwd17.oms_order where dt='${DT_WHOLE}'
)

insert into table dwd17.oms_order partition(dt='${DT_INCR}')
select
id ,
member_id ,
coupon_id ,
order_sn ,
create_time ,
member_username ,
total_amount ,
pay_amount ,
freight_amount ,
promotion_amount ,
integration_amount ,
coupon_amount ,
discount_amount ,
pay_type ,
source_type ,
status ,
order_type ,
delivery_company ,
delivery_sn ,
auto_confirm_day ,
integration ,
growth ,
promotion_info ,
bill_type ,
bill_header ,
bill_content ,
bill_receiver_phone ,
bill_receiver_email ,
receiver_name ,
receiver_phone ,
receiver_post_code ,
receiver_province ,
receiver_city ,
receiver_region ,
receiver_detail_address ,
note ,
confirm_status ,
delete_status ,
use_integration ,
payment_time ,
delivery_time ,
receive_time ,
comment_time ,
modify_time
from
(
select
id ,
member_id ,
coupon_id ,
order_sn ,
create_time ,
member_username ,
total_amount ,
pay_amount ,
freight_amount ,
promotion_amount ,
integration_amount ,
coupon_amount ,
discount_amount ,
pay_type ,
source_type ,
status ,
order_type ,
delivery_company ,
delivery_sn ,
auto_confirm_day ,
integration ,
growth ,
promotion_info ,
bill_type ,
bill_header ,
bill_content ,
bill_receiver_phone ,
bill_receiver_email ,
receiver_name ,
receiver_phone ,
receiver_post_code ,
receiver_province ,
receiver_city ,
receiver_region ,
receiver_detail_address ,
note ,
confirm_status ,
delete_status ,
use_integration ,
payment_time ,
delivery_time ,
receive_time ,
comment_time ,
modify_time ,
row_number() over(partition by id order by modify_time desc) as rn
from tmp
) o
where rn=1
"

if [ $? -eq 0 ]
then
echo "congratulations! hive表分区加载成功! 邮件已发送至admin@51doit.com"
else
echo "节哀顺变! 表分区加载失败! 邮件已发送至admin@51doit.com"
fi

二、数据域大宽表加工

事实表关联各种纬度表(字典表、实体表、维表),行成宽表,方便后续各种多维度分析。

成交金额分析:

大数据之数据仓库建设(三)_hive

1、建立宽表并写入数据

#!/bin/bash

# -- 成交金额分析
# -- DWS宽表(oms_order + oms_order_item)
# CREATE TABLE dws17.oms_order_detail (
# id bigint COMMENT '订单id',
# member_id bigint ,
# coupon_id bigint ,
# order_sn string COMMENT '订单编号',
# create_time BIGINT COMMENT '提交时间',
# member_username string COMMENT '用户帐号',
# total_amount decimal(10,2) COMMENT '订单总金额',
# pay_amount decimal(10,2) COMMENT '应付金额(实际支付金额)',
# freight_amount decimal(10,2) COMMENT '运费金额',
# promotion_amount decimal(10,2) COMMENT '促销优化金额(促销价、满减、阶梯价)',
# integration_amount decimal(10,2) COMMENT '积分抵扣金额',
# coupon_amount decimal(10,2) COMMENT '优惠券抵扣金额',
# discount_amount decimal(10,2) COMMENT '管理员后台调整订单使用的折扣金额',
# pay_type int COMMENT '支付方式:0->未支付;1->支付宝;2->微信',
# source_type int COMMENT '订单来源:0->PC订单;1->app订单',
# status int COMMENT '订单状态:0->待付款;1->待发货;2->已发货;3->已完成;4->已关闭;5->无效订单',
# order_type int COMMENT '订单类型:0->正常订单;1->秒杀订单',
# delivery_company string COMMENT '物流公司',
# delivery_sn string COMMENT '物流单号',
# auto_confirm_day int COMMENT '自动确认时间(天)',
# integration int COMMENT '可以获得的积分',
# growth int COMMENT '可以活动的成长值',
# promotion_info string COMMENT '活动信息',
# bill_type int COMMENT '发票类型:0->不开发票;1->电子发票;2->纸质发票',
# bill_header string COMMENT '发票抬头',
# bill_content string COMMENT '发票内容',
# bill_receiver_phone string COMMENT '收票人电话',
# bill_receiver_email string COMMENT '收票人邮箱',
# receiver_name string COMMENT '收货人姓名',
# receiver_phone string COMMENT '收货人电话',
# receiver_post_code string COMMENT '收货人邮编',
# receiver_province string COMMENT '省份/直辖市',
# receiver_city string COMMENT '城市',
# receiver_region string COMMENT '区',
# receiver_detail_address string COMMENT '详细地址',
# note string COMMENT '订单备注',
# confirm_status int COMMENT '确认收货状态:0->未确认;1->已确认',
# delete_status int COMMENT '删除状态:0->未删除;1->已删除',
# use_integration int COMMENT '下单时使用的积分',
# payment_time BIGINT COMMENT '支付时间',
# delivery_time BIGINT COMMENT '发货时间',
# receive_time BIGINT COMMENT '确认收货时间',
# comment_time BIGINT COMMENT '评价时间',
# modify_time BIGINT COMMENT '修改时间',
# item_product_id bigint ,
# item_product_pic string ,
# item_product_name string ,
# item_product_brand string ,
# item_product_sn string ,
# item_product_price decimal(10,2) COMMENT '销售价格',
# item_product_quantity int COMMENT '购买数量',
# item_product_sku_id bigint COMMENT '商品sku编号',
# item_product_sku_code string COMMENT '商品sku条码',
# item_product_category_id bigint COMMENT '商品分类id',
# item_sp1 string COMMENT '商品的销售属性',
# item_sp2 string ,
# item_sp3 string ,
# item_promotion_name string COMMENT '商品促销名称',
# item_promotion_amount decimal(10,2) COMMENT '商品促销分解金额',
# item_coupon_amount decimal(10,2) COMMENT '优惠券优惠分解金额',
# item_integration_amount decimal(10,2) COMMENT '积分优惠分解金额',
# item_real_amount decimal(10,2) COMMENT '该商品经过优惠后的分解金额',
# item_gift_integration int ,
# item_gift_growth int ,
# item_product_attr string COMMENT '商品销售属性:[{"key":"颜色","value":"颜色"},{"key":"容量","value":"4G"}]'
# )
# COMMENT '订单表'
# PARTITIONED BY (dt string)
# stored as PARQUET
# ;

########################################################
# #
# @author hunter@doitedu #
# @date ${DT_INCR} #
# @desc oms_order增量抽取任务 启动脚本 #
# #
########################################################

export HIVE_HOME=/opt/apps/hive-3.1.2/

DT=`date -d'-1 day' +%Y-%m-%d`

if [ $1 ]
then
DT=$1
fi

-- 计算插入
${HIVE_HOME}/bin/hive -e "
INSERT INTO TABLE dws17.oms_order_detail partition(dt='${DT}')
SELECT
o.id ,
o.member_id ,
o.coupon_id ,
o.order_sn ,
o.create_time ,
o.member_username ,
o.total_amount ,
o.pay_amount ,
o.freight_amount ,
o.promotion_amount ,
o.integration_amount ,
o.coupon_amount ,
o.discount_amount ,
o.pay_type ,
o.source_type ,
o.status ,
o.order_type ,
o.delivery_company ,
o.delivery_sn ,
o.auto_confirm_day ,
o.integration ,
o.growth ,
o.promotion_info ,
o.bill_type ,
o.bill_header ,
o.bill_content ,
o.bill_receiver_phone ,
o.bill_receiver_email ,
o.receiver_name ,
o.receiver_phone ,
o.receiver_post_code ,
o.receiver_province ,
o.receiver_city ,
o.receiver_region ,
o.receiver_detail_address ,
o.note ,
o.confirm_status ,
o.delete_status ,
o.use_integration ,
o.payment_time ,
o.delivery_time ,
o.receive_time ,
o.comment_time ,
o.modify_time ,
i.product_id ,
i.product_pic ,
i.product_name ,
i.product_brand ,
i.product_sn ,
i.product_price ,
i.product_quantity ,
i.product_sku_id ,
i.product_sku_code ,
i.product_category_id ,
i.sp1 ,
i.sp2 ,
i.sp3 ,
i.promotion_name ,
i.promotion_amount ,
i.coupon_amount ,
i.integration_amount ,
i.real_amount ,
i.gift_integration ,
i.gift_growth ,
i.product_attr
FROM
dwd17.oms_order o
join
dwd17.oms_order_item i
on o.dt='${DT}' and o.id=i.order_id
"

if [ $? -eq 0 ]
then
echo "congratulations! hive表分区加载成功! 邮件已发送至admin@51doit.com"
else
echo "节哀顺变! 表分区加载失败! 邮件已发送至admin@51doit.com"
fi

2、建立分析纬度表,并写入数据

#!/bin/bash

# -- 成交金额分析统计报表
# -- 源表1: DWS宽表(dws17.oms_order_detail)
# -- 源表2: 会员信息表
# CREATE TABLE dwd17.ums_member_detail (
# id bigint ,
# member_level_id bigint ,
# username string COMMENT '用户名',
# password string COMMENT '密码',
# nickname string COMMENT '昵称',
# phone string COMMENT '手机号码',
# status int COMMENT '帐号启用状态:0->禁用;1->启用',
# create_time bigint COMMENT '注册时间',
# icon string COMMENT '头像',
# gender int COMMENT '性别:0->未知;1->男;2->女',
# birthday date COMMENT '生日',
# city string COMMENT '所做城市',
# job string COMMENT '职业',
# personalized_signature string COMMENT '个性签名',
# source_type int COMMENT '用户来源',
# integration int COMMENT '积分',
# growth int COMMENT '成长值',
# luckey_count int COMMENT '剩余抽奖次数',
# history_integration int COMMENT '历史积分数量'
# )
# PARTITIONED BY (dt string)
# STORED as parquet
# ;

# -- 目标表:ads17.oms_order_gmv_cube
# DROP TABLE ads17.oms_order_gmv_cube;
# CREATE TABLE ads17.oms_order_gmv_cube (
# order_total_amount decimal(10,2),
# order_pay_amount decimal(10,2),
# coupon_amount decimal(10,2) COMMENT '优惠券抵扣金额',
# promotion_amount decimal(10,2) COMMENT '促销优化金额(促销价、满减、阶梯价)',
# integration_amount decimal(10,2) COMMENT '积分抵扣金额',
# dim_day string,
# dim_category_id string,
# dim_brand_id string,
# dim_member_level_id string,
# dim_order_type string,
# dim_source_type string,
# dim_promotion_name string
# )
# COMMENT '订单表'
# PARTITIONED BY (dt string)
# stored as PARQUET
# ;

########################################################
# #
# @author hunter@doitedu #
# @date ${DT_INCR} #
# @desc gmv统计分析报表计算任务启动脚本 #
# #
########################################################

export HIVE_HOME=/opt/apps/hive-3.1.2/

DT=`date -d'-1 day' +%Y-%m-%d`

if [ $1 ]
then
DT=$1
fi

# -- 计算插入
${HIVE_HOME}/bin/hive -e "
INSERT INTO TABLE ads17.oms_order_gmv_cube partition(dt='${DT}')
SELECT
sum(item_product_price*item_product_quantity) as order_total_amount ,
sum(item_real_amount) as order_pay_amount ,
sum(item_coupon_amount) as coupon_amount ,
sum(item_promotion_amount) as promotion_amount ,
sum(item_integration_amount) as integration_amount ,
'2020-10-07' as dim_day ,
o.item_product_category_id as dim_category_id ,
o.item_product_brand as dim_brand_id ,
u.member_level_id as dim_member_level_id ,
o.order_type as dim_order_type ,
o.source_type as dim_source_type ,
o.item_promotion_name as dim_promotion_name
FROM dws17.oms_order_detail o
JOIN dwd17.ums_member_detail u
on o.dt='${DT}' and from_unixtime(o.create_time,'yyyy-MM-dd')='${DT}' and o.member_id=u.id
GROUP BY
o.item_product_category_id,
o.item_product_brand,
u.member_level_id,
o.order_type,
o.source_type,
o.item_promotion_name
with cube
"

if [ $? -eq 0 ]
then
echo "congratulations! hive表分区加载成功! 邮件已发送至admin@51doit.com"
else
echo "节哀顺变! 表分区加载失败! 邮件已发送至admin@51doit.com"
fi

下单退单分析:

大数据之数据仓库建设(三)_数据_02

1、建立退单宽表并写入数据

#!/bin/bash

# -- 目标表建表:
# CREATE TABLE dws17.oms_order_and_return(
# od_id bigint COMMENT '订单id',
# od_member_id bigint ,
# od_coupon_id bigint ,
# od_order_sn string COMMENT '订单编号',
# od_create_time BIGINT COMMENT '提交时间',
# od_member_username string COMMENT '用户帐号',
# od_total_amount string COMMENT '订单总金额',
# od_pay_amount string COMMENT '应付金额(实际支付金额)',
# od_freight_amount string COMMENT '运费金额',
# od_promotion_amount string COMMENT '促销优化金额(促销价、满减、阶梯价)',
# od_integration_amount string COMMENT '积分抵扣金额',
# od_coupon_amount string COMMENT '优惠券抵扣金额',
# od_discount_amount string COMMENT '管理员后台调整订单使用的折扣金额',
# od_pay_type int COMMENT '支付方式:0->未支付;1->支付宝;2->微信',
# od_source_type int COMMENT '订单来源:0->PC订单;1->app订单',
# od_status int COMMENT '订单状态:0->待付款;1->待发货;2->已发货;3->已完成;4->已关闭;5->无效订单',
# od_order_type int COMMENT '订单类型:0->正常订单;1->秒杀订单',
# od_delivery_company string COMMENT '物流公司',
# od_delivery_sn string COMMENT '物流单号',
# od_auto_confirm_day int COMMENT '自动确认时间(天)',
# od_integration int COMMENT '可以获得的积分',
# od_growth int COMMENT '可以活动的成长值',
# od_promotion_info string COMMENT '活动信息',
# od_bill_type int COMMENT '发票类型:0->不开发票;1->电子发票;2->纸质发票',
# od_bill_header string COMMENT '发票抬头',
# od_bill_content string COMMENT '发票内容',
# od_bill_receiver_phone string COMMENT '收票人电话',
# od_bill_receiver_email string COMMENT '收票人邮箱',
# od_receiver_name string COMMENT '收货人姓名',
# od_receiver_phone string COMMENT '收货人电话',
# od_receiver_post_code string COMMENT '收货人邮编',
# od_receiver_province string COMMENT '省份/直辖市',
# od_receiver_city string COMMENT '城市',
# od_receiver_region string COMMENT '区',
# od_receiver_detail_address string COMMENT '详细地址',
# od_note string COMMENT '订单备注',
# od_confirm_status int COMMENT '确认收货状态:0->未确认;1->已确认',
# od_delete_status int COMMENT '删除状态:0->未删除;1->已删除',
# od_use_integration int COMMENT '下单时使用的积分',
# od_payment_time BIGINT COMMENT '支付时间',
# od_delivery_time BIGINT COMMENT '发货时间',
# od_receive_time BIGINT COMMENT '确认收货时间',
# od_comment_time BIGINT COMMENT '评价时间',
# od_modify_time BIGINT COMMENT '修改时间' ,
# -- 退货表字段
# rt_id bigint ,
# rt_order_id bigint COMMENT '订单id',
# rt_company_address_id bigint COMMENT '收货地址表id',
# rt_product_id bigint COMMENT '退货商品id',
# rt_order_sn string COMMENT '订单编号',
# rt_create_time bigint COMMENT '申请时间',
# rt_member_username string COMMENT '会员用户名',
# rt_return_amount string COMMENT '退款金额',
# rt_return_name string COMMENT '退货人姓名',
# rt_return_phone string COMMENT '退货人电话',
# rt_status int COMMENT '申请状态:0->待处理;1->退货中;2->已完成;3->已拒绝',
# rt_handle_time bigint COMMENT '处理时间',
# rt_product_pic string COMMENT '商品图片',
# rt_product_name string COMMENT '商品名称',
# rt_product_brand string COMMENT '商品品牌',
# rt_product_attr string COMMENT '商品销售属性:颜色:红色;尺码:xl;',
# rt_product_count int COMMENT '退货数量',
# rt_product_price string COMMENT '商品单价',
# rt_product_real_price string COMMENT '商品实际支付单价',
# rt_reason string COMMENT '原因',
# rt_description string COMMENT '描述',
# rt_proof_pics string COMMENT '凭证图片,以逗号隔开',
# rt_handle_note string COMMENT '处理备注',
# rt_handle_man string COMMENT '处理人员',
# rt_receive_man string COMMENT '收货人',
# rt_receive_time bigint COMMENT '收货时间',
# rt_receive_note string COMMENT '收货备注'
# )
# PARTITIONED BY (dt STRING)
# STORED AS PARQUET
# ;

########################################################
# #
# @author hunter@doitedu #
# @date ${DT_INCR} #
# @desc 下单、退单统计分析服务表启动脚本 #
# #
########################################################

export HIVE_HOME=/opt/apps/hive-3.1.2/

DT=`date -d'-1 day' +%Y-%m-%d`

if [ $1 ]
then
DT=$1
fi

# -- 计算插入
${HIVE_HOME}/bin/hive -e "
WITH od as (
SELECT * FROM dwd17.oms_order WHERE dt='${DT}' AND from_unixtime(create_time,'yyyy-MM-dd')='${DT}'
),
rt as (
SELECT * FROM dwd17.oms_order_return_apply WHERE dt='${DT}' AND from_unixtime(create_time,'yyyy-MM-dd')='${DT}'
)
INSERT INTO TABLE dws17.oms_order_and_return PARTITION (dt='${DT}')
SELECT
od.id ,
od.member_id ,
od.coupon_id ,
od.order_sn ,
od.create_time ,
od.member_username ,
od.total_amount ,
od.pay_amount ,
od.freight_amount ,
od.promotion_amount ,
od.integration_amount ,
od.coupon_amount ,
od.discount_amount ,
od.pay_type ,
od.source_type ,
od.status ,
od.order_type ,
od.delivery_company ,
od.delivery_sn ,
od.auto_confirm_day ,
od.integration ,
od.growth ,
od.promotion_info ,
od.bill_type ,
od.bill_header ,
od.bill_content ,
od.bill_receiver_phone ,
od.bill_receiver_email ,
od.receiver_name ,
od.receiver_phone ,
od.receiver_post_code ,
od.receiver_province ,
od.receiver_city ,
od.receiver_region ,
od.receiver_detail_address ,
od.note ,
od.confirm_status ,
od.delete_status ,
od.use_integration ,
od.payment_time ,
od.delivery_time ,
od.receive_time ,
od.comment_time ,
od.modify_time ,
rt.id ,
rt.order_id ,
rt.company_address_id ,
rt.product_id ,
rt.order_sn ,
rt.create_time ,
rt.member_username ,
rt.return_amount ,
rt.return_name ,
rt.return_phone ,
rt.status ,
rt.handle_time ,
rt.product_pic ,
rt.product_name ,
rt.product_brand ,
rt.product_attr ,
rt.product_count ,
rt.product_price ,
rt.product_real_price ,
rt.reason ,
rt.description ,
rt.proof_pics ,
rt.handle_note ,
rt.handle_man ,
rt.receive_man ,
rt.receive_time ,
rt.receive_note
FROM
od
FULL JOIN
rt
ON
od.id=rt.order_id
"

if [ $? -eq 0 ]
then
echo "congratulations! hive表分区加载成功! 邮件已发送至admin@51doit.com"
else
echo "节哀顺变! 表分区加载失败! 邮件已发送至admin@51doit.com"
fi

2、建立退货分析纬度表,并写入数据

#!/bin/bash

# -- 下单退单人数单数统计日报表
#
# -- 源表:dws.oms_order_and_return
# -- 目标:ads.oms_order_odcnt_oduser_cube
#
# -- 建表:
# CREATE TABLE ads17.oms_order_odcnt_oduser_cube(
# order_cnt int,
# order_users int,
# order_cancel_cnt int,
# order_cancel_users int,
# order_return_cnt int,
# order_return_users int,
# order_return_p_cnt int,
# dim_dt string,
# dim_member_level_id string,
# dim_order_type string,
# dim_source_type string
# )
# PARTITIONED BY (dt string)
# STORED AS PARQUET
# ;

# o11,u1,${DT},1 , null
# o12,u1,${DT},1 , o12,p01,20.8,2,u1_name
# o12,u1,${DT},1 , o12,p02,10.0,1,u1_name
# o13,u2,${DT},5 , null
# o14,u2,${DT},5 , null
# null , o02,p06,10.50,1,u3_name
# null , o02,p04,10.50,1,u3_name

########################################################
# #
# @author hunter@doitedu #
# @date ${DT_INCR} #
# @desc 下单、退单统计分析服务表启动脚本 #
# #
########################################################

export HIVE_HOME=/opt/apps/hive-3.1.2/

DT=`date -d'-1 day' +%Y-%m-%d`

if [ $1 ]
then
DT=$1
fi

# -- 计算插入
${HIVE_HOME}/bin/hive -e "
WITH od_rt AS (
SELECT * FROM dws17.oms_order_and_return WHERE dt='${DT}'
)

SELECT
COUNT(DISTINCT od_id) AS order_cn,
COUNT(DISTINCT od_member_id) AS order_users,
COUNT(DISTINCT if(od_status=5,od_id,null)) AS order_cancel_cnt,
COUNT(DISTINCT if(od_status=5,od_member_id,null)) AS order_cancel_users,
COUNT(DISTINCT rt_order_id) AS order_return_cnt,
COUNT(DISTINCT rt_member_username) order_return_users ,
SUM(rt_product_count) AS order_return_p_cnt,
'${DT}' AS dim_dt ,
member_level_id AS dim_member_level_id,
od_order_type AS dim_order_type ,
od_source_type AS dim_source_type
FROM
(
SELECT
od_rt.*,
u.member_level_id
FROM od_rt JOIN dwd17.ums_member_detail u ON od_rt.od_member_id=u.id

UNION ALL

SELECT
od_rt.*,
u.member_level_id
FROM od_rt JOIN dwd17.ums_member_detail u ON od_rt.rt_member_username=u.username
) o
GROUP BY member_level_id,od_order_type,od_source_type
WITH cube

"

if [ $? -eq 0 ]
then
echo "congratulations! hive表分区加载成功! 邮件已发送至admin@51doit.com"
else
echo "节哀顺变! 表分区加载失败! 邮件已发送至admin@51doit.com"
fi

三、消费画像标签(DWS层)

给用户打上一些消费相关(下单、退货、金额、客单价)的统计数据标签。

1、建表

drop table if exists ads17.userprofile_consume_tag;
create table ads17.userprofile_consume_tag(
user_id bigint ,--用户
first_order_time string ,--首单日期
last_order_time string ,--末单日期
first_order_ago bigint ,--首单距今时间
last_order_ago bigint ,--末单距今时间
month1_order_cnt bigint ,--近30天下单次数
month1_order_amt double ,--近30天购买金额(总金额)
month2_order_cnt bigint ,--近60天购买次数
month2_order_amt double ,--近60天购买金额
month3_order_cnt bigint ,--近90天购买次数
month3_order_amt double ,--近90天购买金额
max_order_amt double ,--最大订单金额
min_order_amt double ,--最小订单金额
total_order_cnt bigint ,--累计消费次数(不含退拒)
total_order_amt double ,--累计消费金额(不含退拒)
total_coupon_amt double ,--累计使用代金券金额
user_avg_order_amt double ,--平均订单金额(含退拒)
month3_user_avg_amt double ,--近90天平均订单金额(含退拒)
common_address string ,--常用收货地址
common_paytype string ,--常用支付方式
month1_cart_cnt_30 bigint ,--最近30天加购次数
month1_cart_goods_cnt_30 bigint ,--最近30天加购商品件数
month1_cart_cancel_cnt bigint ,--最近30天取消商品件数
dw_date string -- 计算日期
) partitioned by (dt string)
stored as parquet
;

2、计算指标,写入目标表

-- 计算
-- 订单表
-- o11,u1,${DT},1
-- o12,u1,${DT},1 o12,退款金额
-- o13,u2,${DT},5
-- o14,u2,${DT},5

-- 退货表
-- o12,p01,20.8,2,u1_name
-- o12,p02,20.8,2,u1_name
-- o02,p06,10,1,u3_name
-- o02,p04,10,1,u3_name

with tmp1 as (
SELECT
a.member_id AS member_id ,--用户
min(from_unixtime(a.create_time,'yyyy-MM-dd')) AS first_order_time ,--首单日期
max(from_unixtime(a.create_time,'yyyy-MM-dd')) AS last_order_time ,--末单日期
-- AS first_order_ago ,--首单距今时间
-- AS last_order_ago ,--末单距今时间
count(if(datediff('2020-10-07',from_unixtime(a.create_time,'yyyy-MM-dd'))<=30,1,null)) AS month1_order_cnt ,--近30天下单次数
sum(if(datediff('2020-10-07',from_unixtime(a.create_time,'yyyy-MM-dd'))<=30,total_amount,0)) AS month1_order_amt ,--近30天购买金额(总金额)
count(if(datediff('2020-10-07',from_unixtime(a.create_time,'yyyy-MM-dd'))<=60,1,null)) AS month2_order_cnt ,--近60天购买次数
sum(if(datediff('2020-10-07',from_unixtime(a.create_time,'yyyy-MM-dd'))<=60,total_amount,0)) AS month2_order_amt ,--近60天购买金额
count(if(datediff('2020-10-07',from_unixtime(a.create_time,'yyyy-MM-dd'))<=90,1,null)) AS month3_order_cnt ,--近90天购买次数
sum(if(datediff('2020-10-07',from_unixtime(a.create_time,'yyyy-MM-dd'))<=90,total_amount,0)) AS month3_order_amt ,--近90天购买金额
max(total_amount) AS max_order_amt ,--最大订单金额
min(total_amount) AS min_order_amt ,--最小订单金额
count(if(b.order_id is null,1,null)) AS total_order_cnt ,--累计消费次数(不含退拒)
sum(if(b.order_id is null,total_amount,0)) AS total_order_amt ,--累计消费金额(不含退拒)
sum(coupon_amount) AS total_coupon_amt ,--累计使用代金券金额
round(avg(total_amount),2) AS user_avg_order_amt ,--平均订单金额(含退拒)
round(avg(if(datediff('2020-10-07',from_unixtime(a.create_time,'yyyy-MM-dd'))<=90,total_amount,null)),2) AS month3_user_avg_amt --近90天平均订单金额(含退拒)
FROM
( SELECT * FROM DWD17.oms_order od WHERE dt='2020-10-07' ) a
LEFT JOIN
( SELECT order_id FROM dwd17.oms_order_return_apply rt WHERE dt='2020-10-07' GROUP BY order_id) b
ON a.id = b.order_id

GROUP BY a.member_id
) ,

-- common_address string ,--常用收货地址
tmp2 as (
SELECT
member_id,
addr as common_address
FROM (
SELECT
member_id,
addr,
row_number() over(partition by member_id order by order_cnt desc) as rn
FROM
(
SELECT
member_id,
concat_ws(',',receiver_province,receiver_city,receiver_region,receiver_detail_address) as addr,
count(1) as order_cnt
FROM DWD17.oms_order od WHERE dt='2020-10-07'
GROUP BY member_id,concat_ws(',',receiver_province,receiver_city,receiver_region,receiver_detail_address)
) o1
) o2
WHERE rn=1
) ,

-- common_paytype string ,--常用支付方式
tmp3 as (
SELECT
member_id,
pay_type as common_paytype
FROM (
SELECT
member_id,
pay_type,
row_number() over(partition by member_id order by order_cnt desc) as rn
FROM
(
SELECT
member_id,
pay_type,
count(1) as order_cnt
FROM DWD17.oms_order od WHERE dt='2020-10-07'
GROUP BY member_id,pay_type
) o1
) o2
WHERE rn=1
) ,

-- 购物车相关指标标签
-- month1_cart_cnt_30 bigint ,--最近30天加购次数
-- month1_cart_goods_cnt_30 bigint ,--最近30天加购商品件数
-- month1_cart_cancel_cnt bigint ,--最近30天取消商品件数
tmp4 as (
SELECT
member_id,
COUNT(DISTINCT if(datediff('2020-10-07',from_unixtime(create_date,'yyyy-MM-dd'))<=30,create_date,null)) as month1_cart_cnt_30,
SUM(if(datediff('2020-10-07',from_unixtime(create_date,'yyyy-MM-dd'))<=30,quantity,0)) as month1_cart_goods_cnt_30,
SUM(if(datediff('2020-10-07',from_unixtime(create_date,'yyyy-MM-dd'))<=30 and delete_status=1,quantity,0)) as month1_cart_cancel_cnt
FROM dwd17.oms_cart_item WHERE dt='2020-10-07'
GROUP BY member_id
),

tmp5 as (
SELECT member_id FROM dwd17.oms_order od WHERE dt='2020-10-07'
UNION
SELECT member_id FROM dwd17.oms_cart_item WHERE dt='2020-10-07'
)

INSERT INTO TABLE ads17.userprofile_consume_tag PARTITION (dt='2020-10-07')
SELECT
tmp5.member_id AS user_id ,--用户
tmp1.first_order_time ,--首单日期
tmp1.last_order_time ,--末单日期
datediff('2020-10-07',tmp1.first_order_time) AS first_order_ago ,--首单距今时间
datediff('2020-10-07',tmp1.last_order_time) AS last_order_ago ,--末单距今时间
tmp1.month1_order_cnt ,--近30天下单次数
tmp1.month1_order_amt ,--近30天购买金额(总金额)
tmp1.month2_order_cnt ,--近60天购买次数
tmp1.month2_order_amt ,--近60天购买金额
tmp1.month3_order_cnt ,--近90天购买次数
tmp1.month3_order_amt ,--近90天购买金额
tmp1.max_order_amt ,--最大订单金额
tmp1.min_order_amt ,--最小订单金额
tmp1.total_order_cnt ,--累计消费次数(不含退拒)
tmp1.total_order_amt ,--累计消费金额(不含退拒)
tmp1.total_coupon_amt ,--累计使用代金券金额
tmp1.user_avg_order_amt ,--平均订单金额(含退拒)
tmp1.month3_user_avg_amt ,--近90天平均订单金额(含退拒)
tmp2.common_address ,--常用收货地址
tmp3.common_paytype ,--常用支付方式
tmp4.month1_cart_cnt_30 ,--最近30天加购次数
tmp4.month1_cart_goods_cnt_30 ,--最近30天加购商品件数
tmp4.month1_cart_cancel_cnt ,--最近30天取消商品件数
'2020-10-07' AS dw_date -- 计算日期
FROM tmp5
LEFT JOIN tmp1 ON tmp5.member_id=tmp1.member_id
LEFT JOIN tmp2 ON tmp5.member_id=tmp2.member_id
LEFT JOIN tmp3 ON tmp5.member_id=tmp3.member_id
LEFT JOIN tmp4 ON tmp5.member_id=tmp4.member_id
;

四、退货拒货画像标签

退拒商品统计标签表

1、建表

drop table if exists ads17.userprofile_reject_tag;
create table ads17.userprofile_reject_tag(
user_id bigint ,-- 用户
p_sales_cnt bigint ,-- 不含退拒商品购买数量
p_sales_amt double ,-- 不含退拒商品购买的商品总价
p_sales_cut_amt double ,-- 不含退拒实付金额(扣促销减免)
h_sales_cnt bigint ,-- 含退拒购买数量
h_sales_amt double ,-- 含退拒购买金额
h_sales_cut_amt double ,-- 含退拒实付金额(扣促销减免)
return_cnt bigint ,-- 退货商品数量
return_amt double ,-- 退货商品金额
dw_date bigint
) partitioned by (dt string)
stored as parquet
;

2、计算指标,并写入数据

-- 计算
-- 将 订单商品项详情表 关联 退货申请记录表,得到如下形式schema
-- 用户,订单,商品,商品价格,购买数量,实付金额, 退货订单Id,退货商品,退货件数,退货的金额
-- zhas,od01,p1, 20, 100 , 1800 , od01 p1 , 80 , 1400
-- zhas,od01,p2, 10, 20 , 200 , null null null null

SELECT
a.member_id as user_id ,
sum(if(b.order_id is null,a.product_quantity,a.product_quantity-b.product_count)) as p_sales_cnt ,
sum(if(b.order_id is null,a.product_quantity*a.product_price,(a.product_quantity-b.product_count)*a.product_price)) as p_sales_amt ,
sum(if(b.order_id is null,a.real_amount,a.real_amount-b.return_amount)) as p_sales_cut_amt ,
sum(a.product_quantity) as h_sales_cnt ,
sum(a.product_quantity*a.product_price) as h_sales_amt ,
sum(a.real_amount) as h_sales_cut_amt ,
sum(b.product_count) as return_cnt ,
sum(b.return_amount) as return_amt ,
'2020-10-07' as dw_date
FROM
(
SELECT oi.*,od.member_id
FROM dwd17.oms_order_item oi
JOIN dwd17.oms_order od
ON od.dt='2020-10-07' and oi.dt='2020-10-07' and od.id=oi.order_id
) a
LEFT JOIN
(SELECT * FROM dwd17.oms_order_return_apply where dt='2020-10-07') b
ON a.order_id=b.order_id and a.product_id=b.product_id
GROUP BY a.member_id

五、拉链表

数据合并后,已经看不到前一天的数据,所以需要拉链表,记住每天的数据状态

大数据之数据仓库建设(三)_unix_03

1、订单表

create table test.order_incr_test(
oid string,
uid string,
amt string,
sts string
)
partitioned by (dt string)
row format delimited fields terminated by ','
;

2、10月1日数据

o1,u1,20,status1 
o2,u1,30,status1
o3,u2,40,status4
o4,u3,28,status3

10月2日增量数据

o1,u1,20,status2
o2,u1,30,status3
o5,u4,26,status1

3、创建拉链表

create table test.order_zip_test(
oid string,
uid string,
amt string,
sts string,
start_dt string,
end_dt string
)
partitioned by (dt string)
row format delimited fields terminated by ','
;

4、计算逻辑

-- 计算逻辑
-- 1.将T-1日的拉链表 LEFT JOIN T日的增量,对T-1拉链中的数据有效期做更新
-- 2.UNION ALL 上T日增量(老数据的新状态,新数据)

with zip as (
select * from test.order_zip_test where dt='2020-10-01'
),
newdata as (
select *from test.order_incr_test where dt='2020-10-02'
)

INSERT INTO TABLE test.order_zip_test PARTITION(dt='2020-10-02')
SELECT
zip.oid,
zip.uid,
zip.amt,
zip.sts,
zip.start_dt,
if(newdata.oid is not null and end_dt='9999-12-31',zip.dt,zip.end_dt) as end_dt
FROM zip left join newdata on zip.oid=newdata.oid

UNION ALL

SELECT
oid,
uid,
amt,
sts,
dt as start_dt,
'9999-12-31' as end_dt
FROM newdata