欢迎来到尧图网

客户服务 关于我们

您的位置:首页 > 汽车 > 时评 > SQL面试题练习 —— 查询每个产品每年总销售额

SQL面试题练习 —— 查询每个产品每年总销售额

2024/10/25 14:25:53 来源:https://blog.csdn.net/hu_wei123/article/details/140034394  浏览:    关键词:SQL面试题练习 —— 查询每个产品每年总销售额

目录

  • 1 题目
  • 2 建表语句
  • 3 题解

1 题目


已知有表 t_product_sales 如下,记录了每个产品id、产品名称、产品销售开始日期、产品销售结束日期以及产品日均销售金额,请计算出每个产品每年的销售金额。

样例数据

+-------------+---------------+----------------------+----------------------+----------------------+
| product_id  | product_name  |     period_start     |      period_end      | average_daily_sales  |
+-------------+---------------+----------------------+----------------------+----------------------+
| 1           | LC Phone      | 2019-01-25 00:00:00  | 2019-02-28 00:00:00  | 100                  |
| 2           | LC T-Shirt    | 2018-12-01 00:00:00  | 2020-01-01 00:00:00  | 10                   |
| 3           | LC Keychain   | 2019-12-01 00:00:00  | 2020-01-31 00:00:00  | 1                    |
+-------------+---------------+----------------------+----------------------+----------------------+

期望结果

+------------+--------------+-------------+--------------+
| product_id | product_name | report_year | total_amount |
+------------+--------------+-------------+--------------+
| 1          | LC Phone     |    2019     | 3500         |
| 2          | LC T-Shirt   |    2018     | 310          |
| 2          | LC T-Shirt   |    2019     | 3650         |
| 2          | LC T-Shirt   |    2020     | 10           |
| 3          | LC Keychain  |    2019     | 31           |
| 3          | LC Keychain  |    2020     | 31           |
+------------+--------------+-------------+--------------+

2 建表语句


--建表语句
create table if not exists t_product_sales
(product_id          bigint,product_name        string,period_start        string,period_end          string,average_daily_sales bigint
)ROW FORMAT DELIMITEDFIELDS TERMINATED BY ','STORED AS orc;--插入数据insert into t_product_sales(product_id, product_name, period_start, period_end, average_daily_sales)
values (1, 'LC Phone', '2019-01-25 00:00:00', '2019-02-28 00:00:00', 100),(2, 'LC T-Shirt', '2018-12-01 00:00:00', '2020-01-01 00:00:00', 10),(3, 'LC Keychain', '2019-12-01 00:00:00', '2020-01-31 00:00:00', 1);

3 题解


(1)构建年份维表

with dim_year as (select '2018' as year, '2018-01-01' as year_first_day, '2018-12-31' as year_end_dayunion allselect '2019' as year, '2019-01-01' as year_first_day, '2019-12-31' as year_end_dayunion allselect '2020' as year, '2020-01-01' as year_first_day, '2020-12-31' as year_end_day)
select year,year_first_day,year_end_day from dim_year

执行结果

+-------+-----------------+---------------+
| year  | year_first_day  | year_end_day  |
+-------+-----------------+---------------+
| 2018  | 2018-01-01      | 2018-12-31    |
| 2019  | 2019-01-01      | 2019-12-31    |
| 2020  | 2020-01-01      | 2020-12-31    |
+-------+-----------------+---------------+

(2)年份维表与原始数据进行关联

把原始数据与年份维表进行笛卡尔积,得到每年与原始数据的一个交叉值。

# 这一设置的含义是关闭Hive中的笛卡尔积严格检查。
# 具体来说,Hive默认情况下不允许生成笛卡尔积(即两个没有连接条件的表的笛卡尔积),
# 因为笛卡尔积通常会生成非常大的结果集,可能会导致性能问题或资源耗尽。
# 通过将这个设置设为false,你可以禁用这种严格检查,从而允许在查询中生成笛卡尔积。set hive.strict.checks.cartesian.product = false;
with dim_year as (select '2018' as year, '2018-01-01' as year_first_day, '2018-12-31' as year_end_dayunion allselect '2019' as year, '2019-01-01' as year_first_day, '2019-12-31' as year_end_dayunion allselect '2020' as year, '2020-01-01' as year_first_day, '2020-12-31' as year_end_day)
selectproduct_id,product_name,period_start,period_end,average_daily_sales,year,year_first_day,year_end_day
from t_product_sales
left join dim_year

执行结果

+-------------+---------------+----------------------+----------------------+----------------------+-------+-----------------+---------------+
| product_id  | product_name  |     period_start     |      period_end      | average_daily_sales  | year  | year_first_day  | year_end_day  |
+-------------+---------------+----------------------+----------------------+----------------------+-------+-----------------+---------------+
| 1           | LC Phone      | 2019-01-25 00:00:00  | 2019-02-28 00:00:00  | 100                  | 2019  | 2019-01-01      | 2019-12-31    |
| 1           | LC Phone      | 2019-01-25 00:00:00  | 2019-02-28 00:00:00  | 100                  | 2020  | 2020-01-01      | 2020-12-31    |
| 1           | LC Phone      | 2019-01-25 00:00:00  | 2019-02-28 00:00:00  | 100                  | 2018  | 2018-01-01      | 2018-12-31    |
| 2           | LC T-Shirt    | 2018-12-01 00:00:00  | 2020-01-01 00:00:00  | 10                   | 2019  | 2019-01-01      | 2019-12-31    |
| 2           | LC T-Shirt    | 2018-12-01 00:00:00  | 2020-01-01 00:00:00  | 10                   | 2020  | 2020-01-01      | 2020-12-31    |
| 2           | LC T-Shirt    | 2018-12-01 00:00:00  | 2020-01-01 00:00:00  | 10                   | 2018  | 2018-01-01      | 2018-12-31    |
| 3           | LC Keychain   | 2019-12-01 00:00:00  | 2020-01-31 00:00:00  | 1                    | 2019  | 2019-01-01      | 2019-12-31    |
| 3           | LC Keychain   | 2019-12-01 00:00:00  | 2020-01-31 00:00:00  | 1                    | 2020  | 2020-01-01      | 2020-12-31    |
| 3           | LC Keychain   | 2019-12-01 00:00:00  | 2020-01-31 00:00:00  | 1                    | 2018  | 2018-01-01      | 2018-12-31    |
+-------------+---------------+----------------------+----------------------+----------------------+-------+-----------------+---------------+

(3)计算每年每个产品在售天数,计算年销售额

我们先观察2中的结果,可以看到原始记录中每行数据都与所有年都有一行记录。我们从销售日期和每年开始日期中取较大日期 得到一个开始时间,然后从销售截止日期和每年的结束日期取较小日期 得到一个结束日期,然后用结束日期减去开始日期。

with dim_year as (select '2018' as year, '2018-01-01' as year_first_day, '2018-12-31' as year_end_dayunion allselect '2019' as year, '2019-01-01' as year_first_day, '2019-12-31' as year_end_dayunion allselect '2020' as year, '2020-01-01' as year_first_day, '2020-12-31' as year_end_day),tmp as (select product_id, product_name, period_start, period_end, average_daily_sales, year, year_first_day, year_end_day, datediff(if(to_date(period_end)> to_date(year_end_day), to_date(year_end_day), to_date(period_end)), if(to_date(period_start)> to_date(year_first_day), to_date(period_start), to_date(year_first_day))) as date_difffrom t_product_salesleft join dim_year)
select product_id,product_name,year,(date_diff + 1) * average_daily_sales as total_amount
from tmp
where date_diff >= 0

执行结果

+-------------+---------------+-------+---------------+
| product_id  | product_name  | year  | total_amount  |
+-------------+---------------+-------+---------------+
| 1           | LC Phone      | 2019  | 3500          |
| 2           | LC T-Shirt    | 2018  | 310           |
| 2           | LC T-Shirt    | 2019  | 3650          |
| 2           | LC T-Shirt    | 2020  | 10            |
| 3           | LC Keychain   | 2019  | 31            |
| 3           | LC Keychain   | 2020  | 31            |
+-------------+---------------+-------+---------------+

只有在当年有销售时间的数据date_diff >=0,如果在当年没有销售时间,则date_diff为负。但是这个date_diff 的值比预期小1,这是因为我们算了日期差,所以我们在结果上+1即可。

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com