一、题目
现有用户登录记录表,已经按照用户日期进行去重处理。以用户登录的最早日期作为新增日期,请计算次日留存率是多少。
代码语言:javascript
复制
+----------+-------------+
| user_id | login_date |
+----------+-------------+
| aaa | 2023-12-01 |
| bbb | 2023-12-01 |
| bbb | 2023-12-02 |
| ccc | 2023-12-02 |
| bbb | 2023-12-03 |
| ccc | 2023-12-03 |
| ddd | 2023-12-03 |
| ccc | 2023-12-04 |
| ddd | 2023-12-04 |
+----------+-------------+
二、分析
维度 | 评分 |
---|---|
题目难度 | ⭐️⭐️⭐️⭐️ |
题目清晰度 | ⭐️⭐️⭐️⭐️⭐️ |
业务常见度 | ⭐️⭐️⭐️⭐️⭐️ |
三、SQL
指标定义:
次日留存用户:新增用户第二天登录(活跃)的用户;
次日留存率:t+1日留存用户数/t日新增用户;
1.根据登录日志,使用开窗函数计算出用户的最小登录时间作为新增日期first_day,然后计算当天日期和新增日期的时间差。
代码语言:javascript
复制
select
user_id,
login_date,
min(login_date)over(partition by user_id order by login_date asc) as first_day,
datediff(login_date,min(login_date)over(partition by user_id order by login_date asc)) as date_diff
from t_login_040
查询结果
2.我们根据first_day进行分组,date_diff=0的为当天新增用户,date_diff=1的为次日登录的用户
代码语言:javascript
复制
select
first_day,
count(case when date_diff = 0 then user_id end) as new_cnt,
count(case when date_diff =1 then user_id end) as next_act_cnt
from(
select
user_id,
login_date,
min(login_date)over(partition by user_id order by login_date asc) as first_day,
datediff(login_date,min(login_date)over(partition by user_id order by login_date asc)) as date_diff
from t_login_040
)t
group by first_day
order by first_day asc
查询结果
3.用次日留存数/新增用户数据即为留存率,因为新增可能为0,所以需要先判断。
代码语言:javascript
复制
select
first_day,
concat(if(count(case when date_diff = 0 then user_id end) =0,0,count(case when date_diff =1 then user_id end) /count(case when date_diff = 0 then user_id end))*100,'%') as next_act_per
from(
select
user_id,
login_date,
min(login_date)over(partition by user_id order by login_date asc) as first_day,
datediff(login_date,min(login_date)over(partition by user_id order by login_date asc)) as date_diff
from t_login_040
)t
group by first_day
order by first_day asc
查询结果
四、建表语句和数据插入
代码语言:javascript
复制
create table t_login_040 ( user_id string COMMENT '用户ID', login_date string COMMENT '登录日期' ) COMMENT '用户登录记录表' stored as orc ;
insert into t_login_040(user_id,login_date)
values
('aaa','2023-12-01'),
('bbb','2023-12-01'),
('bbb','2023-12-02'),
('ccc','2023-12-02'),
('bbb','2023-12-03'),
('ccc','2023-12-03'),
('ddd','2023-12-03'),
('ccc','2023-12-04'),
('ddd','2023-12-04');