MySQL 如何填补范围内缺失的日期?

我有一个有两列的表格,日期和分数。它最多有30个条目,每个条目的最后30天之一。

date      score
-----------------
1.8.2010  19
2.8.2010  21
4.8.2010  14
7.8.2010  10
10.8.2010 14

我的问题是有些日期不见了——我想看看:

date      score
-----------------
1.8.2010  19
2.8.2010  21
3.8.2010  0
4.8.2010  14
5.8.2010  0
6.8.2010  0
7.8.2010  10
...

我需要从单个查询中获得: 19、21、9、14、0、0、10、0、0、14... ... 这意味着缺少的日期被填充为0。

我知道如何使用服务器端语言迭代日期并丢失空格来获得所有值。但是在 mysql 中这样做是否可行,这样我就可以按日期对结果进行排序并得到缺失的部分。

编辑: 在这个表中有另一个名为 UserID 的列,所以我有30.000个用户,他们中的一些人拥有这个表中的分数。如果日期 < 30天前,我每天删除日期,因为我需要为每个用户最后30天得分。原因是,我正在制作一个图表的用户活动在过去30天,并绘制一个图表,我需要30个逗号分隔值。所以我可以说在查询中给我 USERID = 10203活动,查询会给我30个分数,过去30天每天一个。我希望我现在说得更清楚了。

67509 次浏览

MySQL 没有递归功能,所以只能使用 NUMBERS 表技巧-

  1. 创建一个只包含递增数字的表——使用 auto _  很容易:

    DROP TABLE IF EXISTS `example`.`numbers`;
    CREATE TABLE  `example`.`numbers` (
    `id` int(10) unsigned NOT NULL auto_increment,
    PRIMARY KEY  (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
    
  2. Populate the table using:

    INSERT INTO `example`.`numbers`
    ( `id` )
    VALUES
    ( NULL )
    

    你想要多少价值观都可以。

  3. Use 日期 _ 添加 to construct a list of dates, increasing the days based on the NUMBERS.id value. Replace "2010-06-06" and "2010-06-14" with your respective start and end dates (but use the same format, YYYY-MM-DD) -

    SELECT `x`.*
    FROM (SELECT DATE_ADD('2010-06-06', INTERVAL `n`.`id` - 1 DAY)
    FROM `numbers` `n`
    WHERE DATE_ADD('2010-06-06', INTERVAL `n`.`id` -1 DAY) <= '2010-06-14' ) x
    
  4. LEFT JOIN onto your table of data based on the time portion:

       SELECT `x`.`ts` AS `timestamp`,
    COALESCE(`y`.`score`, 0) AS `cnt`
    FROM (SELECT DATE_FORMAT(DATE_ADD('2010-06-06', INTERVAL `n`.`id` - 1 DAY), '%m/%d/%Y') AS `ts`
    FROM `numbers` `n`
    WHERE DATE_ADD('2010-06-06', INTERVAL `n`.`id` - 1 DAY) <= '2010-06-14') x
    LEFT JOIN TABLE `y` ON STR_TO_DATE(`y`.`date`, '%d.%m.%Y') = `x`.`ts`
    

If you want to maintain the date format, use the DATE_FORMAT function:

DATE_FORMAT(`x`.`ts`, '%d.%m.%Y') AS `timestamp`

您可以通过使用 日历表来实现这一点。这是一个创建一次并填充一个日期范围的表(例如,每天一个数据集2000-2050; 这取决于您的数据)。然后可以将表与日历表进行外部连接。如果表中缺少日期,则返回0作为得分。

I'm not a fan of the other answers, requiring tables to be created and such. This query does it efficiently without helper tables.

SELECT
IF(score IS NULL, 0, score) AS score,
b.Days AS date
FROM
(SELECT a.Days
FROM (
SELECT curdate() - INTERVAL (a.a + (10 * b.a) + (100 * c.a)) DAY AS Days
FROM       (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS a
CROSS JOIN (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS b
CROSS JOIN (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS c
) a
WHERE a.Days >= curdate() - INTERVAL 30 DAY) b
LEFT JOIN your_table
ON date = b.Days
ORDER BY b.Days;

So lets dissect this.

SELECT
IF(score IS NULL, 0, score) AS score,
b.Days AS date

If 将检测没有得分的日子,并将它们设置为0。天数是你选择从当前日期得到的配置天数,最多1000天。

    (SELECT a.Days
FROM (
SELECT curdate() - INTERVAL (a.a + (10 * b.a) + (100 * c.a)) DAY AS Days
FROM       (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS a
CROSS JOIN (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS b
CROSS JOIN (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS c
) a
WHERE a.Days >= curdate() - INTERVAL 30 DAY) b

这个子查询是我在 stackoverflow 上看到的。它高效地生成从当前日期开始的过去1000天的列表。WHERE 子句末尾的间隔(目前为30)决定返回哪些天; 最大值为1000。这个查询可以很容易地修改为返回100年的日期,但是对于大多数情况,1000年应该是好的。

LEFT JOIN your_table
ON date = b.Days
ORDER BY b.Days;

这是将包含分数的表放入其中的部分。您可以比较从日期生成器查询中选择的日期范围,以便能够在需要的地方填充0(分数最初将设置为 NULL,因为它是 LEFT JOIN; 这在 select 语句中是固定的)。我也是按日期订的,因为。这是偏好,你也可以按分数排序。

ORDER BY之前,你可以很容易地加入你的表关于用户信息你提到你的编辑,添加最后的要求。

I hope this version of the query helps someone. Thanks for reading.

Michael Conard 的回答很棒,但我需要15分钟的间隔时间,而这个间隔时间必须始终从每15分钟的开头开始:

SELECT a.Days
FROM (
SELECT FROM_UNIXTIME( FLOOR( UNIX_TIMESTAMP() / (15 * 60) ) * (15 * 60)) - INTERVAL 15 * (a.a + (10 * b.a) + (100 * c.a)) MINUTE AS Days
FROM       (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS a
CROSS JOIN (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS b
CROSS JOIN (SELECT 0 AS a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) AS c
) a
WHERE a.Days >= curdate() - INTERVAL 30 DAY

这将把当前时间设定为上一轮的第15分钟:

FROM_UNIXTIME( FLOOR( UNIX_TIMESTAMP() / (15 * 60) ) * (15 * 60))

这将消除15分钟的步骤时间:

- INTERVAL 15 * (a.a + (10 * b.a) + (100 * c.a)) MINUTE

如果有更简单的方法,请告诉我。

自从这个问题被提出以来,时间过得很快。MySQL 8.0于2018年发布,并增加了对 递归公共表表达式的支持,它为这个问题提供了一个优雅的、最先进的解决方案。

以下查询可用于生成日期列表,例如2010年8月的前15天:

with recursive all_dates(dt) as (
-- anchor
select '2010-08-01' dt
union all
-- recursion with stop condition
select dt + interval 1 day from all_dates where dt < '2010-08-15'
)
select * from all_dates order by dt

然后,您可以使用表 left join生成这个结果集,以生成预期的输出:

with recursive all_dates(dt) as (
select '2010-08-01' dt
union all
select dt + interval 1 day from all_dates where dt < '2010-08-15'
)
select d.dt date, coalesce(t.score, 0) score
from all_dates d
left join mytable t on t.date = d.dt
order by d.dt

DB Fiddle 演示 :

date       | score
:--------- | ----:
2010-08-01 |    19
2010-08-02 |    21
2010-08-03 |     0
2010-08-04 |    14
2010-08-05 |     0
2010-08-06 |     0
2010-08-07 |    10
2010-08-08 |     0
2010-08-09 |     0
2010-08-10 |    14
2010-08-11 |     0
2010-08-12 |     0
2010-08-13 |     0
2010-08-14 |     0
2010-08-15 |     0

请注意,将递归 CTE 调整为其他间隔或周期是非常容易的。例如,假设我们希望在2010年8月1日凌晨4点到8点之间每隔15分钟有一行; 我们可以这样做:

with recursive all_dates(dt) as (
select '2010-08-01 04:00:00' dt
union all
select dt + interval 15 minute from all_dates where dt < '2010-08-01 08:00:00'
)
...

您可以使用插入操作直接从开始日期到今天

        with recursive all_dates(dt) as (
-- anchor
select '2021-01-01' dt
union all
-- recursion with stop condition
INSERT IGNORE  INTO mytable (date,score) VALUES (dt + interval 1 day ,0 )  where dt + interval 1 day <= curdate()
)
select * from all_dates