I am using phpMyAdmin that uses MySQL version 5.7.
I have a database table called condensed
which contains millions of unique emails with multiple "send dates", among other columns. Some of the emails will have a NULL send date.
Focusing on the just the email
and send_date
columns here.
Here is an example of the table:
| email | a_last_sent | b_last_sent | c_last_sent | d_last_sent | ..up to 14 dates
----------------------------------------------------------------------------------
| email1 | 2024-06-12 | 2024-05-25 | NULL | 2024-06-06 |
----------------------------------------------------------------------------------
| email2 | 2024-06-01 | 2024-06-16 | 2024-06-05 | 2024-06-19 |
----------------------------------------------------------------------------------
| email3 | NULL | NULL | 2024-05-12 | 2024-06-10 |
----------------------------------------------------------------------------------
| email4 | NULL | 2024-06-13 | NULL | 2024-05-11 |
----------------------------------------------------------------------------------
| email5 | 2024-06-09 | 2024-05-01 | NULL | NULL |
----------------------------------------------------------------------------------
I need to exclude anything that has more than 3 sent dates within the month of June.
So if an email has a sent date in 3 or columns within the month of June, do not return those records.
Using my example above, the results should looks something like this:
| email | a_last_sent | b_last_sent | c_last_sent | d_last_sent | ..up to 14 dates
----------------------------------------------------------------------------------
| email1 | 2024-06-12 | 2024-05-25 | NULL | 2024-06-06 |
----------------------------------------------------------------------------------
| email3 | NULL | NULL | 2024-05-12 | 2024-06-10 |
----------------------------------------------------------------------------------
| email4 | NULL | 2024-06-13 | NULL | 2024-05-11 |
----------------------------------------------------------------------------------
| email5 | 2024-06-09 | 2024-05-01 | NULL | NULL |
----------------------------------------------------------------------------------
The results above excluded email2
because it had 4 dates within the month of June.
email1
meets the criteria because it only has 2 dates within the month of June and 1 in May.
email3
, email4
, and email5
meet the criteria because they only have 1 date within the month of June and 1 in May.
I can write the below query:
SELECT
* FROM
`condensed`
WHERE
(
`a_last_sent` NOT BETWEEN '2024-06-01' AND '2024-06-30'
OR
`b_last_sent` NOT BETWEEN '2024-06-01' AND '2024-06-30'
OR
`c_last_sent` NOT BETWEEN '2024-06-01' AND '2024-06-30'
// remaining date columns
)
The query above excludes anything within the date range.
So, how can I incorporate the count to exclude anything that has over 3 dates within the specified date range (which in this case is June)?
The values contains NULLs - so we cannot concat or sum the expressions with the most part of the functions/operators.
I'd recommend you to concat the dates using CONCAT_WS() - this function skips NULLs. Then count the amount of according substrings.
For definite month use something like
SELECT LENGTH(dateslist) - LENGTH(REPLACE(dateslist, '2024-06-', '2024-06')) AS amount, ...
FROM ( SELECT CONCAT_WS(',', a_last_sent, b_last_sent, ..., x_last_sent) AS dateslist, ...
...
)
...
For custom range use UNION, nested REPLACEs (v.5.7 does not support regular expression replace) or register and use according user-defined function.
Of course you may also use multiple COALESCE().
Also you may unpivot your data with multiple UNIONs then count by the common way.
UPDATE
Would you mind providing an example of how the COALESCE or the UNION would work in this case? – John Beasley
Using COALESCE. Edited code from Roemer's answer
...
WHERE
(COALESCE(`a_last_sent`, '2000-01-01') BETWEEN '2024-06-01' AND '2024-06-30')
+ (COALESCE(`b_last_sent`, '2000-01-01') BETWEEN '2024-06-01' AND '2024-06-30')
+ (COALESCE(`c_last_sent`, '2000-01-01') BETWEEN '2024-06-01' AND '2024-06-30')
...
Using UNION:
...
FROM ( SELECT email, a_last_sent FROM table WHERE a_last_sent IS NOT NULL
UNION ALL
SELECT email, b_last_sent FROM table WHERE b_last_sent IS NOT NULL
UNION ALL
SELECT email, c_last_sent FROM table WHERE c_last_sent IS NOT NULL
UNION ALL
...
) AS unpivotted_data
...