I am trying to run a query that will get the last time a customer has placed an order. The catch is that the site allows people the ability to checkout both logged in and logged out. Additionally, I want to combine records for customers that have changed their email at some point.
I have a query that works (I think), but it involves three selects - one for the last order, one for logged in/out switches, and one for email switches. This is kind of slow on my database! What am I overlooking? How can I make this query more efficient and/or faster?
SELECT last, email, user_id
FROM (
SELECT last, email, user_id
FROM (
SELECT time as last, email,
CASE WHEN user_id > 0 THEN CAST(user_id as CHAR(20))
ELSE email
END as user_id
FROM orders o1
WHERE time=(SELECT MAX(o2.time)
FROM orders o2
WHERE o1.email = o2.email)
ORDER BY last DESC
) as o3
GROUP BY email
) as o4
GROUP BY user_id
ORDER BY email
Edit: don't know how this will help, but here's the orders table, as requested:
create table orders(order_id int(10) NOT NULL AUTO_INCREMENT,
user_id int(10) NOT NULL,
time int(10) UNSIGNED NOT NULL,
email varchar(50) NOT NULL);
PRIMARY KEY (`order_id`), KEY `userid` (`user_id`))
ENGINE=MyISAM AUTO_INCREMENT=6207584 DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci
Update1.
In this example considered cases
1.User has orders with logged in
and logged out
with the same email.
2. User has orders with logged in
with several email.
Index on ((email,user_id);) will be usefull for query.
time
column data type does not matter.
select user_id,email,max_time
from(
select max(user_id) user_id,email ,max(time) max_time
,row_number()over(partition by max(user_id) order by max(time) desc) rn_email
from orders
group by email
)t
where rn_email=1 or user_id=0
order by email
Old part of answer:
You can range orders by time desc with partition by composite value (user_id or email).
To compare query performance, use 'EXPLAIN SELECT ...'
See example:
Test data
order_id | user_id | time | |
---|---|---|---|
1 | 100 | 2024-09-01 14:00:00 | user100@mail.com |
2 | 100 | 2024-09-10 09:00:00 | user100@mail.com |
3 | 200 | 2024-09-12 09:00:00 | user200@mail.com |
4 | 300 | 2024-09-10 09:00:00 | user300@mail.com |
5 | 300 | 2024-09-12 09:00:00 | user300-other@mail.com |
6 | 400 | 2024-09-10 09:00:00 | user400@mail.com |
7 | 400 | 2024-09-12 09:00:00 | user400-other@mail.com |
8 | 400 | 2024-09-14 09:00:00 | user400@mail.com |
9 | 0 | 2024-09-11 09:00:00 | user_x1@mail.com |
10 | 0 | 2024-09-15 15:00:00 | user_x1@mail.com |
11 | 0 | 2024-09-10 09:00:00 | user_x2@mail.com |
Query example
select *
from(
select *
,row_number()over(partition by case when user_id<>0 then CAST(user_id as varCHAR(50)) else email end
order by time desc) rn
from orders
)ranged_orders
where rn=1
order by email
Output
order_id | user_id | time | rn | |
---|---|---|---|---|
2 | 100 | 2024-09-10 09:00:00 | user100@mail.com | 1 |
3 | 200 | 2024-09-12 09:00:00 | user200@mail.com | 1 |
5 | 300 | 2024-09-12 09:00:00 | user300-other@mail.com | 1 |
8 | 400 | 2024-09-14 09:00:00 | user400@mail.com | 1 |
10 | 0 | 2024-09-15 15:00:00 | user_x1@mail.com | 1 |
11 | 0 | 2024-09-10 09:00:00 | user_x2@mail.com | 1 |
I think it is necessary to bring user_id and email to the same data type - the email data type (varchar(50) in example).
Another query
select *
from(
select * ,row_number()over(partition by user_id order by time desc) rn
from orders
where user_id<>0
union all
select * ,row_number()over(partition by email order by time desc) rn
from orders
where user_id=0
)ranged_orders
where rn=1
order by email
possible (partially) covered by indexes. Example