Why does SELECT ... FOR UPDATE with a subquery create a snapshot in MySQL but INSERT ... SELECT does not?

I'm trying to understand when a consistent snapshot (read view) is created in MySQL, especially under the REPEATABLE READ isolation level, which is the default for InnoDB.

Scenario
I have two tables:

CREATE TABLE table1 (  
  name VARCHAR(100) NOT NULL,  
  INDEX idx_table1_name (name)  
);  

CREATE TABLE table2 (  
  name VARCHAR(100) NOT NULL,  
  INDEX idx_table2_name (name)  
);

Case 1: SELECT ... FOR UPDATE with a subquery

Transaction 1:

START TRANSACTION;  
SELECT * FROM table1 
WHERE name = (
  SELECT name FROM table2 WHERE name = 'a'
) FOR UPDATE;

Transaction 2:

START TRANSACTION;  
INSERT INTO table2 VALUES ('b');  
COMMIT;

Back in Transaction 1:

SELECT * FROM table2 WHERE name = 'b';   
-- returns EMPTY SET

This shows that a snapshot was created at the time of the SELECT ... FOR UPDATE with subquery, since it cannot see the committed value 'b' from Transaction 2.

Case 2: INSERT INTO ... SELECT with subquery
Transaction 1:

START TRANSACTION;   
INSERT INTO table1    
SELECT 'c' FROM table2 WHERE name = 'a';

Transaction 2:

START TRANSACTION;  
INSERT INTO table2 VALUES ('b');  
COMMIT;

Back in Transaction 1:

SELECT * FROM table2 WHERE name = 'b';  
-- returns 'b'

This means no snapshot was created by the INSERT ... SELECT, and it can see the committed data from Transaction 2. According to this page, a regular SELECT ... FOR UPDATE is treated as a DML operation rather than a regular SELECT.

https://dev.mysql.com/doc/refman/8.4/en/innodb-consistent-read.html

Questions

Why does SELECT ... FOR UPDATE with a subquery create a snapshot?
Why does INSERT INTO ... SELECT not create a snapshot in the same REPEATABLE READ isolation level?
Is this behavior documented anywhere in the MySQL official documentation?
Does the snapshot timing depend on the presence of subqueries in DML?

Solution

Yes, it depends on the use of subqueries.

https://dev.mysql.com/doc/refman/8.4/en/innodb-locking-reads.html says:

A locking read clause in an outer statement does not lock the rows of a table in a nested subquery unless a locking read clause is also specified in the subquery. For example, the following statement does not lock rows in table t2.
SELECT * FROM t1 WHERE c1 = (SELECT c1 FROM t2) FOR UPDATE;
To lock rows in table t2, add a locking read clause to the subquery:
SELECT * FROM t1 WHERE c1 = (SELECT c1 FROM t2 FOR UPDATE) FOR UPDATE;

So in your case 1, the outer query was a locking read on table1, and locking reads will see committed changes as if you had used READ-COMMITTED isolation level. But the subquery without a locking modifier still behaves according to REPEATABLE-READ isolation level.

If you use a locking modifier in the subquery like this:

SELECT * FROM table1 
WHERE name = (
  SELECT name FROM table2 WHERE name = 'a'
  FOR UPDATE
) FOR UPDATE;

Then it should see rows committed by the other session.

But a subsequent non-locking statement will not see rows that were committed after session 1's snapshot started, even though the locking statement could see them.