mysqljunction-table

MySQL handling many to many relations from data in csv file


I have a single CSV file with following columns:

ActorName, Address, City, Movies

ActorName column has duplicate entries because there are multiple movies that actor acted in. Eg. Actor Clint Eastwood is listed 3 times because movies column has 3 movies he acted in.

Issue I am having is how to create a junction table or relation. If i create ActorID first then ActorName will still have duplicates. And if i create Movies table first and move Movies column, then delete the duplicates from Actor table then how will I associate movie with the actor?


Solution

  • An example. Loading the data into temporary table then copying into working tables.

    CREATE PROCEDURE process_data ()
    BEGIN
    
    -- load data from CSV file into temptable
    LOAD DATA INFILE 'x:/folder/new_data.csv' INTO TABLE raw_data;
    
    -- insert actor names into actor table if absent
    INSERT IGNORE INTO actor (actor_name) SELECT actor_name FROM raw_data;
    
    -- insert movie names into movie table if absent
    INSERT IGNORE INTO movie (movie_name) SELECT movie_name FROM raw_data;
    
    -- insert junction data into actor_in_movie if absent
    INSERT IGNORE INTO actor_in_movie (actor_id, movie_id)
    SELECT actor_id, movie_id
    FROM raw_data
    JOIN actor USING (actor_name)
    JOIN movie USING (movie_name);
    
    -- clear imported data
    TRUNCATE raw_data;
    END
    

    DEMO fiddle