mongodbcsvmongodb-atlascompassmongoimport

How do I import data as an array from a .csv file into a mongodb collection in Atlas?


I want to import data into a mongodb collection in an Atlas instance with some data stored in an array. The data is initially is stored in a table in a .csv file like this:

name age interests.0 interests.1 interests.2
David 24 Jogging Swimming
Sarah 43 Movies Football Netball

If I import it via compass, it imports correctly and the data looks like:

{
 name : 'David',
 age:24, 
 interests : [
    0: "Jogging",
    1: "Swimming",
 ]
},
{
 name : 'Sarah',
 age:43, 
 interests : [
    0: "Movies",
    1: "Football",
    2: "Netball"
 ]
}

But compass requires me to manually select the type of each column which is time consuming. If I use mongoimport with the following command:

mongoimport --uri 'mongodb+srv://cluster0.xxx.mongodb.net/my_db' \
   --username='user' \
   --collection='my_collection' \
   --ignoreBlanks \
   --type=csv \
   --headerline \
   --file=/url-to-my-data/data.csv

It does not require the data type to be manually selected, but the interests column becomes an object, e.g.:

interests : {
    0: "Movies",
    1: "Football",
    2: "Netball"
}

How can I import the data from the format it is currently in, in the .csv file but avoid the manual data type selection compass requires and keep the interests column as an array?


Solution

  • The easiest solution I have found for this is to import the data with mongoImport and modify the objects to arrays after they are imported as follows:

    1. Import the .csv using mongoImport as in the question above.
    2. Access the Atlas instance using mongosh
    3. Use the following command to update the interests object to an array:
    db.my_collection.find({}).forEach(doc => {   
        if(doc.interests){
            let interestsArray = Object.values(doc.interests); 
            db.my_collection.updateOne({"_id": doc._id}, [{$set:{ "interests" : interestsArray}}]); 
        }
    });
    

    The if statement is used to check if the document has an interests field, otherwise Object.values() throws an error. This is helpful if using it to modify lots of documents and some might not have an interests field because the command would stop on the first document to not have an interests field.