pythondjango

Django Save Multiple Objects At Once


I have been practicing Django for a while now. Currently I am using it in a project where I'm fetching Facebook data via GET requests and then saving it to an sqlite database using Django models. I would like to know how can I improve the following code and save a list of Facebook posts and their metrics efficiently. In my current situation, I am using a for loop to iterate on a list containing several Facebook Posts and their respective metrics which is then associated to the specific Django model and finally saved.


def save_post(post_id, page_id):

    facebook_post = Post(post_id=post_id,
                    access_token=fb_access_token)

    post_db = PostsModel(page_id=page_id, post_id=post.post_id)
    post_db.message = facebook_post.message
    post_db.story = facebook_post.story
    post_db.full_picture = facebook_post.full_picture
    post_db.reactions_count = facebook_post.reactions_count
    post_db.comments_count = facebook_post.comments_count
    post_db.shares_count = facebook_post.shares_count
    post_db.interactions_count = facebook_post.interactions_count
    post_db.created_time = facebook_post.created_time
    post_db.published = facebook_post.published
    post_db.attachment_title = facebook_post.attachment_title
    post_db.attachment_description = facebook_post.attachment_description
    post_db.attachment_target_url = facebook_post.attachment_target_url
    post_db.save()

post_db is a Django model object instantiated using PostsModel while Post is a normal Python Class which I wrote. The latter is simply a collection of GET requests which fetches data from Facebook's Graph API and returns JSON data whereby I associate relevant data to class attributes (message, 'shares_count`).

I read about the bulk_create function from Django's documentation but I don't know how to pass on the above. I also tried using multiprocessing and Pool but the above function does execute. Right now, I am just iterating sequentially on a list. As the list increases in length, it takes more time to save.


def create(self, request):
        page_id = request.data['page_id']

        page = get_object_or_404(PagesModel, pk=page_id)
        post_list = get_list_or_404(PostsModel, page_id=page_id)

        for post_id in post_list:
            save_post(post_id=post_id, page_id=page)

The above function gets an already saved list from the database for a specific page based on the page_id. Then, the for loop iterates on each post in the list and its post_id and page instance are sent to the save_post function to fetch its data and save it.

Huge thanks if anyone can suggest a more effective way to tackle this. Thank you.


Solution

  • You are going in the right direction with the bulk_load. Generate a list of the PostsModel objects and then use bulk_create to upload them into the database. An important note here is that it won't work if the posts already exist in the database. For updating posts, try bulk_update.

    def save_post(post_id, page_id):
    
        facebook_post = Post(post_id=post_id,
                    access_token=fb_access_token)
    
        post_db = PostsModel(page_id=page_id, post_id=post.post_id)
        post_db.message = facebook_post.message
        post_db.story = facebook_post.story
        post_db.full_picture = facebook_post.full_picture
        post_db.reactions_count = facebook_post.reactions_count
        post_db.comments_count = facebook_post.comments_count
        post_db.shares_count = facebook_post.shares_count
        post_db.interactions_count = facebook_post.interactions_count
        post_db.created_time = facebook_post.created_time
        post_db.published = facebook_post.published
        post_db.attachment_title = facebook_post.attachment_title
        post_db.attachment_description = facebook_post.attachment_description
        post_db.attachment_target_url = facebook_post.attachment_target_url
        return post_db
    
    def create(self, request):
        page_id = request.data['page_id']
    
        page = get_object_or_404(PagesModel, pk=page_id)
        post_list = get_list_or_404(PostsModel, page_id=page_id)
        
        post_model_list = [save_post(post_id=post_id, page_id=page) for post_id in 
                           post_list]
        
        PostsModel.objects.bulk_create(post_model_list)