python-3.xflaskmongoengineflask-mongoengine

How to get a large amount of data from the database and transfer it over http with one request?


In the Flask application, I have a table with events, there are a large number of them, and now I have a need to make a function for exporting data to the CSV:

@events.route('/events/get_events', methods=['POST', 'GET'])
def get_transactions():
    query = {}

    events = EventModel.objects(__raw__=query).all()

    @stream_with_context
    def generate_io_csv(header, items):
        data = StringIO()
        csw_writer = csv.writer(
            data, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)

        csw_writer.writerow(header)
        yield data.getvalue()
        data.seek(0)
        data.truncate(0)

        for item in items:
            csw_writer.writerow((
                item.date,
                item.user,
                item.data,
            ))

            yield data.getvalue()
            data.seek(0)
            data.truncate(0)

    header = ('date', 'user', 'data')

    response = Response(generate_io_csv(header, events), mimetype='text/csv')
    response.headers.set('Content-Disposition', 'attachment', filename='data.csv')

    return response

I have pagination implemented on my site, so there are no problems, but when trying to export a large amount of data it takes too long.

I understand that I can make a task to generate a file and then request it, but I would like to do without it if possible.

As a DB, I use MonogoDB and Mongoengine to connect.


Solution

  • When manipulating large amount of documents, the overhead introduced by MongoEngine may be too important. One way to speed this up could be to bypass MongoEngine and send the raw documents (as returned by pymongo).

    Try with:

    events = EventModel.objects(__raw__=query).as_pymongo().no_cache()
    

    Downside is that this will give you dict instead of EventModel instances but perhaps this will be acceptable for your use case.

    By default, MongoEngine caches the documents (in case you re-iterate over the querysets a second time, it wouldn't hit the database), so you better turn that off with no_cache() otherwise you may run out of memory