pythonjsonexasol

Sending API JSON data to Exasol DB table


I am working with fake JSON data from a dummy JSON site that looks like this:

[
  {
    "postId": 1,
    "id": 1,
    "name": "id labore ex et quam laborum",
    "email": "Eliseo@gardner.biz",
    "body": "laudantium enim quasi est quidem magnam voluptate ipsam eos\ntempora quo necessitatibus\ndolor quam autem quasi\nreiciendis et nam sapiente accusantium"
  },
  {
    "postId": 1,
    "id": 2,
    "name": "quo vero reiciendis velit similique earum",
    "email": "Jayne_Kuhic@sydney.com",
    "body": "est natus enim nihil est dolore omnis voluptatem numquam\net omnis occaecati quod ullam at\nvoluptatem error expedita pariatur\nnihil sint nostrum voluptatem reiciendis et"
  }
]

I read in the API data through the requests library then turn around and send it into an Exasol DB table. See code below

import requests
import pyexasol

def get_api_data():
    r = requests.get("http://jsonplaceholder.typicode.com/comments")
    data = r.json()
    return data
    
def connection():
    session = pyexasol.connect_local_config('my_exasol')
    return session

def send_api_data():
    s = connection()
    data = get_api_data()
    for row in data:
        s.execute("""INSERT INTO TESTBED.TEST_API(postId, id, name, email, body) VALUES ({postId}, {id},{name},
        {email}, {body})""", {'postId': row['postId'], 'id': row['id'], 'name': row['name'], 'email': row['email'],
        'body': row['body']})

send_api_data()

This works fine, the problem is it is incredibly slow. It takes almost 2 mins to insert 500 records. I know there must be a more efficient way to do this. In reality I will be pulling in data from an API that has thousands of records that I will want to turn around and send into a DB table.

Any ideas on a better approach to take?


Solution

  • Executing single insert statements is slow in Exasol, because it's a column based database. You should use IMPORT instead. Make sure to read the best practices for pyexasol as well. Also consider enabling compression.

    For your example, try the following. It takes 0.7 seconds to import the data in my case.

    import requests
    import pyexasol
    import pandas
    import time
    
    def get_api_data():
        r = requests.get("http://jsonplaceholder.typicode.com/comments")
        data = r.json()
        return data
        
    def connection():
        session = pyexasol.connect_local_config('my_exasol')
        return session
    
    def send_api_data():
        s = connection()
        data = get_api_data()
    
        data_for_import = [(row['postId'], row['id'], row['name'], row['email'], row['body']) for row in data]
        start = time.time()
        s.import_from_iterable(data_for_import, ("TESTBED","TEST_API"))
        print("Finished import after ", time.time() - start, " seconds")
    
    send_api_data()