pythonpandasdruid

Column names do not appear using druiddb


I am running Apache Druid datastore locally. I am loading data from a Kafka stream.

On Druid, I can see the column names:

enter image description here

And then using druiddb (https://github.com/betodealmeida/druid-dbapi), I am writing an SQL query and reading data into Python environment and putting it in a pandas dataframe. However, some column names do not appear:

from druiddb import connect
# https://github.com/betodealmeida/druid-dbapi
import pandas as pd

druid_host = "localhost"
druid_port = 8888
druid_path = "/druid/v2/sql"
druid_scheme = "http"
druid_query = """SELECT * FROM malaria_cases_full"""    
druid_connection = connect(host=druid_host, port=druid_port, path=druid_path, scheme=druid_scheme)
druid_cursor= druid_connection.cursor()
df = pd.DataFrame(druid_cursor.execute(druid_query))
df.head(n =10)

enter image description here


Solution

  • I suggest you to use the (official?) Python connector for Druid, which is pydruid.

    Or simply read_sql with an engine :

    # pip install sqlalchemy==1.4.4
    from sqlalchemy import MetaData, Table
    from sqlalchemy.engine import create_engine
    
    engine = create_engine("druid://localhost:8888/druid/v2/sql") # add ?header=True
    ta = Table("wikipedia", MetaData(bind=engine), autoload=True) # if needed
    
    df = pd.read_sql(ta.select(), engine)
    

    Output :

    print(df.columns)
    
    Index(['__time', 'isRobot', 'channel', 'flags', 'isUnpatrolled', 'page',
           'diffUrl', 'added', 'comment', 'commentLength', 'isNew', 'isMinor',
           'delta', 'isAnonymous', 'user', 'deltaBucket', 'deleted', 'namespace',
           'cityName', 'countryName', 'regionIsoCode', 'metroCode',
           'countryIsoCode', 'regionName'],
          dtype='object')
    

    print(df)
    
                             __time isRobot  ... countryIsoCode       regionName
    21243  2016-06-27T18:50:07.084Z    true  ...                                
    10272  2016-06-27T10:20:13.238Z   false  ...             AU  New South Wales
    ...                         ...     ...  ...            ...              ...
    21271  2016-06-27T18:51:31.698Z   false  ...                                
    5773   2016-06-27T06:16:43.741Z   false  ...                                
    
    [24433 rows x 24 columns]
    

    Used datasource (wikipedia) :

    enter image description here