azurecluster-computingazure-data-explorerazure-cloud-services

How to Ingest data in to Azure Cluster using Python


I have a set of data that I would like to query using the KQL using Azure Data Explorer. I have a continuous source of incoming data every few seconds. This data I would like to put into the Azure Cluster to run a query.

I have explored few options using python library but it only supports using file or blob.

https://learn.microsoft.com/en-us/azure/data-explorer/python-ingest-data

How do I put single record using python into so that I can query using Azure Data Explorer.


Solution

  • There's sample code for ingesting from a data frame (with one or more records) using the azure-kusto-ingest library here

    cluster = "https://ingest-{cluster_name}.kusto.windows.net/"
    
    # In case you want to authenticate with AAD application.
    client_id = "<insert here your AAD application id>"
    client_secret = "<insert here your AAD application key>"
    
    # read more at https://docs.microsoft.com/en-us/onedrive/find-your-office-365-tenant-id
    authority_id = "<insert here your tenant id>"
    
    kcsb = KustoConnectionStringBuilder.with_aad_application_key_authentication(cluster, client_id, client_secret, authority_id)
    
    client = QueuedIngestClient(kcsb)
    
    # there are a lot of useful properties, make sure to go over docs and check them out
    ingestion_props = IngestionProperties(
        database="{database_name}",
        table="{table_name}",
        data_format=DataFormat.CSV,
        # in case status update for success are also required (remember to import ReportLevel from azure.kusto.ingest)
        # report_level=ReportLevel.FailuresAndSuccesses,
        # in case a mapping is required (remember to import IngestionMappingKind from azure.kusto.data.data_format)
        # ingestion_mapping_reference="{json_mapping_that_already_exists_on_table}",
        # ingestion_mapping_kind= IngestionMappingKind.JSON,
    )
    
    ###########################
    ## ingest from dataframe ##
    ###########################
    
    import pandas
    
    fields = ["id", "name", "value"]
    rows = [[1, "abc", 15.3], [2, "cde", 99.9]]
    
    df = pandas.DataFrame(data=rows, columns=fields)
    
    client.ingest_from_dataframe(df, ingestion_properties=ingestion_props)