apache-sparkairflowlivy

How to pull Spark jobs client logs submitted using Apache Livy batches POST method using AirFlow


I am working on submitting Spark job using Apache Livy batches POST method.

This HTTP request is send using AirFlow. After submitting job, I am tracking status using batch Id.

I want to show driver ( client logs) logs on Air Flow logs to avoid going to multiple places AirFLow and Apache Livy/Resource Manager.

Is this possible to do using Apache Livy REST API?


Solution

  • Livy has an endpoint to get logs /sessions/{sessionId}/log & /batches/{batchId}/log.

    Documentation:

    You can create python functions like the one shown below to get logs:

    http = HttpHook("GET", http_conn_id=http_conn_id)
    
    def _http_rest_call(self, method, endpoint, data=None, headers=None, extra_options=None):
        if not extra_options:
            extra_options = {}
    
        self.http.method = method
        response = http.run(endpoint, json.dumps(data), headers, extra_options=extra_options)
    
        return response
    
    
    def _get_batch_session_logs(self, batch_id):
        method = "GET"
        endpoint = "batches/" + str(batch_id) + "/log"
        response = self._http_rest_call(method=method, endpoint=endpoint)
        # return response.json()
        return response