[SOLVED] How to retrieve output data as JSON from IBM Watson cloud

How to retrieve output data as JSON from IBM Watson cloud

I am solving a decision optimization problem using Cplex solver (Docplex) on IBM Watson Cloud. However, I got stuck trying to adapt the output format.

Available documentation provides the following function to include in the main file in order to retrieve the output as a .csv file:

def write_all_outputs(outputs):
    '''Write all dataframes in ``outputs`` as .csv.

    Args:
        outputs: The map of outputs 'outputname' -> 'output df'
    '''
    for (name, df) in iteritems(outputs):
        csv_file = '%s.csv' % name
        print(csv_file)
        with get_environment().get_output_stream(csv_file) as fp:
            if sys.version_info[0] < 3:
                fp.write(df.to_csv(index=False, encoding='utf8'))
            else:
                fp.write(df.to_csv(index=False).encode(encoding='utf8'))
    if len(outputs) == 0:
        print("Warning: no outputs written")

Then in the deployment, the file type is specified :

solve_payload = {
      
        client.deployments.DecisionOptimizationMetaNames.OUTPUT_DATA: [
            {
                "id" : ".*\.csv"
            }
        ]
    }

All works fine with .csv files.

However, I would like to get a .json file instead. I have therefore created the corresponding dictionary and modified the function as follows:

def write_all_outputs(outputs):
    for (name, content) in iteritems(outputs):
        json_file = '%s.json' % name
        with get_environment().get_output_stream(json_file) as fp:
            fp.write(json.dumps({name: content}).encode('utf-8'))
    if len(outputs) == 0:
        print("Warning: no outputs written")

And also in the deployment:

solve_payload = {
        client.deployments.DecisionOptimizationMetaNames.OUTPUT_DATA: [
            {
                "id" : ".*\.json"
            }
        ]
    }

The problem is solved normally (to optimality). The issue is that I get a weird .JSON file content with random letters. I suspect there is something wrong with the write_all_outputs function. Note that running locally produces a proper .JSON file with no issues.

I would really appreciate any help regarding my problem.

Solution

For those encountering the same issue. I got an answer on IBM forum.

With inlined output data, everything except CSV files are base64 encoded. So you need to get the content of the json from the job details and decode it.

For example, with log.txt, it it:

    import base64
    import io
    
    output_data = job_details['entity']['decision_optimization']['output_data']
    
    logs = [line for o in output_data if o['id'] == 'log.txt' for line in io.BytesIO(base64.b64decode(o['content']))]
    for l in logs:
        print(l)