google-cloud-data-fusioncdap

How to pass schema file as Macros to BigQuery sink in data fusion


I am creating a data fusion pipeline to load csv data from GCS to BigQuery for my use case i need to create a property macros and provide the value during runtime. Need to understand how we can pass the schema file as Macros to BigQuery sink. If i simply pass the json schema file path to Macros values i am getting the following error.

java.lang.IllegalArgumentException: Invalid schema: Use JsonReader.setLenient(true) to accept malformed JSON at line 1 column 1


Solution

  • There is currently no way to use the contents of a file as a macro value, though there is a jira open for something like this (https://issues.cask.co/browse/CDAP-15424). It is expected that the schema contents should be set as macro value. The UI currently doesn't handle these types of macro values very well (https://issues.cask.co/browse/CDAP-15423), so I would suggest setting it through the REST endpoint (https://docs.cdap.io/cdap/6.0.0/en/reference-manual/http-restful-api/preferences.html#H2290), where the app name is the pipeline name.

    Alternatively, you can make your pipeline a little more generic by writing an Action plugin that looks something like:

    @Override
    public void run(ActionContext context) throws Exception {
      String schema = readFileContents();
      context.getArguments().setArgument(key, schema);
    }
    

    The plugin would be the first stage in your pipeline, and would allow subsequent stages in your pipeline to use ${key} as a macro that would be replaced with the actual schema.