javagoogle-bigqueryclient-library

Any examples of loading to BigQuery using a POST request and Java client library?


Does anyone have any examples of creating a new insert job for BigQuery using both:


Solution

  • You need to call the bigquery.jobs().insert(...) method.

    I don't know what you have done yet but you should have an authenticated client to the API at least like:

    bigquery = new Bigquery.Builder(HTTP_TRANSPORT, JSON_FACTORY, credentials)
                    .setApplicationName("...").build();
    

    That's a simplified version of an insertRows method i wrote using the google-http-client library for java and the bigquery-api (you should check that the dataset exists, validate ids etc.):

    public Long insertRows(String projectId, 
                           String datasetId, 
                           String tableId, 
                           InputStream schema,
                           AbstractInputStreamContent data) {
        try {
    
            // Defining table fields
            ObjectMapper mapper = new ObjectMapper();
            List<TableFieldSchema> schemaFields = mapper.readValue(schema, new TypeReference<List<TableFieldSchema>>(){});
            TableSchema tableSchema = new TableSchema().setFields(schemaFields);
    
            // Table reference
            TableReference tableReference = new TableReference()
                    .setProjectId(projectId)
                    .setDatasetId(datasetId)
                    .setTableId(tableId);
    
            // Load job configuration
            JobConfigurationLoad loadConfig = new JobConfigurationLoad()
                    .setDestinationTable(tableReference)
                    .setSchema(tableSchema)
                    // Data in Json format (could be CSV)
                    .setSourceFormat("NEWLINE_DELIMITED_JSON")
                    // Table is created if it does not exists
                    .setCreateDisposition("CREATE_IF_NEEDED")
                    // Append data (not override data)
                    .setWriteDisposition("WRITE_APPEND");
            // If your data are coming from Google Cloud Storage
            //.setSourceUris(...);
    
            // Load job
            Job loadJob = new Job()
                    .setJobReference(
                            new JobReference()
                                    .setJobId(Joiner.on("-").join("INSERT", projectId, datasetId,
                                            tableId, DateTime.now().toString("dd-MM-yyyy_HH-mm-ss-SSS")))
                                    .setProjectId(projectId))
                    .setConfiguration(new JobConfiguration().setLoad(loadConfig));
            // Job execution
            Job createTableJob = bigquery.jobs().insert(projectId, loadJob, data).execute();
            // If loading data from Google Cloud Storage
            //createTableJob = bigquery.jobs().insert(projectId, loadJob).execute();
    
            String jobId = createTableJob.getJobReference().getJobId();
            // Wait for job completion
            createTableJob = waitForJob(projectId, createTableJob);
            Long rowCount = createTableJob != null ? createTableJob.getStatistics().getLoad().getOutputRows() : 0l;
            log.info("{} rows inserted in table '{}' (dataset: '{}', project: '{}')", rowCount, tableId, datasetId, projectId);
            return rowCount;
        }
        catch (IOException e) { throw Throwables.propagate(e); }
    }
    

    I don't know the format of your data but if your are using files, you can add a function like:

     public Long insertRows(String projectId, String datasetId, String tableId, File schema, File data) {
        try {
            return insertRows(projectId, datasetId, tableId, new FileInputStream(schema),
                    new FileContent(MediaType.OCTET_STREAM.toString(), data));
        }
        catch (FileNotFoundException e) { throw Throwables.propagate(e); }
    }