google-bigquery

Error loading file stored in Google Cloud Storage to Big Query


I have been trying to create a job to load a compressed json file from Google Cloud Storage to a Google BigQuery table. I have read/write access in both Google Cloud Storage and Google BigQuery. Also, the uploaded file belongs in the same project as the BigQuery one.

The problem happens when I access to the resource behind this url https://www.googleapis.com/upload/bigquery/v2/projects/NUMERIC_ID/jobs by means of a POST request. The content of the request to the abovementioned resource can be found as follows:

{
"kind" : "bigquery#job",
"projectId" : NUMERIC_ID,
"configuration": {
    "load": {
        "sourceUris": ["gs://bucket_name/document.json.gz"],
        "schema": {
            "fields": [
                {
                    "name": "id",
                    "type": "INTEGER"
                },
                {
                    "name": "date",
                    "type": "TIMESTAMP"
                },
                {
                    "name": "user_agent",
                    "type": "STRING"
                },
                {
                    "name": "queried_key",
                    "type": "STRING"
                },
                {
                    "name": "user_country",
                    "type": "STRING"
                },
                {
                    "name": "duration",
                    "type": "INTEGER"
                },
                {
                    "name": "target",
                    "type": "STRING"
                }
            ]
        },
        "destinationTable": {
            "datasetId": "DATASET_NAME",
            "projectId": NUMERIC_ID,
            "tableId": "TABLE_ID"
        }
    }
}

}

However, the error doesn't make any sense and can also be found below:

{
    "error": {
        "errors": [
            {
                "domain": "global",
                "reason": "invalid",
                "message": "Job configuration must contain exactly one job-specific configuration object (e.g., query, load, extract, spreadsheetExtract), but there were 0: "
            }
        ],
        "code": 400,
        "message": "Job configuration must contain exactly one job-specific configuration object (e.g., query, load, extract, spreadsheetExtract), but there were 0: "
    }
}

I know the problem doesn't lie either in the project id or in the access token placed in the authentication header, because I have successfully created an empty table before. Also I specify the content-type header to be application/json which I don't think is the issue here, because the body content should be json encoded.

Thanks in advance


Solution

  • Your HTTP request is malformed -- BigQuery doesn't recognize this as a load job at all. You need to look into the POST request, and check the body you send.

    You need to ensure that all the above (which seams correct) is the body of the POST call. The above Json should be on a single line, and if you manually creating the multipart message, make sure there is an extra newline between the headers and body of each MIME type.

    If you are using some sort of library, make sure the body is not expected in some other form, like resource, content, or body. I've seen libraries that use these differently.

    Try out the BigQuery API explorer: https://developers.google.com/bigquery/docs/reference/v2/jobs/insert and ensure your request body matches the one made by the API.