javascriptgoogle-docs-api

Avoid calculating startIndex and endIndex when creating a document


I have proven to myself that I can insert text into a Google Docs document using this code:

function appendToDocument() {
    let offset = 12;
    let updateObject = {
        documentId: 'xxxxxxx',
        resource: {
            requests: [{
                "insertText": {
                    "text": "John Doe",
                    "location": {
                        "index": offset,
                    },
                },
            }],
        },
    };
    gapi.client.docs.documents.batchUpdate(updateObject).then(function(response) {
        appendPre('response =  ' + JSON.stringify(response));
    }, function(response) {
        appendPre('Error: ' + response.result.error.message);
    });
}

My next step is to create an entire, complex document using the API. I am stunned by what appears to be the fact that I need to maintain locations into the documents, like this

new Location().setIndex(25)

I am informing myself of that opinion by reading this https://developers.google.com/docs/api/how-tos/move-text

The document I am trying to create is very dynamic and very complex, and handing the coding challenge to keeping track of index values to the API user, rather than the API designer, seems odd.

Is there an approach, or a higher level API, that allows me to construct a document without this kind of housekeeping?


Solution

  • Unfortunately, the short answer is no, there's no API that lets you bypass the index-tracking required of the base Google Docs API - at least when it comes to building tables.

    I recently had to tackle this issue myself - a combination of template updating and document construction - and I basically ended up writing an intermediate API with helper functions to search for and insert by character indices.

    For example, one trick I've been using for table creation is to first create a table of a specified size at a given index, and put some text in the first cell. Then I can search the document object for the tableCells element that contains that text, and work back from there to get the table start index.

    Another trick is that if you know how many specific kinds of objects (like tables) you have in your document, you can parse through the full document object and keep track of table counts, and stop when you get to the one you want to update/delete (you can use this approach for creating too but the target text approach is easier, I find).

    From there with some JSON parsing and trial-and-error, you can figure out the start index of each cell in a table, and write functions to programmatically find and create/replace/delete. If there's an easier way to do all this, I haven't found it. There is one Github repo with a Google Docs API wrapper specifically for tables, and it does appear to be active, although I found it after I wrote everything on my own and I haven't used it.)

    Here's a bit of code to get you started:

    def get_target_table(doc, target_txt):
        """ Given a target string to be matched in the upper left column of a table
            of a Google Docs JSON object, return JSON representing that table. """
        body = doc["body"]["content"]
        for element in body:
            el_type = list(element.keys())[-1]
            if el_type == "table":
                header_txt = get_header_cell_text(element['table']).lower().strip()
                if target_txt.lower() in header_txt:
                    return element
        return None
    
    def get_header_cell_text(table):
        """ Given a table element in Google Docs API JSON, find the text of
            the first cell in the first row, which should be a column header. """
        return table['tableRows'][0]\
            ['tableCells'][0]\
            ['content'][0]\
            ['paragraph']['elements'][0]\
            ['textRun']['content']
    

    Assuming you've already created a table with the target text in it: now, start by pulling the document JSON object from the API, and then use get_target_table() to find the chunk of JSON related to the table.

    doc = build("docs", "v1", credentials=creds).documents().get(documentId=doc_id).execute() 
    table = get_target_table(doc, "my target")
    

    From there you'll see the nested tableRows and tableCells objects, and the content inside each cell has a startIndex. Construct a matrix of table cell start indices, and then, for populating them, work backwards from the bottom right cell to the upper left, to avoid displacing your stored indices (as suggested in the docs and in one of the comments).

    It's definitely a bit of a slog. And styling table cells is a whole 'nother beast, which is a dizzying maze of JSON options. The interactive JSON constructor tool on the Docs API site is useful to get the syntax write.

    Hope this helps, good luck!