javascriptcouchdbcouchdb-futon

Couchdb view on large documents with strings as keys times out


I'm trying to create a view in CouchDB 2.1 for documents that are very large (around 300k-900k lines for each document, around 15-20 documents in total).

The documents look like this:

{
"222123456": {
    "_id": "222123456",
    "type": "Order",
    "0300": {
        "51234567": {
            "_id": "51234567",
            "type": "Material",
            "DS": "M532F1234567",
            "HTZ": "M532-F1234-000-00",
            "A name for some material": {
                "_id": "A name for some material",
                "type": "Description",
                "0054": {
                    "600": {
                        "1": {
                            "_id": "1",
                            "type": "Amount",
                            "X": {
                                "11220": {
                                    "_id": "11220",
                                    "type": "row"
                                },
                                "_id": "X",
                                "type": "Bulk"
                            }
                        },
                        "_id": "600",
                        "type": "Site"
                    },
                    "_id": "0054",
                    "type": "Pos"
                }
            }
        },
        "51255111": {
            // And another material
            // ...
        },
        "_id": "0300",
        "type": "Process"
    }
    // + more orders with more items
},
"222555666": {
    // Another order with more processes which contain even more materials
    // ...
},
"_id": "FileImport_001",
"_rev": "1-2f77e699332bb7c76a137b86f83bbe91",
"type": "Machine"
}

Every document has 1-n orders, every order has 1-n processes and every process contains 1-n materials which I'm trying to query. My current view iterates through all orders, processes and materials with for loops.

This is the view I'm using:

function (doc) {
    var splitMsn = doc._id.split("_"); // Split _id into [FileImport, 001] array
    for (var key_order in doc) { // For every order in the document...
        if (typeof doc[key_order] == 'object' && doc[key_order] != '') { // where the value is an object and not empty...
            var order = doc[key_order]; // Save the order as a value
            for (var key_process in order) { // ...and search all processes in that order nr
                if (typeof order[key_process] == 'object' && order[key_process] != '') { // If process contains an object as value and it's not empty
                    var process = order[key_process]; // Save the process as a value
                    for (var key_matnr in process) { // For every material in the process
                        if (typeof process[key_matnr] == 'object' && process[key_matnr] != '') { // If material nr contains an object as value and not empty
                            var matnr = process[key_matnr]; // Save material nr as value
                            for (var key_matname in matnr) { // For every material name in the material number
                                if (typeof matnr[key_matname] == 'object' && matnr[key_matname] != '') { // Contains object and not empty
                                    var matname = matnr[key_matname]; // Save material name
                                    emit([splitMsn[1], key_order, key_process, key_matnr], matname); // emit [001, 222123456, 0300, 51234567], Material name
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

With this view I can query for a specific document number, order, process and the material number. In return I get the material name with the amount (e.g. 1 which I'm after).

When I'm using one document the index is created just fine but even with a second document (let alone 15 or 20) CouchDB says "OS process timed out" while creating the view.

My question: is there a faster and/or more elegant way to iterate through all these steps to finally get to the deeply buried "amount" value that I need?

Many thanks in advance!


Solution

  • The system is protecting itself from you.

    In general, using large documents isn't hitting CouchDB's sweetspot. Add in deeply nested structures and very complex maps, and your situation is even worse.

    I'd recommend reconsidering your data model. Use (much) smaller documents (one per material, say). Your map function will be much simpler, too.