phpregexmongodbdatatables

Find a mongodb document using a partial _id string


I need to find either one or more documents in a collection that have a specific string in their _id field.

this is proving to be a problem since the _id field is an object and not a string so i cant just regex it.

for example lets say i have these documents with these _id:

54060b811e8e813c55000058 
54060e9e1e8e813c55000059
540738e082fa085e5f000015

and i want to search for "00005" then the result should be

54060b811e8e813c55000058
54060e9e1e8e813c55000059

is there anyway to accomplish this?

i need this for a jquery datatables implementaion which uses the server-side processing with php.

this means i need to add something to this portion of the code:

if ( !empty($input['sSearch']) ) {
    $sSearch = $input['sSearch'];

    for ( $i=0 ; $i < $iColumns ; $i++ ) {
        if ($input['bSearchable_'.$i] == 'true') {
            if ($input['bRegex'] == 'true') {
                $sRegex = str_replace('/', '\/', $sSearch);
            } else {
                $sRegex = preg_quote($sSearch, '/');
            }
            $searchTermsAny[] = array(
                $dataProps[$i] => new MongoRegex( '/'.$sRegex.'/i' )
            );
        }
    }
}

any advice would be apriciated

UPDATE:

thanks to saj it seems it is possible to find items using a partial _id by using a $where clause something like this:

$where: "this._id.toString().match(/pattern/i)"

i tried adding this to the php code like this:

if ( !empty($input['sSearch']) ) {
    $sSearch = $input['sSearch'];

    for ( $i=0 ; $i < $iColumns ; $i++ ) {
        if ($input['bSearchable_'.$i] == 'true') {
            if ($input['bRegex'] == 'true') {
                $sRegex = str_replace('/', '\/', $sSearch);
            } else {
                $sRegex = preg_quote($sSearch, '/');
            }
            $searchTermsAny[] = array(
                $dataProps[$i] => new MongoRegex( '/'.$sRegex.'/i',
                '$where: "this._id.toString().match(/'.$sRegex.'/i)"' )
            );
        }
    }
}

however now every query returns all records instead of only the ones that are suppsoed to match.

any ideas?

SOLUTION:

thanks to your help i have figured this out, in order to add an open search in the _id field i need to add a $where clause in the $or section of the query array.

specificaly in my situation i used the following code:

if ( !empty($input['sSearch']) ) {
    $sSearch = $input['sSearch'];

    for ( $i=0 ; $i < $iColumns ; $i++ ) {
        if ($input['bSearchable_'.$i] == 'true') {
            if ($input['bRegex'] == 'true') {
                $sRegex = str_replace('/', '\/', $sSearch);
            } else {
                $sRegex = preg_quote($sSearch, '/');
            }
            $searchTermsAny[] = array(
                $dataProps[$i] => new MongoRegex( '/'.$sRegex.'/i')
            );
        }
    }

    // add this line for string search inside the _id field
    $searchTermsAny[]['$where'] = "this._id.str.match(/$sSearch/)";
}

thank you for your help :)

as far as performance i agree this is the WRONG way to go and i will make sure to have an added strign field with the _id in it to make performance much better, but for now atleast i have a working solution to the problem.


Solution

  • The $regex and MongoRegex (i.e. a BSON regex type used in an equality match) only support matching against strings, so you cannot use them directly with an ObjectId.

    Regarding your last code example, you attempted to use $where in a MongoRegex constructor:

    $searchTermsAny[] = array(
        $dataProps[$i] => new MongoRegex( '/'.$sRegex.'/i',
        '$where: "this._id.toString().match(/'.$sRegex.'/i)"' )
    );
    

    MongoRegex's constructor takes a single string (e.g. /foo/i), from which it derives the pattern and flags. $where is intended to be use as a top-level query operator (not associated with any field name). I don't follow what you're doing with $dataProps[$i], but let's suppose you were constructing a single $where query to match an ObjectId's string representation. The query document would look like the following:

    { $where: 'this._id.str.match(/00005/)' }
    

    Note that I'm accessing the str property here instead of invoking toString(). That's because toString() actually returns the shell representation of the ObjectId. You can see this by checking its source in the shell:

    > x = new ObjectId()
    ObjectId("5409ddcfd95d6f6a2eb33e7f")
    > x.toString
    function (){
        return "ObjectId(" + tojson(this.str) + ")";
    }
    

    Also, if you're simply checking if a substring exists in the _id's hex representation, you may want to use indexOf() (with a != -1 comparison) instead of match() with a regex.


    That said, using $where is generally a bad idea if you're not combining it with additional query criteria that can use an index. This is because $where invokes the JavaScript interpreter for each document considered in the result set. If you combine it with other, more selective criteria, MongoDB can use an index and narrow down the documents that it needs to evaluate with $where; however, you're in for a bad time if you're using $where and scanning many documents or a table scan in the worst case.

    You're probably better of creating a second field in each document that contains the hex string representation of the _id. Then, you can index that field and query for it using a regex. The non-anchored regex queries will still be a bit inefficient (see: regex index use in the docs), but this should still be much faster than using $where.

    This solution (duplicating the _id string) will incur some added storage per document, but you may decide the additional 24-30 bytes (string payload and a short field name) is negligible.