scalamongodbmongodb-scalacasbah

Casbah's problem with large number of returned objects


Casbah (or the java driver for mongodb) seems to have problem dealing with a large number of returned objects. For example, the following code segment would produce an IllegalArgumentException and won't return a single result (full stack trace below). However, if I reduce the "limit(...)" to 1994, everything seems to work fine.

for (link <- links; query = link $exists true) {
    val group = new HashMap[String, Set[(String, String)]] with MultiMap[String, (String, String)]
    log.find(query, fieldsToGet.result).limit(1996) foreach {

      x => {
        group.addBinding(x.get(link).toString, (x.get("_id").toString(), x.get("eventType").toString))
      }

    }
    allGroups += link -> group
  }

Apr 26, 2011 8:23:40 PM com.mongodb.DBTCPConnector$MyPort error
SEVERE: MyPort.error called
java.lang.IllegalArgumentException: response too long: 1278031173
    at com.mongodb.Response.<init>(Response.java:40)
    at com.mongodb.DBPort.go(DBPort.java:101)
    at com.mongodb.DBPort.go(DBPort.java:66)
    at com.mongodb.DBPort.call(DBPort.java:56)
    at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:211)
    at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:266)
    at com.mongodb.DBCursor._check(DBCursor.java:309)
    at com.mongodb.DBCursor._hasNext(DBCursor.java:431)
    at com.mongodb.DBCursor.hasNext(DBCursor.java:456)
    at com.mongodb.casbah.MongoCursorBase$class.hasNext(MongoCursor.scala:72)
    at com.mongodb.casbah.MongoCursor.hasNext(MongoCursor.scala:517)
    at scala.collection.Iterator$class.foreach(Iterator.scala:631)
    at com.mongodb.casbah.MongoCursor.foreach(MongoCursor.scala:517)
    at Sequencer$$anonfun$3.apply(Sequencer.scala:23)
    at Sequencer$$anonfun$3.apply(Sequencer.scala:20)
    at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
    at scala.collection.immutable.List.foreach(List.scala:45)
    at Sequencer$.<init>(Sequencer.scala:20)
    at Sequencer$.<clinit>(Sequencer.scala)
    at Sequencer.main(Sequencer.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:115)
Exception in thread "main" java.lang.ExceptionInInitializerError
    at Sequencer.main(Sequencer.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:115)
Caused by: java.lang.IllegalArgumentException: response too long: 1278031173
    at com.mongodb.Response.<init>(Response.java:40)
    at com.mongodb.DBPort.go(DBPort.java:101)
    at com.mongodb.DBPort.go(DBPort.java:66)
    at com.mongodb.DBPort.call(DBPort.java:56)
    at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:211)
    at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:266)
    at com.mongodb.DBCursor._check(DBCursor.java:309)
    at com.mongodb.DBCursor._hasNext(DBCursor.java:431)
    at com.mongodb.DBCursor.hasNext(DBCursor.java:456)
    at com.mongodb.casbah.MongoCursorBase$class.hasNext(MongoCursor.scala:72)
    at com.mongodb.casbah.MongoCursor.hasNext(MongoCursor.scala:517)
    at scala.collection.Iterator$class.foreach(Iterator.scala:631)
    at com.mongodb.casbah.MongoCursor.foreach(MongoCursor.scala:517)
    at Sequencer$$anonfun$3.apply(Sequencer.scala:23)
    at Sequencer$$anonfun$3.apply(Sequencer.scala:20)
    at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:61)
    at scala.collection.immutable.List.foreach(List.scala:45)
    at Sequencer$.<init>(Sequencer.scala:20)
    at Sequencer$.<clinit>(Sequencer.scala)
    ... 6 more

Seems the exception was produced by the following check in the "Response.java" in the java driver.

ByteArrayInputStream bin = new ByteArrayInputStream( b );
_len = Bits.readInt( bin );
if ( _len > ( 32 * 1024 * 1024 ) )
 throw new IllegalArgumentException( "response too long: " + _len );

Could it be caused by that particular object returned? or could this be about casbah?

Thanks, Derek


Solution

  • It looks like the Java driver is checking to see if the current response block is greater than 32 Megabytes and then throwing the exception.

    If you set the batchSize(FEWER_NUMBER_OF_DOCS) on the cursor, this will reduce the lock time in the database and return less than 32 MB worth of data.

    I would play around with the batchSize to see what is optimal for your application.

    http://api.mongodb.org/scala/casbah/2.1.2/scaladoc/

    The max should probably be increased in the Java driver.

    The strange part about your response is that it says it is returning ~ 1.19 GB worth of data.

    If your response doesn't have that much data, it may indicate the collection is corrupt.