mongodbhadoopapache-pigmongodb-queryhadoop-plugins

Adding a mongo query when importing data with pig and mongodb


How would you attach a query when importing data using MongoLoader in apache pig. I could see in the mongo-hadoop wiki that there is reference to "mongo.input.query" but it seems to relate to the standard map reduce functionality and not Apache Pig.

raw = LOAD 'mongodb://localhost:27017/demo.yield_historical' USING com.mongodb.hadoop.pig.MongoLoader;

Would it be similar to this?

raw = LOAD 'mongodb://localhost:27017/demo.yield_historical' USING com.mongodb.hadoop.pig.MongoLoader WITH mongo.input.query={"_id":{"$gt":{"$date":1182470400000}}};

Solution

  • You can do it like this:

    set mongo.input.query '{"value.task.creation":{ "$gte": { "$date": 1421366400}, "$lt" : { "$date": 1421539200} } }'
    
    data = LOAD 'mongodb://54.93.131.188:27017/foo.units'
              USING com.mongodb.hadoop.pig.MongoLoader(); 
    DUMP data;