I have a server running IP controller and 12 IPengines. I connect to the controller from my laptop using SSH. I submitted some jobs to the controller using the load-balanced view interface (in non-blocking mode) and stored the message IDs in the Asyc Result object returned the by apply_async() method.
I accidentally lost the message IDs for the jobs and wanted to know if there's a way to retrieve the job IDs (or the results) from the Hub database. I use a SQLite database for the Hub, and I can get the rc.db_query() method to work, but I don't know what to look for.
Does anyone know how to query the Hub database only for message IDs of the jobs I submitted? What's the easiest way of retrieving the job results from the Hub, if I don't have access to the AsyncHubResult object (or their message IDs)?
Thanks!
Without the message IDs, you are might have a pretty hard time finding the right tasks, unless there haven't been so many tasks submitted.
The querying is based on MongoDB (it's a passthrough when you use mongodb, and a subset of simple operators are implemented for sqlite).
Quick summary: a query is a dict. If you use literal values, they are equality tests, but you can use dict values for comparison operators.
You can search by date for any of the timestamps:
For instance, to find tasks submitted yesterday:
from datetime import date, time, timedelta, datetime
# round to midnight
today = datetime.combine(date.today(), time())
yesterday = today - timedelta(days=1)
rc.db_query({'submitted': {
'$lt': today, # less than midnight last night
'$gt': yesterday, # greater than midnight the night before
}})
or all tasks submitted 1-4 hours ago:
found = rc.db_query({'submitted': {
'$lt': datetime.now() - timedelta(hours=1),
'$gt': datetime.now() - timedelta(hours=4),
}})
With the results of that, you can look at keys like client_uuid
to retrieve all messages submitted by a given client instance (e.g. a single notebook or script):
client_id = found[0]['client_uuid']
all_from_client = rc.db_query({'client_uuid': client_uuid})
Since you are only interested in results at this point, you can specify keys=['msg_id']
to only retrieve the message IDs. We can then use these msg_ids to get all the results produced by a single client session:
# construct list of msg_ids
msg_ids = [ r['msg_id'] for r in rc.db_query({'client_uuid': client_uuid}, keys=['msg_id']) ]
# use client.get_result to retrieve the actual results:
results = rc.get_result(msg_ids)
At this point, you have all of the results, but you have lost the association of which results came from which execution. There isn't a lot of info to help you out there, but you might be able to tell by type, timestamps, or perhaps select the 9 final items from a given session.