How can I access all the records in each set using Sickle?
I can access sets like this, but I don't know how to go from here and download each record from every set:
from sickle import Sickle
sickle = Sickle('http://www.duo.uio.no/oai/request')
sets = sickle.ListSets()
for s in sets:
print s
The print prints out every set like this:
<set xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><setSpec>com_10852_1</setSpec><setName>Det matematisk-naturvitenskapelige fakultet</setName></set>
I can also iterate through the sets to go deeper:
for s in sets:
for rec in sets:
print rec
This prints all the sub-sets, so it's probably from here I can get access to the individual records, but the API is hard to understand, and I have not be able to access the records.
Be sure to read the short and sweet Tutorial.
For harvesting an entire OAI-PMH repository, you do not need to iterate over sets. Here is the complete code:
from sickle import Sickle
sickle = Sickle('http://www.duo.uio.no/oai/request')
recs = sickle.ListRecords(metadataPrefix="oai_dc")
for r in recs:
print r
If for some reason you really wish to harvest records set by set, you can certainly do so. Here is the complete code again:
from sickle import Sickle
sickle = Sickle('http://www.duo.uio.no/oai/request')
sets = sickle.ListSets()
for s in sets:
recs = sickle.ListRecords(metadataPrefix="oai_dc", set=s.setSpec)
for r in recs:
print r