We have a Pyspark pair RDD which stores the path of .owl
files as key and the file contents as value.
I wish to carry out reasoning using Owlready2. To load an ontology from OWL files, the get_ontology()
function is used. However, the given function expects an IRI (a sort of URL) to the file, whereas I have the file contents as a str
in Python.
Is there a way I could make this work out?
I have tried the following:
get_ontology(file_contents).load()
--> this obviously does not work as the function expects a file path.get_ontology(file_contents)
--> no error, but the ontology does not get loaded, so reasoning does not happen.Answering my own question.
The load()
function in Owlready2 has a couple of more arguments which are not mentioned anywhere in the documentation. The function definitions of the package can be seen here.
Quoting from there, def load(self, only_local = False, fileobj = None, reload = False, reload_if_newer = False, **args)
is the function signature.
We can see that a fileobj
can also be passed, which is None
by default. Further, the line fileobj = open(f, "rb")
tells us that the file needs to be read in binary mode.
Taking all this into consideration, the following code worked for our situation:
from io import BytesIO # to create a file-like object
my_str = RDDList[1][1] # the pair RDD cell with the string data
my_str_as_bytes = str.encode(my_str) # convert to binary
fobj = BytesIO(my_str_as_bytes)
abox = get_ontology("some-random-path").load(fileobj=fobj) # the path is insignificant, notice the 'fileobj = fobj'.