python apache-spark pyspark ontology owlready

Loading an ontology from string in Python

We have a Pyspark pair RDD which stores the path of .owl files as key and the file contents as value.

I wish to carry out reasoning using Owlready2. To load an ontology from OWL files, the get_ontology() function is used. However, the given function expects an IRI (a sort of URL) to the file, whereas I have the file contents as a str in Python.

Is there a way I could make this work out?

I have tried the following:

Used get_ontology(file_contents).load() --> this obviously does not work as the function expects a file path.
Used get_ontology(file_contents) --> no error, but the ontology does not get loaded, so reasoning does not happen.

Solution

Answering my own question.

The load() function in Owlready2 has a couple of more arguments which are not mentioned anywhere in the documentation. The function definitions of the package can be seen here.

Quoting from there, def load(self, only_local = False, fileobj = None, reload = False, reload_if_newer = False, **args) is the function signature.

We can see that a fileobj can also be passed, which is None by default. Further, the line fileobj = open(f, "rb") tells us that the file needs to be read in binary mode.

Taking all this into consideration, the following code worked for our situation:

from io import BytesIO # to create a file-like object
my_str = RDDList[1][1] # the pair RDD cell with the string data
my_str_as_bytes = str.encode(my_str) # convert to binary
fobj = BytesIO(my_str_as_bytes) 

abox = get_ontology("some-random-path").load(fileobj=fobj) # the path is insignificant, notice the 'fileobj = fobj'.