Consider the following program, which I wrote in two Jupyter Notebook cells.
Cell 1:
import rumbledb as rmbl
%load_ext rumbledb
%env RUMBLEDB_SERVER=http://public.rumbledb.org:9090/jsoniq
Cell 2:
%%jsoniq
parse-json("{\"x\":3}}").x
After executing
spark-submit rumbledb-1.21.0-for-spark-3.5.jar serve -p 9090
in a Git Bash console, when I run these two cells in order, the output of the second cell is
Took: 0.2607388496398926 ms
3
I'd like to rewrite cell 2 so as not to use the cell magic literally (%%
syntax) but programatically, via a function. The reason I'd like to avoid using it literally is so that I can encapsulate it in a function, as I described in this post.
I tried the advice in the end of this answer and rewrote cell 2 as follows:
Cell 3:
from IPython import get_ipython
ipython = get_ipython()
ipython.run_cell_magic('jsoniq', '', 'parse-json("{\"x\":3}}").x')
However, when I ran this cell, I got the following error message:
Took: 2.2189698219299316 ms
There was an error on line 2 in file:/home/ubuntu/:
Code: [XPST0003]
Message: Parser failed.
Metadata: file:/home/ubuntu/:LINE:2:COLUMN:0:
This code can also be looked up in the documentation and specifications for more information.
3
in the output, and how can I get it?Following up now that there is a Python edition of RumbleDB, the query would look like so (using a raw string as suggested by @Anerdw):
!pip install jsoniq
from jsoniq import RumbleSession
rumble = RumbleSession.builder.getOrCreate();
print(rumble.jsoniq(r"""
parse-json("{\"x\":3}").x
""").json());
Notes:
A JSONiq query always returns a sequence of items. json() maps it to a Python list (with one JSON value per item). When the sequence has only one item (as is the case here), this returns a list with just one JSON value. (In fact, RumbleDB also supports sequences with billions of items if it runs on a cluster, in which case other output methods are preferred, such as df() or rdd() or write()).
With the jsoniq package, there is no need to run a server any more. The RumbleDB jar file is shipped with this package, and RumbleSession is wrapped around a SparkSession.
The jsoniq package is still considered to be alpha.
Java 17 or 21 is required. For example, if using Google Colab, this can be done like so:
import os
!apt-get install -y openjdk-17-jdk-headless -qq
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-17-openjdk-amd64"