pythonjupyter-notebookipython-magicjsoniq

Error running a magic programmatically via IPython's run_cell_magic


Consider the following program, which I wrote in two Jupyter Notebook cells.

Cell 1:

import rumbledb as rmbl
%load_ext rumbledb
%env RUMBLEDB_SERVER=http://public.rumbledb.org:9090/jsoniq

Cell 2:

%%jsoniq
parse-json("{\"x\":3}}").x

After executing

spark-submit rumbledb-1.21.0-for-spark-3.5.jar serve -p 9090

in a Git Bash console, when I run these two cells in order, the output of the second cell is

Took: 0.2607388496398926 ms
3

I'd like to rewrite cell 2 so as not to use the cell magic literally (%% syntax) but programatically, via a function. The reason I'd like to avoid using it literally is so that I can encapsulate it in a function, as I described in this post.

I tried the advice in the end of this answer and rewrote cell 2 as follows:

Cell 3:

from IPython import get_ipython
ipython = get_ipython()
ipython.run_cell_magic('jsoniq', '', 'parse-json("{\"x\":3}}").x')

However, when I ran this cell, I got the following error message:

Took: 2.2189698219299316 ms
There was an error on line 2 in file:/home/ubuntu/:


Code: [XPST0003]
Message: Parser failed. 

Metadata: file:/home/ubuntu/:LINE:2:COLUMN:0:
This code can also be looked up in the documentation and specifications for more information.
  1. Why did I get the error message, and how can I get rid of it?
  2. Why did I not get 3 in the output, and how can I get it?

Solution

  • Following up now that there is a Python edition of RumbleDB, the query would look like so (using a raw string as suggested by @Anerdw):

    !pip install jsoniq
    from jsoniq import RumbleSession
    rumble = RumbleSession.builder.getOrCreate();
    print(rumble.jsoniq(r"""
       parse-json("{\"x\":3}").x
    """).json());
    

    Notes:

    1. A JSONiq query always returns a sequence of items. json() maps it to a Python list (with one JSON value per item). When the sequence has only one item (as is the case here), this returns a list with just one JSON value. (In fact, RumbleDB also supports sequences with billions of items if it runs on a cluster, in which case other output methods are preferred, such as df() or rdd() or write()).

    2. With the jsoniq package, there is no need to run a server any more. The RumbleDB jar file is shipped with this package, and RumbleSession is wrapped around a SparkSession.

    3. The jsoniq package is still considered to be alpha.

    4. Java 17 or 21 is required. For example, if using Google Colab, this can be done like so:

    import os
    !apt-get install -y openjdk-17-jdk-headless -qq
    os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-17-openjdk-amd64"