databasedata-processing

How does DataSQRL handle structured and unstructured data?


I am curious about how DataSQRL handles structured and unstructured data. What are the differences between structured and unstructured data and how does DataSQRL process each type? How do I configure what type of data I’m ingesting?

I've read through the "What is SQRL?" section of the docs, in particular the bit about Nested Tables but it's not quite clear if it's really just as straight-forward as it seems. For example, is there a limit to how deeply tables can be nested either practically or by design?


Solution

  • DataSQRL can ingest both unstructured data (like SQL tables) and semi-structured data (like JSON documents). The data format documentation page lists the input formats that are supported.

    Semi-structured data like JSON is represented as nested tables like you said. The mapping to nested tables is controlled by the schema configuration file that is provided by the data source. To customize the mapping, you can run the data discovery command and then go into the directory where the data source was created to find the schema configuration file that ends in .schema.yml. You can change this file to update the mapping.

    There is no logical limit to how deeply data can be nested. We have tested it up to 4 levels of nesting. So, I would consider that a practical limit for now and pre-process the data if it is more deeply nested than that.