cjsonalgorithmparsingjson-c

Traversing through and modifying a C json string


What's the correct way to traverse and modify a JSON string in C?

Specifically, I have a string, body_buf. When printed out

print("length: %d\n%.*s\n", body_len, body_len, body_buf);

It looks like this:

length: 113
{"field1":"something","whatever":10,"description":"body","id":"random","__oh__":{"session":"12345678jhgfdrtyui"}}

Another more complicated body_buf may look like this:

{"status":1,"query":{},"proc":{"memory":{"total":17177939968,"cmax":18363625472,"amax":20000000000},"cpu":{"cores":[0.788,0.132,0.319,2.951,10.111,3.309,1.43,0.8,2.705,4.203,2.32,2,0.019,0.172,0.247,3.888,0.282,0.423,5.254,0.258,0.009,0.369,3.277,0.048,0.283,7.574,3.086,1.592,0.191,0.166,4.348,0.391,0.085,0.25,7.12,4.927,3.671,1.147,3.216,4.628,0.131,0.995,0.744,4.252,4.022,3.505,3.758,3.491],"total":108.886,"limit":800},"disk":{"used":20170,"limit":50000,"io_limit":500}}}

I want to simplify body_buf (which also doubles as removing sensitive information) according to the following rules, only modifying the values, not any of the keys:

  1. Strings become the length of strings.
  2. Array of strings becomes [array_len, max_len, min_len].
  3. Array of numbers becomes [array_len, max, min].

I'm not familiar with working with JSON strings in C. What's the best way to do this?

I can treat body_buf as a string and traverse through it, modifying whatever comes after a ":", because those are bound to be the values I might modify, depending on the type. For arrays, I need to keep track of anything that are sandwiched between "[" and "]". This could work but doesn't seem very straightforward.

Alternatively, perhaps convert the body_buf to a JSON type and then traverse through the nested structure. But then I also have to modify it. I have yet to find a C example (which would be helpful) using json-c or otherwise that traverses and modifies (or create a new one via some kind of deep copy?) a JSON object.

Details (rules above, 1-3) aside, this should be a relatively common operation -- to traverse and modify. So for those more attuned to the intricacies and good/standard practices of json-c or JSON manipulation in general in C, I'm looking for some pointers.

Again, I have json-c:

#include "cJSON.h"
#include "cJSON_Utils.h"
#include <libjson/json.h>
#include <libjson/json_tokener.h>

Relevant information I've looked at so far include the following:

https://gist.github.com/alan-mushi/19546a0e2c6bd4e059fd

How to get json values after json_tokener_parse()?

Parsing deeply nested JSON key using json-c


Solution

  • I don't know how "simplify" the json will be useful. Using json in c can be scary the first time.

    I like cJSON library, it is light, portable and stable. It has a good test coverage, and the license is MIT.

    I think this code using the library cJSON will do what you asked:

    #include <cjson/cJSON.h>
    #include <stdbool.h>
    #include <string.h>
    #include <stdio.h>
    #include <limits.h>
    #include <float.h>
    
    const char json1[] = "{\"field1\":\"something\",\"whatever\":10,\"description\":\"body\",\"id\":\"random\",\"__oh__\":{\"session\":\"12345678jhgfdrtyui\"}}";
    const char json2[] = "{\"status\":1,\"query\":{},\"proc\":{\"memory\":{\"total\":17177939968,\"cmax\":18363625472,\"amax\":20000000000},\"cpu\":{\"cores\":[0.788,0.132,0.319,2.951,10.111,3.309,1.43,0.8,2.705,4.203,2.32,2,0.019,0.172,0.247,3.888,0.282,0.423,5.254,0.258,0.009,0.369,3.277,0.048,0.283,7.574,3.086,1.592,0.191,0.166,4.348,0.391,0.085,0.25,7.12,4.927,3.671,1.147,3.216,4.628,0.131,0.995,0.744,4.252,4.022,3.505,3.758,3.491],\"total\":108.886,\"limit\":800},\"disk\":{\"used\":20170,\"limit\":50000,\"io_limit\":500}}}";
    const char json3[] = "{\"Name\":\"Tom\",\"Age\":18,\"Address\":\"California\",\"arr\":[1,2,3,4,5]}";
    
    static void simplifyArray(cJSON *input, cJSON *output)
    {  
        cJSON *item;
        size_t noElems = 0;
        
        if (cJSON_IsString(cJSON_GetArrayItem(input, 0))) {
            size_t max, min;
            max = 0;
            min = UINT_MAX;
            cJSON_ArrayForEach(item, input) {
                noElems++;
                size_t len = strlen(cJSON_GetStringValue(item));
                if (len > max) max = len;
                if (len < min) min = len;
            }
            cJSON *newArray = cJSON_AddArrayToObject(output, input->string);
            cJSON_AddItemToArray(newArray, cJSON_CreateNumber(noElems));
            cJSON_AddItemToArray(newArray, cJSON_CreateNumber(max));
            cJSON_AddItemToArray(newArray, cJSON_CreateNumber(min));
    
        } else if (cJSON_IsNumber(cJSON_GetArrayItem(input, 0))) {
            double max, min;
            max = -DBL_MAX;
            min = DBL_MAX;
            cJSON_ArrayForEach(item, input) {
                noElems++;
                double value = item->valuedouble;
                if (value > max) max = value;
                if (value < min) min = value;
            }
            cJSON *newArray = cJSON_AddArrayToObject(output, input->string);
            cJSON_AddItemToArray(newArray, cJSON_CreateNumber(noElems));
            cJSON_AddItemToArray(newArray, cJSON_CreateNumber(max));
            cJSON_AddItemToArray(newArray, cJSON_CreateNumber(min));
        }
    }
    
    static void simplify(cJSON *input, cJSON *output)
    {
        cJSON *elem;
        for (elem = input; elem != NULL; elem = elem->next) {
            if (cJSON_IsString(elem)) {
                cJSON_AddNumberToObject(output, elem->string, strlen(cJSON_GetStringValue(elem)));
            } else if (cJSON_IsArray(elem)) {
                simplifyArray(elem, output);
            } else if (cJSON_IsObject(elem)) {
                cJSON *newOutput = cJSON_AddObjectToObject(output, elem->string);
                simplify(elem->child, newOutput);
            } else {
                cJSON *dup = cJSON_Duplicate(elem, true);
                cJSON_AddItemToObject(output, elem->string, dup);
            }
        }
    }
    
    static void simplifyAndPrint(const char *json)
    {
        cJSON *input = cJSON_Parse(json);
        cJSON *output = cJSON_CreateObject();
        simplify(input->child, output);
        printf("%s\n", cJSON_PrintUnformatted(output));
        cJSON_Delete(input);
        cJSON_Delete(output);
    }
    
    int main()
    {
        simplifyAndPrint(json1);
        simplifyAndPrint(json2);
        simplifyAndPrint(json3);
        return 0;
    }
    

    The output:

    {"field1":9,"whatever":10,"description":4,"id":6,"__oh__":{"session":18}}
    {"status":1,"query":{},"proc":{"memory":{"total":17177939968,"cmax":18363625472,"amax":20000000000},"cpu":{"cores":[48,10.111,0.009],"total":108.886,"limit":800},"disk":{"used":20170,"limit":50000,"io_limit":500}}}
    {"Name":3,"Age":18,"Address":10,"arr":[5,5,1]}
    

    In the example above I preferred don't alter the input JSON, if you don't care about this you can use the funcion cJSON_ReplaceItemInObject to substitute the node.

    P.S.: I am assuming arrays contain only strings and numbers, and don't mix it, because there is no rule to handle other array configurations.

    P.S.2: This code is using the version of the library present in Ubuntu 20.04, if you download the library from GitHub the version will contain more features.