pythontextreplace

How to remove text inside a .jbeam text file without using json?


I cannot use a standard JSON parser because I'm working with .jbeam files, which are based on JSON but follow a custom format used internally by BeamNG. These files contain unique formatting rules that cause standard JSON parsers to fail, such as support for comments and the omission of commas between elements, which would typically result in errors in a regular JSON parser.

The task I'm attempting to accomplish is relatively simple: replace the nodes property with some new text in a specific format, while preserving the overall formatting of the original file (which is quite large, though I’m only using a simplified example here). However, using a standard JSON parser alters the formatting of other properties in the file as well. I attempted the solution outlined in this thread, but it quickly became cumbersome, as I had to wrap everything to preserve the formatting of the file, which doesn’t make sense for such a straightforward task. The original formatting is important because without it you will have a nightmare reading .jbeam files as they would increase in number of lines, dramatically decreasing readability.

After hours of struggling with various methods, I ultimately decided to take the simplest approach by using basic text replacement. However, even for simple operations like removing text, I'm encountering issues. I know this task could be easily handled with a bash script, but I’m curious why it's proving to be so difficult with Python. How can I efficiently remove the contents of the nodes property in this example using Python?

sample_jbeam = '''
{
    "partname": {
        "refNodes": [
            ["ref:", "back:", "left:", "up:", "leftCorner:", "rightCorner:"],
            ["ref", "", "", "", "", ""]
        ],
        "nodes": [
            ["id", "posX", "posY", "posZ"],
            ["ref", 0, 0, 0],
            ["b1", 1.0, 1.0, 1.0],
        ],
        "beams": [
        ],
    }
}
'''

I want to remove the contents inside "nodes" so I that I'm left with:

{
    "partname": {
        "refNodes": [
            ["ref:", "back:", "left:", "up:", "leftCorner:", "rightCorner:"],
            ["ref", "", "", "", "", ""]
        ],
        "nodes": [],
        "beams": [
        ],
    }
}

Since I cannot use the Python's json loader because jbeam is not really json, and besides json will change the formatting of the original file.

Failed Attempt 1: RegEx

I tried RegEx but it doesn't work:


pattern = r'("nodes":\s*)\[.*?\]'
# cleaned_text = re.sub(pattern, r'\1[]', sample_jbeam, flags=re.DOTALL)
cleaned_text = re.sub(r'"nodes": \[.*?\]', '"nodes": []', text, flags=re.DOTALL)
print(cleaned_text)

Failed Attempt 2: io.StringIO

I've tried using io.StringIO() as in my answer post but after I changed the jbeam formatting it stopped working.

Failed Attempt 3: json Preprocessor

I tried using this Amazing Json preprocessor which works really well in making the jbeam valid json but unfortunately the original formatting is also changed.


Solution

  • Here's the solution with the help of an expert friend of mine mgerhardy to remove the contents and insert new content using io.StringIO()

    import io
    
    class JBeamProcessor:
        def __init__(self, json_data):
            self.json_data = json_data
            self.input_stream = None
            self.output_stream = io.StringIO()
            self.modified_data = json_data
    
        def remove_node_contents(self, key):
            self.output_stream = io.StringIO()
            self.input_stream = io.StringIO(self.modified_data)
    
            depth = 1
            skipping = False
            key_buffer = []
            inside_string = False
    
            while True:
                ch = self.input_stream.read(1)
                if not ch:
                    break
    
                if ch == '"':
                    inside_string = not inside_string
    
                if inside_string and depth == 0:
                    key_buffer.append(ch)
                    if len(key_buffer) > 255:
                        key_buffer = key_buffer[:255]
    
                if not inside_string and key_buffer:
                    key_str = ''.join(key_buffer)
                    key_buffer = []
                    if key_str[1:] == key:
                        skipping = True
                        self.output_stream.write('"' + ':' + ' ')
    
                if ch == '[' and not inside_string:
                    if skipping:
                        depth += 1
                        if depth == 1:
                            self.output_stream.write('[')
                        continue
    
                if ch == ']' and not inside_string:
                    if depth > 0:
                        depth -= 1
                        if depth == 0:
                            skipping = False
    
                if not skipping:
                    self.output_stream.write(ch)
    
            return self.output_stream.getvalue()
    
        def get_key_indent(self, key):
            current_pos = self.input_stream.tell()
            self.input_stream.seek(0)
    
            for line in self.input_stream:
                stripped_line = line.lstrip()
                if stripped_line.startswith(f'"{key}"'):
                    indent = len(line) - len(stripped_line)
                    self.input_stream.seek(current_pos)
                    return indent
    
            self.input_stream.seek(current_pos) 
            return -1
    
        def insert_node_contents(self, key, new_contents):
            self.remove_node_contents(key)
            spaces = self.get_key_indent(key)
            indent = " " * spaces
            result = self.get_result()
            indented_contents = "\n".join(indent + line for line in new_contents.splitlines())
            result = result.replace(f'"{key}": []', f'"{key}": [\n\t{indented_contents}\n{indent}]')
            self.modified_data = result
            return result
    
        def get_result(self):
            return self.output_stream.getvalue()
    

    Usage:

    with open(jbeam_filepath, "r", encoding="utf-8") as f:
        existing_data_str = f.read()
        processor = JBeamProcessor(existing_data_str)
        existing_data_str = processor.insert_node_contents("nodes", "your replacement text or nodes)