regexpython-3.xfor-loopexplodelateral

How to loop throuh a line inside a file by regex as loop variable


I am trying to make something like an Explode function for a json File. The loop should get a json file line by line and in each line I have multiple values that i want to extract out of this line and put it together with the main line (like lateral view or Explode function in SQL)

The Data looks like this

{"wl_id":0,"wl_customer_id":0,"wl_webpage_name":"webpage#00","wl_timestamp":"2013-01-27 16:07:02","wl_key2":103717,"wl_key3":589101,"wl_key4":23095,"wl_key5":200527,"wl_key6":60319}

now what I want is like in SQL Explode this

{"wl_id":0,"wl_customer_id":0,"wl_webpage_name":"webpage#00","wl_timestamp":"2013-01-27 16:07:02","wl_key2":103717}
{"wl_id":0,"wl_customer_id":0,"wl_webpage_name":"webpage#00","wl_timestamp":"2013-01-27 16:07:02","wl_key3":589101}
{"wl_id":0,"wl_customer_id":0,"wl_webpage_name":"webpage#00","wl_timestamp":"2013-01-27 16:07:02","wl_key4":23095}
{"wl_id":0,"wl_customer_id":0,"wl_webpage_name":"webpage#00","wl_timestamp":"2013-01-27 16:07:02","wl_key5":200527}


 import io
 import sys
 import re

 i = 0
 with io.open('lateral_result.json', 'w', encoding="utf-8") as f, io.open('lat.json', encoding="utf-8") as g:
for line in g:
    x = re.search('(.*wl_timestamp":"[^"]+",)', line)
    y = re.search('("wl_key[^,]+),', line)
    for y in line:
        i = i + 1
        print (x.group(0), y.group(i),'}', file=f)    

I get all the time an Error that I cant get a str as group, but when I put the Regex down in the next for loop it just gets me the first result and does nothing or in another way it just takes the same results and writes it as often as it finds a character in the line.


Solution

  • Dont use regex on json - use json on json and operate the data structure:

    import json
    
    data_str = """{"wl_id":0,"wl_customer_id":0,"wl_webpage_name":"webpage#00","wl_timestamp":"2013-01-27 16:07:02","wl_key2":103717,"wl_key3":589101,"wl_key4":23095,"wl_key5":200527,"wl_key6":60319}"""
    
    data = json.loads(data_str)  # you can use json.load( file_handle )
    
    print(data)
    
    for k in (x for x in data.keys() if x.startswith("wl_key")):
        print(data["wl_timestamp"],k,data[k])
    

    Output:

    2013-01-27 16:07:02 wl_key2 103717
    2013-01-27 16:07:02 wl_key3 589101
    2013-01-27 16:07:02 wl_key4 23095
    2013-01-27 16:07:02 wl_key5 200527
    2013-01-27 16:07:02 wl_key6 60319