pythonlistdata-structuresreadlinesfileparse

Need to parse a file and create a data structure out of it


We want to parse a file and create a data structure of some sort to be used later (in Python). The content of file looks like this:

plan HELLO
   feature A 
       measure X :
          src = "Type ,N ame"
       endmeasure //X

       measure Y :
        src = "Type ,N ame"
       endmeasure //Y

       feature Aa
           measure AaX :
              src = "Type ,N ame"
           endmeasure //AaX

           measure AaY :
              src = "Type ,N ame"
           endmeasure //AaY
           
           feature Aab
              .....
           endfeature // Aab
         
       endfeature //Aa
 
   endfeature // A
   
   feature B
     ......
   endfeature //B
endplan

plan HOLA
endplan //HOLA

So there's a file that contain one or more plans and then each plan contains one or more feature, further each feature contains a measure that contains info (src, type, name) and feature can further contain more features.

We need to parse through the file and create a data structure that would have

                     plan (HELLO) 
            ------------------------------
             ↓                          ↓ 
          Feature A                  Feature B
  ----------------------------          ↓
   ↓           ↓             ↓           ........
Measure X    Measure Y    Feature Aa
                         ------------------------------
                            ↓           ↓             ↓ 
                       Measure AaX   Measure AaY   Feature Aab
                                                        ↓
                                                        .......

I am trying to parse through the file line by line and create a list of lists that would contain plan -> feature -> measure, feature


Solution

  • Here is a function that would turn your string into a dictionary:

    def getplans(s):
        stack = [{}]
        for line in s.splitlines():
            if "=" in line:  # leaf
                key, value = line.split("=", 1)
                stack[-1][key.strip()] = value.strip(' "')
            elif line.strip()[:3] == "end":
                stack.pop()
            elif line.strip():
                collection, name, *_ = line.split()
                stack.append({})
                stack[-2].setdefault(collection + "s", {})[name] = stack[-1]
        return stack[0]
    

    Here is an example call:

    s = """plan HELLO
       feature A 
           measure X :
              src = "Type, Name"
           endmeasure //X
    
           measure Y :
            src = "Type, Name"
           endmeasure //Y
    
           feature Aa
               measure AaX :
                  src = "Type, Name"
               endmeasure //AaX
    
               measure AaY :
                  src = "Type, Name"
               endmeasure //AaY
               
               feature Aab
                    measure Car :
                      src = "Model, Make"
                   endmeasure //car
               endfeature // Aab
             
           endfeature //Aa
     
       endfeature // A
       
       feature B
           measure Hotel :
              src = "Stars, Reviews"
           endmeasure //Hotel
        endfeature //B
    endplan
    
    plan HOLA
    endplan //HOLA
    """
    
    import json
    print(json.dumps(getplans(s), indent=4))
    

    The output:

    {
        "plans": {
            "HELLO": {
                "features": {
                    "A": {
                        "measures": {
                            "X": {
                                "src": "Type ,N ame"
                            },
                            "Y": {
                                "src": "Type ,N ame"
                            }
                        },
                        "features": {
                            "Aa": {
                                "measures": {
                                    "AaX": {
                                        "src": "Type ,N ame"
                                    },
                                    "AaY": {
                                        "src": "Type ,N ame"
                                    }
                                },
                                "features": {
                                    "Aab": {
                                        "measures": {
                                            "Car": {
                                                "src": "Model, Make"
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    },
                    "B": {
                        "measures": {
                            "Hotel": {
                                "src": "Stars, Reviews"
                            }
                        }
                    }
                }
            },
            "HOLA": {}
        }
    }
    

    If your input has some other syntax -- not included in your question -- you'll probably need to tune the script further to deal with that.