pythonparsingldif

Append current line to previous line


I'm trying to parse .ldif file but failed to get desired output. Any help is much appreciated.

Here's is what I'm doing using python:

lines = open("use.ldif", "r").read().split("\n")
for i, line in enumerate(lines):
   if not line.find(":"):
      lines[i-1] = lines[-1].strip() + line
      lines.pop(i)

open("user_modified.ldif", "w").write("\n".join(lines)+"\n")

use.ldif (input file)

dn: cnh
changetype: add
objectclass: inetOrgPerson
objectclass: cdsUser
objectclass: organizationalPerson
objectclass: Person
objectclass: n
objectclass: Top
objectclass: cd
objectclass: D
objectclass: nshd shdghsf shgdhfjh jhghhghhgh
 hjgfhgfghfhg
street: shgdhgf

dn: cnh
changetype: add
objectclass: inetOrgPerson
objectclass: hjgfhgfghfhg
street: shgdhgf kjsgdhgsjhg shdghsgjfhsfsf
 jgsdhsh
company: xyz

user_modified.ldif (Output from my code)

I am getting the same output, nothing is modified. I feel it's because I'm doing split("\n") but I'm not getting an idea of what else can be done.

desired output

dn: cnh
changetype: add
objectclass: inetOrgPerson
objectclass: cdsUser
objectclass: organizationalPerson
objectclass: Person
objectclass: n
objectclass: Top
objectclass: cd
objectclass: D
objectclass: nshd shdghsf shgdhfjh jhghhghhghhjgfhgfghfhg
street: shgdhgf

dn: cnh
changetype: add
objectclass: inetOrgPerson
objectclass: hjgfhgfghfhg
street: shgdhgf kjsgdhgsjhg shdghsgjfhsfsfjgsdhsh
company: xyz

As you can see in my output file user_modified.ldif the object class in first entry and street in second entry gets to the next line. How can I have them in same line, like in the desired output.

Thanks in advance


Solution

  • Okey here my approach:

    import re
    
    pattern = re.compile(r"(\w+):(.*)")
    
    with open("use.ldif", "r") as f:
        new_lines = []
    
        for line in f:
            if line.endswith('\n'):
                line = line[:-1]
    
            if line == "":
                new_lines.append(line)
                continue
    
            l = pattern.search(line)
            if l:
                new_lines.append(line)
            else:
                new_lines[-1] += line
    
    with open("user_modified.ldif", "wt") as f:
        f.write("\n".join(new_lines))
    

    Looking a bit your code I suggest you to get documented a bit about iterating over files. Maybe you are still beginner with Python, but in your code shows you are processing whole file 3 times, at read(), at split('\n') and finally at the for statement. When you open a file, what you get is called descriptor, and as you can see in my code you can use it to iterate over the file getting a line on each step. For larger files this will become a important performance trick.