I'm trying to parse .ldif
file but failed to get desired output. Any help is much appreciated.
Here's is what I'm doing using python:
lines = open("use.ldif", "r").read().split("\n")
for i, line in enumerate(lines):
if not line.find(":"):
lines[i-1] = lines[-1].strip() + line
lines.pop(i)
open("user_modified.ldif", "w").write("\n".join(lines)+"\n")
use.ldif (input file)
dn: cnh
changetype: add
objectclass: inetOrgPerson
objectclass: cdsUser
objectclass: organizationalPerson
objectclass: Person
objectclass: n
objectclass: Top
objectclass: cd
objectclass: D
objectclass: nshd shdghsf shgdhfjh jhghhghhgh
hjgfhgfghfhg
street: shgdhgf
dn: cnh
changetype: add
objectclass: inetOrgPerson
objectclass: hjgfhgfghfhg
street: shgdhgf kjsgdhgsjhg shdghsgjfhsfsf
jgsdhsh
company: xyz
user_modified.ldif (Output from my code)
I am getting the same output, nothing is modified. I feel it's because I'm doing split("\n")
but I'm not getting an idea of what else can be done.
desired output
dn: cnh
changetype: add
objectclass: inetOrgPerson
objectclass: cdsUser
objectclass: organizationalPerson
objectclass: Person
objectclass: n
objectclass: Top
objectclass: cd
objectclass: D
objectclass: nshd shdghsf shgdhfjh jhghhghhghhjgfhgfghfhg
street: shgdhgf
dn: cnh
changetype: add
objectclass: inetOrgPerson
objectclass: hjgfhgfghfhg
street: shgdhgf kjsgdhgsjhg shdghsgjfhsfsfjgsdhsh
company: xyz
As you can see in my output file user_modified.ldif
the object class in first entry and street in second entry gets to the next line.
How can I have them in same line, like in the desired output.
Thanks in advance
Okey here my approach:
import re
pattern = re.compile(r"(\w+):(.*)")
with open("use.ldif", "r") as f:
new_lines = []
for line in f:
if line.endswith('\n'):
line = line[:-1]
if line == "":
new_lines.append(line)
continue
l = pattern.search(line)
if l:
new_lines.append(line)
else:
new_lines[-1] += line
with open("user_modified.ldif", "wt") as f:
f.write("\n".join(new_lines))
Looking a bit your code I suggest you to get documented a bit about iterating over files. Maybe you are still beginner with Python, but in your code shows you are processing whole file 3 times, at read()
, at split('\n')
and finally at the for
statement. When you open a file, what you get is called descriptor, and as you can see in my code you can use it to iterate over the file getting a line on each step. For larger files this will become a important performance trick.