I am trying to do the following. Compare two text files ( Masterfile and usedfile) and write the unique values(not common in both) of Masterfile to third file (Newdata ). Both files have one word in each line. example:
Masterfile content
Johnny
transfer
hello
kitty
usedfile content
transfer
hello
expected output in Newdata
Johnny
kitty
I have two solutions but both have problem
solution 1:This gives information like -,+ prefixed to the data final output.
import difflib
with open(r'C:\Master_Data.txt','r') as masterfile:
with open(r'C:\Used_Data.txt','r') as usedfile:
with open(r'c:\Ready_to_use.txt','w+') as Newdata:
tempmaster = masterfile.readlines()
tempusedfile = usedfile.readlines()
d = difflib.Differ()
diff = d.compare(tempmaster,tempusedfile)
for line in diff:
Newdata.write(line)
solution 2: I tried using set ,it shows fine when I use print statement but don't know how to write to a file.
with open(r'C:\Master_Data.txt','r') as masterfile:
with open(r'C:\Used_Data.txt','r') as usedfile:
with open(r'c:\Ready_to_use.txt','w+') as Newdata:
difference = set(masterfile).difference(set(usedfile))
print difference
Can anyone suggest
Ok,
1) You can use solution 2 to write to a file by adding this:
difference = set(masterfile).difference(set(usedfile))
[Newdata.write(x) for x in difference]
This is a shorthand way of doing this:
for x in difference:
Newdata.write(line)
However, this will just write each element in the difference
set to the Newdata
file. If you use this method make sure that you have the correct values in your difference
array to start with.
2) I wouldn't bother using difflib, it's an extra library that isn't required to do something small like this.
3) This is how I would do it, without using any libraries and simple comparison statements:
with open(r'Master_Data.txt','r') as masterdata:
with open(r'Used_Data.txt','r') as useddata:
with open(r'Ready_to_use.txt','w+') as Newdata:
usedfile = [ x.strip('\n') for x in list(useddata) ] #1
masterfile = [ x.strip('\n') for x in list(masterdata) ] #2
for line in masterfile: #3
if line not in usedfile: #4
Newdata.write(line + '\n') #5
Here's the explaination:
First I just opened all the files like you did, just changed the names of the variables. Now, here are the pieces that I've changed
#1
- This is a shorthanded way of looping through each line in the Used_Data.txt
file and remove the \n
at the end of each line, so we can compare the words properly.
#2
- This does the same thing as #1
except with the Master_Data.txt
file
#3
- I loop through each line in the Master_Data.txt
file
#4
- I check to see if the line is not in
the masterfile
array also exists in the usedfile
array.
#5
- If the if
statement is true, then the line from Master_File.txt
we are checking does not appear in Used_Data.txt
, so we write it to the Ready_to_use.txt
file using the call Newdata.write(line + '\n')
. The reason we need the '\n'
after is so the file knows to start a new line next time we try to write something.