Using sed -i within a loop

I'm reformatting a big file with sample metadata. I have a file (let's call it File2) with the group each sample belong to, with one id and pop per line. My idea was to while read over that file and use sed -i to update each of the samples info. The issue is that sed is not updating the file.

The input file is a .fam file from plink, in this fashion:

pop id 0 0 0 -9
pop id 0 0 0 -9
pop id 0 0 0 -9
pop id 0 0 0 -9

Right now pop and id are the same, so I want to update the file with File2, but the sed code I normally use for this doesn't seem to work:

while read -r id pop; do sed -i 's/^$id/$pop/' File1.fam; done < File2.txt

I have tried only the sed command without iteration and it works fine. But I have 700 samples and I would dread having to do this one by one.

Why is it not working?

Solution

Assuming that your files are formatted as follows:

$ cat file1.fam
pop id1 0 0 0 -9
pop id2 0 0 0 -9
pop id3 0 0 0 -9

$ cat file2.txt
id3   POP003
id2   POP002
id1   POP001

If your goal is to replace the 1^st column in file1.fam with the values from the 2^nd column from file2.txt using the id* values for matching, you can:

Read file2.txt into a map: map[id] = pop.
Iterate file1.fam and replace the 1^st field with map[id] where id is taken from the 2^nd field.

E.g.,

awk 'NR==FNR { map[$1]=$2; next } { if ($2 in map) $1 = map[$2]; print }' \
    file2.txt OFS=' ' file1.fam

In the command above, awk reads the two files sequentially: file2.txt, then file1.fam. When it reads file2.txt, the number of the current record NR is equal to the current record in the current file FNR. Look at the following example for better understanding:

awk '{print FNR, NR, $0}' file1.fam file2.txt
1 1 pop id1 0 0 0 -9
2 2 pop id2 0 0 0 -9
3 3 pop id3 0 0 0 -9
1 4 id3   POP003
2 5 id2   POP002
3 6 id1   POP001

The NR===FNR block fills the map with the keys from the first column(IDs) and values from the second one(pop values). For the rest of the lines, the first column in(pop) is replaced with the matching value from the map (if any).

The result is printed to the standard output. You can redirect it to a file if you wish:

awk ... > output.txt

Note that awk parses space-separated fields. If the values in your files may contain spaces, you might need to adjust the field separator(FS) or consider using other tools(e.g., Perl). But the idea will remain the same.