regexawksedgrep

Merge multi-line cell in double quotes


I have this tsv (tab separated) file having 2 columns. The first column is a single (or group of) words and second column is it's meaning.

test file

test    try
test    "a short exam to measure somebody's knowledge 
or skill in something."
testing examine

I am trying to merge second and third line because it is in double quotes. For e.g.

Expected Output

test    try
test    "a short exam to measure somebody's knowledge or skill in something."
testing examine

I tried this:

awk -v FS='\t' -v OFS='\t' '{print $1, $2}' test.tsv
test    try
test    "a short exam to measure somebody's knowledge
or skill in something."
testing examine

But it does not merge the line 2 and 3. I tried "partsplit" and that merged all lines together.

awk 'BEGIN { FS=OFS="\t"}
{
    if (patsplit($0,a,/"[^"]+"/,s)) {
        gsub(/\n/,"",a[1])
        printf "%s%s%s", s[0],a[1],s[1]
    }
    else
        printf "%s", $0
    printf ";"
}' test.tsv

I need to keep the tab separated format like the original file. The only change required is to merge text in 2 double quotes.


Solution

  • This might work for you (GNU sed):

    sed ':a;N;/\n[^\t]*$/s/\n//;ta;P;D' file
    

    If a second line of a two line window does not contain a tab character, remove the newline between the first and second lines and go again.

    Otherwise, print/delete the first line and repeat.