javascriptregexmatchcapturing-group

Regular Expressions in Javascript - Find Multiple Lines of Text After a Tag


I have the text below stored in the variable description:

This is a code update

Official Name: None

Pub: https://content.upcodes.co/viewer/washington/wa-mechanical-code-2021

Agency:  

Reference: https://web.archive.org/web/20230226234118/https://lawfilesext.leg.wa.gov/law/wsr/agency/BuildingCodeCouncil.htm

Citation: WAC 51-52 / WSR 23-02-055

Draft Doc Title: WSR 23-02-055 (#1)

Draft Source Doc: https://web.archive.org/web/20230303022030/https://lawfilesext.leg.wa.gov/law/wsr/2023/02/23-02-055.htm (#1)

Draft Drive: https://drive.google.com/file/d/1pYmwQS3t-ZX-Vyg9yBabtIpXZ7By2G6f/view?usp=share_link ( #1)

Final Doc Title: 

IECC Com Update(#1)

IECC Res Update (#2)

Final Source Doc: https://web.archive.org/web/20230303022130/https://apps.leg.wa.gov/wac/default.aspx?cite=51-52&full=true&pdf=true (#1)

https://web.archive.org/web/20230303022030/https://lawfilesext.leg.wa.gov/law/wsr/2023/02/23-02-055.htm (#2)

Final Drive: https://web.archive.org/web/20230303022130/https://apps.leg.wa.gov/wac/default.aspx?cite=51-52&full=true&pdf=true (#1)

https://web.archive.org/web/2023030302fdfdfg2130/https://apps.legfdg.gov/wac/default.aspx?cite=51-52&fdsfullfdsf=true&pfdsfdf=true  (#2)

Effective Date:  January 4, 2023

I want to extract the information after 'Final Doc Title:' tag. It should give me two values. The first value is IECC Com Update(#1) and IECC Res Update (#2). I have a code below that extracts the text after the tag until a new line character is found.

//8. Extract Final Doc Title
var final_doc_title = description.search("Final Doc Title:");
if(final_doc_title != -1){
    final_doc_title = description.match(/(?<=^Final Doc Title:)[^\n\r]+/m);
    final_doc_title = final_doc_title?.[0].trim();
}else{
    final_doc_title = '';
}
console.log('Final Doc Title: ' + final_doc_title);

The problem with this code is it returns an empty string, because there is a newline character right after 'Final Doc Title:'.

Final Doc Title:\n
IECC Com Update(#1)\n
IECC Com Update(#1)\n

How will I modify my code to return two lines? Thanks!


Solution

  • You can match those newline characters with \s*, assuming you are not interested in white space that precedes the text you are looking for.

    If the text you want to find ends just before the line that has a colon (like in Final Source Doc: https:....), then you could do the following:

    const description = "This is a code update\n\nOfficial Name: None\n\nPub: https://content.upcodes.co/viewer/washington/wa-mechanical-code-2021\n\nAgency:  \n\nReference: https://web.archive.org/web/20230226234118/https://lawfilesext.leg.wa.gov/law/wsr/agency/BuildingCodeCouncil.htm\n\nCitation: WAC 51-52 / WSR 23-02-055\n\nDraft Doc Title: WSR 23-02-055 (#1)\n\nDraft Source Doc: https://web.archive.org/web/20230303022030/https://lawfilesext.leg.wa.gov/law/wsr/2023/02/23-02-055.htm (#1)\n\nDraft Drive: https://drive.google.com/file/d/1pYmwQS3t-ZX-Vyg9yBabtIpXZ7By2G6f/view?usp=share_link ( #1)\n\nFinal Doc Title: \n\nIECC Com Update(#1)\n\nIECC Res Update (#2)\n\nFinal Source Doc: https://web.archive.org/web/20230303022130/https://apps.leg.wa.gov/wac/default.aspx?cite=51-52&full=true&pdf=true (#1)\n\nhttps://web.archive.org/web/20230303022030/https://lawfilesext.leg.wa.gov/law/wsr/2023/02/23-02-055.htm (#2)\n\nFinal Drive: https://web.archive.org/web/20230303022130/https://apps.leg.wa.gov/wac/default.aspx?cite=51-52&full=true&pdf=true (#1)\n\nhttps://web.archive.org/web/2023030302fdfdfg2130/https://apps.legfdg.gov/wac/default.aspx?cite=51-52&fdsfullfdsf=true&pfdsfdf=true  (#2)\n\nEffective Date:  January 4, 2023\nI want to extract the information after 'Final Doc Title:' tag. It should give me two values. The first value is IECC Com Update(#1) and IECC Res Update (#2). I have a code below that extracts the text after the tag until a new line character is found.\n\n//8. Extract Final Doc Title";
    
    var result = description.match(/^Final Doc Title:\s*((?:\s*^(?:[^:\r\n]*)$)*)/m)?.[1];
    var parts = result?.match?.(/.+/gm);
    console.log(parts);