javascriptadobe-javascript

Javascript regex invalid quantifier error to find 8 digit number in PDF


I have the following javascript code:

/* Extract pages to folder */

// Regular expression used to acquire the base name of file
var re = /\.pdf$/i;

// filename is the base name of the file Acrobat is working on
var filename = this.documentFileName.replace(re,"");

try {for (var i = 0; i < this.numPages; i++)

    var id = /\ (?<!\d)\d{8}(?!\d)/;
    console.println(id);

    this.extractPages({

    nStart: i,

    cPath: "/J/my file path/" + "SBIC_" + id + ".pdf"

    });        

} catch (e) { console.println("Aborted: " + e) }

I get the error that the quantifier is invalid in this line of code var reg = /\ (?<!\d)\d{8}(?!\d)/

However, this line of regex pulls the id 22001188 when I use it in https://regex101.com/ to find the 8 digit number in "I.D. Control 22001188".

Do I have to integrate the regex a different way in the code for it to search through the text in the document?


UPDATED 1/30/2023 I am using the below REGEX in the code to find the 8 digit ID I need. First, I put all the PDFs text into a string and then I use a search query to find it. Now I just need to figure out how to add the result into a variable so I can extract each page in the PDF by ID.

/* Extract pages to folder */

// function padLeft(s,len,c){c=c || '0'; while(s.length< len) s= c+s; return s; }

// Regular expression used to acquire the base name of file
var re = /\.pdf$/i;

// filename is the base name of the file Acrobat is working on
var filename = this.documentFileName.replace(re,"");

for (var i = 0; i < this.numPages; i++) {  // Loop through the entire document
    numWords = this.getPageNumWords(i); // Find out how many words are on the page
    var WordString = ""; // Prepare a string
    for (var j = 0; j < numWords; j++) // Put all the words on the page into a string
    {
        WordString = WordString + " " + this.getPageNthWord(i, j);
    }
    if (WordString.match(/\b\d{8}\b/)) { // Search for the word "Hello" in the string
        search.matchWholeWord = true; // If we got here, we'll search for "Hello" in the document
        search.query(WordString.match(/\b\d{8}\b/), "ActiveDoc");
    }
}

UPDATED 2/2/2023

Below is the working code used to extract every page from the pdf and then name it the 8 digit ID found within the text of the pdf.

// Regular expression used to acquire the base name of file
var re = /\.pdf$/i;

// filename is the base name of the file Acrobat is working on
var filename = this.documentFileName.replace(re,"");

for (var i = 0; i < this.numPages; i++) {  // Loop through the entire document
    numWords = this.getPageNumWords(i); // Find out how many words are on the page
    var WordString = ""; // Prepare a string
    for (var j = 0; j < numWords; j++) // Put all the words on the page into a string
    {WordString = WordString + " " + this.getPageNthWord(i, j);}
    
    ID = WordString.match(/\b\d{8}\b/); // Search for the ID control # in the string
    
    this.extractPages({

    nStart: i,

    cPath: "/J/Middle Office Read/Operational Support/SBA Spreadsheets & Forms/Funded SBAs/" + "SBIC_" + ID + ".pdf"

    });        

}


Solution

  • The sequence ?<! is a negative look-behind sequence which is not yet supported by all the browsers/systems. It seems that it is not supported in your case.

    You may use word boundaries in regex as given below to extract 8-digit numbers from your string:

    \b\d{8}\b