I have the following javascript code:
/* Extract pages to folder */
// Regular expression used to acquire the base name of file
var re = /\.pdf$/i;
// filename is the base name of the file Acrobat is working on
var filename = this.documentFileName.replace(re,"");
try {for (var i = 0; i < this.numPages; i++)
var id = /\ (?<!\d)\d{8}(?!\d)/;
console.println(id);
this.extractPages({
nStart: i,
cPath: "/J/my file path/" + "SBIC_" + id + ".pdf"
});
} catch (e) { console.println("Aborted: " + e) }
I get the error that the quantifier is invalid in this line of code var reg = /\ (?<!\d)\d{8}(?!\d)/
However, this line of regex pulls the id 22001188 when I use it in https://regex101.com/ to find the 8 digit number in "I.D. Control 22001188".
Do I have to integrate the regex a different way in the code for it to search through the text in the document?
UPDATED 1/30/2023 I am using the below REGEX in the code to find the 8 digit ID I need. First, I put all the PDFs text into a string and then I use a search query to find it. Now I just need to figure out how to add the result into a variable so I can extract each page in the PDF by ID.
/* Extract pages to folder */
// function padLeft(s,len,c){c=c || '0'; while(s.length< len) s= c+s; return s; }
// Regular expression used to acquire the base name of file
var re = /\.pdf$/i;
// filename is the base name of the file Acrobat is working on
var filename = this.documentFileName.replace(re,"");
for (var i = 0; i < this.numPages; i++) { // Loop through the entire document
numWords = this.getPageNumWords(i); // Find out how many words are on the page
var WordString = ""; // Prepare a string
for (var j = 0; j < numWords; j++) // Put all the words on the page into a string
{
WordString = WordString + " " + this.getPageNthWord(i, j);
}
if (WordString.match(/\b\d{8}\b/)) { // Search for the word "Hello" in the string
search.matchWholeWord = true; // If we got here, we'll search for "Hello" in the document
search.query(WordString.match(/\b\d{8}\b/), "ActiveDoc");
}
}
UPDATED 2/2/2023
Below is the working code used to extract every page from the pdf and then name it the 8 digit ID found within the text of the pdf.
// Regular expression used to acquire the base name of file
var re = /\.pdf$/i;
// filename is the base name of the file Acrobat is working on
var filename = this.documentFileName.replace(re,"");
for (var i = 0; i < this.numPages; i++) { // Loop through the entire document
numWords = this.getPageNumWords(i); // Find out how many words are on the page
var WordString = ""; // Prepare a string
for (var j = 0; j < numWords; j++) // Put all the words on the page into a string
{WordString = WordString + " " + this.getPageNthWord(i, j);}
ID = WordString.match(/\b\d{8}\b/); // Search for the ID control # in the string
this.extractPages({
nStart: i,
cPath: "/J/Middle Office Read/Operational Support/SBA Spreadsheets & Forms/Funded SBAs/" + "SBIC_" + ID + ".pdf"
});
}
The sequence ?<!
is a negative look-behind sequence which is not yet supported by all the browsers/systems.
It seems that it is not supported in your case.
You may use word boundaries in regex as given below to extract 8-digit numbers from your string:
\b\d{8}\b