javascriptregexhibernatenamed-parametersstring-algorithm

Extract JPA Named Parameters in Javascript


I am trying to extract JPA named parameters in Javasacript. And this is the algorithm that I can think of

const notStrRegex = /(?<![\S"'])([^"'\s]+)(?![\S"'])/gm
const namedParamCharsRegex = /[a-zA-Z0-9_]/;

/**
 * @returns array of named parameters which,
 * 1. always begins with :
 * 2. the remaining characters is guranteed to be following {@link namedParamCharsRegex}
 *
 * @example
 * 1. "select * from a where id = :myId3;" -> [':myId3']
 * 2. "to_timestamp_tz(:FROM_DATE, 'YYYY-MM-DD\"T\"HH24:MI:SS')" -> [':FROM_DATE']
 * 3. "TO_CHAR(ep.CHANGEDT,'yyyy=mm-dd hh24:mi:ss')" -> []
 */
export function extractNamedParam(query: string): string[] {
  return (query.match(notStrRegex) ?? [])
    .filter((word) => word.includes(':'))
    .map((splittedWord) => splittedWord.substring(splittedWord.indexOf(':')))
    .filter((splittedWord) => splittedWord.length > 1) // ignore ":"
    .map((word) => {
      // i starts from 1 because word[0] is :
      for (let i = 1; i < word.length; i++) {
        const isAlphaNum = namedParamCharsRegex.test(word[i]);
        if (!isAlphaNum) return word.substring(0, i);
      }
      return word;
    });
}

I got inspired by the solution in https://stackoverflow.com/a/11324894/12924700 to filter out all characters that are enclosed in single/double quotes.

While the code above fulfilled the 3 use cases above. But when a user input

const testStr  = '"user input invalid string \' :shouldIgnoreThisNamedParam \' in a string"'
extractNamedParam(testStr) // should return [] but it returns [":shouldIgnoreThisNamedParam"] instead

I did visit the source code of hibernate to see how named parameters are extracted there, but I couldn't find the algorithm that is doing the work. Please help.


Solution

  • You can use

    /"[^\\"]*(?:\\[\w\W][^\\"]*)*"|'[^\\']*(?:\\[\w\W][^\\']*)*'|(:\w+)/g
    

    Get the Group 1 values only. See the regex demo. The regex matches strings between single/double quotes and captures : + one or more word chars in all other contexts.

    See the JavaScript demo:

    const re = /"[^\\"]*(?:\\[\w\W][^\\"]*)*"|'[^\\']*(?:\\[\w\W][^\\']*)*'|(:\w+)/g;
    const text = "to_timestamp_tz(:FROM_DATE, 'YYYY-MM-DD\"T\"HH24:MI:SS')";
    let matches=[], m;
    while (m=re.exec(text)) {
      if (m[1]) {
        matches.push(m[1]);
      }
    }
    console.log(matches);

    Details: