I'm tying to extract a part of my string:
output | Fullstring | Expected Result |
---|---|---|
xi | xi-AN224432_-_this_is_some_text-x10.pdf | AN224432 or AN224432_ |
F1800X7T2P5P | xi-F1800X7T2P5P-blahblah-v01_00-EN.pdf | F1800X7T2P5P |
1E303S | 1E303S-blahblah-v01_20-EN.pdf | 1E303S |
The regex I tried is:
select
regexp(Fullstring ,'^(xi-)?(\[A-Za-z0-9\]+)-.\*','$2') output
This returns the second row correctly, but not the first one.
The output I need is the alphanumeric number in between or it can be in start. "xi" needs to be omiited from the result, as it is not alpha numeric number, i need alpha numeric, if its "xi" in start i want to get the second part of the string, i didn't understand why for second row it showed correct but for first one it displayed 'xi'. third is best case- as there is not xi in the start.
I don't know which flavour of regex you are running, as usually we
would not escape the brackets. But this may be due to the syntax of
your regex engine. Up to you to see if you need them or not (also
for \*
instead of *
). The only doc I found is this page about
Denodo VQL regexp() function and it
mentions Java or Perl kind of syntax.
But what is missing is the optional underscore character "_
" in your
pattern.
In Perl or PHP, it would be: ^(xi-)?(([A-Za-z0-9]+)_?)-.*
Demo here: https://regex101.com/r/DCxixu/1
I added a capturing group so that you can decide if you want to use the
full group with the underscore or if you want to drop it. You could even
replace _?
by _*
if they could be several of them.