The purpose of this is to extract list of field names from the XML (of Adobe LiveCycle Designer). So, I created the fields in designer, then, I copy the XML of the related fields, paste in Notepad++, and then executer find/replace (ctrl-h) to get only the field names, one field in each line.
This will make it then easier to write the SQL statements to add such fields to the DB to register them.
The XML looks like the following:
<field xmlns="http://www.xfa.org/schema/xfa-template/2.8/" y="0in" x="0.343mm" w="8.881pt" h="9.108pt" name="detcon_recreation_only">
<ui>
<checkButton size="8.881pt">
<border>
<edge stroke="lowered"/>
<fill/>
</border>
</checkButton>
</ui>
<font size="0pt" typeface="Adobe Pi Std"/>
<para vAlign="middle"/>
<value>
<text>0</text>
</value>
<items>
<text>1</text>
<text>0</text>
<text/>
</items>
</field>
<field xmlns="http://www.xfa.org/schema/xfa-template/2.8/" name="detcon_special_housing" y="5.393mm" w="27.94mm" h="4.134mm" x="0.343mm">
<ui>
<choiceList>
<border>
<edge stroke="lowered"/>
</border>
<margin/>
</choiceList>
</ui>
<font typeface="Arial Narrow" size="6pt"/>
<margin topInset="0mm" bottomInset="0mm" leftInset="0mm" rightInset="0mm"/>
<para vAlign="middle"/>
<value>
<text>NA</text>
</value>
<items>
<text>Not Applicable</text>
<text>Hotel Component</text>
</items>
<items save="1" presence="hidden">
<text>NA</text>
<text>HC</text>
</items>
</field>
<exclGroup xmlns="http://www.xfa.org/schema/xfa-template/2.8/" name="detcon_photo_taken" x="0in" y="0in">
<?templateDesigner itemValuesSpecified 1?>
<field w="12.446mm" h="3.825mm" name="lb_yes">
<ui>
<checkButton size="1.7639mm" shape="round">
<border>
<?templateDesigner StyleID apcb1?>
<edge/>
<fill/>
</border>
</checkButton>
</ui>
<font typeface="Myriad Pro"/>
<margin leftInset="1mm" rightInset="1mm"/>
<para vAlign="middle"/>
<caption placement="right" reserve="7.698mm">
<para vAlign="middle" spaceAbove="0pt" spaceBelow="0pt" textIndent="0pt" marginLeft="0pt" marginRight="0pt"/>
<font size="8pt" typeface="Arial Narrow" baselineShift="0pt"/>
<value>
<text>YES</text>
</value>
</caption>
<value>
<text xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
</value>
<items>
<text>1</text>
</items>
</field>
<field w="28.702mm" h="3.825mm" name="lb_no" x="13.233mm">
<ui>
<checkButton size="1.7639mm" shape="round">
<border>
<?templateDesigner StyleID apcb1?>
<edge/>
<fill/>
</border>
</checkButton>
</ui>
<font typeface="Myriad Pro"/>
<margin leftInset="1mm" rightInset="1mm"/>
<para vAlign="middle"/>
<caption placement="right" reserve="23.954mm">
<para vAlign="middle" spaceAbove="0pt" spaceBelow="0pt" textIndent="0pt" marginLeft="0pt" marginRight="0pt"/>
<font size="8pt" typeface="Arial Narrow" baselineShift="0pt"/>
<value>
<text>NO (see comments)</text>
</value>
</caption>
<value>
<text xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:nil="true"/>
</value>
<items>
<text>0</text>
</items>
</field>
<border>
<edge presence="hidden"/>
</border>
<?templateDesigner expand 1?></exclGroup>
So I figured out the following RegEx to perform find/replace to get only the field names, one field on each line.
Find to get Field Name: (?i)<(field|exclGroup).*name="([a-z_]\w*)".*$
Replace: $2
Another find/replace...
Remove all other lines: ^.*<(?!.*name=).*.*[\r\n]*
Replace with blank
If you execute the above two find/replace sessions, you will end up with list of field names one field per line.
What I wanted to do is to perform the above in one find/replace session, and then convert the above into SQL Statements using also find/replace, using this template:
INSERT INTO table_name (element_id, element_name, element_type, default_value, required, clone)
VALUES (12345,"field_name_goes_here","/Tx", "", "N", "Y"),
VALUES (12346,"field_name_goes_here","/Tx", "", "N", "Y"),
VALUES (12347,"field_name_goes_here","/Tx", "", "N", "Y"),
VALUES (12348,"field_name_goes_here","/Tx", "", "N", "Y"),
VALUES (12349,"field_name_goes_here","/Tx", "", "N", "Y"),
The element_id
field is sequential, but don't worry about that, I can take care of this in Excel.
Appreciate your help, Tarek
Scraper Series
One small help. The element is slightly different than the which is making things a little more complicated. Your RegEx is so sophisticated, I couldn't modify it to include the element. I think I need more time to digest it. So could you modify it to include exclGroup and only extract name only without extracting the inner field elements of the exclGroup?
Ok, here you go.
It does make it a little more complicated.
I have 2 versions to do this. One that uses recursion, one that doesn't.
I'm posting the version that uses recursion.
If you need the non-recursion, let me know and I'll post that.
Find (?:(?!<(?:field|exclGroup)(?!\w)(?>"[\S\s]*?"|'[\S\s]*?'|(?:(?!/>)[^>])?)+>)[\S\s])*(?><(field|exclGroup)(?=(?:[^>"']|"[^"]*"|'[^']*')*?\sname\s*=\s*(?:(['"])([\S\s]*?)\2))\s+(?>"[\S\s]*?"|'[\S\s]*?'|(?:(?!/>)[^>])?)+>)(?:(?&core)|)</\1\s*>(?:(?!<(?:field|exclGroup)(?!\w)(?>"[\S\s]*?"|'[\S\s]*?'|(?:(?!/>)[^>])?)+>)[\S\s])*(?(DEFINE)(?<core>(?>(?><([\w:]+)(?>"[\S\s]*?"|'[\S\s]*?'|(?:(?!/>)[^>])?)+>)(?:(?&core)|)</\5\s*>|(?!</[\w:]+\s*>)(?>[\S\s]))+))
Replace VALUES (12345,"$3","/Tx", "", "N", "Y"),\r\n
https://regex101.com/r/icnF3i/1
Formatted (incase you need to look at it)
(?: # Prefix - Optional any chars that don't start a field or exclGroup tag
(?!
<
(?: field | exclGroup )
(?! \w )
(?>
" [\S\s]*? "
| ' [\S\s]*? '
| (?:
(?! /> )
[^>]
)?
)+
>
)
[\S\s]
)*
(?> # open 'field' or 'exclGroup' tag ------------------
<
( field | exclGroup ) # (1)
(?= # Asserttion (a pseudo atomic group)
(?: [^>"'] | " [^"]* " | ' [^']* ' )*?
\s name \s* = \s*
(?:
( ['"] ) # (2), Quote
( [\S\s]*? ) # (3), Name value - only thing we want
\2
)
)
\s+
(?>
" [\S\s]*? "
| ' [\S\s]*? '
| (?:
(?! /> )
[^>]
)?
)+
>
)
(?:
(?&core) # Call the core recursion function (balanced tags)
|
)
</ \1 \s* > # Close 'field' or 'exclGroup' tag ------------------
(?: # Postfix - Optional any chars that don't start a field or exclGroup tag
(?!
<
(?: field | exclGroup )
(?! \w )
(?>
" [\S\s]*? "
| ' [\S\s]*? '
| (?:
(?! /> )
[^>]
)?
)+
>
)
[\S\s]
)*
# ---------------------------------------------------------
(?(DEFINE)
(?<core> # (4 start), Inner balanced tags
(?>
(?>
<
( [\w:]+ ) # (5), Any open tag
(?>
" [\S\s]*? "
| ' [\S\s]*? '
| (?:
(?! /> )
[^>]
)?
)+
>
)
(?: # Recurse core
(?&core)
|
)
</ \5 \s* > # Balanced close tag (I can see you 5)
|
(?! </ [\w:]+ \s* > ) # Any char not starting a close tag (passive)
(?> [\S\s] )
)+
) # (4 end)
)
You can view the non-recursive version here https://regex101.com/r/ztOrP5/1