regexepubadobe-indesigncalibre

How can I extract a certain number from a string using Regular expressions?


I think this is probably easy but I don't have the time to learn how to do it.

In a html file, I have a certain class of paragraph, let´s say:

<p class="footnote"></p>

The "p" tag is always followed by numbers, which increase by one in every instance. Let's say the first number is "43". I want the series of numbers to start from 1, so I need to substract 42 from all paragraphs.

For example, I would want to go from:

<p class="footnote">43. Lorem</p>
<p class="footnote">44. Ipsum</p>. 
<p class="footnote">45. Dolor</p>. 

to

<p class="footnote">1. Lorem</p>
<p class="footnote">2. Ipsum</p>. 
<p class="footnote">3. Dolor</p>. 

How can I do it?


Solution

  • If you're looking for a regex that'll handle <p class="footnote">43. Lorem</p> the answer is don't parse HTML with regex.

    Assuming you've extracted the string 43. Lorem from a tag and you want to get a number out then it depends on your requirements:

    To find any number: \d+

    To find any number at the beginning: ^\d+

    To find any number followed by a period: \d+\.

    A more complete solution will require more details about the problem including the programming language you want to use.