regexparsing

Regex to match copyright statements


I don't know much of regex, and I'm trying to find a pattern that allows me to match copyright statements such as:

'Copyright © 2019 Company All Rights Reserved'
'© 2019 Company All Rights Reserved'
'© 2019 Company'

And as many other combinations as possible.

I found this regex pattern in https://github.com/regexhq/copyright-regex/blob/master/index.js

/(?!.*(?:\{|\}|\);))(?:(copyright)[ \t]*(?:(©|\(c\)|&#(?:169|xa9;)|©)[ \t]+)?)(?:((?:((?:(?:19|20)[0-9]{2}))[^\w\n]*)*)([ \t,\w]*))/i

I was trying it here https://regex101.com/ and while it works with 'Copyright © 2019 Company All Rights Reserved', it doesn't work with '© 2019 Company All Rights Reserved'. How can I change it for it to also match when the word Copyright is not there?


Solution

  • I think that pattern can be simplified for your example data because it contains superfluous grouping structures and you might omit the negative lookahead at the start the asserts that the string does not contain {, } or );

    (?:copyright[ \t]*)?(?:©|\(c\)|&#(?:169|xa9;)|©)[ \t]+(?:19|20)[0-9]{2} Company(?: All Rights Reserved)?
    

    Regex demo

    You can extend the pattern to your requirements.

    That will match