regexpython-2.7scrapysgmlsgml-mode

get sgml allow regex for "example.xom/page/200/"


I'm trying to get the regular expression for "example.com/page/200/".

Here's what I've done so far:

rules = (Rule (SgmlLinkExtractor(
  allow=("//page/\d+",),
  restrict_xpaths=('xxxxx',)),
  callback="details", follow= True),
)

Could anyone of you give me a solution? Thanks.


Solution

  • You have an extra slash, and you need to use a raw string. And, since there is a single expression only, you don't need to pass a tuple to allow:

    rules = (Rule(SgmlLinkExtractor(allow=r"/page/\d+", restrict_xpath=('xxxxx',)), 
                  callback="details", follow= True),)