I'm trying to get the regular expression for "example.com/page/200/".
Here's what I've done so far:
rules = (Rule (SgmlLinkExtractor(
allow=("//page/\d+",),
restrict_xpaths=('xxxxx',)),
callback="details", follow= True),
)
Could anyone of you give me a solution? Thanks.
You have an extra slash, and you need to use a raw string. And, since there is a single expression only, you don't need to pass a tuple to allow:
rules = (Rule(SgmlLinkExtractor(allow=r"/page/\d+", restrict_xpath=('xxxxx',)),
callback="details", follow= True),)