I uploaded my spider on scrapyhub. I understand how to upload with my *.txt file, but how do I use it?
My setup.py file looks like
setup(
name = 'project',
version = '1.0',
packages = find_packages(),
package_data={
'youtube_crawl': ['resources/Names.txt']
},
entry_points = {'scrapy': ['settings = youtube_crawl.settings']},
)
Then I want to use this Name.txt.
before uploading my spider looks like
def parse(self, response):
with open('resources/Names.txt','rt') as f:
for link in f:
url = "https://www.youtube.com/results?search_query={}".format(link)
name = link.replace('+',' ')
yield Request(url, meta={'name':name}, callback=self.parse_page, dont_filter=True)
So my question is: how I can use my file on scraping hub?
I tried this code but don't understand how it works, and how integrate it with my code =)
data = pkgutil.get_data("youtube_crawl", "resources/Names.txt")
The function returns a binary string that is the contents of the specified resource.
This line of code:
data = pkgutil.get_data("youtube_crawl", "resources/Names.txt")
is equivalent to this block:
with open('resources/Names.txt') as f:
data = f.read()
f.closed
So now you could read the binary string line by line:
def parse(self, response):
data = pkgutil.get_data("youtube_crawl", "resources/Names.txt")
for link in data.split('\n'):
url = "https://www.youtube.com/results?search_query={}".format(link)
name = link.replace('+',' ')
yield Request(url,
meta={'name':name},
callback=self.parse_page,
dont_filter=True)
Take a look at Python 3 pkgutil or inputoutput doc pages for more details.