scrapypythonanywhereaws-cloud9

Instal Scrapy on PythonAnywhere? (or Cloud9)


Can I run Scrapy on the free level of PythonAnywhere? I've looked, but haven't found instructions for installing it there.

If it can't be run on the free level of PythonAnywhere, is there another online environment where I can run Scrapy without needing to install Python and Scrapy on my computer?

EDIT: My question was just about PythonAnywhere, but in finding the answer to the question, I came across Cloud9 and found it to be a preferable alternative, which is explained in the answer.


Solution

  • Short summary:

    ====================================

    There were three parts to my question:

    1. Can I run Scrapy in the free level of PythonAnywhere? This part has been answered: Yes, but with debilitating restrictions.

    The other two parts have not been answered, but I've found some answers and will share them here.

    1. What other online environments allow me to run Scrapy without needing to install Python and Scrapy on my computer? I haven't found a direct answer to this, but the free tutorial website, Python for Everybody ("Py4E"), has a page, Setting up your Python Development Environment, which lists four online Python environments. It provides a brief tutorial on PythonAnywhere and then just provides links to the other three: Trinket, Cloud9, and CodeAnywhere.

    None of those four environments say anything about running Scrapy on them. With some more research, I did find out how to use Scrapy in PythonAnywhere, which I explain next below. Of the other three, Cloud9 is part of Amazon's AWS suite, which is a sophisticated set of software tools that I've used other parts of before. Based on that, I assumed it also accommodates Scrapy, and I checked it out as well. I've added the results of that below as a new part 4 to my question.

    1. Now, the main part of my question: How to install Scrapy on PythonAnywhere? The answer is:

    It's amazing that PythonAnywhere's otherwise excellent documentation doesn't say anything about this. I found it out by following instructions that I hoped would lead me to installing Scrapy:

    python3 -m pip install "SomeProject"
    

    (* Footnote below on syntax of that command)

    The instructions said that "SomeProject" is supposed to be a project that's included in the Python Package Index, so I went there and searched for Scrapy. It gave me a list of 681 projects with "scrapy" in the name, and some of them looked like they might be various versions of Scrapy itself. None of them were called just "Scrapy", but the Scrapy instruction quoted above said to use just that name. So I held my breath and entered:

    python3 -m pip install Scrapy
    

    And guess what I got? PythonAnywhere told me:

    Requirement already satisfied: Scrapy in /usr/local/lib/python3.9/site-packages (2.5.0)
    

    That was followed by a couple of dozen more lines that all started with "Requirement already satisfied", which I took to be the dependencies required by Scrapy, all of them already present and ready to roll.

    scrapy startproject tutorial
    

    I entered that, and PythonAnywhere told me that it had successfully created a new project. Since this was a Scrapy command, I conclude that, yes, indeed, I already have Scrapy installed and running on PythonAnywhere. No installation necessary!

    1. What about Cloud9? As I said above in my answer to part 2, when I found out about Cloud9, I was interested because it's part of Amazon Web Services ("AWS"). I've used other parts of AWS before and found them to be sophisticated, complicated, powerful, and well-documented. They are also very economical.

    AWS is a commercial system run by Amazon. It charges fees based on usage, with no minimums, and with low-volume usage being free. The pricing page for Cloud9 shows it to be no exception. Cloud9 itself is free to use, but using it calls on other AWS resources that have charges.

    The pricing page gives the following example: "If you use the default settings running an IDE for 4 hours per day for 20 days in a month with a 30-minute auto-hibernation setting your monthly charges for 90 hours of usage would be ... $2.05". That's less than half the lowest monthly cost of PythonAnywhere. (As stated in the answer by Giles Thomas, the free level of PythonAnywhere is not very useful for Scrapy.) I'm not sure how the amount of usage in the Cloud9 example compares with the amount of usage allowed by PythonAnywhere's $5/mo service, but my usage is going to be a lot less than either one, so I expect my cost of using Cloud9 to be very low, and possibly nothing. Furthermore, if I only use Scrapy for a project a couple of times a year, with PythonAnywhere, I'd have to close my account in between projects to stop being charged, but AWS doesn't charge me when I'm not using it, so I can keep the account with no cost between projects.

    So based on both the quality of the AWS modules I've used and the low usage cost, I was very interested in Cloud9 as an alternative.

    And I was not surprised to find that I could use Scrapy in it.

    To figure that out, I quickly abandoned the webpage instructions in favor of downloading a pdf of the comprehensive User Guide from the documentation page. Comprehensive = 595 pages! But it's very well organized and cross-referenced, so I was able to learn what I needed by reading about 20 pages, which included a tutorial on using the GUI environment (pg 29..38) and another on using Python in Cloud9 (pg 423..7).

    In that second tutorial, I ran:

    After that tutorial, I was ready to find out if Scrapy is there. I had learned by then about pip show, so I ran:

    The answer was no:

    So I repeated the command that I'd done earlier in PythonAnywhere:

    This time, there were very few "Requirement already satisfied"s and instead there were a lot of "Collecting ... Downloading"s, followed by "Installing collected packages" and then "Successfully installed" with a long list that included Scrapy-2.6.1.

    I repeated python -m pip show Scrapy and got several lines of output that told me Scrapy 2.6.1 is installed. Finally, I ran the same test I'd run before in PythonAnywhere, the first instruction in the official Scrapy tutorial:

    scrapy startproject tutorial
    

    and got the same output as before, telling me that the project had been created.

    Bingo! I have Scrapy running in Cloud9.

    On the negative side, there was a problem here. AWS has two levels of sign-in authority, called root users and IAM. For proper security, I should be running Cloud9 as an IAM user, but there was a problem being able to sign in that way. I posted a question on SO about that, but while waiting for an answer, went ahead and started using Cloud9 as the root user. In the course of that, I got the message:

    WARNING: Running pip install with root privileges is generally not a good idea.

    That warning came with a suggestion of an alternative command that didn't make sense and didn't work when I tried it. So I'm not sure how much I've messed up the security of my AWS account by what I've been doing here. My work is not secretive, so the security may be a non-issue, but I'd still like to figure out how to proceed as an IAM user and clean up any damage I might have caused by what I've been doing as the root user. If anyone knows about that, please respond to the SO question about it linked in the previous paragraph.

    So now I've got Scrapy running in Cloud9, and I'm going to go find out if it can get the data I need. I'll make another edit here if there are any surprises in terms of Cloud9 either (a) not being able to do something or (b) resulting in unexpected charges.

    ====================================

    (*) Footnote on syntax of python3 -m pip install "SomeProject": Since I was working in something called PythonAnywhere, I was tempted to think that this was a Python command. But then I had to remember that, within PythonAnywhere, I was working in Bash, a Unix shell. So Python3 is a Unix command. I haven't found documentation of that exact command, but did of a command it's presumably based on Python. That documentation says, "-m module-name Searches ... for the named module and runs the corresponding .py file as a script." So this means that pip is a Unix module written in Python for installing Python packages. Then install <project name> is a parameter of the pip module. (Somebody please correct me if I've said any of that wrong.)