Skip to content

Scrapyd Server

Clone the repo

Install python packages:

sudo apt install python3 python3-dev python3-venv

Install some extra packages for the pip modules later:

sudo apt install default-libmysqlclient-dev build-essential pkg-config

Create the python venv: python3 -m venv /path/to/new/virtual/environment

Start venv with source env/bin/activate

And install some extras pip install scrapeops_scrapy python-dotenv

Install scrapy, scrapyd and mysql connector pip install scrapy scrapyd mysql mysql-connector-python

create ~/scrapyd.conf. Binding to 0.0.0.0 allowa external access.

[scrapyd]
bind_address = 0.0.0.0
#username = seb
#password = superpassword

Setup scrapyd as a service in systemd by creating the file /etc/systemd/system/scrapyd.service

[Unit]
Description=Scrapyd service
After=network.target
[Service]
User=seb
Group=seb
WorkingDirectory=/home/seb/env/scrapy/products
ExecStart=/home/seb/env/scrapy/env/bin/scrapyd
[Install]
WantedBy=multi-user.target

Enable and start the systemd services

sudo systemctl enable scrapyd

systemctl start scrapyd

As the server is currently on a private network with dynamic IP, use a tunnel and Nginx reverse proxy.

Create the tunnel and installed cloudflared on the server (it’s in the repo)

Setup nginx

server {
listen 443 ssl http2;
listen [::]:443 ssl http2;
ssl_certificate /etc/ssl/private/pricemonitor.pro.pem;
ssl_certificate_key /etc/ssl/private/pricemonitor.pro.key;
server_name scrapy.pricemonitor.pro;
location / {
proxy_pass http://localhost:6800/;
proxy_set_header X-Forwarded-Proto http;
}
}

This forwards the domain to the localhost scrapyd service with no authentication or restrictions.

Create a new application for scrapyd and set an access policy to Service Auth

image.png

image.png

https://developers.cloudflare.com/cloudflare-one/identity/service-tokens/

Authenticate with:

Terminal window
curl -H "CF-Access-Client-Id: <CLIENT_ID>" -H "CF-Access-Client-Secret: <CLIENT_SECRET>" https://app.example.com

Subsequent requests:

Terminal window
curl -H "cookie: CF_Authorization=<CF_AUTHORIZATION_COOKIE>" https://app.example.com