Scrapyd Server
Install the spider on the server
Section titled “Install the spider on the server”Clone the repo
Install python packages:
sudo apt install python3 python3-dev python3-venv
Install some extra packages for the pip modules later:
sudo apt install default-libmysqlclient-dev build-essential pkg-config
Create the python venv: python3 -m venv /path/to/new/virtual/environment
Start venv with source env/bin/activate
And install some extras pip install scrapeops_scrapy python-dotenv
Install scrapyd on the server
Section titled “Install scrapyd on the server”Install scrapy, scrapyd and mysql connector pip install scrapy scrapyd mysql mysql-connector-python
create ~/scrapyd.conf. Binding to 0.0.0.0 allowa external access.
[scrapyd]bind_address = 0.0.0.0#username = seb#password = superpasswordInstall scrapyd service
Section titled “Install scrapyd service”Setup scrapyd as a service in systemd by creating the file /etc/systemd/system/scrapyd.service
[Unit]Description=Scrapyd serviceAfter=network.target
[Service]User=sebGroup=sebWorkingDirectory=/home/seb/env/scrapy/productsExecStart=/home/seb/env/scrapy/env/bin/scrapyd
[Install]WantedBy=multi-user.targetEnable and start the systemd services
sudo systemctl enable scrapyd
systemctl start scrapyd
Cloudflare Tunnel
Section titled “Cloudflare Tunnel”As the server is currently on a private network with dynamic IP, use a tunnel and Nginx reverse proxy.
Create the tunnel and installed cloudflared on the server (it’s in the repo)
Setup nginx
server { listen 443 ssl http2; listen [::]:443 ssl http2; ssl_certificate /etc/ssl/private/pricemonitor.pro.pem; ssl_certificate_key /etc/ssl/private/pricemonitor.pro.key; server_name scrapy.pricemonitor.pro;
location / { proxy_pass http://localhost:6800/; proxy_set_header X-Forwarded-Proto http; }}This forwards the domain to the localhost scrapyd service with no authentication or restrictions.
Cloudflare Zero Trust
Section titled “Cloudflare Zero Trust”Create a new application for scrapyd and set an access policy to Service Auth
https://developers.cloudflare.com/cloudflare-one/identity/service-tokens/
Authenticate with:
curl -H "CF-Access-Client-Id: <CLIENT_ID>" -H "CF-Access-Client-Secret: <CLIENT_SECRET>" https://app.example.comSubsequent requests:
curl -H "cookie: CF_Authorization=<CF_AUTHORIZATION_COOKIE>" https://app.example.com
