Schedule Scraper Worker

Automated web scraper that collects real-time transport schedule data

About

A headless worker service that continuously scrapes transport schedule data from third-party websites using Puppeteer. Runs in configurable cycles with proxy rotation, posts collected data to a central API, and supports seat availability tracking and capacity change detection. Deployed as a Docker container with automatic restarts.

Highlights

Headless browser automation with Puppeteer for dynamic page scraping
Configurable scraping cycles with proxy rotation support
Real-time seat availability and capacity change detection
Remote configuration via API-driven variable definitions
Dockerized deployment with automatic restart and memory management

Tech Stack

Node.jsTypeScriptPuppeteerCheerioAxiosDocker

Code Stats

HTML85.3%

TypeScript14.7%

8,612 total lines of code

Duration

March 2025 — Present

Private repository

Previous Next