Back to projects

Schedule Scraper Worker

Automated web scraper that collects real-time transport schedule data

About

A headless worker service that continuously scrapes transport schedule data from third-party websites using Puppeteer. Runs in configurable cycles with proxy rotation, posts collected data to a central API, and supports seat availability tracking and capacity change detection. Deployed as a Docker container with automatic restarts.

Highlights

  • Headless browser automation with Puppeteer for dynamic page scraping
  • Configurable scraping cycles with proxy rotation support
  • Real-time seat availability and capacity change detection
  • Remote configuration via API-driven variable definitions
  • Dockerized deployment with automatic restart and memory management

Tech Stack

Node.jsTypeScriptPuppeteerCheerioAxiosDocker

Code Stats

HTML85.3%
TypeScript14.7%

8,612 total lines of code

Duration

March 2025 — Present

Private repository
PreviousNext