Back to projects

Terminals Updater

Multi-threaded web scraper for terminal schedule data collection

About

A headless scraping service that collects departure schedules from terminal websites using Puppeteer and Cheerio. Runs as a Dockerized service with configurable proxy rotation, multi-threaded workers for parallel scraping, and scheduled execution. Feeds data into the Terminals Data analytics dashboard via API.

Highlights

  • Multi-threaded scraping with Node.js Worker Threads for parallel data collection
  • Headless browser automation with Puppeteer and HTML parsing with Cheerio
  • Configurable proxy rotation for reliable scraping at scale
  • Dockerized deployment with scheduled execution and block-time management
  • API integration to feed scraped data into analytics dashboard

Tech Stack

TypeScriptPuppeteerCheerioNode.js Worker ThreadsDockerAxios

Code Stats

TypeScript69.5%
YAML17.9%
CSV12.6%

6,064 total lines of code

Duration

December 2023 — Present

Private repository
PreviousNext