Multi-threaded web scraper for terminal schedule data collection
A headless scraping service that collects departure schedules from terminal websites using Puppeteer and Cheerio. Runs as a Dockerized service with configurable proxy rotation, multi-threaded workers for parallel scraping, and scheduled execution. Feeds data into the Terminals Data analytics dashboard via API.
6,064 total lines of code
December 2023 — Present