The daily pipeline
Once a day a scheduled job walks the stories through five stages. Each stage is an isolated module, so a failure in one source never takes down the whole run.
- Gather. Pluggable sources fetch the latest stories — the Hacker News API plus six RSS feeds (Ars Technica, TechCrunch, The Verge, Wired, PCMag, MIT Technology Review). Adding a feed is a one-line config change.
- Dedupe. The same article often appears on multiple sites. Stories are deduplicated by URL so nothing gets summarized — or paid for — twice.
- Extract. For each story, the main article text is pulled from the page (stripping nav, ads, and boilerplate) so the model summarizes the actual content, not the chrome around it.
- Summarize. Everything is sent to an OpenAI model in one pass, which writes a short overview paragraph and then groups the stories into topic sections (AI, Security, Hardware, and so on) as markdown.
- Store & deliver. The digest and its source stories are saved to PostgreSQL, then posted to a Discord channel (split across messages to respect Discord's length limit).
The whole run is idempotent: if today's digest already exists it does nothing, so re-running is always safe.
Serving the data
A small REST API (FastAPI) reads the digests back out of Postgres and exposes them as JSON — a list of all digests, the latest one, and any specific day. This website is a server-rendered Next.js app that calls that API and renders each digest's markdown with its source links.
Where it runs
The backend — pipeline, database, and API — lives on a Linux VPS. The API runs as a managed service that restarts on failure or reboot, sits behind nginx as a reverse proxy with an automatically-renewing HTTPS certificate, and the daily run is driven by cron. The frontend is deployed separately on Vercel's CDN, talking to the backend over HTTPS.
The stack
- Backend: Python, SQLAlchemy, FastAPI, PostgreSQL
- Data: Hacker News API, RSS (feedparser), article extraction (trafilatura)
- Summarization: OpenAI API
- Frontend: Next.js (App Router), server components, react-markdown
- Ops: VPS, nginx, Let's Encrypt, systemd, cron, Vercel