← Back to digests

How it works

Daily Tech Digest is an automated pipeline that reads the day's top tech stories, summarizes them with an LLM, and publishes the result here, to a database, and to Discord — every morning, with no manual input.

The daily pipeline

Once a day a scheduled job walks the stories through five stages. Each stage is an isolated module, so a failure in one source never takes down the whole run.

  1. Gather. Pluggable sources fetch the latest stories — the Hacker News API plus six RSS feeds (Ars Technica, TechCrunch, The Verge, Wired, PCMag, MIT Technology Review). Adding a feed is a one-line config change.
  2. Dedupe. The same article often appears on multiple sites. Stories are deduplicated by URL so nothing gets summarized — or paid for — twice.
  3. Extract. For each story, the main article text is pulled from the page (stripping nav, ads, and boilerplate) so the model summarizes the actual content, not the chrome around it.
  4. Summarize. Everything is sent to an OpenAI model in one pass, which writes a short overview paragraph and then groups the stories into topic sections (AI, Security, Hardware, and so on) as markdown.
  5. Store & deliver. The digest and its source stories are saved to PostgreSQL, then posted to a Discord channel (split across messages to respect Discord's length limit).

The whole run is idempotent: if today's digest already exists it does nothing, so re-running is always safe.

Serving the data

A small REST API (FastAPI) reads the digests back out of Postgres and exposes them as JSON — a list of all digests, the latest one, and any specific day. This website is a server-rendered Next.js app that calls that API and renders each digest's markdown with its source links.

Where it runs

The backend — pipeline, database, and API — lives on a Linux VPS. The API runs as a managed service that restarts on failure or reboot, sits behind nginx as a reverse proxy with an automatically-renewing HTTPS certificate, and the daily run is driven by cron. The frontend is deployed separately on Vercel's CDN, talking to the backend over HTTPS.

The stack