AI Engineer Daily Driver — Stack Files

Cursor + Claude API + Python + Modal + Postgres + Fly.io. The exact setup used to ship an LLM-powered production feature over six weeks.

Why This Stack

The defining constraint was: fast iteration on prompts and model behavior, not fast infrastructure setup. Modal handles compute spikes without requiring a Kubernetes cluster — you write Python and it runs wherever it needs to run. Fly.io for the long-running service layer. Postgres for everything that needs to persist. The Claude API was chosen over OpenAI at the start of this project after a two-day eval; the structured output quality was the deciding factor for our extraction use case.

The Stack

Tool	Role	Cost / Tier
Cursor	AI-assisted code editor (VS Code fork)	$20/mo Pro
Claude API (claude-sonnet-4-20250514)	LLM inference, primary model	Usage-based, ~$3-15/mo at our scale
Python 3.12	Primary language for AI workloads	Free
Modal	Serverless GPU/CPU compute for ML workloads	Usage-based, ~$0.10/GPU-hour
Postgres (Fly.io managed)	Primary database	$0/mo free tier
Fly.io	Long-running service deployment	$5-20/mo

The Prompt Iteration Loop

The actual day-to-day was: edit prompt in a Jupyter notebook, run against 20 test cases from a fixture file, evaluate outputs, iterate. The fixture-based evaluation setup was built on day two and paid off on every subsequent day. Build this first.

Modal made the "run this on 500 documents in parallel" problem trivial. We were processing batches that would have timed out on a single machine; Modal fanned them out transparently. The billing for occasional heavy compute was acceptable at our scale.

What I'd Change

I'd add structured output validation earlier with Pydantic. We spent time debugging malformed model outputs that a schema would have caught immediately. Also: Modal's cold start time on GPU instances is real. For latency-sensitive inference, you need the keep-warm option, which costs money even at idle.