One FastAPI App, Two Runtimes

Part 4 of a 5-part series on the engineering behind Arc Radius, a platform that tracks US state legislation affecting LGBTQ+ youth. This post covers the serving layer — how a single codebase runs both as a local dev server and as a production AWS Lambda, and reconfigures its entire backend at startup without a code change.

The problem

A serving API has competing audiences. A developer running it on a laptop wants it to start instantly, talk to whatever's convenient, and not require a cloud account to see a response. Production wants it horizontally scalable, cheap at idle, and wired to the real datastores. The usual outcome is two diverging codebases — a "local version" and a "prod version" — that drift apart until the thing you tested isn't the thing you shipped.

We wanted the opposite: one artifact, byte-for-byte, that runs both ways. The same code that a developer runs under uvicorn on a laptop is the same code that runs on AWS Lambda behind API Gateway in production. No fork, no "prod build," no divergence.

There's a second wrinkle specific to this project. The serving layer can read from several different backends — a Neo4j knowledge graph, a Supabase Postgres database, the LegiScan API directly, or an in-memory mock — and which backend is appropriate depends on the environment and what's been provisioned. So the app doesn't just need to run in two places; it needs to reconfigure what it talks to, at runtime, based on the environment it wakes up in.

The design: an ASGI app that doesn't know where it's running

The trick that makes one artifact run two ways is an ASGI adapter called Mangum. FastAPI is an ASGI application — it expects to be driven by an ASGI server. Locally, that server is uvicorn:

In production, there's no uvicorn; there's an API Gateway event arriving at a Lambda. Mangum bridges the gap — it's a shim that translates a Lambda/API Gateway event into the ASGI calls the FastAPI app already understands:

That's the whole dual-runtime mechanism. The FastAPI app object is identical in both worlds. Locally, uvicorn drives it. In production, Mangum drives it. The app itself has no idea which one is calling — it just handles requests. One codebase, two front doors.

   LOCAL DEV                          AWS LAMBDA / PROD
   uvicorn src.main:app               API Gateway event
        │                                    │
        ▼                                    ▼
   ASGI server ───────►  app (FastAPI)  ◄─────── handler = Mangum(app)
                            │
        ┌──────────┬────────┴─────────┬──────────────┐
        ▼          ▼                 ▼              ▼
   analytics    bills            generation       states
   router       router           router           router
                              [mounted only if flag on]

The second half of the design — reconfiguring the backend — runs on a small config object and two feature flags. There's a single env-driven settings singleton that the app and every router consult. Two booleans do the heavy lifting:

NEO4J_UI_ENABLED decides whether the UI read endpoints (list bills, bill detail, list states) pull from the Neo4j graph or from a mock data layer. Default: off.
RAG_AND_GENERATION_ENABLED decides whether the natural-language "Ask" feature and its /generate router exist at all. Default: on.

The flags don't just toggle behavior inside an endpoint — RAG_AND_GENERATION_ENABLED controls whether the generation router is even mounted. Flip it off and those routes don't return errors; they don't exist. The shape of the API itself changes based on configuration.

Crucially, the two backend paths behind the UI flag are interchangeable by construction: whether a request is served from Neo4j or from the mock, both return the identical Pydantic response types and both use the same pagination cursor. A caller can't tell which backend answered. That's what makes the mock genuinely useful — a developer can run the whole UI API with no Neo4j instance at all and get responses shaped exactly like production's.

The decision that defines this post

Here's the detail worth dwelling on: the feature flags are not symmetric, and the asymmetry is intentional.

It would be tidy to make both flags behave the same way — same default, same parsing. They don't, and each difference encodes something real about how the app is meant to be operated.

NEO4J_UI_ENABLED is opt-in and compound-guarded. It's true only if the operator explicitly sets it on and the Neo4j credentials are actually present. That second condition is the interesting one: the flag self-disables if the credentials are missing. An operator can ask for the Neo4j backend, but if they haven't supplied a URI and user, the flag quietly resolves to false instead of letting the app start up and then crash the first time it tries to open a driver. The flag is defensive — it refuses to promise a backend it can't actually reach.

RAG_AND_GENERATION_ENABLED is the mirror image: opt-out and default-on. It's true unless explicitly disabled. The reasoning is about who each default serves. RAG-on-by-default is a local-dev convenience — a developer cloning the repo gets the marquee feature working immediately without hunting for a flag to set. But the docstring is explicit that production should turn it off, because in the Lambda deployment that capability is wired differently. So one flag defaults off to protect against a missing-dependency crash, and the other defaults on to smooth the local-dev experience. Same mechanism, opposite defaults, each chosen for its environment.

This is the kind of decision that looks like a small inconsistency until you see the intent: defaults are a UX decision, and the right default depends on who's most likely to be affected by getting it wrong. A missing-credential crash hurts everyone, so that path is guarded shut. A missing-RAG-flag annoyance only hurts a local developer, so that path is opened by default and documented to be closed in prod.

There's one more decision worth naming because it surprises people: the app deliberately runs two separate Neo4j clients. The UI and analytics endpoints use an async driver, because they're FastAPI coroutines and the whole point of async is to keep the event loop free. But the RAG retrieval path uses a synchronous module-scope driver, because the underlying GraphRAG retriever library runs synchronously. Rather than force one model onto code that doesn't fit it, the serving layer keeps both — and bridges them by offloading the synchronous RAG pipeline onto a worker thread (asyncio.to_thread) so it doesn't block the async event loop. Two drivers isn't an accident or a cleanup someone forgot; it's two execution models living side by side because the code on each side genuinely needs a different one.

Where it's fragile

The honest weak points are the cost of all this flexibility — more configuration surface, more ways for environment and code to disagree.

Configuration is a real surface area, and config bugs are silent. The settings object reads environment variables on every access and normalizes blank strings to "unset." That's robust — a whitespace-only env var won't masquerade as a real value — but it also means a misconfigured deployment fails quietly: the app starts fine and just serves from the wrong backend, or omits a router, with no loud error to catch it. The flexibility that lets the app run anywhere also lets it run subtly wrong anywhere.
Two runtimes means two behaviors to keep honest. One artifact running under both uvicorn and Mangum is the goal, but the two paths aren't truly identical — Lambda has cold starts, a connection-reuse model across warm invocations, and a request lifecycle that local uvicorn doesn't. The code is shared; the runtime characteristics are not. Something can pass locally and behave differently under Lambda precisely because the environment differs even though the code doesn't.
The flag asymmetry is correct but unobvious. The very thing this post praises — two flags with opposite defaults — is a trap for the next person. RAG defaulting on and needing to be turned off in prod is exactly the kind of thing that gets missed, because "default on" feels like "leave it alone." The intent lives in a docstring, which is one grep away from being invisible.
Two Neo4j drivers is two lifecycles to manage. Keeping a sync and an async driver is the right call, but it doubles the connection-management story — two things to open, reuse, and close correctly, with different rules for each.

Roadmap

The through-line is one artifact, many runtimes, reconfigured by environment — and the roadmap is about making that flexibility safer to operate, not reducing it.

The most useful item is fail-loud configuration validation at startup. The flags already guard against the worst crash, but the app could go further — validate the full config the moment it boots and refuse to start (or at least log loudly) when the environment is internally inconsistent, so a misconfigured deployment announces itself instead of silently serving from the wrong place.

Second, a startup banner that states the resolved configuration — which runtime, which flags, which backend each path will use. Half the risk in a multi-backend, multi-flag system is not knowing which configuration actually took effect; surfacing the resolved state turns a silent misconfiguration into something visible in the first line of the logs.

Third, closing the local/Lambda behavior gap where it matters most — particularly around connection lifecycle and cold starts, the places where shared code still behaves differently across the two runtimes. Even a thin local emulation of the Lambda request lifecycle would catch the class of bug that only appears in production.

And finally, promoting the prod-vs-local flag intent out of the docstring into something harder to miss — an explicit environment profile, or a check that warns when RAG is left on in a Lambda context — so the asymmetry that's correct-by-design doesn't become a footgun for whoever operates this next.

Next in the series — and a change of pace: a standalone guide to reading Cypher if you already think in SQL, drawn from the query layer that powers everything in this system.

One FastAPI App, Two Runtimes

The problem

The design: an ASGI app that doesn't know where it's running

The decision that defines this post

Where it's fragile

Roadmap

More From Product Bulletin

Building GraphRAG for Legislative Search

An Event-Driven Pipeline Choreographed Entirely Through S3

Teaching a Model to Read Legislation — and to Clean Its Own Data