Deployment & data plane¶

The single most consequential decision is where the model runs and under what contract. Get this wrong and nothing else in the architecture matters; every subsequent design decision is downstream of whether PHI can legally flow through the system at all.

HIPAA-eligible inference¶

Use one of:

Direct enterprise contract with the model provider, with a Business Associate Agreement (BAA) that explicitly covers the API endpoints you are using.
AWS Bedrock under the AWS BAA, with the specific model confirmed in scope.
Google Cloud Vertex AI under the Google Cloud BAA, with the specific model confirmed in scope.
Azure OpenAI under the Microsoft BAA, with the specific deployment confirmed in scope.

Default consumer or developer-tier API access is not BAA-covered, even if the underlying model is the same. The contract is the difference, not the model weights.

What to verify before sending PHI¶

BAA executed with the model provider.
BAA executed with the cloud host (if different).
BAAs in place with every subprocessor in the chain (logging, observability, fine-tuning vendors, vector DB host).
Zero data retention confirmed in writing: prompts and completions are not retained for vendor model improvement or human review.
Vendor-side telemetry disabled or replaced with your own HIPAA-controlled logging.
Specific features in scope. Not all betas, tools, or model variants are BAA-eligible. Verify the exact configuration.
BAA scope and version re-checked against the current agreement. A BAA is scoped to specific products, plan tiers, and features, and that scope changes over time. A BAA signed for one product (for example, a provider's API) may not extend to a newer offering (an enterprise plan, a new tool) without signing a new agreement. Providers maintain an explicit list of eligible services; treat coverage as something to re-verify at each procurement and at each vendor product change, not a one-time check. (As of 2026, Anthropic's BAA is one concrete example: API and HIPAA-ready Enterprise are covered under separate, dated agreements, while Console, consumer tiers, and beta features are excluded.)
Data residency pinned to US (or your required region). HIPAA does not strictly mandate residency, but payer contracts and state law often do.
Model improvement opt-ins disabled.

Tenancy and isolation¶

Single-tenant or strongly isolated multi-tenant. If your platform serves multiple covered entities, every layer (inference, retrieval index, logs, caches) must enforce tenant boundaries. Cross-tenant retrieval is the failure mode that ends companies.

Network architecture¶

Private connectivity to inference endpoints (VPC endpoints, Private Service Connect) where supported.
Egress controls so the model context cannot exfiltrate PHI to unintended destinations.
Secrets in a managed secrets store, not in code or environment files.

What "model + version" means in clinical context¶

Pin to a specific model version. Auto-upgrades to a "latest" alias change the system's behavior without notice: fine for a chatbot, not fine for clinical generation. Treat version changes as code changes: re-run evaluation, document the diff, get clinical sign-off.

Common mistakes¶

Treating the BAA as a procurement formality instead of a system-design constraint. Read it. Understand what's in scope.
Forgetting that embeddings are PHI. The vector store hosting embeddings of clinical text needs the same controls as the EHR.
Assuming "we'll add the BAA before launch" is acceptable. PHI sent before the BAA is in place is a breach, retroactively un-fixable.
Leaving default vendor request-logging on. Many providers log requests by default for abuse monitoring; this requires an explicit opt-out and a contractual carve-out.