Building AI agents for healthcare is a different discipline than building AI agents for most other industries. The technical fundamentals are the same - retrieval, reasoning, tool use, orchestration - but the compliance requirements change how you architect every layer of the system. Get it wrong, and your organization faces fines up to $1.5 million per violation category per year, along with reputational damage that is hard to recover from.
This guide covers the specific technical decisions you need to make when building AI agents that handle Protected Health Information (PHI) under HIPAA. It is not a general HIPAA primer or an overview of healthcare regulations. It assumes you already know the basics and need to understand how HIPAA requirements map to the architectural choices involved in building and deploying modern AI systems.
Key Takeaways- AI agents introduce HIPAA risks that traditional software does not: training data exposure, prompt injection leading to PHI leaks, inference logging, and third-party API calls
- Every service in your AI pipeline that touches PHI must be covered by a Business Associate Agreement (BAA)
- De-identification using Safe Harbor or Expert Determination methods can significantly reduce your compliance surface area
- Architecture patterns like PHI isolation boundaries, encryption in transit and at rest, and tamper-evident audit logging are non-negotiable
- Common mistakes include sending PHI to LLM APIs without BAAs, logging raw prompts, and using patient data for fine-tuning without proper de-identification
Why HIPAA Matters Differently for AI Systems
HIPAA has been around since 1996. The Security Rule was finalized in 2003. Most healthcare IT teams have years of experience building HIPAA-compliant web applications, databases, and APIs. But AI agents introduce a fundamentally new class of risks that traditional application architectures simply do not have to deal with.
Training data exposure. When you fine-tune a model on clinical data, that data can be memorized and later reproduced during inference. Research has demonstrated that large language models can reproduce training examples verbatim under the right prompting conditions. If your training set includes PHI - patient names, dates of birth, medical record numbers, diagnoses - that information can surface in model outputs to users who should never see it. A 2023 study from Google DeepMind showed that extractable memorization in LLMs scales with model size and data duplication. Inference-time PHI handling. AI agents do not just retrieve and display data. They reason over it, combine it with other context, and generate new text. A clinical AI agent might pull a patient's medication list from an EHR, combine it with lab results from a FHIR API, and generate a summary for a physician. At every step of that pipeline, PHI is being processed, transformed, and potentially logged. The question is not whether PHI flows through your system. It is whether every component that touches PHI is properly safeguarded. Prompt and completion logging. Most AI engineering teams log prompts and completions for debugging, evaluation, and monitoring. In a healthcare context, those prompts often contain PHI. If your logging pipeline sends data to an unencrypted datastore, a third-party observability tool without a BAA, or a monitoring dashboard accessible to unauthorized staff, you have a HIPAA violation. Third-party API calls. Modern AI agents rely on external services: LLM inference APIs, embedding services, vector databases, tool-calling endpoints. Each of these services is a potential point where PHI leaves your controlled environment. If any of them processes PHI without a BAA in place, you are out of compliance, even if the data is only in transit for milliseconds.The HIPAA Security Rule Mapped to AI Architecture
The HIPAA Security Rule defines three categories of safeguards: Technical, Administrative, and Physical. Here is how each maps to the architectural decisions you will make when building an AI agent.
Technical Safeguards
Access Control (§164.312(a)) - Every component in your AI pipeline needs role-based access control. This includes not just your application layer but your model serving infrastructure, vector databases, prompt logs, and evaluation datasets. A machine learning engineer debugging a model should not have access to the same PHI that a clinical user sees.In practice, this means implementing RBAC at multiple levels:
# Example: Access control policy for AI agent components
roles:
clinical_user:
permissions:
- inference:read
- patient_context:read
phi_access: true
audit_level: full
ml_engineer:
permissions:
- model:deploy
- metrics:read
- evaluation:read
phi_access: false
data_view: de-identified-only
audit_level: full
system_service:
permissions:
- inference:execute
- vector_db:query
- ehr_api:read
phi_access: true
requires_baa: true
audit_level: full
Audit Controls (§164.312(b)) - You must record every access to PHI, every inference call that processes PHI, and every data transformation. This is more granular than typical application logging. For an AI agent, an audit log entry should capture who initiated the request, what PHI was accessed, which model processed it, what the output contained, and where that output was sent.
Integrity Controls (§164.312(c)) - Data at rest and in transit must be protected against unauthorized modification. For AI systems, this extends to model weights (to prevent tampering), training datasets (to prevent poisoning), and configuration files that control PHI access policies.
Transmission Security (§164.312(e)) - All PHI in transit must be encrypted. This includes API calls to LLM providers, queries to vector databases, messages between microservices, and any data flowing between your AI agent's components. TLS 1.2 at minimum; TLS 1.3 preferred.
Administrative Safeguards
Risk Analysis (§164.308(a)(1)) - Before deploying an AI agent that handles PHI, you must conduct a formal risk analysis. For AI systems, this should cover:- Model behavior risks (hallucination, memorization, prompt injection)
- Data flow risks (PHI exposure through logging, caching, or third-party services)
- Access control risks (over-permissioned service accounts, shared credentials)
- Incident response risks (how you detect and respond to a PHI breach involving the AI system)
A Business Associate Agreement is a legal contract between a Covered Entity (or another Business Associate) and a Business Associate. It establishes what the Business Associate is permitted to do with PHI and what safeguards they must maintain. For AI workloads, you need BAAs with the following provider types:
- Cloud Infrastructure (AWS, GCP, Azure) - BAAs available, but not all services are in scope. Check the provider's HIPAA-eligible services list.
- LLM API Providers (OpenAI API, Anthropic API, Google Vertex AI) - Enterprise/API tiers typically support BAAs. Consumer tiers do not.
- Vector Databases (Pinecone, Weaviate managed) - Check whether managed cloud versions offer BAAs. Self-hosted avoids this requirement.
- Logging/Monitoring (Datadog, Splunk) - Both offer BAA-eligible tiers. Confirm before sending any PHI to their platforms.
- CI/CD and MLOps (Weights & Biases, MLflow managed) - If experiment tracking includes PHI samples, the platform needs a BAA.
Physical Safeguards
Facility Access Controls (§164.310(a)) - If you run on-premises GPU infrastructure for model training or inference, physical access controls apply. For cloud deployments, your cloud provider's BAA should cover their physical security obligations, but you must verify that the specific regions and services you use are BAA-eligible. Device and Media Controls (§164.310(d)) - Training data stored on local machines, GPU servers, or removable media must be encrypted and tracked. This is especially relevant for teams that download clinical datasets for local model development.Architecture Patterns for HIPAA-Compliant AI Agents
There is no single correct architecture for a HIPAA-compliant AI agent, but there are patterns that make compliance significantly easier to achieve and maintain.
The PHI Isolation Boundary
The most effective architectural pattern is to establish a clear PHI boundary in your system. Inside the boundary, every component is HIPAA-hardened. Outside the boundary, no PHI ever flows.
The architecture works as follows: EHR/FHIR data flows into the AI Agent Orchestrator, which connects to the clinical user interface, a PHI-aware encrypted vector database, and a tamper-evident audit log. A de-identification service sits at the boundary edge.
The key principle: PHI stays inside the boundary. If data must cross the boundary - for example, to call an external LLM API - it passes through a de-identification service first, or the external service must be covered by a BAA and meet all HIPAA technical safeguard requirements. This boundary-based approach simplifies compliance by concentrating your security controls in a well-defined perimeter rather than trying to harden every individual component independently.
Data Residency and Encryption
PHI must be encrypted at rest using AES-256 and in transit using TLS 1.2+. But encryption alone is not sufficient. You also need to control where data physically resides and ensure that encryption keys are managed properly with appropriate access controls.
For AWS deployments (which is what we typically recommend for healthcare AI workloads), this means:
- S3 buckets with SSE-KMS encryption and bucket policies restricting access to specific VPCs
- RDS or DynamoDB with encryption at rest enabled and VPC-only access
- SageMaker endpoints running in a private VPC with no internet-facing access
- CloudWatch Logs encrypted with KMS keys you control (not AWS-managed keys)
# Example: S3 bucket policy enforcing encryption and VPC-only access
import json
bucket_policy = {
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyUnencryptedTransport",
"Effect": "Deny",
"Principal": "",
"Action": "s3:",
"Resource": [
"arn:aws:s3:::phi-training-data-bucket",
"arn:aws:s3:::phi-training-data-bucket/"
],
"Condition": {
"Bool": {"aws:SecureTransport": "false"}
}
},
{
"Sid": "RestrictToVPC",
"Effect": "Deny",
"Principal": "",
"Action": "s3:",
"Resource": [
"arn:aws:s3:::phi-training-data-bucket",
"arn:aws:s3:::phi-training-data-bucket/"
],
"Condition": {
"StringNotEquals": {
"aws:sourceVpce": "vpce-1a2b3c4d"
}
}
}
]
}
Audit Logging Architecture
HIPAA requires audit logs retained for at least six years. For AI agents, this means logging at a granularity that most engineering teams are not used to.
Here is what a comprehensive audit log entry should look like for an AI agent inference call:
{
"event_id": "evt_8f3a2b1c",
"timestamp": "2026-03-05T14:32:18.442Z",
"event_type": "ai_agent.inference",
"user": {
"id": "usr_clinician_4821",
"role": "physician",
"ip_address": "10.0.4.52"
},
"patient": {
"mrn_hash": "sha256:a1b2c3d4..."
},
"agent": {
"agent_id": "clinical_summary_agent_v2",
"model": "gpt-4-turbo-hipaa",
"model_version": "2026-02-15"
},
"data_accessed": [
"medications_list",
"lab_results_recent",
"problem_list"
],
"phi_categories_in_prompt": [
"patient_name",
"date_of_birth",
"diagnosis_codes"
],
"output_destination": "clinical_ui",
"encryption": {
"in_transit": "TLS_1_3",
"at_rest": "AES_256_KMS"
},
"baa_verified": true
}
Store these logs in a tamper-evident system. AWS CloudTrail with log file validation enabled is one option. A dedicated SIEM (Security Information and Event Management) system with write-once storage is another. No one - not administrators, not engineers, not the AI agent itself - should be able to modify or delete audit records.
Putting It Together: A Reference Architecture
Here is how a complete HIPAA-compliant AI agent architecture looks in practice when deployed on AWS.
All compute runs inside a private VPC in a HIPAA-eligible region. The AI agent orchestrator, vector database, and all data services sit in private subnets with no public-facing access. The only public-facing component is an Application Load Balancer that terminates TLS and sits behind AWS WAF for additional protection.
Calls to AWS Bedrock or SageMaker use VPC Endpoints (PrivateLink) so they never traverse the public internet. They stay on the AWS backbone network, which is both faster and more secure. Before any data leaves the VPC for inference (if using an external API rather than Bedrock), it passes through a de-identification service. If you use AWS Bedrock within the same account and region covered by your AWS BAA, you can pass PHI directly, but you should still minimize the PHI in each prompt to what is strictly necessary for the task.
Audit logging uses CloudTrail with log file validation, writing to an S3 bucket with object lock enabled (Write Once Read Many). No one can modify or delete these records. Encryption is enforced everywhere: KMS customer-managed keys for data at rest, TLS 1.3 for data in transit, with key rotation on a defined schedule.
Key architectural decisions in this design: the private VPC with no public subnets for compute ensures that the AI agent orchestrator and all data services are fully isolated from the public internet. VPC Endpoints via PrivateLink for AWS AI services ensure calls to Bedrock or SageMaker stay on the AWS backbone network, which is both faster and more secure. The PHI scrubbing service at the boundary ensures that any data leaving the VPC for external services is properly de-identified. And the use of AWS-native services covered by the BAA minimizes the number of additional third-party agreements needed.
Common Mistakes That Break HIPAA Compliance
After working on multiple healthcare AI projects, including our AI Clinical Empowerment Platform and Curengo Rehabilitation Platform, we have seen the same compliance mistakes come up repeatedly. Here are the ones that cause the most trouble and how to avoid them.
Sending PHI to Third-Party LLM APIs Without a BAA
This is the single most common violation we see. A development team builds an AI agent that calls the OpenAI or Anthropic API, passes patient data in the prompt, and assumes that because the API uses HTTPS, they are compliant. They are not.
HTTPS encrypts data in transit, but the API provider is still processing PHI. That makes them a Business Associate, and you need a BAA before any PHI touches their systems. Consumer API tiers (like the standard OpenAI API without an enterprise agreement) do not come with BAAs.
The fix: either sign an enterprise agreement that includes a BAA, self-host the model, or de-identify data before it leaves your PHI boundary.
Logging Prompts That Contain PHI
Debugging AI agents means logging prompts and completions. But if those prompts contain "Summarize the medications for John Smith, DOB 04/15/1962, MRN 482910," you have just written PHI to your logging infrastructure. If that logging infrastructure is Elasticsearch in a shared cluster, or a SaaS platform without a BAA, or a developer's local machine, you have a problem.
The fix: implement a PHI-scrubbing layer between your agent and your logging pipeline. Log structured metadata (what type of data was accessed, which agent handled it, response latency) without logging the raw content.
import re
from typing import Any
class PHIScrubber:
"""Scrub PHI from log entries before they reach the logging pipeline."""
PATTERNS = {
"ssn": re.compile(r'\b\d{3}-\d{2}-\d{4}\b'),
"mrn": re.compile(r'\bMRN\s:?\s\d{4,10}\b', re.IGNORECASE),
"phone": re.compile(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b'),
"dob": re.compile(
r'\b(DOB|Date of Birth)\s:?\s\d{1,2}[/-]\d{1,2}[/-]\d{2,4}\b',
re.IGNORECASE
),
"email": re.compile(r'\b[\w.+-]+@[\w-]+\.[\w.-]+\b'),
}
@classmethod
def scrub(cls, text: str) -> str:
for category, pattern in cls.PATTERNS.items():
text = pattern.sub(f'[REDACTED_{category.upper()}]', text)
return text
@classmethod
def safe_log_entry(cls, prompt: str, completion: str,
metadata: dict[str, Any]) -> dict[str, Any]:
return {
"prompt_length": len(prompt),
"completion_length": len(completion),
"prompt_scrubbed": cls.scrub(prompt),
"phi_detected": any(
p.search(prompt) for p in cls.PATTERNS.values()
),
metadata
}
Note: regex-based scrubbing is a starting point, not a complete solution. For production systems, combine it with Named Entity Recognition models trained on clinical text (see the de-identification section below).
Using Patient Data in Fine-Tuning Without De-identification
Fine-tuning a model on clinical notes to improve its understanding of medical terminology sounds reasonable. But if those clinical notes contain PHI and you do not de-identify them first, you have created a model that may memorize and reproduce patient information.
The risk compounds if you later deploy that model in a context where users who are not authorized to see the original patient data can interact with it. Even if direct extraction is difficult, membership inference attacks can reveal whether a specific patient's data was in the training set.
Ignoring Model Output as PHI
If your AI agent takes PHI as input and generates a clinical summary, that summary is also PHI. This seems obvious, but teams frequently treat model outputs differently from inputs in their compliance architecture. The output goes to a frontend without proper access controls, gets cached in a CDN, or is stored in a database that is not encrypted. Every piece of data derived from PHI is itself PHI under HIPAA. Treat it accordingly.
Overlooking Prompt Injection as a PHI Breach Vector
Prompt injection is not just a security concern - it is a HIPAA concern. If an attacker can manipulate your AI agent into revealing PHI from its context window, that is a data breach under HIPAA. This means prompt injection defenses are not optional for healthcare AI agents; they are a compliance requirement.
Mitigations include input validation, output filtering, system prompt hardening, and running sensitive operations in isolated contexts where the model only has access to the minimum PHI required for the current task.
De-identification Strategies for Training Data
HIPAA provides two approved methods for de-identification: Safe Harbor and Expert Determination. Both can be applied to training data for AI models, but they have different tradeoffs.
Safe Harbor Method
The Safe Harbor method requires removing 18 specific categories of identifiers:
For AI training data, the Safe Harbor method is the most straightforward to implement programmatically. You can build an automated pipeline that systematically detects and removes or replaces each category of identifier.
Expert Determination Method
The Expert Determination method requires a qualified statistical expert to certify that the risk of re-identification is "very small." This method allows you to retain more data granularity - for example, keeping partial dates or geographic regions larger than what Safe Harbor allows - which can be important for model performance.
The tradeoff is cost and complexity. You need to engage an expert, document their methodology, and maintain their certification. For organizations building multiple AI models on clinical data, this investment often pays for itself through better model quality.
Practical NER-Based De-identification
For production de-identification pipelines, Named Entity Recognition (NER) models trained on clinical text are the most effective approach. General-purpose NER models miss healthcare-specific identifiers, so you need models specifically trained on clinical corpora.
# Example: Clinical text de-identification pipeline
from dataclasses import dataclass
from enum import Enum
class PHIType(Enum):
PATIENT_NAME = "PATIENT_NAME"
DATE = "DATE"
LOCATION = "LOCATION"
AGE = "AGE"
CONTACT = "CONTACT"
ID = "ID"
@dataclass
class PHIEntity:
text: str
phi_type: PHIType
start: int
end: int
confidence: float
class ClinicalDeidentifier:
"""
De-identification pipeline for clinical text.
Uses a combination of:
1. Fine-tuned NER model for entity detection
2. Rule-based patterns for structured identifiers (MRN, SSN)
3. Context-aware replacement to maintain clinical meaning
"""
def __init__(self, ner_model_path: str):
self.ner_model = self._load_model(ner_model_path)
def deidentify(self, text: str,
preserve_dates_as_offsets: bool = False
) -> dict:
"""
De-identify clinical text while preserving clinical meaning.
Returns:
dict with 'text' (de-identified), 'entities' (found PHI),
and 'method' (Safe Harbor or Expert Determination)
"""
entities = self._detect_phi(text)
deidentified_text = self._replace_entities(
text, entities, preserve_dates_as_offsets
)
return {
"text": deidentified_text,
"entities_found": len(entities),
"entity_types": [e.phi_type.value for e in entities],
"method": "safe_harbor",
"confidence_min": min(
(e.confidence for e in entities), default=1.0
)
}
def _detect_phi(self, text: str) -> list[PHIEntity]:
"""Run NER model + rule-based detection."""
ner_entities = self.ner_model.predict(text)
rule_entities = self._apply_rules(text)
return self._merge_entities(ner_entities, rule_entities)
A practical consideration: de-identification is never 100% perfect. NER models have false negatives. Unusual name formats, nicknames, and contextual identifiers ("the patient who was in that car accident on Highway 101 last Tuesday") can slip through automated pipelines. This is why you should:
For teams working with clinical text at scale, consider using established tools like the i2b2/n2c2 de-identification models as a starting point, then fine-tuning on your specific clinical corpus.
HIPAA Compliance Deployment Checklist
Use this checklist before deploying any AI agent that processes PHI. It is not exhaustive - your specific deployment may have additional requirements - but it covers the areas where we see teams most frequently fall short.
Infrastructure and Encryption
- [ ] All data at rest encrypted with AES-256 (customer-managed KMS keys)
- [ ] All data in transit encrypted with TLS 1.2+
- [ ] Compute resources deployed in HIPAA-eligible cloud regions
- [ ] No public subnets for data processing or storage services
- [ ] VPC endpoints configured for all AWS/cloud service calls
- [ ] S3 bucket policies enforce encryption and VPC-only access
- [ ] Database encryption enabled (RDS, DynamoDB, or equivalent)
- [ ] KMS key rotation schedule configured and documented
Access Control
- [ ] Role-based access control implemented at application and infrastructure levels
- [ ] Principle of least privilege applied to all service accounts
- [ ] Multi-factor authentication required for all human access to PHI systems
- [ ] Service-to-service authentication uses short-lived credentials (IAM roles, not static keys)
- [ ] Access reviews conducted on a defined schedule (at least quarterly)
Business Associate Agreements
- [ ] BAA signed with cloud infrastructure provider
- [ ] BAA signed with LLM API provider (if using external inference)
- [ ] BAA signed with all SaaS tools that process PHI (logging, monitoring, CI/CD)
- [ ] BAA scope verified to include all specific services in use
- [ ] BAA inventory maintained and reviewed annually
Audit Logging
- [ ] All PHI access events logged with user, timestamp, data accessed, and action
- [ ] All AI inference calls logged with model version, input metadata, and output destination
- [ ] Logs stored in tamper-evident storage (S3 Object Lock, WORM, or equivalent)
- [ ] Log retention period set to minimum six years
- [ ] Automated alerting configured for anomalous access patterns
Data Handling and AI-Specific Controls
- [ ] PHI boundary clearly defined in architecture documentation
- [ ] De-identification pipeline implemented and validated for training data
- [ ] Prompt logging either disabled or routed through PHI scrubbing
- [ ] Model outputs treated as PHI when derived from PHI inputs
- [ ] Prompt injection defenses implemented (input validation, output filtering)
- [ ] Model memorization risk assessed for any fine-tuned models
- [ ] Evaluation datasets de-identified or stored within PHI boundary
- [ ] Human review process defined for high-stakes clinical outputs
Organizational
- [ ] HIPAA Security Officer designated
- [ ] Risk analysis completed and documented
- [ ] Incident response plan includes AI-specific scenarios
- [ ] Workforce training completed (including AI-specific HIPAA training)
- [ ] Policies reviewed and updated within the last 12 months
Working with Existing Healthcare Systems
AI agents in healthcare rarely operate in isolation. They need to integrate with Electronic Health Record systems, clinical decision support tools, lab information systems, and other established infrastructure that has been in place for years. These integrations introduce their own set of HIPAA considerations that are worth addressing separately.
FHIR API access control. If your AI agent reads patient data through FHIR APIs, ensure it requests only the minimum necessary data elements (the HIPAA "minimum necessary" standard). An agent summarizing recent lab results does not need access to the patient's full history, demographic details, or billing information. EHR audit integration. Most EHR systems have their own audit logging. Your AI agent's audit logs should correlate with the EHR's logs so that a complete access trail can be reconstructed during an investigation. Include the EHR session ID or correlation ID in your agent's log entries. Clinical workflow integration.** As we discuss in our article on AI scribes for healthcare documentation, the way an AI agent fits into clinical workflows affects its compliance posture. An agent that operates asynchronously (processing data after the encounter) has a different risk profile than one that operates in real time during a patient visit. Both can be compliant, but they require different architectural approaches.For teams building AI governance programs that extend beyond HIPAA into broader regulatory compliance, our AI governance services cover the full spectrum of requirements. For a broader view of AI governance requirements, see our enterprise AI governance and compliance framework guide. And for organizations that need hands-on help building compliant AI systems, our AI development services include compliance architecture as a core component of every healthcare engagement.
We've built HIPAA-compliant AI systems for healthcare clients including clinical documentation and rehabilitation platforms. See our healthcare case studies or talk to us about your project.
BeyondScale Team
AI/ML Team
AI/ML Team at BeyondScale Technologies, an ISO 27001 certified AI consulting firm and AWS Partner. Specializing in enterprise AI agents, multi-agent systems, and cloud architecture.
