Vector database security is the infrastructure layer most RAG deployments skip. Teams invest in prompt filtering, output monitoring, and LLM access controls while the database storing their embedded intellectual property runs with default authentication, no audit logging, and minimal network segmentation. This guide covers specific configuration steps across the four major platforms: Pinecone, pgvector, Weaviate, and Qdrant.
For attack classes and CVEs (CVE-2025-64513 in Milvus, CVE-2024-41892 in Pinecone, PoisonedRAG), see our vector database security risk guide. This guide picks up where that one ends: you understand the threats, here is how to configure your way out of them.
Key Takeaways
- Embeddings are invertible. Cornell's Vec2Text research (EMNLP 2023) showed 92% of input text can be reconstructed from embeddings alone. Every unauthorized exposure of your vector store is a potential data breach, not just an access control failure.
- pgvector supports row-level security at the PostgreSQL layer. This lets you enforce tenant isolation at the database itself rather than in application code, where it is easier to bypass.
- Pinecone's six-role RBAC model separates control plane operations (index management) from data plane operations (reads and writes). Most deployments use a single API key for both, which means a compromised retrieval service has write access to the knowledge base.
- Weaviate's production-ready RBAC arrived in v1.29.0. Deployments on earlier versions have only two roles: Admin and Read-Only. Many enterprise deployments are still on older versions.
- Qdrant ships insecure by default. Its own documentation states: "By default, all self-deployed Qdrant instances are not secure." JWT-based access control, available since v1.9.0, scopes tokens to specific collections.
- Customer-managed encryption keys (CMEK) give enterprises cryptographic control: revoking the key in your KMS immediately renders all stored embeddings inaccessible and is required for HIPAA and FedRAMP compliance.
- EU AI Act Article 12 requires high-risk AI systems to maintain automatic event logs. Vector database query logs are part of this obligation for systems making consequential decisions.
- Embedding API keys (OpenAI, Cohere, Voyage AI) frequently end up in version control, CI/CD logs, and application configs. A compromised embedding key gives an attacker the ability to generate and inject vectors into your knowledge base.
Why Vector Databases Require Infrastructure-Level Security
Most enterprise teams apply security controls at the application layer: the LLM API call is authenticated, retrieved context is filtered, and output is monitored. The vector database sits behind this boundary, treated as a trusted internal service. In practice, "trusted internal service" frequently means no authentication, no audit logging, and network isolation limited to VPC placement.
When the application tier is compromised, everything in the vector store is immediately accessible. The stakes are higher than they appear. Your vector database holds embedded representations of every document your RAG pipeline has indexed: contracts, source code, clinical notes, customer support tickets, financial records, internal wikis. These documents look like numeric arrays in storage, but they are not anonymized.
The Vec2Text method from Morris, Kuleshov, Shmatikov, and Rush at Cornell reconstructs 92% of 32-token input texts exactly from their embeddings using only the embedding API. No model weights required. The 2025 follow-on (arXiv:2504.16609) reported cosine similarity of 89.4 and an LLM Judge leakage score of 92.0 against black-box encoders. A zero-shot variant published in April 2025 (arXiv:2504.00147) achieved meaningful inversion without any training on the target model at all.
A second structural risk compounds this. When your ingestion pipeline pulls documents from Confluence, SharePoint, or an internal wiki, it strips the source document's permission metadata. The original access controls exist in the source system. Your vector database almost certainly has no equivalent controls applied at the document level. Anyone who can query the vector store can retrieve context from documents they should not have accessed.
Infrastructure-level security addresses both risks through access control at the database layer rather than only the application, encryption with customer-managed keys, and audit logging that creates a persistent record for compliance and incident response.
Internal resource: RAG Security and Data Poisoning Guide covers ingestion-layer attack vectors in detail.
Access Control: Per-Platform Configuration
Pinecone: Separating Control Plane from Data Plane
Pinecone's RBAC model separates access into two planes with three roles each:
Control plane (index and infrastructure management):
- Owner: full organizational access including billing and user management
- Admin: create and delete indexes, manage API keys, configure project settings
- Billing Admin: manage billing settings only
- Data Owner: read, write, and delete vectors within assigned indexes
- Data Editor: read and write, no delete permissions
- Data Viewer: read-only access to index records
Recommended service account pattern:
Ingestion service → Data Editor key, scoped to ingestion index only
Retrieval service → Data Viewer key, read-only on retrieval index
Admin operations → Owner key in secrets manager, not in application config
Key rotation → Quarterly via Pinecone API, automated through secrets manager
On namespace isolation: Pinecone namespaces are logical partitions within a single index, not security boundaries. A compromised or overly scoped API key with index-level access reaches all namespaces in that index. For security-sensitive multi-tenant deployments, create one index per tenant with a dedicated scoped key. This costs more but provides actual isolation.
pgvector: Row-Level Security for Multi-Tenant RAG
pgvector runs inside PostgreSQL, which means you can apply PostgreSQL's native row-level security directly to your embeddings table. RLS policies enforce tenant isolation at the database engine, not in application code, and they apply to all query types including vector similarity searches.
Basic multi-tenant RLS pattern:
-- Step 1: Add tenant identifier to your embeddings table
ALTER TABLE document_embeddings
ADD COLUMN tenant_id UUID NOT NULL;
-- Step 2: Enable RLS on the table
ALTER TABLE document_embeddings ENABLE ROW LEVEL SECURITY;
-- Step 3: Create a policy that restricts access to the current tenant
CREATE POLICY tenant_isolation ON document_embeddings
USING (tenant_id = current_setting('app.current_tenant')::uuid);
-- Step 4: Create separate roles for ingestion and retrieval
CREATE ROLE ingestion_service LOGIN PASSWORD 'strong-password';
CREATE ROLE retrieval_service LOGIN PASSWORD 'strong-password';
GRANT INSERT, UPDATE ON document_embeddings TO ingestion_service;
GRANT SELECT ON document_embeddings TO retrieval_service;
-- Step 5: Application sets tenant context before any query
SET app.current_tenant = '550e8400-e29b-41d4-a716-446655440000';
-- Step 6: Vector similarity searches now automatically filter to tenant rows
SELECT id, content, embedding <=> $1 AS distance
FROM document_embeddings
ORDER BY distance
LIMIT 10;
The RLS policy applies before any data is returned, including the similarity computation. A retrieval_service role without BYPASSRLS attribute cannot access rows outside the currently active tenant, regardless of how the query is constructed.
Additional configuration for production pgvector deployments:
- Enable
pgauditfor structured audit logging of all SELECT and DML operations on embedding tables - Enforce TLS by setting
ssl = oninpostgresql.confand replacinghostentries inpg_hba.confwithhostssl - Set
log_min_duration_statement = 1000to log slow queries and monitor for bulk extraction patterns - Do not grant retrieval roles
DELETEpermissions or schema-levelCREATErights
Weaviate: RBAC and Shard-Level Multi-Tenant Isolation
Weaviate v1.29.0 introduced fine-grained RBAC at the collection level. Earlier versions provided only two roles: Admin and Read-Only. Check your version before assuming you have granular access control.
Enable RBAC and define collection-scoped roles in your Weaviate configuration:
authentication:
apikey:
enabled: true
allowed_keys:
- roles: ["ingestion-service"]
key: "${WEAVIATE_INGESTION_KEY}"
- roles: ["retrieval-service"]
key: "${WEAVIATE_RETRIEVAL_KEY}"
authorization:
rbac:
enabled: true
roles:
- name: ingestion-service
permissions:
- data: create
collection: "CustomerDocuments"
- name: retrieval-service
permissions:
- data: read
collection: "CustomerDocuments"
For stronger isolation in multi-tenant environments, Weaviate's multi-tenancy feature isolates each tenant's data into a separate physical shard. This goes beyond RBAC: data in one shard is not accessible to queries scoped to a different tenant at the storage layer.
Enable multi-tenancy at collection creation (it cannot be added to an existing collection):
import weaviate.classes as wvc
client.collections.create(
"CustomerDocuments",
multi_tenancy_config=wvc.config.Configure.multi_tenancy(enabled=True)
)
# Add a tenant
collection = client.collections.get("CustomerDocuments")
collection.tenants.create(wvc.tenants.Tenant(name="tenant_acme"))
# Scope queries to a specific tenant
tenant_collection = collection.with_tenant("tenant_acme")
results = tenant_collection.query.near_text(query="contract renewal terms", limit=5)
Each tenant's data occupies a separate shard. Weaviate supports 50,000+ active shards per node and 1M concurrently active tenants across a cluster, according to published architecture documentation. Inactive tenants are automatically offloaded from memory while their data remains persisted. From a security standpoint, a compromised retrieval service can only access the shards assigned to its tenant.
Qdrant: JWT-Based Collection-Scoped Tokens
Qdrant's documentation is explicit: the default configuration binds to all network interfaces with no authentication. Address both before deploying to any non-local environment.
Minimal secure configuration in config.yaml:
service:
api_key: "${QDRANT_ADMIN_KEY}"
read_only_api_key: "${QDRANT_READ_KEY}"
tls:
cert: /etc/qdrant/certs/server.crt
key: /etc/qdrant/certs/server.key
ca_cert: /etc/qdrant/certs/ca.crt
For multi-tenant deployments, JWT-based access control (v1.9.0+) scopes tokens to specific collections with specific permissions. This is more granular than static API keys:
import jwt
import datetime
# Create a read-only token scoped to a single tenant's collection
payload = {
"exp": datetime.datetime.utcnow() + datetime.timedelta(hours=24),
"access": [
{
"collection": "tenant_acme_documents",
"access": "r"
}
]
}
token = jwt.encode(payload, qdrant_jwt_secret, algorithm="HS256")
A retrieval service for tenant A receives a JWT scoped to tenant_acme_documents with read access. It cannot query tenant B's collection regardless of how the request is constructed. The token scope is enforced server-side at Qdrant.
Encryption: At Rest, In Transit, and Application Layer
Customer-Managed Encryption Keys
Standard at-rest encryption protects data from physical media theft. CMEK extends that model: your cloud provider's KMS holds the encryption keys, not the vector database vendor.
The architecture works as follows. Your KMS key wraps an Encryption Zone Key (EZK) unique to your database cluster. Every read and write operation requires a call to your KMS to unwrap the EZK before proceeding. If you disable or delete the KMS key, your cluster becomes cryptographically inaccessible within seconds. Every key access event appears in your cloud provider's audit trail: AWS CloudTrail, GCP Cloud Audit Logs, or Azure Monitor.
For compliance: HIPAA requires that covered entities maintain exclusive control over encryption keys for PHI. PCI-DSS v4.0 requires key management procedures that ensure only authorized parties hold decryption keys. FedRAMP requires FIPS 140-2 validated cryptographic modules and full key lifecycle management. CMEK satisfies all three when implemented with a compliant KMS.
Pinecone offers CMEK on Enterprise plans. Zilliz Cloud (managed Milvus) supports CMEK on dedicated Business-Critical clusters. Weaviate Cloud supports BYOK on enterprise tiers. For self-hosted deployments, apply full-disk encryption at the infrastructure layer using cloud provider-managed keys or customer-managed KMS integration.
TLS Enforcement
All vector database traffic should use TLS 1.2 at minimum, with TLS 1.3 preferred for new deployments. Without TLS, API keys and query content traverse the network in plaintext.
For pgvector, enforce TLS in pg_hba.conf:
# Allow only TLS connections from application subnets
hostssl all all 10.0.0.0/8 scram-sha-256
# Block non-TLS connections from all external hosts
host all all 0.0.0.0/0 reject
For Qdrant, TLS is not enabled by default. The tls block in config.yaml must be explicitly configured and the verify_https_client_certificate option set according to your mTLS requirements.
For managed services (Pinecone Cloud, Weaviate Cloud, Qdrant Cloud), TLS is provided by default. Verify by examining the client connection configuration to confirm no fallback to non-TLS connections is possible.
Embedding API Key Security
Embedding API keys generate the vectors that populate your knowledge base. A compromised embedding key does not just expose data: it gives an attacker the ability to generate and inject arbitrary vectors into your knowledge base, which is the entry point for knowledge base poisoning attacks.
Common exposure patterns: keys hardcoded in application configs that are checked into version control, keys in CI/CD pipeline environment variables that appear in build logs, and single keys shared across development, staging, and production environments.
Controls to implement:
- Store embedding API keys in a secrets manager (AWS Secrets Manager, HashiCorp Vault, GCP Secret Manager). Retrieve at runtime, not at build time.
- Scope embedding keys to the minimum required permissions. For most providers, a key used only for embedding generation does not need access to fine-tuning, model management, or billing APIs.
- Create separate keys for ingestion pipelines and any other services that might call the embedding API for other purposes.
- Rotate embedding API keys on the same quarterly schedule as your vector database credentials.
- Monitor embedding API usage for volume anomalies. A sudden spike in embedding generation requests from a service account that normally handles retrieval only is a signal worth investigating.
Audit Logging and Compliance
What to Log
Vector database audit logs should capture four event categories:
Query events: timestamp, service account or user identifier, collection or index queried, query vector hash (not the raw vector), number of results returned, latency.
Ingestion events: document source URL or identifier, tenant assignment, schema version, ingestion service identity, timestamp.
Authentication events: successful and failed authentication attempts, API key identifier (never the key value itself), source IP, timestamp.
Administrative events: collection creation or deletion, RBAC changes, API key creation or rotation, TLS configuration changes.
For privacy-sensitive deployments, log a hash of the query text rather than the raw text itself. This preserves forensic value for pattern analysis while reducing risk from log exposure.
EU AI Act Article 12
The EU AI Act, effective August 2024, applies to high-risk AI systems including those making consequential decisions in employment, credit, healthcare, and safety-critical infrastructure. Article 12 requires that high-risk systems are designed and developed with capabilities for automatic logging of events throughout their operational lifetime.
For RAG systems contributing to high-risk decisions, this means vector database query logs must be:
- Retained for the period specified in the system's technical documentation, often 5 to 10 years for regulated sectors
- Covering each use of the system: what was retrieved, when, by which principal, and in response to which query
- Available to competent national authorities on request
SIEM Integration for Exfiltration Detection
Vector database query logs provide the signal for detecting active exfiltration. Detection rules to configure in your SIEM:
- High result-set volume: legitimate RAG retrieval returns 3 to 10 documents per query. A service account returning 100+ results per query consistently is anomalous.
- High query frequency: bulk extraction will appear as a spike in query rate from a single service account or source IP.
- Off-hours activity: queries arriving outside normal application operating hours from a production service account warrant investigation.
- Cross-collection access: a retrieval service that normally queries one collection querying multiple collections in rapid succession is a signal for lateral movement within the vector store.
- RBAC modification events: any change to roles, API key permissions, or network access rules outside an approved change window should generate an alert.
Network Isolation
VPC Placement and Private Endpoints
Vector databases should not be reachable from the public internet. The baseline network posture:
- Deploy self-hosted vector databases within a VPC with no public IP assignment
- For managed services, use cloud-provider private connectivity options:
- Remove or block any load balancer rules that expose vector database ports (Qdrant default: 6333, Weaviate default: 8080, Chroma default: 8000) to internet-facing subnets
Segmenting Application Tiers
Network segmentation within the application stack reduces blast radius:
- The ingestion service reaches the vector database on write-capable endpoints. It has no access to the LLM inference layer.
- The retrieval service reaches the vector database on read-only endpoints. It cannot issue write requests regardless of what the application layer instructs.
- The LLM inference layer receives retrieved context from the retrieval service. It has no direct connection to the vector database.
See also: AWS Bedrock Security: Enterprise Configuration Guide for cloud-native AI stack network segmentation patterns.
10-Point Vector Database Hardening Checklist
Apply these controls to any RAG deployment before promoting to production:
Authentication
api_key in config.yaml. Weaviate: enable the API key authentication module. pgvector: replace all trust entries in pg_hba.conf with scram-sha-256.Authorization
Encryption
tls block in config.yaml. pgvector: use hostssl rules in pg_hba.conf. Verify no plaintext fallback is possible in client libraries.Monitoring
Network
Conclusion
Your vector database deserves the same security controls you apply to your primary database. The threat model is at least as severe: the data is sensitive, the default configurations are permissive, and the attack surface includes both the database itself and the embedding API that populates it.
The specific steps differ by platform, but the principles are consistent across all four covered here. Authenticate every connection. Scope every permission to the minimum required. Encrypt at rest and in transit, with customer-managed keys where compliance demands it. Log every query. Isolate the database at the network layer before any production traffic reaches it.
For teams building or auditing RAG infrastructure, a BeyondScale AI security assessment validates your vector database configuration against these controls and identifies gaps across the full stack: ingestion pipelines, retrieval architecture, LLM guardrails, and output monitoring. You can also run a Securetom scan to identify exposed vector database endpoints and misconfigured access controls in your current deployment.
References and further reading:
AI Security Audit Checklist
A 30-point checklist covering LLM vulnerabilities, model supply chain risks, data pipeline security, and compliance gaps. Used by our team during actual client engagements.
We will send it to your inbox. No spam.
BeyondScale Team
AI Security Team, BeyondScale Technologies
Security researcher and engineer at BeyondScale Technologies, an ISO 27001 certified AI cybersecurity firm.
Want to know your AI security posture? Run a free Securetom scan in 60 seconds.
Start Free Scan

