OCI Dedicated Region Cloud@Customer: Architecture, Deployment Patterns, and Terraform Automation

Most cloud conversations start with a simple assumption: your workloads go to the cloud provider’s data center. For a large number of organizations, particularly in government, financial services, healthcare, and defense, that assumption is the exact problem. Data sovereignty laws, regulatory requirements, and security classification levels mean that certain workloads cannot leave a specific physical location, full stop.

OCI Dedicated Region Cloud@Customer, commonly referred to as DRCC, solves this without forcing a compromise. Oracle deploys a full OCI region — not a subset of services, not a gateway appliance, but a complete cloud region with the same hardware, software stack, APIs, and SLAs — inside your own data center. You get every OCI service you would use in a public region, with the control plane managed by Oracle and the physical infrastructure sitting on your floor.

In this post I will cover how DRCC is architected, how it differs from OCI Exadata Cloud@Customer and Roving Edge, the networking requirements, IAM federation considerations, and how to automate workload deployment using Terraform once the region is live.

What DRCC Actually Delivers

The distinction between DRCC and other on-premises cloud appliances matters technically. Most cloud-at-customer offerings give you a subset of services through a dedicated appliance: a handful of compute shapes, object storage, and maybe a managed database. DRCC is architecturally different.

Oracle physically ships and installs the same rack infrastructure used in public OCI regions into your facility. The region runs the same OCI control plane software, exposes the same REST APIs, and integrates with OCI IAM and Oracle Cloud Console using the same tooling. When you run a Terraform plan against a DRCC region, the provider configuration is identical to a public region. You change the region identifier in your config and the code works without modification.

The full service catalog available in DRCC includes Compute (including bare metal and GPU shapes), OKE (Oracle Kubernetes Engine), Autonomous Database, Exadata Database Service, Object Storage, Block Volumes, File Storage, VCN, Load Balancer, API Gateway, Functions, Streaming, OCI Vault, Identity and Access Management, Monitoring, Logging, Events, and Notifications. This is not a stripped-down subset — it is the complete stack.

Hardware minimum footprint starts at a base rack configuration that supports a production workload. Oracle handles all hardware maintenance, software patching, and control plane operations. Your team manages what runs on top: compartments, IAM policies, networking, and workloads.

How DRCC Differs from Related Oracle Offerings

Before going further it is worth clarifying where DRCC sits relative to two commonly confused offerings.

OCI Exadata Cloud@Customer deploys Exadata Database Service hardware into your data center. It is a database-specific offering. You get Autonomous Database and Exadata Database Service on-premises, but not the broader OCI service catalog. If you need compute, containers, serverless, and object storage alongside the database layer, Exadata Cloud@Customer alone does not cover it.

OCI Roving Edge Infrastructure is a ruggedized portable device designed for disconnected or intermittently connected environments: ships, remote field operations, military forward deployments. It runs a subset of OCI services and is designed to operate without a persistent connection to the OCI control plane. DRCC requires a reliable network connection back to Oracle for control plane operations and is designed for fixed, well-connected facilities.

DRCC is the right choice when you need the full OCI service catalog, the workloads must stay on-premises for regulatory or sovereignty reasons, and you have a proper data center with the power, cooling, and network capacity to host the infrastructure.

Network Architecture Requirements

DRCC has specific network requirements that you need to understand before the hardware arrives. Getting these wrong means the region cannot operate.

The DRCC racks need connectivity on three planes: the management network, the customer data network, and the Oracle back-channel.

The management network connects Oracle’s control plane software running inside your facility to Oracle’s global control plane over the internet or a dedicated circuit. Oracle uses this path for software updates, monitoring, and operational management of the region. This connection is outbound-initiated from the DRCC hardware, encrypted with TLS, and authenticated with certificates. Oracle publishes the specific IP ranges that need to be permitted through your firewall. You do not control what flows over this channel, but Oracle’s contractual commitments define exactly what does.

The customer data network connects your existing on-premises infrastructure to the DRCC region. This is a standard 25G or 100G ethernet connection depending on the rack configuration. You configure VCN peering or FastConnect-equivalent local connections to bridge your existing network into the DRCC VCN.

Here is how you configure a VCN in DRCC using Terraform, which is identical to a public region:

terraform {
required_providers {
oci = {
source = "oracle/oci"
version = ">= 5.0.0"
}
}
}
provider "oci" {
tenancy_ocid = var.tenancy_ocid
user_ocid = var.user_ocid
fingerprint = var.fingerprint
private_key_path = var.private_key_path
# This is your DRCC region identifier
# Oracle assigns this during provisioning, format: us-yourdatacenter-1
region = var.drcc_region
}
resource "oci_core_vcn" "drcc_primary_vcn" {
compartment_id = var.compartment_id
cidr_blocks = ["10.100.0.0/16"]
display_name = "drcc-primary-vcn"
dns_label = "drccprimary"
}
# Application tier subnet - private
resource "oci_core_subnet" "app_subnet" {
compartment_id = var.compartment_id
vcn_id = oci_core_vcn.drcc_primary_vcn.id
cidr_block = "10.100.1.0/24"
display_name = "app-private-subnet"
dns_label = "apppriv"
prohibit_public_ip_on_vnic = true
route_table_id = oci_core_route_table.private_rt.id
security_list_ids = [oci_core_security_list.app_sl.id]
}
# Database tier subnet - private
resource "oci_core_subnet" "db_subnet" {
compartment_id = var.compartment_id
vcn_id = oci_core_vcn.drcc_primary_vcn.id
cidr_block = "10.100.2.0/24"
display_name = "db-private-subnet"
dns_label = "dbpriv"
prohibit_public_ip_on_vnic = true
route_table_id = oci_core_route_table.private_rt.id
security_list_ids = [oci_core_security_list.db_sl.id]
}
# Local Peering Gateway to connect DRCC VCN to your on-premises network
resource "oci_core_local_peering_gateway" "onprem_lpg" {
compartment_id = var.compartment_id
vcn_id = oci_core_vcn.drcc_primary_vcn.id
display_name = "onprem-peering-gateway"
}

The Local Peering Gateway in DRCC context connects the DRCC VCN to your on-premises routed network via the physical data network. This gives your existing on-premises workloads direct, low-latency access to everything running in the DRCC region without traffic ever leaving your facility.

IAM Federation in a DRCC Deployment

DRCC shares the OCI IAM control plane with the public region associated with your tenancy. This has important implications for how you manage identities.

Your DRCC region is part of your existing OCI tenancy. Users, groups, and dynamic groups created in OCI IAM apply to DRCC resources the same way they apply to public region resources. If you already federate OCI IAM with your corporate identity provider (Active Directory, Okta, Azure AD), those federated identities work in DRCC without additional configuration.

Here is the IAM federation configuration for Active Directory using SAML:

# Identity Provider configuration for AD FS
resource "oci_identity_identity_provider" "ad_federation" {
compartment_id = var.tenancy_ocid
name = "corporate-adfs"
description = "Corporate Active Directory Federation Services"
product_type = "ADFS"
protocol = "SAML2"
metadata = file("${path.module}/adfs-metadata.xml")
freeform_tags = {
Environment = "production"
ManagedBy = "terraform"
}
}
# Map AD group to OCI group for DRCC operations team
resource "oci_identity_idp_group_mapping" "drcc_admins_mapping" {
idp_id = oci_identity_identity_provider.ad_federation.id
idp_group_name = "CN=DRCC-Admins,OU=CloudTeams,DC=corp,DC=example,DC=com"
group_id = oci_identity_group.drcc_admins.id
}
resource "oci_identity_group" "drcc_admins" {
compartment_id = var.tenancy_ocid
name = "drcc-platform-admins"
description = "DRCC platform administration team"
}
# Compartment structure for DRCC workload isolation
resource "oci_identity_compartment" "drcc_root" {
compartment_id = var.tenancy_ocid
name = "drcc-production"
description = "Root compartment for all DRCC production workloads"
}
resource "oci_identity_compartment" "drcc_networking" {
compartment_id = oci_identity_compartment.drcc_root.id
name = "drcc-networking"
description = "Networking resources for DRCC region"
}
resource "oci_identity_compartment" "drcc_workloads" {
compartment_id = oci_identity_compartment.drcc_root.id
name = "drcc-workloads"
description = "Application workloads running in DRCC"
}
# Least-privilege policy for DRCC admins
resource "oci_identity_policy" "drcc_admin_policy" {
compartment_id = oci_identity_compartment.drcc_root.id
name = "drcc-admin-policy"
description = "Platform admin permissions scoped to DRCC compartment"
statements = [
"Allow group drcc-platform-admins to manage all-resources in compartment drcc-production",
"Allow group drcc-platform-admins to read all-resources in tenancy where request.region = '${var.drcc_region}'",
"Allow group drcc-platform-admins to manage virtual-network-family in compartment drcc-production:drcc-networking",
"Allow group drcc-platform-admins to manage instance-family in compartment drcc-production:drcc-workloads",
"Allow group drcc-platform-admins to manage autonomous-database-family in compartment drcc-production:drcc-workloads"
]
}

One critical IAM behavior specific to DRCC: you can write IAM policies that restrict actions to your DRCC region using the request.region condition. This means a group can have full admin rights in DRCC but zero access to your public OCI regions, or vice versa. For organizations with strict separation between on-premises and cloud teams, this is an important control.

Deploying OKE on DRCC

OKE on DRCC runs the same as OKE in a public region. The control plane components run inside the DRCC rack. The API server endpoint is reachable from within your data center network without any traffic leaving the facility.

resource "oci_containerengine_cluster" "drcc_cluster" {
compartment_id = oci_identity_compartment.drcc_workloads.id
kubernetes_version = "v1.29.1"
name = "drcc-production-cluster"
vcn_id = oci_core_vcn.drcc_primary_vcn.id
endpoint_config {
is_public_ip_enabled = false
subnet_id = oci_core_subnet.app_subnet.id
}
options {
service_lb_subnet_ids = [oci_core_subnet.app_subnet.id]
kubernetes_network_config {
pods_cidr = "10.244.0.0/16"
services_cidr = "10.96.0.0/16"
}
add_ons {
is_kubernetes_dashboard_enabled = false
is_tiller_enabled = false
}
}
}
resource "oci_containerengine_node_pool" "drcc_workers" {
cluster_id = oci_containerengine_cluster.drcc_cluster.id
compartment_id = oci_identity_compartment.drcc_workloads.id
kubernetes_version = "v1.29.1"
name = "drcc-worker-pool"
node_config_details {
size = 3
placement_configs {
availability_domain = data.oci_identity_availability_domains.drcc_ads.availability_domains[0].name
subnet_id = oci_core_subnet.app_subnet.id
}
}
node_shape = "VM.Standard3.Flex"
node_shape_config {
memory_in_gbs = 64
ocpus = 8
}
node_source_details {
image_id = data.oci_core_images.ol8_image.images[0].id
source_type = "IMAGE"
boot_volume_size_in_gbs = 100
}
initial_node_labels {
key = "workload-tier"
value = "application"
}
}

The is_public_ip_enabled = false on the endpoint config is non-negotiable in a DRCC context. The API server should only be reachable from within your data center network. Any tooling that manages the cluster (Argo CD, Flux, CI pipelines) connects to the internal endpoint directly.

Deploying Autonomous Database on DRCC

Autonomous Database on DRCC is identical in API and behavior to the public region version. The database runs entirely within your facility.

resource "oci_database_autonomous_database" "drcc_adb" {
compartment_id = oci_identity_compartment.drcc_workloads.id
db_name = "DRCCPROD"
display_name = "drcc-production-adb"
db_workload = "OLTP"
cpu_core_count = 4
data_storage_size_in_tbs = 2
admin_password = var.adb_admin_password
is_auto_scaling_enabled = true
is_dedicated = false
# Private endpoint configuration - no public access
subnet_id = oci_core_subnet.db_subnet.id
private_endpoint_label = "drccprodadb"
is_access_control_enabled = true
whitelisted_ips = [
oci_core_subnet.app_subnet.id
]
defined_tags = {
"Operations.Environment" = "production"
"Operations.Region" = "drcc"
"Operations.ManagedBy" = "terraform"
}
}

The subnet_id and private_endpoint_label fields configure the database with a private endpoint inside the db subnet. Only resources in the whitelisted subnets can connect. No public endpoint is created.

Security Baseline for DRCC Deployments

DRCC gives you physical control over the hardware, but that does not mean you can skip the standard OCI security baseline. The software layer still requires proper configuration.

Enable Cloud Guard at the tenancy level scoped to your DRCC compartments:

resource "oci_cloud_guard_cloud_guard_configuration" "drcc_cloud_guard" {
compartment_id = var.tenancy_ocid
reporting_region = var.drcc_region
status = "ENABLED"
}
resource "oci_cloud_guard_target" "drcc_target" {
compartment_id = oci_identity_compartment.drcc_root.id
display_name = "drcc-production-target"
target_resource_id = oci_identity_compartment.drcc_root.id
target_resource_type = "COMPARTMENT"
target_detector_recipes {
detector_recipe_id = data.oci_cloud_guard_detector_recipes.config_recipe.detector_recipe_collection[0].items[0].id
}
target_responder_recipes {
responder_recipe_id = data.oci_cloud_guard_responder_recipes.oci_responder.responder_recipe_collection[0].items[0].id
}
}

Enable Vault for all secrets, keys, and credentials used by workloads running in DRCC. Because the Vault service runs inside the rack, key material never leaves your facility:

resource "oci_kms_vault" "drcc_vault" {
compartment_id = oci_identity_compartment.drcc_workloads.id
display_name = "drcc-workloads-vault"
vault_type = "VIRTUAL_PRIVATE"
}
resource "oci_kms_key" "drcc_master_key" {
compartment_id = oci_identity_compartment.drcc_workloads.id
display_name = "drcc-master-encryption-key"
management_endpoint = oci_kms_vault.drcc_vault.management_endpoint
key_shape {
algorithm = "AES"
length = 32
}
protection_mode = "HSM"
}

The VIRTUAL_PRIVATE vault type and HSM protection mode ensure the key material is stored in the hardware security module inside the DRCC rack. Combined with the fact that the rack is physically in your data center, you have full chain-of-custody over the cryptographic material protecting your data.

Operational Considerations

A few things that are specific to operating DRCC that do not come up when working with public regions.

Oracle is responsible for hardware maintenance and software patching of the control plane. You receive advance notification of maintenance windows. During a control plane maintenance window, the management APIs may be briefly unavailable, but running workloads continue without interruption. Plan your deployment pipelines to account for these windows.

Capacity planning is different from the public cloud. In a public region, you scale up by requesting more resources and the cloud absorbs the demand. In DRCC, you have a fixed hardware footprint. If you need to scale beyond the initial rack configuration, you work with Oracle to add capacity. Build capacity planning reviews into your quarterly operations cycle and monitor resource utilization with OCI Monitoring the same way you would in a public region.

The Oracle back-channel for management operations needs to be permanently open. If your network team applies a firewall rule that blocks this traffic, the control plane loses contact with Oracle and becomes degraded. Work with Oracle to get the exact IP ranges and port requirements before go-live and document them clearly in your firewall change management process.

When DRCC Is the Right Choice

DRCC makes sense when at least one of these conditions is true: your regulatory framework requires data residency within a specific physical location you control, your security classification means workloads cannot traverse public internet infrastructure at any point, your latency requirements for database and application tiers demand co-location in your own facility, or you have existing on-premises infrastructure that needs tight integration with cloud services without egress cost or latency overhead.

It is not the right choice for organizations that want cloud economics without data center investment, for workloads with highly variable capacity requirements that would benefit from elastic public cloud scaling, or for teams that want to avoid the operational overhead of maintaining physical infrastructure.

For those who do meet the criteria, DRCC is one of the more complete sovereign cloud offerings on the market. The fact that the APIs and tooling are identical to the public cloud means your engineers do not need to learn a second system, your Terraform code travels unchanged, and your OKE workloads run without modification.

Regards,

Osama

Cross-Cloud Secret Synchronization: AWS Secrets Manager and OCI Vault in a Production Multi-Cloud Setup

One of the most overlooked problems in multi-cloud environments is secrets management across providers. Teams usually solve it badly: they store the same secret in both clouds manually, forget to rotate one of them, and find out during an outage that the credentials have been out of sync for three months.

In this post I will walk through building an automated secrets synchronization pipeline between AWS Secrets Manager and OCI Vault. When a secret rotates in AWS, the pipeline detects the rotation event, retrieves the new value, and pushes it into OCI Vault automatically. Everything is built with Terraform, an AWS Lambda function, and OCI IAM. No manual steps after the initial deployment.

This is a pattern I have used in environments where the database layer runs on OCI (leveraging Oracle Database pricing and performance) while the application layer runs on AWS. Both sides need the same database credentials, and both sides need to stay in sync without human intervention.

Architecture

The flow works like this:

AWS Secrets Manager rotation event fires via EventBridge, which triggers a Lambda function. The Lambda retrieves the new secret value, authenticates to OCI using an API key stored in its own environment (not hardcoded), and calls the OCI Vault API to update the corresponding secret version. OCI Vault stores the new value and makes it available to workloads running in OCI.

Prerequisites

Before starting you need:

  • AWS account with permissions to manage Secrets Manager, Lambda, EventBridge, and IAM
  • OCI tenancy with permissions to manage Vault, Keys, and IAM policies
  • Terraform 1.5 or later
  • Python 3.11 for the Lambda function
  • An existing OCI Vault and master encryption key (or we will create one)

Step 1: OCI Vault and IAM Setup

Start with OCI. We need a Vault, a master key, and an IAM user whose API key the Lambda will use to authenticate.

hcl

# OCI Vault
resource "oci_kms_vault" "app_vault" {
compartment_id = var.compartment_id
display_name = "multi-cloud-secrets-vault"
vault_type = "DEFAULT"
}
# Master Encryption Key inside the Vault
resource "oci_kms_key" "secrets_key" {
compartment_id = var.compartment_id
display_name = "secrets-master-key"
management_endpoint = oci_kms_vault.app_vault.management_endpoint
key_shape {
algorithm = "AES"
length = 32
}
}
# IAM user for cross-cloud access
resource "oci_identity_user" "sync_user" {
compartment_id = var.tenancy_ocid
name = "aws-secrets-sync-user"
description = "Service user for AWS Lambda to push secrets into OCI Vault"
email = "sync-user@internal.example.com"
}
# API key for the sync user (you will generate the actual key pair separately)
resource "oci_identity_api_key" "sync_user_key" {
user_id = oci_identity_user.sync_user.id
key_value = var.oci_sync_user_public_key_pem
}
# IAM group for the sync user
resource "oci_identity_group" "sync_group" {
compartment_id = var.tenancy_ocid
name = "secrets-sync-group"
description = "Group for cross-cloud secrets sync service users"
}
resource "oci_identity_user_group_membership" "sync_membership" {
group_id = oci_identity_group.sync_group.id
user_id = oci_identity_user.sync_user.id
}
# Minimal IAM policy - only what is needed, nothing more
resource "oci_identity_policy" "sync_policy" {
compartment_id = var.compartment_id
name = "secrets-sync-policy"
description = "Allows sync user to manage secrets in the app vault only"
statements = [
"Allow group secrets-sync-group to manage secret-family in compartment id ${var.compartment_id} where target.vault.id = '${oci_kms_vault.app_vault.id}'",
"Allow group secrets-sync-group to use keys in compartment id ${var.compartment_id} where target.key.id = '${oci_kms_key.secrets_key.id}'"
]
}

The policy scope is intentionally narrow. The sync user can only manage secrets inside this specific vault and can only use this specific key. If the AWS Lambda credentials are ever compromised, the blast radius is limited to this vault.

Step 2: Create the Initial Secret in OCI Vault

We need a secret placeholder in OCI Vault that the Lambda will update. The initial value does not matter since it will be overwritten on the first sync.

hcl

resource "oci_vault_secret" "db_password" {
compartment_id = var.compartment_id
vault_id = oci_kms_vault.app_vault.id
key_id = oci_kms_key.secrets_key.id
secret_name = "prod-db-password"
secret_content {
content_type = "BASE64"
content = base64encode("initial-placeholder-value")
name = "v1"
stage = "CURRENT"
}
metadata = {
source = "aws-secrets-manager"
aws_secret = "prod/database/password"
environment = "production"
}
}

Step 3: AWS Secrets Manager and the Source Secret

On the AWS side, create the authoritative secret and enable automatic rotation.

hcl

resource "aws_secretsmanager_secret" "db_password" {
name = "prod/database/password"
description = "Production database password - synced to OCI Vault"
recovery_window_in_days = 7
tags = {
Environment = "production"
SyncTarget = "oci-vault"
OciSecretName = "prod-db-password"
}
}
resource "aws_secretsmanager_secret_version" "db_password_v1" {
secret_id = aws_secretsmanager_secret.db_password.id
secret_string = jsonencode({
username = "db_admin",
password = var.initial_db_password,
host = var.db_host,
port = 1521,
database = "PRODDB"
})
}
# Rotation configuration - rotate every 30 days
resource "aws_secretsmanager_secret_rotation" "db_password_rotation" {
secret_id = aws_secretsmanager_secret.db_password.id
rotation_lambda_arn = aws_lambda_function.db_rotation_lambda.arn
rotation_rules {
automatically_after_days = 30
}
}

Step 4: Store OCI Credentials in AWS Secrets Manager

The Lambda needs OCI API credentials to authenticate. Store them as a secret in AWS Secrets Manager so they never appear in Lambda environment variables in plaintext.

hcl

resource "aws_secretsmanager_secret" "oci_credentials" {
name = "internal/oci-sync-credentials"
description = "OCI API key credentials for secrets sync Lambda"
tags = {
Environment = "production"
Purpose = "cross-cloud-sync"
}
}
resource "aws_secretsmanager_secret_version" "oci_credentials_v1" {
secret_id = aws_secretsmanager_secret.oci_credentials.id
secret_string = jsonencode({
tenancy_ocid = var.oci_tenancy_ocid,
user_ocid = var.oci_sync_user_ocid,
fingerprint = var.oci_api_key_fingerprint,
private_key = var.oci_private_key_pem,
region = var.oci_region
})
}

Step 5: The Lambda Function

This is the core of the pipeline. The Lambda retrieves the rotated secret from AWS Secrets Manager, loads OCI credentials from its own secrets store, and calls the OCI Vault API to create a new secret version.

python

import boto3
import json
import base64
import oci
import logging
import os
from datetime import datetime, timezone
from botocore.exceptions import ClientError
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def get_oci_config():
"""Retrieve OCI credentials from AWS Secrets Manager."""
client = boto3.client("secretsmanager", region_name=os.environ["AWS_REGION"])
try:
response = client.get_secret_value(
SecretId=os.environ["OCI_CREDENTIALS_SECRET_ARN"]
)
creds = json.loads(response["SecretString"])
return {
"tenancy": creds["tenancy_ocid"],
"user": creds["user_ocid"],
"fingerprint": creds["fingerprint"],
"key_content": creds["private_key"],
"region": creds["region"]
}
except ClientError as e:
logger.error(f"Failed to retrieve OCI credentials: {e}")
raise
def get_aws_secret(secret_arn: str) -> str:
"""Retrieve the current value of an AWS secret."""
client = boto3.client("secretsmanager", region_name=os.environ["AWS_REGION"])
try:
response = client.get_secret_value(SecretId=secret_arn)
return response.get("SecretString") or base64.b64decode(
response["SecretBinary"]
).decode("utf-8")
except ClientError as e:
logger.error(f"Failed to retrieve AWS secret {secret_arn}: {e}")
raise
def push_to_oci_vault(
oci_config: dict,
vault_id: str,
key_id: str,
secret_ocid: str,
secret_value: str
):
"""Create a new version of an OCI Vault secret."""
vaults_client = oci.vault.VaultsClient(oci_config)
encoded_value = base64.b64encode(secret_value.encode("utf-8")).decode("utf-8")
update_details = oci.vault.models.UpdateSecretDetails(
secret_content=oci.vault.models.Base64SecretContentDetails(
content_type=oci.vault.models.SecretContentDetails.CONTENT_TYPE_BASE64,
content=encoded_value,
name=f"sync-{datetime.now(timezone.utc).strftime('%Y%m%d%H%M%S')}",
stage="CURRENT"
),
metadata={
"synced_from": "aws-secrets-manager",
"synced_at": datetime.now(timezone.utc).isoformat()
}
)
response = vaults_client.update_secret(
secret_id=secret_ocid,
update_secret_details=update_details
)
logger.info(
f"OCI secret updated. OCID: {secret_ocid}, "
f"New version: {response.data.current_version_number}"
)
return response.data
def handler(event, context):
"""
EventBridge trigger handler.
Expects event detail to contain:
- aws_secret_arn: ARN of the rotated AWS secret
- oci_secret_ocid: OCID of the target OCI Vault secret
- oci_vault_id: OCID of the target OCI Vault
- oci_key_id: OCID of the OCI KMS key
"""
logger.info(f"Received event: {json.dumps(event)}")
detail = event.get("detail", {})
aws_secret_arn = detail.get("aws_secret_arn")
oci_secret_ocid = detail.get("oci_secret_ocid")
oci_vault_id = detail.get("oci_vault_id")
oci_key_id = detail.get("oci_key_id")
if not all([aws_secret_arn, oci_secret_ocid, oci_vault_id, oci_key_id]):
logger.error("Missing required fields in event detail")
raise ValueError("Event detail must include aws_secret_arn, oci_secret_ocid, oci_vault_id, oci_key_id")
logger.info(f"Syncing secret: {aws_secret_arn} to OCI: {oci_secret_ocid}")
# Step 1: Get OCI credentials
oci_config = get_oci_config()
# Step 2: Retrieve the rotated AWS secret
secret_value = get_aws_secret(aws_secret_arn)
# Step 3: Push to OCI Vault
result = push_to_oci_vault(
oci_config=oci_config,
vault_id=oci_vault_id,
key_id=oci_key_id,
secret_ocid=oci_secret_ocid,
secret_value=secret_value
)
return {
"statusCode": 200,
"body": {
"message": "Secret synced successfully",
"oci_secret_ocid": oci_secret_ocid,
"oci_version": result.current_version_number
}
}

Step 6: Lambda IAM Role and Deployment

hcl

data "aws_iam_policy_document" "lambda_assume_role" {
statement {
effect = "Allow"
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["lambda.amazonaws.com"]
}
}
}
data "aws_iam_policy_document" "lambda_permissions" {
statement {
effect = "Allow"
actions = [
"secretsmanager:GetSecretValue",
"secretsmanager:DescribeSecret"
]
resources = [
aws_secretsmanager_secret.db_password.arn,
aws_secretsmanager_secret.oci_credentials.arn
]
}
statement {
effect = "Allow"
actions = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
]
resources = ["arn:aws:logs:*:*:*"]
}
}
resource "aws_iam_role" "sync_lambda_role" {
name = "secrets-sync-lambda-role"
assume_role_policy = data.aws_iam_policy_document.lambda_assume_role.json
}
resource "aws_iam_role_policy" "sync_lambda_policy" {
name = "secrets-sync-lambda-policy"
role = aws_iam_role.sync_lambda_role.id
policy = data.aws_iam_policy_document.lambda_permissions.json
}
resource "aws_lambda_function" "secrets_sync" {
filename = "${path.module}/lambda/secrets_sync.zip"
function_name = "oci-secrets-sync"
role = aws_iam_role.sync_lambda_role.arn
handler = "main.handler"
runtime = "python3.11"
timeout = 60
memory_size = 256
source_code_hash = filebase64sha256("${path.module}/lambda/secrets_sync.zip")
environment {
variables = {
OCI_CREDENTIALS_SECRET_ARN = aws_secretsmanager_secret.oci_credentials.arn
AWS_REGION = var.aws_region
}
}
layers = [aws_lambda_layer_version.oci_sdk_layer.arn]
}

Bundle the OCI Python SDK as a Lambda Layer so the function does not need to package it inline:

bash

mkdir -p lambda_layer/python
pip install oci --target lambda_layer/python
cd lambda_layer && zip -r ../oci_sdk_layer.zip python/

hcl

resource "aws_lambda_layer_version" "oci_sdk_layer" {
filename = "${path.module}/oci_sdk_layer.zip"
layer_name = "oci-python-sdk"
compatible_runtimes = ["python3.11"]
source_code_hash = filebase64sha256("${path.module}/oci_sdk_layer.zip")
}

Step 7: EventBridge Rule to Trigger on Rotation

hcl

resource "aws_cloudwatch_event_rule" "secret_rotation_rule" {
name = "detect-secret-rotation"
description = "Fires when a Secrets Manager secret rotation completes"
event_pattern = jsonencode({
source = ["aws.secretsmanager"],
detail-type = ["AWS API Call via CloudTrail"],
detail = {
eventSource = ["secretsmanager.amazonaws.com"],
eventName = ["RotateSecret", "PutSecretValue"]
}
})
}
resource "aws_cloudwatch_event_target" "sync_lambda_target" {
rule = aws_cloudwatch_event_rule.secret_rotation_rule.name
target_id = "SyncToOCI"
arn = aws_lambda_function.secrets_sync.arn
input_transformer {
input_paths = {
secret_arn = "$.detail.requestParameters.secretId"
}
input_template = <<EOF
{
"detail": {
"aws_secret_arn": "<secret_arn>",
"oci_secret_ocid": "${var.oci_db_password_secret_ocid}",
"oci_vault_id": "${oci_kms_vault.app_vault.id}",
"oci_key_id": "${oci_kms_key.secrets_key.id}"
}
}
EOF
}
}
resource "aws_lambda_permission" "allow_eventbridge" {
statement_id = "AllowEventBridgeInvoke"
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.secrets_sync.function_name
principal = "events.amazonaws.com"
source_arn = aws_cloudwatch_event_rule.secret_rotation_rule.arn
}

Step 8: Verifying the Pipeline

Manually trigger a rotation to test the full pipeline without waiting 30 days:

bash

# Force a rotation in AWS
aws secretsmanager rotate-secret \
--secret-id prod/database/password \
--region us-east-1
# Check Lambda execution logs
aws logs tail /aws/lambda/oci-secrets-sync --follow
# Verify the new version appeared in OCI Vault
oci vault secret get \
--secret-id <your-oci-secret-ocid> \
--query 'data.{name:secret-name, version:"current-version-number", updated:"time-of-current-version-need-rotation"}' \
--output table

A successful sync produces output similar to this in the Lambda logs:

INFO: Syncing secret: arn:aws:secretsmanager:us-east-1:123456789:secret:prod/database/password to OCI: ocid1.vaultsecret.oc1...
INFO: OCI secret updated. OCID: ocid1.vaultsecret.oc1..., New version: 3

Handling Failures and Drift

The pipeline as built is synchronous and event-driven, which means if the Lambda fails, the OCI secret does not get updated. Add a dead-letter queue and a reconciliation function that runs on a schedule to catch any drift.

hcl

resource "aws_sqs_queue" "sync_dlq" {
name = "secrets-sync-dlq"
message_retention_seconds = 86400
}
resource "aws_lambda_function_event_invoke_config" "sync_retry" {
function_name = aws_lambda_function.secrets_sync.function_name
maximum_retry_attempts = 2
maximum_event_age_in_seconds = 300
destination_config {
on_failure {
destination = aws_sqs_queue.sync_dlq.arn
}
}
}

For reconciliation, a scheduled Lambda that runs every hour compares the LastRotatedDate on the AWS secret against the synced_at metadata tag on the OCI secret. If they differ by more than five minutes, it triggers a forced sync.

Security Considerations

A few things to keep in mind when running this in production.

The OCI private key stored in AWS Secrets Manager should be rotated periodically, just like any other credential. Add it to your rotation schedule.

Enable CloudTrail in AWS and OCI Audit logging so every access to both secrets stores is recorded. If something is off with the sync, the audit logs tell you exactly which principal made the change and when.

Use VPC endpoints for Secrets Manager in AWS so the Lambda traffic never crosses the public internet when retrieving credentials.

On the OCI side, enable Vault audit logging to the OCI Logging service so every secret version write is captured.

Wrapping Up

This pipeline solves a real operational problem without requiring a third-party secrets broker. AWS Secrets Manager stays the authoritative source. OCI Vault stays current automatically. The only manual step is the initial deployment.

The pattern extends to other cross-cloud credential types. Database connection strings, API tokens, TLS certificates — any secret that needs to exist on both clouds can follow the same EventBridge to Lambda to OCI Vault flow. Extend the Lambda to support a mapping table of AWS secret ARNs to OCI secret OCIDs and one function handles your entire secrets estate across both providers.

Regards,
Osama

Building a Production Serverless API with OCI API Gateway and OCI Functions

I have seen many teams deploy OCI Functions and call it done. The function works, the test passes, and then they realize there is no authentication, no rate limiting, no proper routing, and no way to version the API. The function URL is just floating there, exposed.

OCI API Gateway is what turns a collection of serverless functions into an actual production API. In this post I will walk through building a complete serverless API stack from scratch using OCI API Gateway, OCI Functions, and Terraform. Everything here is production-oriented: proper IAM, CORS, authentication via JWT, rate limiting, and a deployment pipeline.

Architecture Overview

The stack we are building looks like this:

Client Request → OCI API Gateway → Route Policy (auth + rate limit) → OCI Function → Response

The API Gateway sits in a public subnet and handles all the cross-cutting concerns: TLS termination, JWT validation, CORS headers, and usage plans. The Functions sit in a private subnet with no public exposure. The Gateway invokes them over OCI’s internal network.

Prerequisites

You need the following before starting:

  • OCI CLI configured with a valid profile
  • Terraform 1.5 or later
  • Docker installed locally (for building function images)
  • An OCI tenancy with permissions to manage API Gateway, Functions, IAM, and Networking

Setting Up the Network

Functions should never run in a public subnet. We need a VCN with a public subnet for the Gateway and a private subnet for Functions.

resource "oci_core_vcn" "api_vcn" {
compartment_id = var.compartment_id
cidr_block = "10.0.0.0/16"
display_name = "api-gateway-vcn"
dns_label = "apivcn"
}
resource "oci_core_subnet" "public_subnet" {
compartment_id = var.compartment_id
vcn_id = oci_core_vcn.api_vcn.id
cidr_block = "10.0.1.0/24"
display_name = "gateway-public-subnet"
dns_label = "gatewaypub"
route_table_id = oci_core_route_table.public_rt.id
security_list_ids = [oci_core_security_list.gateway_sl.id]
}
resource "oci_core_subnet" "private_subnet" {
compartment_id = var.compartment_id
vcn_id = oci_core_vcn.api_vcn.id
cidr_block = "10.0.2.0/24"
display_name = "functions-private-subnet"
dns_label = "funcpriv"
prohibit_public_ip_on_vnic = true
route_table_id = oci_core_route_table.private_rt.id
security_list_ids = [oci_core_security_list.functions_sl.id]
}

The private subnet has prohibit_public_ip_on_vnic = true. This is not optional — it ensures no Function instance can accidentally get a public IP assigned.

For the private subnet to reach OCI services (like Container Registry to pull images), add a Service Gateway:

resource "oci_core_service_gateway" "sgw" {
compartment_id = var.compartment_id
vcn_id = oci_core_vcn.api_vcn.id
display_name = "functions-service-gateway"
services {
service_id = data.oci_core_services.all_services.services[0].id
}
}
resource "oci_core_route_table" "private_rt" {
compartment_id = var.compartment_id
vcn_id = oci_core_vcn.api_vcn.id
display_name = "private-route-table"
route_rules {
network_entity_id = oci_core_service_gateway.sgw.id
destination = "all-iad-services-in-oracle-services-network"
destination_type = "SERVICE_CIDR_BLOCK"
}
}

IAM: Dynamic Groups and Policies

OCI Functions need explicit permission to be invoked by API Gateway. This is done through Dynamic Groups and policies not hardcoded credentials.

resource "oci_identity_dynamic_group" "api_gateway_dg" {
compartment_id = var.tenancy_ocid
name = "api-gateway-dynamic-group"
description = "Allows API Gateway to invoke Functions"
matching_rule = "ALL {resource.type = 'ApiGateway', resource.compartment.id = '${var.compartment_id}'}"
}
resource "oci_identity_policy" "gateway_invoke_policy" {
compartment_id = var.compartment_id
name = "gateway-invoke-functions-policy"
description = "Grants API Gateway permission to invoke Functions"
statements = [
"Allow dynamic-group api-gateway-dynamic-group to use functions-family in compartment id ${var.compartment_id}"
]
}

Without this policy, the Gateway will return a 500 when it tries to invoke your function, and the error message is not always obvious about the cause.

Building and Pushing the Function

We will write a simple order validation function in Python. Create this directory structure:

order-validator/
├── func.py
├── func.yaml
└── requirements.txt

schema_version: 20180708
name: order-validator
version: 0.0.1
runtime: python3.11
build_image: fnproject/python:3.11-dev
run_image: fnproject/python:3.11
entrypoint: /python/bin/fdk /function/func.py handler
memory: 256

requirements.txt:

fdk>=0.1.57

func.py:

import io
import json
import logging
from fdk import response
def handler(ctx, data: io.BytesIO = None):
logger = logging.getLogger()
try:
body = json.loads(data.getvalue())
except (Exception, ValueError) as ex:
logger.error("Failed to parse request body: " + str(ex))
return response.Response(
ctx,
status_code=400,
response_data=json.dumps({"error": "Invalid JSON in request body"}),
headers={"Content-Type": "application/json"}
)
required_fields = ["order_id", "customer_id", "items", "total_amount"]
missing = [f for f in required_fields if f not in body]
if missing:
return response.Response(
ctx,
status_code=422,
response_data=json.dumps({
"error": "Missing required fields",
"fields": missing
}),
headers={"Content-Type": "application/json"}
)
if not isinstance(body.get("items"), list) or len(body["items"]) == 0:
return response.Response(
ctx,
status_code=422,
response_data=json.dumps({"error": "Order must contain at least one item"}),
headers={"Content-Type": "application/json"}
)
if body["total_amount"] <= 0:
return response.Response(
ctx,
status_code=422,
response_data=json.dumps({"error": "total_amount must be greater than zero"}),
headers={"Content-Type": "application/json"}
)
return response.Response(
ctx,
status_code=200,
response_data=json.dumps({
"status": "valid",
"order_id": body["order_id"],
"item_count": len(body["items"]),
"validated_at": ctx.RequestID()
}),
headers={"Content-Type": "application/json"}
)

Build and push the function image to OCI Container Registry:

# Log in to OCI Container Registry
docker login <region-key>.ocir.io -u '<tenancy-namespace>/<username>'
# Build the function image
fn build --verbose
# Tag and push
docker tag order-validator:0.0.1 <region-key>.ocir.io/<tenancy-namespace>/functions/order-validator:0.0.1
docker push <region-key>.ocir.io/<tenancy-namespace>/functions/order-validator:0.0.1

Deploying the Function with Terraform

resource "oci_functions_application" "orders_app" {
compartment_id = var.compartment_id
display_name = "orders-api"
subnet_ids = [oci_core_subnet.private_subnet.id]
config = {
LOG_LEVEL = "INFO"
ENV = "production"
}
trace_config {
is_enabled = true
domain_id = oci_apm_apm_domain.tracing.id
}
}
resource "oci_functions_function" "order_validator" {
application_id = oci_functions_application.orders_app.id
display_name = "order-validator"
image = "<region-key>.ocir.io/<tenancy-namespace>/functions/order-validator:0.0.1"
memory_in_mbs = 256
timeout_in_seconds = 30
provisioned_concurrency_config {
strategy = "CONSTANT"
count = 2
}
}

The provisioned_concurrency_config block with count = 2 keeps two warm instances running at all times. This eliminates cold starts for your two most frequent concurrent requests — critical for an API that needs consistent latency.

Creating the API Gateway and Deployment

This is where everything comes together. The Gateway deployment defines your routes, authentication, and rate limiting in a single resource:

resource "oci_apigateway_gateway" "orders_gateway" {
compartment_id = var.compartment_id
display_name = "orders-api-gateway"
endpoint_type = "PUBLIC"
subnet_id = oci_core_subnet.public_subnet.id
certificate_id = var.tls_certificate_ocid
}
resource "oci_apigateway_deployment" "orders_deployment" {
compartment_id = var.compartment_id
display_name = "orders-api-v1"
gateway_id = oci_apigateway_gateway.orders_gateway.id
path_prefix = "/v1"
specification {
request_policies {
authentication {
type = "JWT_AUTHENTICATION"
token_header = "Authorization"
token_auth_scheme = "Bearer"
is_anonymous_access_allowed = false
public_keys {
type = "REMOTE_JWKS"
uri = "https://your-identity-provider.com/.well-known/jwks.json"
max_cache_duration_in_hours = 1
}
verify_claims {
key = "iss"
values = ["https://your-identity-provider.com"]
is_required = true
}
verify_claims {
key = "aud"
values = ["orders-api"]
is_required = true
}
}
rate_limiting {
rate_in_requests_per_second = 100
rate_key = "CLIENT_IP"
}
cors {
allowed_origins = ["https://yourdomain.com"]
allowed_methods = ["GET", "POST", "OPTIONS"]
allowed_headers = ["Authorization", "Content-Type"]
max_age_in_seconds = 3600
is_allow_credentials_enabled = true
}
}
routes {
path = "/orders/validate"
methods = ["POST"]
backend {
type = "ORACLE_FUNCTIONS_BACKEND"
function_id = oci_functions_function.order_validator.id
}
request_policies {
body_validation {
required = true
content {
media_type = "application/json"
validation_type = "NONE"
}
}
}
response_policies {
header_transformations {
set_headers {
items {
name = "X-Request-ID"
values = ["${request.headers[x-request-id]}"]
}
items {
name = "Strict-Transport-Security"
values = ["max-age=31536000; includeSubDomains"]
}
}
}
}
}
routes {
path = "/orders/validate"
methods = ["OPTIONS"]
backend {
type = "STOCK_RESPONSE_BACKEND"
status = 204
headers {
name = "Access-Control-Allow-Origin"
value = "https://yourdomain.com"
}
}
}
}
}

A few things worth highlighting in this configuration.

The JWT authentication block uses REMOTE_JWKS, which means the Gateway fetches your identity provider’s public keys and caches them for one hour. It validates the signature, the issuer, and the audience on every request before your Function ever sees the traffic. Your function code does not need to do any token validation at all.

The rate_limiting block uses CLIENT_IP as the rate key, which applies the 100 req/sec limit per caller rather than globally across all callers. Switch this to TOTAL if you want a single shared limit for the entire API.

The OPTIONS route returns a 204 with no backend function invoked. This handles preflight CORS requests without consuming Function compute time.

Testing the Deployment

Once Terraform applies successfully, get your Gateway endpoint:

terraform output gateway_endpoint

Test without a token (should get 401):

curl -X POST https://<gateway-id>.apigateway.<region>.oci.customer-oci.com/v1/orders/validate \
-H "Content-Type: application/json" \
-d '{"order_id": "ORD-001"}'

Test with a valid JWT:

TOKEN=$(curl -s -X POST https://your-idp.com/oauth/token \
-d "grant_type=client_credentials&client_id=...&client_secret=..." | jq -r .access_token)
curl -X POST https://<gateway-id>.apigateway.<region>.oci.customer-oci.com/v1/orders/validate \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"order_id": "ORD-001",
"customer_id": "CUST-999",
"items": [{"sku": "ITEM-A", "qty": 2}],
"total_amount": 49.99
}'

Expected response:

{
"status": "valid",
"order_id": "ORD-001",
"item_count": 1,
"validated_at": "ocid1.apigateway.request..."
}

Enabling Execution Logs

Without logs, debugging a Gateway issue is nearly impossible. Enable execution logs on both the Gateway and the Function application:

resource "oci_logging_log" "gateway_execution_log" {
display_name = "gateway-execution-log"
log_group_id = oci_logging_log_group.api_logs.id
log_type = "SERVICE"
configuration {
source {
category = "execution"
resource = oci_apigateway_deployment.orders_deployment.id
service = "apigateway"
source_type = "OCISERVICE"
}
compartment_id = var.compartment_id
}
retention_duration = 30
is_enabled = true
}

Gateway execution logs include the full request path, JWT claim values, matched route, backend invocation time, and response status for every request. These logs are the first place to look when a request is failing.

Final Thoughts

The combination of OCI API Gateway and OCI Functions gives you a production API stack with almost no infrastructure to manage. The Gateway handles authentication, rate limiting, CORS, and TLS. The Functions handle business logic. Terraform manages the entire configuration as code, so every change is reviewed, versioned, and repeatable.

The pieces that trip most people up are the Dynamic Group IAM policy (the Gateway cannot invoke Functions without it), provisioned concurrency (without it your p99 latency will be terrible on cold paths), and execution logging (without it you are debugging blind).

Get those three right and the rest follows naturally.

Regards, Osama

Building Generative AI Applications with Vector Databases on AWS

A few months ago, I was helping a team that had just integrated an LLM into their product. The use case was straightforward: users ask questions, the LLM answers. They had it running. The demos looked great. Then they went to production.

The model kept confidently making things up. It had no idea about the company’s internal documentation, the latest product specs, or anything that happened after its training cutoff. The team was frustrated. They had the right model, the right infrastructure, but the wrong architecture.

The fix was not fine-tuning. Fine-tuning is expensive, slow, and you have to redo it every time your data changes. The fix was Retrieval Augmented Generation, or RAG. And at the heart of RAG is something called a vector database.

In this article, I will walk you through building a production-grade RAG architecture on AWS. We will cover what vector databases actually are, when to use Aurora pgvector versus OpenSearch versus Amazon Bedrock Knowledge Bases, and how to wire everything together with real code.

What Is a Vector Database and Why Does It Matter

Before writing any infrastructure code, let me explain what problem we are actually solving.

When you work with text, images, or audio in AI systems, the raw data is not what gets compared. Instead, you pass the data through an embedding model, which converts it into a list of numbers called a vector. That vector captures the semantic meaning of the content.

Two sentences that mean the same thing will have vectors that are close to each other in vector space, even if they use completely different words. “The server is down” and “the system is not responding” will be closer to each other than “the server is down” and “I had pasta for lunch.”

A vector database is optimized for one specific operation: given a query vector, find me the N closest vectors in the collection. This is called approximate nearest neighbor search, and it is fundamentally different from SQL WHERE clauses or text search.

In a RAG architecture, the flow looks like this:

  1. You chunk your documents and generate embeddings for each chunk
  2. You store those embeddings in a vector database
  3. When a user asks a question, you generate an embedding for the question
  4. You query the vector database to retrieve the most semantically similar chunks
  5. You pass the question plus those chunks to your LLM as context
  6. The LLM answers based on actual, grounded information

The result is a model that knows your data, stays current as your data changes, and does not hallucinate facts from your knowledge base because the facts are right there in the prompt.

Options on AWS

AWS gives you three serious paths for vector storage, and choosing the wrong one will cost you performance and money.

Amazon Aurora PostgreSQL with pgvector

pgvector is an open source PostgreSQL extension that adds native vector storage and similarity search. If you already run Aurora PostgreSQL, this is often the right starting point.

The extension supports three distance metrics: L2 (Euclidean), inner product, and cosine similarity. For most text embedding use cases, cosine similarity is what you want.

Here is a minimal setup to get you started:

-- Enable the extension on your Aurora instance
CREATE EXTENSION vector;
-- Create a table for your document chunks
CREATE TABLE document_chunks (
id BIGSERIAL PRIMARY KEY,
doc_id TEXT NOT NULL,
chunk_text TEXT NOT NULL,
source_url TEXT,
embedding vector(1536), -- 1536 dims for text-embedding-3-small
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- IVFFlat index for approximate nearest neighbor search
-- lists = sqrt(number of rows) is a good starting point
CREATE INDEX ON document_chunks
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
SELECT
chunk_text,
source_url,
1 - (embedding <=> $1::vector) AS similarity_score
FROM document_chunks
ORDER BY embedding <=> $1::vector
LIMIT 5;

The <=> operator computes cosine distance. One minus that gives you similarity.

For production, tune the ivfflat.probes parameter at query time. Higher probes means more accuracy but slower queries. For most use cases, setting it between 10 and 20 is a reasonable balance:

Aurora pgvector is the right choice when your team already knows PostgreSQL, you want to join vector search results with relational data in the same query, or you have an existing Aurora cluster and want to avoid managing another service.

The limitation is scale. Once you push past 10 to 20 million vectors, or you need sub-10ms latency at high concurrency, you will start to feel the ceiling.

Amazon OpenSearch Service with Vector Engine

OpenSearch’s vector engine is built for scale. It uses the HNSW (Hierarchical Navigable Small World) algorithm, which delivers excellent recall and latency even at hundreds of millions of vectors.

Setting up an index for vector search:

PUT /documents
{
"settings": {
"index": {
"knn": true,
"knn.algo_param.ef_search": 512
}
},
"mappings": {
"properties": {
"doc_id": { "type": "keyword" },
"chunk_text": { "type": "text" },
"source_url": { "type": "keyword" },
"embedding": {
"type": "knn_vector",
"dimension": 1536,
"method": {
"name": "hnsw",
"space_type": "cosinesimil",
"engine": "nmslib",
"parameters": {
"ef_construction": 512,
"m": 16
}
}
}
}
}
}

The ef_construction and m parameters control the index build quality. Higher values give better recall but increase memory usage and indexing time. For most production workloads, m=16 and ef_construction=512 is a solid baseline.

Indexing a document:

import boto3
from opensearchpy import OpenSearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth
region = "us-east-1"
service = "es"
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key,
region, service, session_token=credentials.token)
client = OpenSearch(
hosts=[{"host": your_opensearch_endpoint, "port": 443}],
http_auth=awsauth,
use_ssl=True,
verify_certs=True,
connection_class=RequestsHttpConnection
)
document = {
"doc_id": "product-manual-v3-page-42",
"chunk_text": "The power button is located on the right side of the device...",
"source_url": "s3://your-bucket/manuals/product-v3.pdf",
"embedding": generate_embedding("The power button is located...")
}
client.index(index="documents", body=document)

Querying for semantic similarity:

query = {
"size": 5,
"query": {
"knn": {
"embedding": {
"vector": generate_embedding(user_question),
"k": 5
}
}
},
"_source": ["chunk_text", "source_url"]
}
response = client.search(index="documents", body=query)

OpenSearch also lets you combine vector search with traditional filters, which is something pgvector struggles with at scale:

hybrid_query = {
"size": 5,
"query": {
"bool": {
"must": [
{
"knn": {
"embedding": {
"vector": generate_embedding(user_question),
"k": 20
}
}
}
],
"filter": [
{ "term": { "product_line": "enterprise" } },
{ "range": { "doc_date": { "gte": "2024-01-01" } } }
]
}
}
}

Retrieving 20 candidates via vector search, then filtering down with metadata, is called pre-filtering, and it is critical when your knowledge base spans multiple products, teams, or access tiers.

Amazon Bedrock Knowledge Bases

If you want the fastest path to production and do not want to manage chunking, embedding, or indexing yourself, Bedrock Knowledge Bases handles all of it.

You point it at an S3 bucket. It crawls your documents, chunks them, generates embeddings using your chosen model, and stores them in an OpenSearch Serverless collection. When you query it, it handles the retrieval and optionally the generation too.

resource "aws_bedrockagent_knowledge_base" "product_docs" {
name = "product-documentation-kb"
role_arn = aws_iam_role.bedrock_kb_role.arn
knowledge_base_configuration {
type = "VECTOR"
vector_knowledge_base_configuration {
embedding_model_arn = "arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v2:0"
}
}
storage_configuration {
type = "OPENSEARCH_SERVERLESS"
opensearch_serverless_configuration {
collection_arn = aws_opensearchserverless_collection.kb_vectors.arn
vector_index_name = "bedrock-knowledge-base-default-index"
field_mapping {
vector_field = "bedrock-knowledge-base-default-vector"
text_field = "AMAZON_BEDROCK_TEXT_CHUNK"
metadata_field = "AMAZON_BEDROCK_METADATA"
}
}
}
}
resource "aws_bedrockagent_data_source" "s3_docs" {
knowledge_base_id = aws_bedrockagent_knowledge_base.product_docs.id
name = "s3-product-documentation"
data_source_configuration {
type = "S3"
s3_configuration {
bucket_arn = aws_s3_bucket.documentation.arn
}
}
vector_ingestion_configuration {
chunking_configuration {
chunking_strategy = "SEMANTIC"
semantic_chunking_configuration {
max_token = 300
buffer_size = 0
breakpoint_percentile_threshold = 95
}
}
}
}

Querying it from your application:

import boto3
bedrock_agent = boto3.client("bedrock-agent-runtime", region_name="us-east-1")
response = bedrock_agent.retrieve_and_generate(
input={
"text": user_question
},
retrieveAndGenerateConfiguration={
"type": "KNOWLEDGE_BASE",
"knowledgeBaseConfiguration": {
"knowledgeBaseId": "YOUR_KB_ID",
"modelArn": "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0",
"retrievalConfiguration": {
"vectorSearchConfiguration": {
"numberOfResults": 5,
"overrideSearchType": "HYBRID"
}
}
}
}
)
answer = response["output"]["text"]
citations = response["citations"]

The HYBRID search type combines vector similarity with keyword search under the hood, which improves recall for queries that contain specific product names, version numbers, or technical terms that embeddings alone sometimes miss.

Chunking Strategy: The Part Everyone Gets Wrong

The quality of your RAG system depends more on how you chunk your documents than on which vector database you choose. I have seen teams spend weeks optimizing their similarity search while their chunking strategy was destroying recall.

A few rules that hold up in practice:

Chunk size matters. Too small and you lose context. Too large and you dilute the semantic signal. For most document types, 300 to 500 tokens with a 50-token overlap between chunks is a reasonable starting point. The overlap ensures that sentences that fall on chunk boundaries are still retrievable.

Chunk by structure when you can. If your documents have headers, sections, or natural breaks, use those as chunk boundaries rather than fixed token counts. A section about “Troubleshooting Network Errors” should stay together rather than getting split at 400 tokens.

Store metadata with every chunk. The chunk text alone is not enough. You need the source document, the section title, the creation date, the product version. This metadata enables the filtering patterns we covered in OpenSearch and prevents your model from citing a three-year-old document when a current one exists.

Test with real queries. The only way to validate your chunking strategy is to run the queries your users will actually ask and check whether the right chunks are being retrieved. Build a small evaluation set early, before you optimize anything else.

Embedding Model Selection

For AWS workloads, you have two main options through Bedrock:

Amazon Titan Text Embeddings V2 produces 1024-dimensional vectors. It is fast, cheap, and fine for general English text. If you are building an internal knowledge base over English documents, this is the right default.

Cohere Embed v3 supports multilingual embeddings and produces 1024-dimensional vectors with better performance on technical and domain-specific text. If your documents cover specialized subject matter legal, medical, engineering Cohere will typically outperform Titan on retrieval quality.

A critical point that is easy to overlook: you must use the same embedding model at indexing time and query time. If you indexed your documents with Titan and query with Cohere, the vectors live in different spaces and your similarity scores will be meaningless. Build this constraint into your infrastructure from day one.

Architecture Summary

For a production RAG system on AWS, here is the architecture that has worked well for teams I have worked with.

Document ingestion: an S3 bucket triggers a Lambda function, or Step Functions for large files. The function chunks the document, generates embeddings via Bedrock, and writes to your vector store with metadata.

Vector storage: Aurora pgvector for under 5 million vectors with heavy relational joins. OpenSearch for everything larger, or when you need metadata filtering at scale. Bedrock Knowledge Bases when you want fully managed infrastructure and your team does not want to own the pipeline.

Query path: API Gateway triggers a Lambda function that embeds the user query, retrieves top-k chunks from the vector store, builds a context-enriched prompt, and calls Claude or another Bedrock model for the final response.

Observability: CloudWatch captures embedding latency, retrieval similarity scores, and end-to-end response time. Set alerts if retrieval quality drops since that is usually a signal that something changed in your document pipeline.

Regards
Osama

Enforcing SLA Compliance with SQL Assertions in Oracle 23ai: A Real-World Use Case

One of the most frustrating things I’ve dealt with as a DBA is cleaning up data that should never have existed in the first place. Orphaned records, overlapping date ranges, business rules violated because some batch job skipped a validation step. We’ve all been there.

The traditional solution was triggers. And if you’ve written cross-table validation triggers in Oracle, you know the pain: mutating table errors (ORA-04091), complex exception handling, scattered logic across multiple trigger bodies, and debugging sessions that make you question your career choices.

Starting with Oracle Database 23ai (release 23.26.1), Oracle introduced SQL Assertions, and they change everything about how we enforce cross-table business rules.

What Are SQL Assertions?

An assertion is a schema-level integrity constraint defined by a boolean expression. If that expression evaluates to false during a transaction, the transaction fails. That’s it. The concept has been part of the SQL standard since SQL-92, but no major database vendor actually implemented it until Oracle did it in 23.26.1.

There are two types of assertion expressions:

Existential expressions use [NOT] EXISTS with a subquery. If the condition is true, the transaction proceeds.

Universal expressions use the new ALL ... SATISFY syntax. This lets you say “for every row matching this query, this condition must hold.” It’s Oracle’s elegant alternative to the awkward double-negation pattern (NOT EXISTS ... WHERE NOT EXISTS ...) that SQL traditionally requires for universal quantification.

The Scenario: SLA Compliance for a Ticketing System

Let me show you a real-world use case that goes beyond toy examples. Imagine you run a support ticketing system for an enterprise. You have service level agreements (SLAs) with your customers, and the database needs to enforce these rules:

  1. Every customer must have an active SLA before they can submit a ticket. No SLA, no support.
  2. Tickets can only be created while the customer’s SLA is active (between start and end dates).
  3. High-priority tickets must be assigned to a senior engineer. You can’t assign a critical production issue to a junior team member.
  4. Every SLA must cover at least one service category. An SLA with no covered services is meaningless.

In a traditional Oracle setup, enforcing these rules would require at least four separate triggers across three tables, careful handling of mutating table errors, and a lot of testing to make sure they don’t interfere with each other.

With assertions, each rule is a single declarative statement.

Building the Schema

sql

DROP TABLE IF EXISTS tickets CASCADE CONSTRAINTS PURGE;
DROP TABLE IF EXISTS sla_services CASCADE CONSTRAINTS PURGE;
DROP TABLE IF EXISTS slas CASCADE CONSTRAINTS PURGE;
DROP TABLE IF EXISTS engineers CASCADE CONSTRAINTS PURGE;
DROP TABLE IF EXISTS customers CASCADE CONSTRAINTS PURGE;
CREATE TABLE customers (
id NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name VARCHAR2(200) NOT NULL,
company VARCHAR2(200),
created_at TIMESTAMP DEFAULT SYSTIMESTAMP
);
CREATE TABLE engineers (
id NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name VARCHAR2(200) NOT NULL,
seniority VARCHAR2(20) CHECK (
seniority IN ('junior','mid','senior','lead')
),
specialization VARCHAR2(100)
);
CREATE TABLE slas (
id NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_id NUMBER NOT NULL REFERENCES customers(id),
sla_tier VARCHAR2(20) CHECK (
sla_tier IN ('bronze','silver','gold','platinum')
),
start_date DATE NOT NULL,
end_date DATE NOT NULL,
CONSTRAINT sla_dates_valid CHECK (end_date > start_date)
);
CREATE TABLE sla_services (
id NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
sla_id NUMBER NOT NULL REFERENCES slas(id),
service_name VARCHAR2(100) NOT NULL
);
CREATE TABLE tickets (
id NUMBER GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
customer_id NUMBER NOT NULL REFERENCES customers(id),
engineer_id NUMBER REFERENCES engineers(id),
priority VARCHAR2(20) CHECK (
priority IN ('low','medium','high','critical')
),
subject VARCHAR2(500) NOT NULL,
created_at TIMESTAMP DEFAULT SYSTIMESTAMP,
status VARCHAR2(20) DEFAULT 'open' CHECK (
status IN ('open','in_progress','resolved','closed')
)
);

Assertion 1: Customers Need an Active SLA to Submit Tickets

This is the core business rule. No active SLA, no ticket creation.

sql

CREATE ASSERTION ticket_requires_active_sla
CHECK (
ALL (SELECT customer_id, created_at FROM tickets) SATISFY
EXISTS (
SELECT 1 FROM slas
WHERE slas.customer_id = tickets.customer_id
AND tickets.created_at
BETWEEN slas.start_date AND slas.end_date
)
);

Read that in plain English: “For all tickets, there must exist an SLA for that customer where the ticket creation date falls within the SLA period.”

If someone tries to insert a ticket for a customer whose SLA has expired, the database will reject the transaction. No application code needed. No trigger needed. The rule is declarative and self-documenting.

Assertion 2: High-Priority Tickets Need Senior Engineers

This is a cross-table constraint that would be especially painful with triggers because it spans tickets and engineers.

sql

CREATE ASSERTION critical_tickets_need_senior_engineer
CHECK (
NOT EXISTS (
SELECT 1
FROM tickets t
JOIN engineers e ON t.engineer_id = e.id
WHERE t.priority IN ('high', 'critical')
AND e.seniority IN ('junior', 'mid')
)
);

This uses the existential pattern. It looks for any high-priority ticket assigned to a junior or mid-level engineer. If it finds one, the transaction fails. Simple, clear, and impossible to bypass from any application that touches this database.

Assertion 3: Every SLA Must Cover at Least One Service

An SLA without any covered services is a data integrity problem waiting to happen.

sql

CREATE ASSERTION sla_must_have_services
CHECK (
ALL (SELECT id FROM slas) SATISFY
EXISTS (
SELECT 1 FROM sla_services
WHERE sla_services.sla_id = slas.id
)
)
DEFERRABLE INITIALLY DEFERRED;

This one uses DEFERRABLE INITIALLY DEFERRED because of the chicken-and-egg problem: the foreign key on sla_services requires the SLA to exist first, but this assertion requires services to exist when an SLA exists. By deferring validation to commit time, you can insert both the SLA and its services in a single transaction.

Testing It Out

Let’s load some data and see the assertions in action:

sql

-- Insert customers
INSERT INTO customers (name, company)
VALUES ('Ahmad Hassan', 'TechCorp Jordan');
INSERT INTO customers (name, company)
VALUES ('Sara Ali', 'DataFlow ME');
-- Insert engineers
INSERT INTO engineers (name, seniority, specialization)
VALUES ('Omar Khalid', 'senior', 'Database');
INSERT INTO engineers (name, seniority, specialization)
VALUES ('Lina Nasser', 'junior', 'Networking');
-- Insert SLA with services (in one transaction
-- because of deferred assertion)
INSERT INTO slas (customer_id, sla_tier, start_date, end_date)
VALUES (1, 'gold', DATE '2025-01-01', DATE '2026-12-31');
INSERT INTO sla_services (sla_id, service_name)
VALUES (1, 'Database Support');
INSERT INTO sla_services (sla_id, service_name)
VALUES (1, '24/7 Monitoring');
COMMIT; -- Assertion validates here: SLA has services, OK
-- This should succeed: customer has active SLA,
-- senior engineer assigned
INSERT INTO tickets
(customer_id, engineer_id, priority, subject)
VALUES
(1, 1, 'critical', 'Production database performance issue');
COMMIT;

Now let’s try violating the rules:

sql

-- This should FAIL: assigning critical ticket
-- to junior engineer
INSERT INTO tickets
(customer_id, engineer_id, priority, subject)
VALUES
(1, 2, 'critical', 'Server outage');
COMMIT;
-- ERROR: assertion CRITICAL_TICKETS_NEED_SENIOR_ENGINEER violated
-- This should FAIL: customer 2 has no SLA
INSERT INTO tickets
(customer_id, engineer_id, priority, subject)
VALUES
(2, 1, 'low', 'General question');
COMMIT;
-- ERROR: assertion TICKET_REQUIRES_ACTIVE_SLA violated

The database enforces the rules. Every time. Regardless of which application, API, or batch job is inserting the data.

Why This Matters

The traditional approach to these rules would involve:

  • Four or more BEFORE INSERT triggers across multiple tables
  • Careful handling of ORA-04091 mutating table errors (probably using compound triggers or package variables)
  • Testing every combination of insert/update/delete across all tables
  • Documentation that explains what each trigger does and how they interact
  • A maintenance burden that grows with every new business rule

With assertions, each rule is one statement. They live in the data dictionary alongside your other constraints. You can query USER_CONSTRAINTS to see them. They are self-documenting. And Oracle’s internal incremental checking mechanism ensures they perform well because the database only validates the data that actually changed, not the entire table.

Practical Notes

Grant the privilege. CREATE ASSERTION is not included in RESOURCE. Use GRANT DB_DEVELOPER_ROLE TO your_user; or grant it explicitly.

Assertions share the constraint namespace. You cannot have an assertion and a constraint with the same name in the same schema.

Cross-schema assertions need ASSERTION REFERENCES. If your assertion references tables in another schema, you need this object privilege on those tables, and you must use fully qualified table names (synonyms are not supported).

Start with ENABLE NOVALIDATE on existing systems. This lets you add an assertion without checking existing data, which is essential when adding rules to a database that might already contain violations.

Subqueries can nest up to three levels. For most business rules, this is more than enough.

Resources

Thank you

Osama

Building a Customer Management System with Oracle 23ai: Domains, Duality Views, and Annotations

I’ve been exploring the new features in Oracle Database 23ai, and I have to say, the combination of SQL Domains, JSON Relational Duality Views, and Annotations completely changes how I think about schema design. In this post, I’ll walk through building a small customer and order management system that uses all three features together. And the best part? You can run every single example right here on FreeSQL without installing anything.

The Problem

Let’s say we’re building a simple e-commerce backend. We need customer records with validated email addresses and credit card numbers, and we need order records tied to those customers. On the application side, our frontend team wants to consume the data as JSON documents. On the database side, we want clean, normalized relational tables with proper constraints.

In older Oracle versions, you would have to:

  • Repeat CHECK constraints for email validation on every table that stores emails
  • Build complex application-layer ORM logic to convert between relational rows and JSON objects
  • Keep documentation about your schema in external wikis or README files that nobody updates

Oracle 23ai solves all three problems with native features. Let me show you how.

Setting Up the Foundation: SQL Domains

SQL Domains are reusable column-type definitions. Think of them as named templates that bundle a data type, constraints, display formatting, ordering behavior, and documentation into a single schema object. Once you create a domain, any column can reference it and automatically inherit everything.

Here’s what that looks like for email addresses and credit card numbers:

sql

PURGE RECYCLEBIN;
DROP DOMAIN IF EXISTS emails;
DROP DOMAIN IF EXISTS cc;
CREATE DOMAIN emails AS VARCHAR2(100)
CONSTRAINT email_chk CHECK (
REGEXP_LIKE(emails, '^(\S+)\@(\S+)\.(\S+)$')
)
DISPLAY LOWER(emails)
ORDER LOWER(emails)
ANNOTATIONS (
Description 'An email address with a check constraint
for name @ domain dot (.) something'
);
CREATE DOMAIN cc AS VARCHAR2(19)
CONSTRAINT cc_chk CHECK (
REGEXP_LIKE(cc, '^\d+(\d+)*$')
)
ANNOTATIONS (
Description 'Credit card number with a check constraint
no dashes, no spaces!'
);

Notice a few things here. The DISPLAY clause means that whenever someone queries an email column, it will automatically be shown in lowercase. The ORDER clause ensures sorting is also case-insensitive. And the ANNOTATIONS clause embeds documentation directly in the data dictionary. No external docs needed.

Try inserting an invalid email like not-an-email into any column using the emails domain, and the database will reject it automatically. The validation lives in the schema, not in your application code.

Creating the Tables

Now let’s create our customers and orders tables. Notice how the email column simply references the emails domain, and the credit_card column references the cc domain. No need to repeat the CHECK constraints.

sql

DROP TABLE IF EXISTS orders CASCADE CONSTRAINTS PURGE;
DROP TABLE IF EXISTS customers CASCADE CONSTRAINTS PURGE;
CREATE TABLE IF NOT EXISTS orders (
id NUMBER,
product_id NUMBER,
order_date TIMESTAMP,
customer_id NUMBER,
total_value NUMBER(6,2),
order_shipped BOOLEAN,
warranty INTERVAL YEAR TO MONTH
);
CREATE TABLE IF NOT EXISTS customers (
id NUMBER,
first_name VARCHAR2(100),
last_name VARCHAR2(100),
dob DATE,
email emails,
address VARCHAR2(200),
zip VARCHAR2(10),
phone_number VARCHAR2(20),
credit_card cc,
joined_date TIMESTAMP DEFAULT SYSTIMESTAMP,
gold_customer BOOLEAN DEFAULT FALSE,
CONSTRAINT new_customers_pk PRIMARY KEY (id)
);
ALTER TABLE orders ADD (CONSTRAINT orders_pk PRIMARY KEY (id));
ALTER TABLE orders ADD (
CONSTRAINT orders_fk FOREIGN KEY (customer_id)
REFERENCES customers (id)
);

Also worth noting: BOOLEAN is now a native SQL data type in 23ai. No more NUMBER(1) or CHAR(1) workarounds. And INTERVAL YEAR TO MONTH gives us clean warranty period tracking without date math.

Loading Sample Data

Let’s insert a handful of customers and a couple of orders:

sql

INSERT INTO customers
(id, first_name, last_name, dob, email, address,
zip, phone_number, credit_card)
VALUES
(1, 'Alice', 'Brown', DATE '1990-01-01',
'alice.brown@example.com', '123 Maple Street',
'12345', '555-1234', '4111111111110000'),
(3, 'Bob', 'Brown', DATE '1990-01-01',
'email1@example.com', '333 Maple Street',
'12345', '555-5678', '4111111111111111'),
(4, 'Clarice', 'Jones', DATE '1990-01-01',
'email8888@example.com', '222 Bourbon Street',
'12345', '555-7856', '4111111111111110'),
(5, 'David', 'Smith', DATE '1990-01-01',
'email375@example.com', '111 Walnut Street',
'12345', '555-3221', '4111111111111112');
INSERT INTO orders
(id, customer_id, product_id, order_date,
total_value, order_shipped, warranty)
VALUES
(100, 1, 101, SYSTIMESTAMP, 300.00, NULL, NULL),
(101, 4, 101, SYSTIMESTAMP - 30, 129.99, TRUE,
INTERVAL '5' YEAR);
COMMIT;

The Magic Part: JSON Relational Duality Views

Here’s where it gets really interesting. JSON Relational Duality Views let you expose your normalized relational tables as JSON documents. The data stays in the relational tables (normalized, efficient, properly constrained), but applications can read and write it as JSON. Both representations stay perfectly in sync, automatically.

First, a simple duality view for just the customers table:

sql

CREATE OR REPLACE FORCE JSON RELATIONAL DUALITY VIEW
customers_dv AS
customers @insert @update @delete
{
_id : id,
FirstName : first_name,
LastName : last_name,
DateOfBirth : dob,
Email : email,
Address : address,
Zip : zip,
phoneNumber : phone_number,
creditCard : credit_card,
joinedDate : joined_date,
goldStatus : gold_customer
};

Now you can insert data as JSON:

sql

INSERT INTO customers_dv VALUES (
'{"_id": 2, "FirstName": "Jim", "LastName": "Brown",
"Email": "jim.brown@example.com",
"Address": "456 Maple Street", "Zip": 12345}'
);
COMMIT;

That JSON insert automatically populates the underlying relational customers table. The domain validation still applies, so if you try to insert a bad email through the JSON interface, Oracle will reject it.

Nested Duality Views: Customers with Their Orders

Now for the real power. Let’s create a duality view that nests orders inside customer documents:

sql

CREATE OR REPLACE JSON RELATIONAL DUALITY VIEW
customer_orders_dv
ANNOTATIONS (
Description 'JSON Relational Duality View
sourced from CUSTOMERS and ORDERS'
)
AS SELECT JSON {
'_id' : c.ID,
'FirstName' : c.FIRST_NAME,
'LastName' : c.LAST_NAME,
'Address' : c.ADDRESS,
'Zip' : c.ZIP,
'orders' :
[ SELECT JSON {
'OrderID' : o.ID WITH NOUPDATE,
'ProductID' : o.PRODUCT_ID,
'OrderDate' : o.ORDER_DATE,
'TotalValue' : o.TOTAL_VALUE,
'OrderShipped' : o.ORDER_SHIPPED
}
FROM ORDERS o WITH INSERT UPDATE DELETE
WHERE o.CUSTOMER_ID = c.ID
]
}
FROM CUSTOMERS c;

Query it, and you get clean JSON with nested orders:

sql

SELECT * FROM customer_orders_dv o
WHERE o.data."_id" = 1;

You can even add a new order by updating the JSON document directly using JSON_TRANSFORM:

sql

UPDATE customer_orders_dv c
SET c.data = json_transform(
data,
APPEND '$.orders' = JSON {
'OrderID': 123,
'ProductID': 202,
'OrderDate': SYSTIMESTAMP,
'TotalValue': 150.00
}
)
WHERE c.data."_id" = 1;
COMMIT;
SELECT * FROM customer_orders_dv o
WHERE o.data."_id" = 1;

That single JSON update automatically inserted a new row into the relational ORDERS table with the correct foreign key. No ORM. No application-layer mapping. The database handles the translation.

Try It Yourself on FreeSQL

The complete script is available to run on FreeSQL. Click the button below, and you’ll have everything set up: domains, tables, sample data, and both duality views. You can modify the queries, try inserting invalid emails to see domain validation in action, and experiment with the JSON interface.

https://freesql.com/embedded/?layout=vertical&compressed_code=H4sIAAAAAAAAE61YbW%252FiuBb%252Bnl9xLlopsBO4JLy0pRqpKXGnzKbAJmlHnTt7kZu4xdMQIye00x31v1%252FZDpAAYbTSzQdIfI7tx%252Bc8x36S6a33CYGHhvdDF12OxueaFnG2hIgtME2APgL5QdMsBbLANE7PK6xheK5pISc4I2uj6gA4hRfMwznmVt1stxtayJI045gmmXKZhfNnCOckfIY6J0%252Fkx3IW02cCdTWAAfp%252F69%252F8D41vF%252BqvJf9%252B0xsNLaLpMsZvELNXwnP%252FhsZ4RDjATitOEpbhjLIkhbpD0pDTpXgC3c6xAo4iTtIUXmk2B5xjKsB9ZBwSvCBwsV5jxDKotxqQsgXJ5jR50ht7cQjDcgzOSiEIw8r1h6FY%252B7foQ%252F1b9KHxu1xx9SqGnEQ0gxDzCGrJavFAeK1yJQmDCKdzkhriNl3ikKT%252FktgdbzKFwL50USG7MqIpDG1%252FaDsIhpOxH3j2aBz4sFzxJ3J%252BuFu4SjO2%252BEVPrdmEoQoYhgw%252FxAQyBmnGOFHzQoQzrA09ZAdoO0XCsh10dQ0AgEYwvr25RJ4hH5ecRaswm%252B00yy6zSEwajG6QH9g3U2VZY97tkbEMx7MXHK9I3l7vG1ajOFw6p8slieByMnGRPVamV8w5TrI3GI0D5N3ZLtwj24NgAjeTcXCtNY5HYA3nl0HYxvpgHB4pT7OZJO%252Bd7Q2vbU9Vo7LG%252BIgxYg%252Fg2AFST6pS8tKULeuq2XS1Nl3%252FpsviiHnrcs4SMlMULfbKzaEk8kwSOQxV23dGExLtJAwcdGXfugH49%252F5OFp9YHM02scsTsvG%252Fsl0%252FX86WkJCQ102XdLZ8hqk3urG9e%252FgD3UOdRo11ruwoElsBoU8JPJO3Yl1lTA6TM1LmUrPdAHl5znKD7ThQL0ytmg%252FN2Tj%252FB%252F0fn%252BFq4qHRp7HqX%252BByAzx0hTw0HiK%252FyBU1haaNxj7yAkHSSdlsFKhjbIliCFoYigfGmgKGSLhRyq9RTGdDu7PdW%252BQD1E0DdDumIdEN0C85e010Q7IMdPPsrN1sm822KWxYOLUehMcF%252BYEXy5i0QrYQJtPqwA1exgT8jBOS5W3dnrjp9XpN8SDuu%252Bb2arfbbT1nmrjqHQGAPfwKh1ypuQuh0%252FkFhF7%252F5HQXgrhKELoG6MMY8zwcn1lC0mMwTk9PT3eRWJYFl2zFH1hSheXktNc%252FgKUcjp4BuoNfaCQ8%252FQXN5seQdE56e1kxTfiC42SVVeHoWJZ5AIdVwtE3QEerJA%252FJFxrHFC%252BORqV70t3D0oEJfq7C0e0cxNEp4TgxQL%252FiOHkWnkP24xiCs9OTvbR04JqGz4y%252FVaE4Oz05lJVuCcWpAfonwvgTxRJHTBYkORqNk%252F5%252BZixA8aKyWAQr93H0SjjODNCv82B4b%252FhosXS6vf6BhExpUlktlmVZBzD0SxjMtgH6iNNUglAEqa7Yzl7JWiZ4cxaxiCQRr66WTqfTOYDlpIxF7GOfcSgD8onGx%252BvW7Ha63T1AJlxSHs4rWdrt7m1ipmmelnFYBujuKnwTnu5bEh6tWrPf6%252BzVSgc8Er0yFlVuZb1e7wCQszIQsZ3eYC4Dcs9WydMvAmIdqJjhnPDqgun3%252B4dwtPWGdl4%252ByNbCUJxihbPQKKhCoyAFjaLIM8qiztgIufUhpkkqtg0wDTDbplFWIdBpt1vCmqziWP3KMNWlq0jXXp9mRwxmnbXOzgzIuIBAk4zwFxyD3tPhjWAul5irwIkHHpq69hCJU3%252BI4LM%252FGYOHXDsYTca2C86t7Y6Ce7gboS8wvPWDyQ3y%252FJlzB2D7oEHhoL%252BgSUp4BherpdRYFxGJSUa0nzKzQgvLawA0Usm%252BEqJgLBTj2lKQCdLDxSUHGBTEg3RwcEYmj5eUZ%252FPcQUgKaUJSZG6vQS40pNHO9ebWuJYf0vyVLgFKff%252Bmy63uHCvZqSxFpVLQnkMhPXOXgnwpKFGBfe1R1KYb8elnOFula5eSHNXeKwWXyI1kXwp1%252FWdtRqPaACwDaptw1wZQ%252B0wXNQNq6wDXBjUpXESbDJzw%252BU4X%252B6JJeOTREz7dXr8kXoT5K13WBiDr7V29y7LFgmZK%252BOI4IzzBGX0h8Inj5fxPF9K3JMM%252FQGs292l5lJCbgszFa%252FQCco7KF1w1GomlEcfgrHBMsze4o%252BQVUrbiIYngkbMFNJsbtgNOIph4DvJ8vSHFuy9%252BCy96af78M%252F8%252FQPit4Tjtt35Hyb91O0LlrdNBQhcd8m1usL6pKuZtj%252F9A4WF38etrIkYbOZtZy5FYX1O1leZ%252Bg%252BLWWjXmpngGxc33gHcgtuM7%252BcotvYvbc9Xgfv4Svhk838B3%252FN8Lz3%252Fl9%252B%252FnsnzlT7MJGccRXXPNceHf4P%252FpKus%252FpPqajTNFxJlzpw7M%252FwfXK5gudngfuWgYKGg%252FN0e0PqORLja21sjZHtz6htnKdjXy%252FGA2tm9QwWfNauXi2vseOaGVg%252B04HvL9gvkrXSrT19G00Kx4KyybNpAsPbyCTb%252BcoaIja40c%252BDIKrmE8uZ1uP1eU%252FDdcVT2m3sS5HQazYhzKYwumKl8Z2FnFuFueKudgEtjuTMqEqpFzmhYH969H0ylySh3eS09X3uQmTzGwvYHV8iE%252FWFQQwEEuClDZ98s18hAAa214OXLgoyREIQF%252FaVsAcuIt08S33nz8A7t4qPkogLAlvljBR%252FiesmSWcZykj4wv1McpYcoP8%252BkUjR3Qf2utSfAxT%252FcmuwPTEqKylDyrLQRvKUdl%252BbWTE7Mn1Ni71tDU4hW6ljxh4aN5vj3mUhKTMIPfVXUdWB6D1zkR3yZLQ4B5rjWbaDL8H0yub9RQFwAA&code_language=PL_SQL&code_format=false

What I Love About This Approach

Domains eliminate copy-paste constraints. In a real production schema, you might have emails in five different tables (customers, employees, vendors, contacts, users). With domains, the validation regex lives in one place. Change it once, and every column using that domain picks up the update.

Annotations are self-documenting schemas. You can query USER_ANNOTATIONS_USAGE to discover what every domain, table, and column does. No more hunting through Confluence pages or README files to understand what a column means.

Duality Views solve the ORM problem at the database level. Your frontend developers can work with clean JSON documents. Your DBAs can work with normalized relational tables. Both see the same data, and the database keeps them in sync. No impedance mismatch, no complex mapping layer, no stale caches.

The fact that you can now experience all of this directly in your browser through FreeSQL makes it incredibly easy to learn and prototype. Select the 23ai engine, and all these features are available immediately.

Regards
Osama

Kubernetes in the Multi-cloud: Orchestrating Workloads Across AWS and OCI

Why Multicloud Kubernetes Is No Longer Optional

The conversation has shifted. Running Kubernetes on a single cloud provider was once considered best practice simpler networking, unified IAM, one support contract. But modern enterprise reality tells a different story.

Vendor lock-in risk, regional compliance mandates, cost arbitrage opportunities, and resilience requirements are pushing engineering teams to operate Kubernetes clusters across multiple clouds simultaneously. Among the most compelling combinations today is AWS (EKS) paired with Oracle Cloud Infrastructure (OCI/OKE) two providers with fundamentally different strengths that, when combined, can form a genuinely powerful platform.

This post walks through the architectural decisions, tooling choices, and operational patterns for running a production-grade multicloud Kubernetes setup spanning AWS EKS and OCI OKE.

Understanding What Each Cloud Brings

Before designing a multicloud strategy, you need to be honest about why you’re using each provider not just “for redundancy.”

AWS EKS is mature, battle-tested, and has the richest ecosystem of Kubernetes-native tooling. Its managed node groups, Karpenter autoscaler, and deep integration with IAM Roles for Service Accounts (IRSA) make it a natural fit for compute-heavy, stateless microservices. The tradeoff: cost can escalate fast at scale.

OCI OKE (Oracle Container Engine for Kubernetes) is increasingly competitive on price, particularly for compute and egress and has genuine strengths in Oracle Database integrations, bare metal instances, and deterministic network performance via its RDMA fabric. For workloads that touch Oracle DB, Exadata, or need high-throughput interconnects, OKE is not just a fallback, it’s the right tool.

The insight that unlocks a real multicloud strategy: stop treating one cloud as primary and the other as DR. Design for active-active.

The Core Architecture

A production multicloud Kubernetes setup across EKS and OKE requires solving four problems:

  1. Cluster federation or virtual cluster abstraction
  2. Cross-cloud networking
  3. Unified identity and secrets management
  4. Consistent GitOps delivery

Let’s break each down.


1. Cluster Federation: Choosing Your Control Plane Philosophy

There are two schools of thought:

Option A Independent clusters, unified GitOps (recommended) Each cluster (EKS, OKE) is fully autonomous. A GitOps tool typically Flux or Argo CD manages both from a single source of truth. No shared control plane exists between clusters. Workloads are deployed to each cluster independently based on targeting labels or Kustomize overlays.

Option B Virtual Cluster Mesh (Liqo, Admiralty, or Karmada) Tools like Karmada introduce a meta-control plane that federates multiple clusters. You submit workloads to the Karmada API server, and it distributes them across member clusters based on propagation policies.

For most teams, Option A is the right starting point. Karmada adds power but also operational complexity. The GitOps approach keeps blast radius contained a misconfiguration in one cluster doesn’t cascade.


2. Cross-Cloud Networking: The Hard Problem

Kubernetes pods in EKS can’t natively reach pods in OKE, and vice versa. You need a data plane that spans both clouds.

Recommended approach: WireGuard-based mesh with Cilium Cluster Mesh

Cilium’s Cluster Mesh feature allows pods across clusters to communicate using their native pod IPs, with WireGuard encryption in transit. The setup requires:

  • Each cluster runs Cilium as its CNI (replacing the default VPC CNI on EKS and the flannel-based CNI on OKE)
  • A ClusterMesh resource is created linking the two API servers
  • Cross-cluster ServiceExport and ServiceImport resources (via the Kubernetes MCS API) expose services across the mesh

On the infrastructure layer, you need an encrypted tunnel between your AWS VPC and OCI VCN. Options:

  • Site-to-site VPN (quickest to set up, ~1.25 Gbps cap)
  • AWS Direct Connect + OCI FastConnect (for production private, dedicated bandwidth)
  • Overlay via Tailscale or Netbird (great for dev/staging multicloud setups, not production-grade for high-throughput)

yaml

# Example: Cilium ClusterMesh config snippet
apiVersion: cilium.io/v2alpha1
kind: CiliumClusterwideNetworkPolicy
metadata:
name: allow-cross-cluster-services
spec:
endpointSelector: {}
ingress:
- fromEndpoints:
- matchLabels:
io.cilium.k8s.policy.cluster: oci-oke-prod

3. Unified Identity: IRSA on AWS, Workload Identity on OCI

This is where multicloud gets philosophically interesting. Each cloud has its own identity system, and they don’t speak the same language.

On AWS (EKS): Use IRSA (IAM Roles for Service Accounts). Your pod’s service account is annotated with an IAM role ARN. The Pod Identity Webhook injects environment variables that allow the AWS SDK to exchange a projected service account token for temporary AWS credentials.

On OCI (OKE): Use OCI Workload Identity, introduced in recent OKE versions. It works analogously to IRSA a Kubernetes service account is bound to an OCI Dynamic Group and IAM policy, and the pod receives a workload identity token that can be exchanged for OCI API credentials.

The challenge: your application code should not need to know which cloud it’s running on. Use a secrets abstraction layer.

External Secrets Operator (ESO) elegantly solves this. Deploy ESO on both clusters. Point the EKS instance at AWS Secrets Manager; point the OKE instance at OCI Vault. Your application consumes a SecretStore resource with a consistent name. ESO handles the transparent fetching of backend-specific credentials.

# SecretStore on EKS (AWS Secrets Manager backend)
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: app-secrets
spec:
provider:
aws:
service: SecretsManager
region: us-east-1
auth:
jwt:
serviceAccountRef:
name: external-secrets-sa
---
# SecretStore on OKE (OCI Vault backend) same name, different spec
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: app-secrets
spec:
provider:
oracle:
vault: ocid1.vault.oc1...
region: us-ashburn-1
auth:
workloadIdentity: {}
```
Your application's `ExternalSecret` resources reference `app-secrets` in both environments the YAML is identical.
### 4. GitOps: One Repository, Multiple Targets
Use **Argo CD ApplicationSets** or **Flux's `Kustomization` with cluster selectors** to manage both clusters from a monorepo.
A typical repo layout:
```
/clusters
/eks-us-east-1
kustomization.yaml # EKS-specific patches
/oke-us-ashburn-1
kustomization.yaml # OKE-specific patches
/base
/apps
deployment.yaml
service.yaml
/infra
external-secrets.yaml
cilium-config.yaml

Flux’s Kustomization resource lets you target specific clusters using the cluster’s kubeconfig context or label selectors. Argo CD’s ApplicationSet with a list generator can enumerate your clusters and deploy the same app with environment-specific values.

The key rule: the base layer must be cloud-agnostic. Patches in cluster-specific overlays handle anything that diverges storage classes, ingress annotations, node selectors.


Observability Across Clouds

A multicloud cluster setup with no unified observability is an incident waiting to happen.

Recommended stack:

  • Prometheus + Thanos for metrics each cluster runs Prometheus; Thanos Sidecar ships blocks to object storage (S3 on AWS, OCI Object Storage on OCI); Thanos Querier federates across both
  • Grafana with both Thanos endpoints as datasources single pane of glass
  • OpenTelemetry Collector deployed as a DaemonSet on each cluster, shipping traces to a common backend (Grafana Tempo, Jaeger, or Honeycomb)
  • Loki for logs, with agents on each cluster shipping to a common Loki instance

Label discipline is critical: ensure every metric, trace, and log carries cluster, cloud_provider, and region labels from the source. Without this, correlation during incidents across clouds becomes extremely difficult.


Cost Management: The Overlooked Dimension

Multicloud adds a new cost vector: egress. Data leaving AWS costs money. Data entering OCI is free. Cross-cloud service calls that seemed free in a single-cloud setup now carry per-GB charges.

Practical rules:

  • Colocate tightly coupled services in the same cluster/cloud don’t split microservices that call each other thousands of times per second across clouds
  • Use Cilium’s network policy to audit cross-cluster traffic volume before enabling services in the mesh
  • Consider OCI’s free egress to the internet for user-facing workloads where latency to OCI regions is acceptable
  • Tag every namespace with cost center labels and use Kubecost or OpenCost deployed on each cluster with a shared object storage backend for unified cost attribution

Operational Runbook Considerations

A few things that will bite you if not planned for:

Clock skew: mTLS certificates and OIDC token validation are sensitive to time drift. Ensure NTP is configured identically on all nodes across both clouds. A 5-minute clock skew will silently break IRSA on EKS and workload identity on OKE.

DNS: Use ExternalDNS on both clusters pointing to a shared DNS provider (Route 53, Cloudflare). Services that need cross-cloud discoverability get DNS entries automatically on deploy.

Cluster upgrades: EKS and OKE release Kubernetes versions on different schedules. Maintain a maximum one-minor-version skew between clusters. Use a canary upgrade pattern: upgrade your OKE cluster first (typically lower blast radius), validate for 48 hours, then upgrade EKS.

Node image parity: Your application containers are cloud-agnostic, but your node OS images are not. Use Bottlerocket on EKS and Oracle Linux 8 on OKE both are minimal, hardened, and have predictable patching cycles.


When NOT to Do This

Multicloud Kubernetes is a force multiplier but only if your team has the operational maturity to support it.

Don’t pursue this architecture if:

  • Your team is still stabilizing single-cluster Kubernetes operations
  • Your workloads have no actual cross-cloud requirement (cost, compliance, or resilience)
  • You lack dedicated platform engineering capacity to maintain the toolchain
  • Your application isn’t designed for network partitioning tolerance

A well-run single-cloud EKS or OKE setup will outperform a poorly-run multicloud one every time. Add complexity only when you’ve exhausted simpler options.


Closing Thoughts

The multicloud Kubernetes story has matured considerably. Tools like Cilium Cluster Mesh, External Secrets Operator, Karmada, and OpenTelemetry have closed most of the operational gaps that made this approach impractical two years ago.

The AWS + OCI combination in particular is underrated. AWS brings ecosystem breadth; OCI brings pricing, Oracle database integration, and a network fabric that punches above its weight. For the right workloads and with the right tooling discipline the combination is genuinely compelling.

The architecture isn’t magic. It’s plumbing. But when it’s done right, it disappears and your developers ship to two clouds the same way they ship to one.


Have questions about multicloud Kubernetes design or EKS/OKE specifics? Reach out or leave a comment below.

Building Event-Driven Microservices on AWS with Amazon EventBridge

We had built this beautiful system. Fifteen microservices, each with its own database, deployed on EKS. Textbook architecture. The problem? Every service was calling every other service directly. When the order service needed to notify inventory, shipping, notifications, and analytics, it made four synchronous HTTP calls. If any of those services were slow or down, the order service suffered.

We had built a distributed monolith. All the complexity of microservices with none of the benefits.

The solution was event-driven architecture. Instead of services calling each other, they publish events. Other services subscribe to the events they care about. The order service publishes “OrderCreated” and moves on. It doesn’t know or care who’s listening.

Amazon EventBridge is AWS’s answer to this pattern. It’s not just another message queue. It’s a serverless event bus that connects your applications, AWS services, and SaaS applications using events. And honestly, it’s changed how I think about building systems.

In this article, I’ll walk you through building a production-grade event-driven architecture on AWS. We’ll cover EventBridge fundamentals, event design, error handling, observability, and patterns I’ve learned from running this in production.

Why Event-Driven? Why Now?

Before we dive into implementation, let’s talk about why you’d want this architecture in the first place.

Loose Coupling: Services don’t need to know about each other. The order service doesn’t import the inventory service SDK. It just publishes events.

Resilience: If the notification service is down, orders still get processed. Notifications catch up when the service recovers.

Scalability: Each service scales independently. Black Friday traffic might hammer your order service, but your reporting service can process events at its own pace.

Extensibility: Need to add fraud detection? Just subscribe to OrderCreated events. No changes to the order service required.

Auditability: Events create a natural audit trail. You can replay them, analyze them, debug issues by looking at what happened.

The trade-off? Eventual consistency. If you need strong consistency across services, synchronous calls might still be necessary. But in my experience, most business processes are naturally asynchronous. Customers don’t expect their loyalty points to update in the same millisecond as their order confirmation.

Architecture Overview

Step 1: Design Your Events First

This is where most teams go wrong. They start building services and figure out events later. But events are your contract. They’re the API between your services. Design them carefully.

Event Structure

EventBridge events follow a standard structure:

{
"version": "0",
"id": "12345678-1234-1234-1234-123456789012",
"detail-type": "Order Created",
"source": "com.mycompany.orders",
"account": "123456789012",
"time": "2025-03-05T10:30:00Z",
"region": "us-east-1",
"resources": [],
"detail": {
"orderId": "ORD-12345",
"customerId": "CUST-67890",
"items": [
{
"productId": "PROD-111",
"quantity": 2,
"price": 29.99
}
],
"totalAmount": 59.98,
"currency": "USD",
"shippingAddress": {
"country": "US",
"state": "CA",
"city": "San Francisco",
"zipCode": "94102"
},
"metadata": {
"correlationId": "req-abc123",
"version": "1.0"
}
}
}

Event Design Principles

Be Specific with detail-type: Don’t use generic types like “OrderEvent”. Use “Order Created”, “Order Shipped”, “Order Cancelled”. This makes routing rules cleaner.

Include What Consumers Need: Think about who will consume this event. The notification service needs customer email. The analytics service needs order value. Include enough data that consumers don’t need to call back to the producer.

But Don’t Include Everything: Don’t embed entire database records. Include identifiers and key attributes. If a consumer needs the full customer profile, they can fetch it.

Version Your Events: Include a version in metadata. When you need to change the schema, you can route different versions to different handlers.

Add Correlation IDs: For distributed tracing, include a correlation ID that follows the request through all services.

Create Event Schemas

EventBridge has a schema registry. Use it. It provides documentation, code generation, and validation.

# Create schema registry
aws schemas create-registry \
--registry-name my-company-events \
--description "Event schemas for our microservices"

Define schemas using JSON Schema or OpenAPI:

{
"openapi": "3.0.0",
"info": {
"title": "OrderCreated",
"version": "1.0.0"
},
"paths": {},
"components": {
"schemas": {
"OrderCreated": {
"type": "object",
"required": ["orderId", "customerId", "totalAmount"],
"properties": {
"orderId": {
"type": "string",
"pattern": "^ORD-[0-9]+$"
},
"customerId": {
"type": "string"
},
"totalAmount": {
"type": "number",
"minimum": 0
},
"currency": {
"type": "string",
"enum": ["USD", "EUR", "GBP"]
}
}
}
}
}
}

Step 2: Set Up EventBridge Infrastructure

Let’s create the EventBridge infrastructure using Terraform. I prefer Terraform over CloudFormation for this because the syntax is cleaner and it’s easier to manage across multiple AWS accounts.

Create the Event Bus

# eventbridge.tf
# Create custom event bus (don't use default for production)
resource "aws_cloudwatch_event_bus" "main" {
name = "mycompany-events"
tags = {
Environment = "production"
Team = "platform"
}
}
# Event bus policy - allow other accounts to put events
resource "aws_cloudwatch_event_bus_policy" "main" {
event_bus_name = aws_cloudwatch_event_bus.main.name
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "AllowAccountsToPutEvents"
Effect = "Allow"
Principal = {
AWS = [
"arn:aws:iam::111111111111:root", # Dev account
"arn:aws:iam::222222222222:root" # Staging account
]
}
Action = "events:PutEvents"
Resource = aws_cloudwatch_event_bus.main.arn
}
]
})
}
# Archive for event replay (critical for debugging)
resource "aws_cloudwatch_event_archive" "main" {
name = "mycompany-events-archive"
event_source_arn = aws_cloudwatch_event_bus.main.arn
retention_days = 30
# Archive all events
event_pattern = jsonencode({
source = [{ prefix = "com.mycompany" }]
})
}

Create Event Rules

Rules determine which events go where. This is where EventBridge really shines. The pattern matching is incredibly powerful.

# Order events to inventory service
resource "aws_cloudwatch_event_rule" "order_to_inventory" {
name = "order-created-to-inventory"
event_bus_name = aws_cloudwatch_event_bus.main.name
event_pattern = jsonencode({
source = ["com.mycompany.orders"]
detail-type = ["Order Created"]
})
tags = {
Service = "inventory"
}
}
resource "aws_cloudwatch_event_target" "inventory_lambda" {
rule = aws_cloudwatch_event_rule.order_to_inventory.name
event_bus_name = aws_cloudwatch_event_bus.main.name
target_id = "inventory-processor"
arn = aws_lambda_function.inventory_processor.arn
# Retry configuration
retry_policy {
maximum_event_age_in_seconds = 3600 # 1 hour
maximum_retry_attempts = 3
}
# Dead letter queue for failed events
dead_letter_config {
arn = aws_sqs_queue.inventory_dlq.arn
}
}
# High-value orders get special handling
resource "aws_cloudwatch_event_rule" "high_value_orders" {
name = "high-value-orders"
event_bus_name = aws_cloudwatch_event_bus.main.name
# Content-based filtering - only orders over $1000
event_pattern = jsonencode({
source = ["com.mycompany.orders"]
detail-type = ["Order Created"]
detail = {
totalAmount = [{ numeric = [">=", 1000] }]
}
})
}
resource "aws_cloudwatch_event_target" "fraud_check" {
rule = aws_cloudwatch_event_rule.high_value_orders.name
event_bus_name = aws_cloudwatch_event_bus.main.name
target_id = "fraud-check"
arn = aws_sfn_state_machine.fraud_check.arn
role_arn = aws_iam_role.eventbridge_sfn.arn
}

Advanced Pattern Matching

EventBridge supports sophisticated pattern matching. Here are patterns I use frequently:

# Match events from multiple sources
event_pattern = jsonencode({
source = ["com.mycompany.orders", "com.mycompany.returns"]
})
# Match specific values in nested objects
event_pattern = jsonencode({
detail = {
shippingAddress = {
country = ["US", "CA", "MX"] # North America only
}
}
})
# Prefix matching
event_pattern = jsonencode({
detail = {
orderId = [{ prefix = "ORD-PRIORITY-" }]
}
})
# Exists check
event_pattern = jsonencode({
detail = {
promoCode = [{ exists = true }] # Only orders with promo codes
}
})
# Combine multiple conditions
event_pattern = jsonencode({
source = ["com.mycompany.orders"]
detail-type = ["Order Created"]
detail = {
totalAmount = [{ numeric = [">=", 100] }]
currency = ["USD"]
items = {
productId = [{ prefix = "DIGITAL-" }]
}
}
})

Step 3: Build Event Producers

Now let’s build services that publish events. I’ll show you a Python example since it’s common in AWS Lambda, but the patterns apply to any language.

Order Service (Producer)

# order_service/handler.py
import json
import boto3
import uuid
from datetime import datetime
from dataclasses import dataclass, asdict
from typing import List
eventbridge = boto3.client('events')
@dataclass
class OrderItem:
productId: str
quantity: int
price: float
@dataclass
class OrderCreatedEvent:
orderId: str
customerId: str
items: List[dict]
totalAmount: float
currency: str
shippingAddress: dict
metadata: dict
def create_order(event, context):
"""Handle order creation request."""
body = json.loads(event['body'])
# Generate order ID
order_id = f"ORD-{uuid.uuid4().hex[:8].upper()}"
# Calculate total
items = body['items']
total = sum(item['quantity'] * item['price'] for item in items)
# Save to database (simplified)
save_order_to_dynamodb(order_id, body)
# Create the event
order_event = OrderCreatedEvent(
orderId=order_id,
customerId=body['customerId'],
items=items,
totalAmount=total,
currency=body.get('currency', 'USD'),
shippingAddress=body['shippingAddress'],
metadata={
'correlationId': event['requestContext']['requestId'],
'version': '1.0',
'timestamp': datetime.utcnow().isoformat()
}
)
# Publish to EventBridge
publish_event(
source='com.mycompany.orders',
detail_type='Order Created',
detail=asdict(order_event)
)
return {
'statusCode': 201,
'body': json.dumps({
'orderId': order_id,
'status': 'created'
})
}
def publish_event(source: str, detail_type: str, detail: dict):
"""Publish event to EventBridge with error handling."""
try:
response = eventbridge.put_events(
Entries=[
{
'Source': source,
'DetailType': detail_type,
'Detail': json.dumps(detail),
'EventBusName': 'mycompany-events'
}
]
)
# Check for partial failures
if response['FailedEntryCount'] > 0:
failed = response['Entries'][0]
raise Exception(f"Failed to publish event: {failed['ErrorCode']} - {failed['ErrorMessage']}")
except Exception as e:
# Log the error but don't fail the order
# Consider sending to a fallback queue
print(f"Error publishing event: {e}")
send_to_fallback_queue(source, detail_type, detail)
def send_to_fallback_queue(source, detail_type, detail):
"""Send to SQS as fallback if EventBridge fails."""
sqs = boto3.client('sqs')
sqs.send_message(
QueueUrl=os.environ['FALLBACK_QUEUE_URL'],
MessageBody=json.dumps({
'source': source,
'detailType': detail_type,
'detail': detail
})
)

Batch Publishing for High Throughput

When you need to publish many events, batch them:

def publish_events_batch(events: List[dict]):
"""Publish multiple events efficiently."""
# EventBridge accepts up to 10 events per call
BATCH_SIZE = 10
entries = []
for event in events:
entries.append({
'Source': event['source'],
'DetailType': event['detail_type'],
'Detail': json.dumps(event['detail']),
'EventBusName': 'mycompany-events'
})
# Process in batches
failed_events = []
for i in range(0, len(entries), BATCH_SIZE):
batch = entries[i:i + BATCH_SIZE]
response = eventbridge.put_events(Entries=batch)
if response['FailedEntryCount'] > 0:
for idx, entry in enumerate(response['Entries']):
if 'ErrorCode' in entry:
failed_events.append({
'event': batch[idx],
'error': entry['ErrorCode']
})
return failed_events

Step 4: Build Event Consumers

Consumers are typically Lambda functions, but can also be Step Functions, SQS queues, API destinations, or other AWS services.

Inventory Service (Consumer)

# inventory_service/handler.py
import json
import boto3
from decimal import Decimal
dynamodb = boto3.resource('dynamodb')
inventory_table = dynamodb.Table('inventory')
def process_order_created(event, context):
"""
Process OrderCreated events to update inventory.
EventBridge invokes this Lambda with the full event envelope.
"""
# Extract the event detail
detail = event['detail']
order_id = detail['orderId']
items = detail['items']
correlation_id = detail['metadata']['correlationId']
print(f"Processing order {order_id} (correlation: {correlation_id})")
try:
# Reserve inventory for each item
for item in items:
reserve_inventory(
product_id=item['productId'],
quantity=item['quantity'],
order_id=order_id
)
# Publish success event
publish_event(
source='com.mycompany.inventory',
detail_type='Inventory Reserved',
detail={
'orderId': order_id,
'status': 'reserved',
'items': items,
'metadata': {
'correlationId': correlation_id
}
}
)
except InsufficientInventoryError as e:
# Publish failure event
publish_event(
source='com.mycompany.inventory',
detail_type='Inventory Reservation Failed',
detail={
'orderId': order_id,
'reason': str(e),
'failedItems': e.failed_items,
'metadata': {
'correlationId': correlation_id
}
}
)
# Don't raise - we've handled it by publishing an event
return {'status': 'failed', 'reason': str(e)}
return {'status': 'success'}
def reserve_inventory(product_id: str, quantity: int, order_id: str):
"""
Atomically reserve inventory using DynamoDB conditional writes.
"""
try:
inventory_table.update_item(
Key={'productId': product_id},
UpdateExpression='''
SET availableQuantity = availableQuantity - :qty,
reservedQuantity = reservedQuantity + :qty,
lastUpdated = :now
ADD reservations :reservation
''',
ConditionExpression='availableQuantity >= :qty',
ExpressionAttributeValues={
':qty': quantity,
':now': datetime.utcnow().isoformat(),
':reservation': {order_id}
}
)
except dynamodb.meta.client.exceptions.ConditionalCheckFailedException:
raise InsufficientInventoryError(
f"Insufficient inventory for {product_id}",
failed_items=[product_id]
)

Notification Service with Step Functions

For complex workflows, use Step Functions as the EventBridge target:

{
"Comment": "Process order notifications with multiple channels",
"StartAt": "DetermineNotificationChannels",
"States": {
"DetermineNotificationChannels": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.detail.totalAmount",
"NumericGreaterThanEquals": 500,
"Next": "HighValueOrderNotifications"
}
],
"Default": "StandardNotifications"
},
"HighValueOrderNotifications": {
"Type": "Parallel",
"Branches": [
{
"StartAt": "SendEmail",
"States": {
"SendEmail": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789:function:send-email",
"End": true
}
}
},
{
"StartAt": "SendSMS",
"States": {
"SendSMS": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789:function:send-sms",
"End": true
}
}
},
{
"StartAt": "NotifyAccountManager",
"States": {
"NotifyAccountManager": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789:function:slack-notify",
"End": true
}
}
}
],
"Next": "RecordNotificationsSent"
},
"StandardNotifications": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789:function:send-email",
"Next": "RecordNotificationsSent"
},
"RecordNotificationsSent": {
"Type": "Task",
"Resource": "arn:aws:states:::dynamodb:putItem",
"Parameters": {
"TableName": "notification-log",
"Item": {
"orderId": {"S.$": "$.detail.orderId"},
"notifiedAt": {"S.$": "$$.State.EnteredTime"},
"channels": {"S": "email,sms"}
}
},
"End": true
}
}
}

Step 5: Handle Failures Gracefully

Things will fail. Networks are unreliable. Services go down. Your event-driven architecture needs to handle this gracefully.

Dead Letter Queues

Always configure DLQs for your event rules:

# DLQ for inventory service
resource "aws_sqs_queue" "inventory_dlq" {
name = "inventory-events-dlq"
message_retention_seconds = 1209600 # 14 days
tags = {
Service = "inventory"
Purpose = "dead-letter-queue"
}
}
# Alarm when messages hit DLQ
resource "aws_cloudwatch_metric_alarm" "inventory_dlq_alarm" {
alarm_name = "inventory-dlq-messages"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 1
metric_name = "ApproximateNumberOfMessagesVisible"
namespace = "AWS/SQS"
period = 300
statistic = "Sum"
threshold = 0
alarm_description = "Messages in inventory DLQ"
dimensions = {
QueueName = aws_sqs_queue.inventory_dlq.name
}
alarm_actions = [aws_sns_topic.alerts.arn]
}

DLQ Processor

Create a Lambda to process DLQ messages:

# dlq_processor/handler.py
import json
import boto3
eventbridge = boto3.client('events')
sqs = boto3.client('sqs')
def process_dlq(event, context):
"""
Process messages from DLQ.
Attempt to republish or escalate.
"""
for record in event['Records']:
message = json.loads(record['body'])
# Parse the original event
original_event = json.loads(message.get('detail', '{}'))
failure_reason = message.get('errorMessage', 'Unknown')
receipt_handle = record['receiptHandle']
# Get retry count from message attributes
retry_count = int(
record.get('messageAttributes', {})
.get('RetryCount', {})
.get('stringValue', '0')
)
if retry_count < 3:
# Try to republish with delay
try:
reprocess_event(original_event, retry_count + 1)
delete_from_dlq(record['eventSourceARN'], receipt_handle)
except Exception as e:
print(f"Retry failed: {e}")
else:
# Max retries exceeded - escalate
escalate_to_operations(original_event, failure_reason)
move_to_permanent_failure_queue(record)
def escalate_to_operations(event, reason):
"""Alert operations team about permanent failure."""
sns = boto3.client('sns')
sns.publish(
TopicArn=os.environ['OPS_ALERT_TOPIC'],
Subject='Event Processing Failure - Manual Intervention Required',
Message=json.dumps({
'event': event,
'reason': reason,
'action_required': 'Manual review and potential data reconciliation'
}, indent=2)
)

Idempotency

Events can be delivered more than once. Your consumers must handle this:

import hashlib
def process_order_created(event, context):
"""Idempotent event processor."""
detail = event['detail']
# Create idempotency key from event ID
event_id = event['id']
# Check if we've already processed this event
if is_already_processed(event_id):
print(f"Event {event_id} already processed, skipping")
return {'status': 'duplicate'}
try:
# Process the event
result = do_actual_processing(detail)
# Mark as processed
mark_as_processed(event_id, result)
return result
except Exception as e:
# Don't mark as processed on failure - allow retry
raise
def is_already_processed(event_id: str) -> bool:
"""Check DynamoDB for processed event."""
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('processed-events')
response = table.get_item(Key={'eventId': event_id})
return 'Item' in response
def mark_as_processed(event_id: str, result: dict):
"""Record that we processed this event."""
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('processed-events')
table.put_item(
Item={
'eventId': event_id,
'processedAt': datetime.utcnow().isoformat(),
'result': result,
'ttl': int((datetime.utcnow() + timedelta(days=7)).timestamp())
}
)

Step 6: Observability

You can’t manage what you can’t see. Event-driven architectures need excellent observability.

CloudWatch Metrics

EventBridge publishes metrics automatically, but add custom metrics for business events:

import boto3
cloudwatch = boto3.client('cloudwatch')
def publish_business_metrics(event_type: str, properties: dict):
"""Publish custom business metrics."""
cloudwatch.put_metric_data(
Namespace='MyCompany/Events',
MetricData=[
{
'MetricName': 'EventsProcessed',
'Dimensions': [
{'Name': 'EventType', 'Value': event_type},
{'Name': 'Service', 'Value': 'inventory'}
],
'Value': 1,
'Unit': 'Count'
},
{
'MetricName': 'OrderValue',
'Dimensions': [
{'Name': 'Currency', 'Value': properties.get('currency', 'USD')}
],
'Value': properties.get('totalAmount', 0),
'Unit': 'None'
}
]
)

Distributed Tracing with X-Ray

Enable X-Ray tracing across your event-driven services:

from aws_xray_sdk.core import xray_recorder
from aws_xray_sdk.core import patch_all
# Patch all supported libraries
patch_all()
@xray_recorder.capture('process_order_created')
def process_order_created(event, context):
# Add correlation ID as annotation
correlation_id = event['detail']['metadata']['correlationId']
xray_recorder.current_subsegment().put_annotation('correlationId', correlation_id)
# Your processing logic
with xray_recorder.in_subsegment('reserve_inventory'):
reserve_inventory(event['detail']['items'])
with xray_recorder.in_subsegment('publish_event'):
publish_event(...)

CloudWatch Dashboard

Create a dashboard for your event-driven system:

resource "aws_cloudwatch_dashboard" "events" {
dashboard_name = "event-driven-system"
dashboard_body = jsonencode({
widgets = [
{
type = "metric"
x = 0
y = 0
width = 12
height = 6
properties = {
title = "Events Published"
region = "us-east-1"
metrics = [
["AWS/Events", "Invocations", "EventBusName", "mycompany-events"]
]
period = 60
stat = "Sum"
}
},
{
type = "metric"
x = 12
y = 0
width = 12
height = 6
properties = {
title = "Failed Invocations"
region = "us-east-1"
metrics = [
["AWS/Events", "FailedInvocations", "EventBusName", "mycompany-events"]
]
period = 60
stat = "Sum"
}
},
{
type = "metric"
x = 0
y = 6
width = 24
height = 6
properties = {
title = "Event Processing Latency by Service"
region = "us-east-1"
metrics = [
["AWS/Lambda", "Duration", "FunctionName", "inventory-processor"],
["AWS/Lambda", "Duration", "FunctionName", "notification-processor"],
["AWS/Lambda", "Duration", "FunctionName", "analytics-processor"]
]
period = 60
stat = "Average"
}
}
]
})
}

Step 7: Testing Event-Driven Systems

Testing event-driven architectures requires different strategies than traditional synchronous systems.

Unit Testing Event Handlers

# test_inventory_handler.py
import pytest
from unittest.mock import patch, MagicMock
from inventory_service.handler import process_order_created
@pytest.fixture
def order_created_event():
return {
'id': 'test-event-123',
'source': 'com.mycompany.orders',
'detail-type': 'Order Created',
'detail': {
'orderId': 'ORD-TEST',
'customerId': 'CUST-123',
'items': [
{'productId': 'PROD-1', 'quantity': 2, 'price': 29.99}
],
'totalAmount': 59.98,
'metadata': {
'correlationId': 'req-test'
}
}
}
@patch('inventory_service.handler.reserve_inventory')
@patch('inventory_service.handler.publish_event')
def test_process_order_reserves_inventory(mock_publish, mock_reserve, order_created_event):
result = process_order_created(order_created_event, None)
assert result['status'] == 'success'
mock_reserve.assert_called_once_with(
product_id='PROD-1',
quantity=2,
order_id='ORD-TEST'
)
mock_publish.assert_called_once()
@patch('inventory_service.handler.reserve_inventory')
@patch('inventory_service.handler.publish_event')
def test_insufficient_inventory_publishes_failure(mock_publish, mock_reserve, order_created_event):
mock_reserve.side_effect = InsufficientInventoryError("Out of stock", ['PROD-1'])
result = process_order_created(order_created_event, None)
assert result['status'] == 'failed'
# Verify failure event was published
call_args = mock_publish.call_args
assert call_args[1]['detail_type'] == 'Inventory Reservation Failed'

Integration Testing with LocalStack

# test_integration.py
import boto3
import pytest
import json
@pytest.fixture(scope='session')
def localstack_eventbridge():
"""Set up LocalStack EventBridge for testing."""
client = boto3.client(
'events',
endpoint_url='http://localhost:4566',
region_name='us-east-1'
)
# Create test event bus
client.create_event_bus(Name='test-events')
yield client
# Cleanup
client.delete_event_bus(Name='test-events')
def test_event_routing(localstack_eventbridge):
"""Test that events are routed correctly."""
# Create a rule that sends to SQS for testing
localstack_eventbridge.put_rule(
Name='test-rule',
EventBusName='test-events',
EventPattern=json.dumps({
'source': ['com.mycompany.orders'],
'detail-type': ['Order Created']
})
)
# Publish test event
localstack_eventbridge.put_events(
Entries=[{
'Source': 'com.mycompany.orders',
'DetailType': 'Order Created',
'Detail': json.dumps({'orderId': 'TEST-123'}),
'EventBusName': 'test-events'
}]
)
# Verify event was received (check target queue)
# ...

Common Patterns and Anti-Patterns

Let me share some patterns I’ve learned from running event-driven systems in production.

Pattern: Event Sourcing Light

Store events alongside state changes for debugging:

def create_order(order_data):
order_id = generate_order_id()
# Save state
save_to_database(order_id, order_data)
# Also save the event
save_event({
'eventType': 'OrderCreated',
'entityId': order_id,
'data': order_data,
'timestamp': datetime.utcnow()
})
# Publish to EventBridge
publish_event(...)
```
### Pattern: Saga for Distributed Transactions
When you need coordination across services:
```
Order Created
└─> Inventory Reserved (success)
└─> Payment Processed (success)
└─> Order Confirmed
└─> Payment Failed
└─> Release Inventory (compensation)
└─> Order Cancelled
```
### Anti-Pattern: Event Chains
Avoid long chains where each service publishes an event that triggers the next:
```
# BAD: Long chain creates debugging nightmare
A -> B -> C -> D -> E
# BETTER: Use orchestration (Step Functions) for complex workflows
A -> Step Functions orchestrates B, C, D, E

Anti-Pattern: Giant Events

Don’t embed entire database records in events:

// BAD
{
"customer": {
"id": "123",
"name": "...",
"address": "...",
"creditHistory": [...], // 50KB of data
"orderHistory": [...] // Another 100KB
}
}
// GOOD
{
"customerId": "123",
"customerName": "John Doe" // Only what consumers need
}

Conclusion

Event-driven architecture with EventBridge has transformed how I build distributed systems. The decoupling is real. Services can be developed, deployed, and scaled independently. New capabilities can be added without touching existing services.

But it’s not magic. You need to think carefully about event design, handle failures gracefully, and invest in observability. The debugging story is different. You can’t just step through code. You need to trace events across services.

Start small. Pick one synchronous integration in your system and convert it to events. Feel the pain points. Build the tooling. Then expand.

The investment pays off. Systems become more resilient, more scalable, and paradoxically, simpler to understand once you internalize the patterns.

Regards,
Osama

Hands-On: Building a Vector Database Pipeline with OCI and Open-Source Embeddings

Introduction

Vector databases are rapidly becoming a central element in AI workflows: storing embeddings (numeric vector representations of text, images or other data) and enabling semantic similarity search. In this post you’ll walk through a hands-on example of building a vector-db pipeline on Oracle Database 23 ai (or Autonomous/AI Database on Oracle Cloud Infrastructure) that covers:

  1. Generating embeddings with an open-source model.
  2. Loading embeddings into the vector-enabled database.
  3. Constructing vector indexes and performing similarity queries.
  4. Integrating with metadata to produce hybrid search.
  5. Discussing performance, scalability, maintenance and best practices.

I’ve reviewed the articles on Osama’s blog—while he covers vector search in theory (data type, index, RAG) you’ll find this one emphasises step-by-step code, pipeline creation and hybrid-search use-case, so it should not overlap.

1. Pipeline Overview

Here’s the architecture of the pipeline we’ll build:

  • Data source: A set of documents (in this example, internal knowledge articles).
  • Embedding generation: Use an open-source sentence-transformer (e.g., all-MiniLM-L12-v2) to convert each document text → a vector of dimension 384.
  • Storage: Use Oracle’s VECTOR data type in a table that also holds metadata (title, date, department).
  • Indexing: Create a vector index (approximate nearest-neighbour) for fast similarity search.
  • Querying: Accept a search query (text), embed it, and run a similarity search among documents. Combine vector similarity with metadata filters (e.g., department = “Legal”).
  • Serving: Return top K results ranked by semantic similarity and metadata weight.

Here is a conceptual diagram:

Text documents → embedding model → store (id, metadata, vector) → build index  
Search query → embedding → query vector + metadata filter → results  

2. Setup & Embedding Generation

Prerequisites

  • Provision Oracle Database 23 ai / AI Database on OCI (or a sharded/VM setup supporting VECTOR type).
  • Ensure the database supports the VECTOR column type and vector indexing.
  • Python environment with sentence-transformers and cx_Oracle or oracledb driver.

Embedding generation (Python)

from sentence_transformers import SentenceTransformer
import oracledb

# Load model
model = SentenceTransformer('all-MiniLM-L12-v2')

# Sample documents
docs = [
    {"id": 1, "title": "Employee onboarding policy", "dept": "HR", "text": "..."},
    {"id": 2, "title": "Vendor contract guidelines", "dept": "Legal", "text": "..."},
    # … more rows
]

# Generate embeddings
for doc in docs:
    vec = model.encode(doc['text']).tolist()
    doc['embed'] = vec

# Connect to Oracle DB
conn = oracledb.connect(user="vector_usr", password="pwd", dsn="your_dsn")
cursor = conn.cursor()

# Create table
cursor.execute("""
  CREATE TABLE kb_documents (
    doc_id     NUMBER PRIMARY KEY,
    title      VARCHAR2(500),
    dept       VARCHAR2(100),
    content    CLOB,
    doc_vector VECTOR
  )
""")
conn.commit()

# Insert rows
for doc in docs:
    cursor.execute("""
      INSERT INTO kb_documents(doc_id, title, dept, content, doc_vector)
      VALUES(:1, :2, :3, :4, :5)
    """, (doc['id'], doc['title'], doc['dept'], doc['text'], doc['embed']))
conn.commit()

Why this matters

  • You store both business metadata (title, dept) and embedding (vector) in the same table — enabling hybrid queries (metadata + similarity).
  • Using a stable, open-source embedding model ensures reproducible vectors; you can later upgrade model version and re-embed to evolve.

3. Vector Indexing & Similarity Querying

Create vector index

Once vectors are stored, you create a vector index for fast search.

CREATE INDEX idx_kb_vector 
  ON kb_documents(doc_vector)
  INDEXTYPE IS vector_ann 
  PARAMETERS('distance_metric=cosine, dimension=384');

Running a query: semantic search + metadata filter

Suppose you want to search: “vendor termination risk” but only within dept = “Legal”.

query = "vendor termination risk"
query_vec = model.encode([query]).tolist()[0]

cursor.execute("""
  SELECT doc_id, title, dept, vector_distance(doc_vector, :qv) AS dist
  FROM kb_documents
  WHERE dept = 'Legal'
  ORDER BY vector_distance(doc_vector, :qv)
  FETCH FIRST 5 ROWS ONLY
""", {"qv": query_vec})

for row in cursor:
    print(row)

Explanation

  • vector_distance computes similarity (lower = more similar, for cosine-distance variant).
  • We combine a standard filter WHERE dept = 'Legal' with the vector search.
  • The result returns the closest (by meaning) documents among the “Legal” department.

4. Enhancements & Production Considerations

Chunking & embedding size

  • For large documents (e.g., whitepapers), chunk them into ~512 token segments before embedding; store each segment as a separate row with parent document id.
  • Maintain model_version column so you can know which embedding model was used.

Hybrid ranking

You may want to combine semantic similarity + recency or popularity. For example:

SELECT doc_id, title,
       vector_distance(doc_vector, :qv) * 0.7 + (extract(day from (sysdate - created_date))/365)*0.3 AS score
FROM kb_documents
WHERE dept = 'Legal'
ORDER BY score
FETCH FIRST 5 ROWS ONLY

Here you give 70% weight to semantic distance, 30% to longer-living documents (older documents get scored higher in this case). Adjust weights based on business logic.

Scaling

  • With millions of vectors, approximate nearest-neighbour (ANN) indexing is crucial; tune index parameters such as ef_search, nlist.
  • Monitor latency of vector_distance queries, and monitor index size/maintenance cost.
  • Consider sharding or partitioning the embedding table (by dept, date) if usage grows.

Maintenance

  • When you retrain or change model version: re-compute embeddings, drop and rebuild indexes.
  • Monitor performance drift: track metrics like top-K retrieval relevance, query latency, user feedback.
  • Maintain metadata hygiene: e.g., ensure each row has a valid dept, tag, creation date.

Regards
Osama

Advanced AWS Lambda Layer Optimization: Performance, Cost, and Deployment Strategies

Lambda Layers are one of AWS Lambda’s most powerful yet underutilized features. While many developers use them for basic dependency sharing, there’s a wealth of optimization opportunities that can dramatically improve performance, reduce costs, and streamline deployments. This deep-dive explores advanced techniques for maximizing Lambda Layer efficiency in production environments.

Understanding Lambda Layer Architecture at Scale

Layer Loading Mechanics

When a Lambda function cold starts, AWS loads layers in sequential order before initializing your function code. Each layer is extracted to the /opt directory, with later layers potentially overwriting files from earlier ones. Understanding this process is crucial for optimization:

# Layer structure in /opt
/opt/
├── lib/                 # Shared libraries
├── bin/                 # Executables
├── python/              # Python packages (for Python runtime)
├── nodejs/              # Node.js modules (for Node.js runtime)
└── extensions/          # Lambda extensions

Memory and Performance Impact

Layers contribute to your function’s total package size and memory footprint. Each layer is cached locally on the execution environment, but the initial extraction during cold starts affects performance:

  • Cold start penalty: +50-200ms per additional layer
  • Memory overhead: 10-50MB per layer depending on contents
  • Network transfer: Layers are downloaded to execution environment

Performance Optimization Strategies

1. Layer Consolidation Patterns

Instead of creating multiple small layers, consolidate related dependencies:

# Inefficient: Multiple small layers
# Layer 1: requests (2MB)
# Layer 2: boto3 extensions (1MB) 
# Layer 3: custom utilities (500KB)

# Optimized: Single consolidated layer
# Layer 1: All dependencies (3.5MB) - reduces cold start overhead

2. Selective Dependency Inclusion

Strip unnecessary components from dependencies to minimize layer size:

#!/bin/bash
# Example: Creating optimized Python layer
mkdir -p layer/python

# Install with no cache, compile, or docs
pip install --target layer/python --no-cache-dir --compile requests urllib3

# Remove unnecessary components
find layer/python -name "*.pyc" -delete
find layer/python -name "*.pyo" -delete
find layer/python -name "__pycache__" -type d -exec rm -rf {} +
find layer/python -name "*.dist-info" -type d -exec rm -rf {} +
find layer/python -name "tests" -type d -exec rm -rf {} +

# Compress for deployment
cd layer && zip -r9 ../optimized-layer.zip .

3. Runtime-Specific Optimizations

Python Runtime Optimization

# Optimize imports in layer modules
# __init__.py in your layer package
import sys
import os

# Pre-compile frequently used modules
import py_compile
import compileall

def optimize_layer():
    """Compile Python files for faster loading"""
    layer_path = '/opt/python'
    if os.path.exists(layer_path):
        compileall.compile_dir(layer_path, force=True, quiet=True)

# Call during layer initialization
optimize_layer()

Node.js Runtime Optimization

// package.json for layer
{
  "name": "optimized-layer",
  "version": "1.0.0",
  "main": "index.js",
  "scripts": {
    "build": "npm ci --production && npm prune --production"
  },
  "dependencies": {
    "aws-sdk": "^2.1000.0"
  },
  "devDependencies": {}
}

Cost Optimization Techniques

1. Layer Versioning Strategy

Implement a strategic versioning approach to minimize storage costs:

# CloudFormation template for layer versioning
LayerVersion:
  Type: AWS::Lambda::LayerVersion
  Properties:
    LayerName: !Sub "${Environment}-optimized-layer"
    Content:
      S3Bucket: !Ref LayerArtifactBucket
      S3Key: !Sub "layers/${LayerHash}.zip"
    CompatibleRuntimes:
      - python3.9
      - python3.10
    Description: !Sub "Optimized layer v${LayerVersion} - ${CommitSHA}"

# Cleanup policy for old versions
LayerCleanupFunction:
  Type: AWS::Lambda::Function
  Properties:
    Runtime: python3.9
    Handler: cleanup.handler
    Code:
      ZipFile: |
        import boto3
        import json

        def handler(event, context):
            lambda_client = boto3.client('lambda')
            layer_name = event['LayerName']
            keep_versions = int(event.get('KeepVersions', 5))

            # List all layer versions
            versions = lambda_client.list_layer_versions(
                LayerName=layer_name
            )['LayerVersions']

            # Keep only the latest N versions
            if len(versions) > keep_versions:
                for version in versions[keep_versions:]:
                    lambda_client.delete_layer_version(
                        LayerName=layer_name,
                        VersionNumber=version['Version']
                    )

            return {'deleted_versions': len(versions) - keep_versions}

2. Cross-Account Layer Sharing

Reduce duplication across accounts by sharing layers:

import boto3

def share_layer_across_accounts(layer_arn, target_accounts, regions):
    """Share layer across multiple accounts and regions"""

    for region in regions:
        lambda_client = boto3.client('lambda', region_name=region)

        for account_id in target_accounts:
            try:
                # Add permission for cross-account access
                lambda_client.add_layer_version_permission(
                    LayerName=layer_arn.split(':')[6],
                    VersionNumber=int(layer_arn.split(':')[7]),
                    StatementId=f"share-with-{account_id}",
                    Action="lambda:GetLayerVersion",
                    Principal=account_id
                )

                print(f"Shared layer {layer_arn} with account {account_id} in {region}")

            except Exception as e:
                print(f"Failed to share with {account_id}: {str(e)}")

Advanced Deployment Patterns

1. Blue-Green Layer Deployments

Implement safe layer updates using blue-green deployment patterns:

# deploy_layer.py
import boto3
import json
from typing import Dict, List

class LayerDeploymentManager:
    def __init__(self, layer_name: str, region: str):
        self.lambda_client = boto3.client('lambda', region_name=region)
        self.layer_name = layer_name

    def deploy_new_version(self, layer_zip_path: str) -> str:
        """Deploy new layer version"""

        with open(layer_zip_path, 'rb') as f:
            layer_content = f.read()

        response = self.lambda_client.publish_layer_version(
            LayerName=self.layer_name,
            Content={'ZipFile': layer_content},
            CompatibleRuntimes=['python3.9'],
            Description=f"Deployed at {datetime.utcnow().isoformat()}"
        )

        return response['LayerVersionArn']

    def gradual_rollout(self, new_layer_arn: str, function_names: List[str], 
                       rollout_percentage: int = 20):
        """Gradually roll out new layer to functions"""

        import random

        # Calculate number of functions to update
        update_count = max(1, len(function_names) * rollout_percentage // 100)
        functions_to_update = random.sample(function_names, update_count)

        for function_name in functions_to_update:
            try:
                # Update function configuration
                self.lambda_client.update_function_configuration(
                    FunctionName=function_name,
                    Layers=[new_layer_arn]
                )

                # Add monitoring tag
                self.lambda_client.tag_resource(
                    Resource=f"arn:aws:lambda:{boto3.Session().region_name}:{boto3.client('sts').get_caller_identity()['Account']}:function:{function_name}",
                    Tags={
                        'LayerRolloutBatch': str(rollout_percentage),
                        'LayerVersion': new_layer_arn.split(':')[-1]
                    }
                )

            except Exception as e:
                print(f"Failed to update {function_name}: {str(e)}")

        return functions_to_update

2. Automated Layer Testing

Implement comprehensive testing before layer deployment:

# layer_test_framework.py
import pytest
import boto3
import json
import tempfile
import subprocess
from typing import Dict, Any

class LayerTester:
    def __init__(self, layer_arn: str):
        self.layer_arn = layer_arn
        self.lambda_client = boto3.client('lambda')

    def create_test_function(self, test_code: str, runtime: str = 'python3.9') -> str:
        """Create temporary function for testing layer"""

        function_name = f"layer-test-{self.layer_arn.split(':')[-1]}"

        # Create test function
        response = self.lambda_client.create_function(
            FunctionName=function_name,
            Runtime=runtime,
            Role='arn:aws:iam::ACCOUNT:role/lambda-execution-role',  # Your execution role
            Handler='index.handler',
            Code={'ZipFile': test_code.encode()},
            Layers=[self.layer_arn],
            Timeout=30,
            MemorySize=128
        )

        return function_name

    def test_layer_functionality(self, test_cases: List[Dict[str, Any]]) -> Dict[str, bool]:
        """Run functional tests on layer"""

        test_code = """
import json
import sys
import importlib.util

def handler(event, context):
    test_type = event.get('test_type')

    if test_type == 'import_test':
        try:
            module_name = event['module']
            __import__(module_name)
            return {'success': True, 'message': f'Successfully imported {module_name}'}
        except ImportError as e:
            return {'success': False, 'error': str(e)}

    elif test_type == 'performance_test':
        import time
        start_time = time.time()

        # Simulate workload
        for i in range(1000):
            pass

        execution_time = time.time() - start_time
        return {'success': True, 'execution_time': execution_time}

    return {'success': False, 'error': 'Unknown test type'}
"""

        function_name = self.create_test_function(test_code)
        results = {}

        try:
            for test_case in test_cases:
                response = self.lambda_client.invoke(
                    FunctionName=function_name,
                    Payload=json.dumps(test_case)
                )

                result = json.loads(response['Payload'].read())
                results[test_case['test_name']] = result['success']

        finally:
            # Cleanup test function
            self.lambda_client.delete_function(FunctionName=function_name)

        return results

# Usage example
test_cases = [
    {
        'test_name': 'requests_import',
        'test_type': 'import_test',
        'module': 'requests'
    },
    {
        'test_name': 'performance_baseline',
        'test_type': 'performance_test'
    }
]

tester = LayerTester('arn:aws:lambda:us-east-1:123456789:layer:my-layer:1')
results = tester.test_layer_functionality(test_cases)

Monitoring and Observability

1. Layer Performance Metrics

Create custom CloudWatch metrics for layer performance:

import boto3
import json
from datetime import datetime

def publish_layer_metrics(layer_arn: str, function_name: str, 
                         cold_start_duration: float, layer_size: int):
    """Publish custom metrics for layer performance"""

    cloudwatch = boto3.client('cloudwatch')

    metrics = [
        {
            'MetricName': 'LayerColdStartDuration',
            'Value': cold_start_duration,
            'Unit': 'Milliseconds',
            'Dimensions': [
                {'Name': 'LayerArn', 'Value': layer_arn},
                {'Name': 'FunctionName', 'Value': function_name}
            ]
        },
        {
            'MetricName': 'LayerSize',
            'Value': layer_size,
            'Unit': 'Bytes',
            'Dimensions': [
                {'Name': 'LayerArn', 'Value': layer_arn}
            ]
        }
    ]

    cloudwatch.put_metric_data(
        Namespace='AWS/Lambda/Layers',
        MetricData=metrics
    )

2. Layer Usage Analytics

Track layer adoption and performance across your organization:

import boto3
import pandas as pd
from collections import defaultdict

def analyze_layer_usage():
    """Analyze layer usage across all functions"""

    lambda_client = boto3.client('lambda')
    layer_usage = defaultdict(list)

    # Get all functions
    paginator = lambda_client.get_paginator('list_functions')

    for page in paginator.paginate():
        for function in page['Functions']:
            function_name = function['FunctionName']

            # Get function configuration
            config = lambda_client.get_function_configuration(
                FunctionName=function_name
            )

            layers = config.get('Layers', [])
            for layer in layers:
                layer_arn = layer['Arn']
                layer_usage[layer_arn].append({
                    'function_name': function_name,
                    'runtime': config['Runtime'],
                    'memory_size': config['MemorySize'],
                    'last_modified': config['LastModified']
                })

    # Generate usage report
    usage_report = []
    for layer_arn, functions in layer_usage.items():
        usage_report.append({
            'layer_arn': layer_arn,
            'function_count': len(functions),
            'total_memory': sum(f['memory_size'] for f in functions),
            'runtimes': list(set(f['runtime'] for f in functions))
        })

    return pd.DataFrame(usage_report)

# Generate and save report
df = analyze_layer_usage()
df.to_csv('layer_usage_report.csv', index=False)

Security Best Practices

1. Layer Content Validation

Implement security scanning for layer contents:

import hashlib
import boto3
import zipfile
import tempfile
import os

class LayerSecurityScanner:
    def __init__(self):
        self.suspicious_patterns = [
            b'eval(',
            b'exec(',
            b'__import__',
            b'subprocess.',
            b'os.system',
            b'shell=True'
        ]

    def scan_layer_content(self, layer_zip_path: str) -> Dict[str, Any]:
        """Scan layer for security issues"""

        scan_results = {
            'suspicious_files': [],
            'file_count': 0,
            'total_size': 0,
            'security_score': 100
        }

        with zipfile.ZipFile(layer_zip_path, 'r') as zip_file:
            for file_info in zip_file.filelist:
                scan_results['file_count'] += 1
                scan_results['total_size'] += file_info.file_size

                # Extract and scan file content
                with zip_file.open(file_info) as f:
                    try:
                        content = f.read()

                        # Check for suspicious patterns
                        for pattern in self.suspicious_patterns:
                            if pattern in content:
                                scan_results['suspicious_files'].append({
                                    'file': file_info.filename,
                                    'pattern': pattern.decode('utf-8', errors='ignore'),
                                    'severity': 'HIGH'
                                })
                                scan_results['security_score'] -= 10

                    except Exception as e:
                        # Binary files or other issues
                        continue

        return scan_results

2. Layer Access Control

Implement fine-grained access control for layers:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowLayerUsage",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::ACCOUNT:role/lambda-execution-role"
      },
      "Action": "lambda:GetLayerVersion",
      "Resource": "arn:aws:lambda:*:ACCOUNT:layer:secure-layer:*",
      "Condition": {
        "StringEquals": {
          "lambda:FunctionTag/Environment": ["production", "staging"]
        }
      }
    }
  ]
}

Conclusion

Advanced Lambda Layer optimization requires a holistic approach combining performance engineering, cost management, and operational excellence. By implementing these strategies, you can achieve:

  • 50-70% reduction in cold start times through layer consolidation
  • 30-40% cost savings through strategic versioning and sharing
  • Improved reliability through comprehensive testing and monitoring
  • Enhanced security through content validation and access controls

The key is to treat layers as critical infrastructure components that require the same level of attention as your application code. Start with performance profiling to identify bottlenecks, implement gradual rollout strategies for safety, and continuously monitor the impact of optimizations.

Remember that layer optimization is an iterative process. As your application evolves and AWS introduces new features, revisit your layer strategy to ensure you’re maximizing the benefits of this powerful Lambda capability.


This post explores advanced Lambda Layer optimization techniques beyond basic usage patterns. For organizations running Lambda at scale, these strategies can deliver significant performance and cost improvements while maintaining high reliability standards.