Building a Multi-Cloud Architecture with OCI and AWS: A Real-World Integration Guide

I’ll tell you something that might sound controversial in cloud circles: the best cloud is often more than one cloud.

I’ve worked with dozens of enterprises over the years, and here’s what I’ve noticed. Some started with AWS years ago and built their entire infrastructure there. Then they realized Oracle Autonomous Database or Exadata could dramatically improve their database performance. Others were Oracle shops that wanted to leverage AWS’s machine learning services or global edge network.

The question isn’t really “which cloud is better?” The question is “how do we get the best of both?”

In this article, I’ll walk you through building a practical multi-cloud architecture connecting OCI and AWS. We’ll cover secure networking, data synchronization, identity federation, and the operational realities of running workloads across both platforms.

Why Multi-Cloud Actually Makes Sense

Let me be clear about something. Multi-cloud for its own sake is a terrible idea. It adds complexity, increases operational burden, and creates more things that can break. But multi-cloud for the right reasons? That’s a different story.

Here are legitimate reasons I’ve seen organizations adopt OCI and AWS together:

Database Performance: Oracle Autonomous Database and Exadata Cloud Service are genuinely difficult to match for Oracle workloads. If you’re running complex OLTP or analytics on Oracle, OCI’s database offerings are purpose-built for that.

AWS Ecosystem: AWS has services that simply don’t exist elsewhere. SageMaker for ML, Lambda’s maturity, CloudFront’s global presence, or specialized services like Rekognition and Comprehend.

Vendor Negotiation: Having workloads on multiple clouds gives you negotiating leverage. I’ve seen organizations save millions in licensing by demonstrating they could move workloads.

Acquisition and Mergers: Company A runs on AWS, Company B runs on OCI. Now they’re one company. Multi-cloud by necessity.

Regulatory Requirements: Some industries require data sovereignty or specific compliance certifications that might be easier to achieve with a particular provider in a particular region.

If none of these apply to you, stick with one cloud. Seriously. But if they do, keep reading.

Architecture Overview

Let’s design a realistic scenario. We have an e-commerce company with:

  • Application tier running on AWS (EKS, Lambda, API Gateway)
  • Core transactional database on OCI (Autonomous Transaction Processing)
  • Data warehouse on OCI (Autonomous Data Warehouse)
  • Machine learning workloads on AWS (SageMaker)
  • Shared data that needs to flow between both clouds


Setting Up Cross-Cloud Networking

The foundation of any multi-cloud architecture is networking. You need a secure, reliable, and performant connection between clouds.

Option 1: IPSec VPN (Good for Starting Out)

IPSec VPN is the quickest way to connect AWS and OCI. It runs over the public internet but encrypts everything. Good for development, testing, or low-bandwidth production workloads.

On OCI Side:

First, create a Dynamic Routing Gateway (DRG) and attach it to your VCN:

bash

# Create DRG
oci network drg create \
--compartment-id $COMPARTMENT_ID \
--display-name "aws-interconnect-drg"
# Attach DRG to VCN
oci network drg-attachment create \
--drg-id $DRG_ID \
--vcn-id $VCN_ID \
--display-name "vcn-attachment"

Create a Customer Premises Equipment (CPE) object representing AWS:

bash

# Create CPE for AWS VPN endpoint
oci network cpe create \
--compartment-id $COMPARTMENT_ID \
--ip-address $AWS_VPN_PUBLIC_IP \
--display-name "aws-vpn-endpoint"

Create the IPSec connection:

bash

# Create IPSec connection
oci network ip-sec-connection create \
--compartment-id $COMPARTMENT_ID \
--cpe-id $CPE_ID \
--drg-id $DRG_ID \
--static-routes '["10.1.0.0/16"]' \
--display-name "oci-to-aws-vpn"

On AWS Side:

Create a Customer Gateway pointing to OCI:

bash

# Create Customer Gateway
aws ec2 create-customer-gateway \
--type ipsec.1 \
--public-ip $OCI_VPN_PUBLIC_IP \
--bgp-asn 65000
# Create VPN Gateway
aws ec2 create-vpn-gateway \
--type ipsec.1
# Attach to VPC
aws ec2 attach-vpn-gateway \
--vpn-gateway-id $VGW_ID \
--vpc-id $VPC_ID
# Create VPN Connection
aws ec2 create-vpn-connection \
--type ipsec.1 \
--customer-gateway-id $CGW_ID \
--vpn-gateway-id $VGW_ID \
--options '{"StaticRoutesOnly": true}'

Update route tables on both sides:

bash

# AWS: Add route to OCI CIDR
aws ec2 create-route \
--route-table-id $ROUTE_TABLE_ID \
--destination-cidr-block 10.2.0.0/16 \
--gateway-id $VGW_ID
# OCI: Add route to AWS CIDR
oci network route-table update \
--rt-id $ROUTE_TABLE_ID \
--route-rules '[{
"destination": "10.1.0.0/16",
"destinationType": "CIDR_BLOCK",
"networkEntityId": "'$DRG_ID'"
}]'

Option 2: Private Connectivity (Production Recommended)

For production workloads, you want dedicated private connectivity. This means OCI FastConnect paired with AWS Direct Connect, meeting at a common colocation facility.

The good news is that Oracle and AWS both have presence in major colocation providers like Equinix. The setup involves:

  1. Establishing FastConnect to your colocation
  2. Establishing Direct Connect to the same colocation
  3. Connecting them via a cross-connect in the facility

hcl

# Terraform for FastConnect virtual circuit
resource "oci_core_virtual_circuit" "aws_interconnect" {
compartment_id = var.compartment_id
display_name = "aws-fastconnect"
type = "PRIVATE"
bandwidth_shape_name = "1 Gbps"
cross_connect_mappings {
customer_bgp_peering_ip = "169.254.100.1/30"
oracle_bgp_peering_ip = "169.254.100.2/30"
}
customer_asn = "65001"
gateway_id = oci_core_drg.main.id
provider_name = "Equinix"
region = "Dubai"
}

hcl

# Terraform for AWS Direct Connect
resource "aws_dx_connection" "oci_interconnect" {
name = "oci-direct-connect"
bandwidth = "1Gbps"
location = "Equinix DX1"
provider_name = "Equinix"
}
resource "aws_dx_private_virtual_interface" "oci" {
connection_id = aws_dx_connection.oci_interconnect.id
name = "oci-vif"
vlan = 4094
address_family = "ipv4"
bgp_asn = 65002
amazon_address = "169.254.100.5/30"
customer_address = "169.254.100.6/30"
dx_gateway_id = aws_dx_gateway.main.id
}

Honestly, setting this up involves coordination with both cloud providers and the colocation facility. Budget 4-8 weeks for the physical connectivity and plan for redundancy from day one.

Database Connectivity from AWS to OCI

Now that we have network connectivity, let’s connect AWS applications to OCI databases.

Configuring Autonomous Database for External Access

First, enable private endpoint access for your Autonomous Database:

bash

# Update ADB to use private endpoint
oci db autonomous-database update \
--autonomous-database-id $ADB_ID \
--is-access-control-enabled true \
--whitelisted-ips '["10.1.0.0/16"]' \ # AWS VPC CIDR
--is-mtls-connection-required false # Allow TLS without mTLS for simplicity

Get the connection string:

bash

oci db autonomous-database get \
--autonomous-database-id $ADB_ID \
--query 'data."connection-strings".profiles[?consumer=="LOW"].value | [0]'

Application Configuration on AWS

Here’s a practical Python example for connecting from AWS Lambda to OCI Autonomous Database:

python

# lambda_function.py
import cx_Oracle
import os
import boto3
from botocore.exceptions import ClientError
def get_db_credentials():
"""Retrieve database credentials from AWS Secrets Manager"""
secret_name = "oci-adb-credentials"
region_name = "us-east-1"
session = boto3.session.Session()
client = session.client(
service_name='secretsmanager',
region_name=region_name
)
try:
response = client.get_secret_value(SecretId=secret_name)
return json.loads(response['SecretString'])
except ClientError as e:
raise e
def handler(event, context):
# Get credentials
creds = get_db_credentials()
# Connection string format for Autonomous DB
dsn = """(description=
(retry_count=20)(retry_delay=3)
(address=(protocol=tcps)(port=1522)
(host=adb.me-dubai-1.oraclecloud.com))
(connect_data=(service_name=xxx_atp_low.adb.oraclecloud.com))
(security=(ssl_server_dn_match=yes)))"""
connection = cx_Oracle.connect(
user=creds['username'],
password=creds['password'],
dsn=dsn,
encoding="UTF-8"
)
cursor = connection.cursor()
cursor.execute("SELECT * FROM orders WHERE order_date = TRUNC(SYSDATE)")
results = []
for row in cursor:
results.append({
'order_id': row[0],
'customer_id': row[1],
'amount': float(row[2])
})
cursor.close()
connection.close()
return {
'statusCode': 200,
'body': json.dumps(results)
}

For containerized applications on EKS, use a connection pool:

python

# db_pool.py
import cx_Oracle
import os
class OCIDatabasePool:
_pool = None
@classmethod
def get_pool(cls):
if cls._pool is None:
cls._pool = cx_Oracle.SessionPool(
user=os.environ['OCI_DB_USER'],
password=os.environ['OCI_DB_PASSWORD'],
dsn=os.environ['OCI_DB_DSN'],
min=2,
max=10,
increment=1,
encoding="UTF-8",
threaded=True,
getmode=cx_Oracle.SPOOL_ATTRVAL_WAIT
)
return cls._pool
@classmethod
def get_connection(cls):
return cls.get_pool().acquire()
@classmethod
def release_connection(cls, connection):
cls.get_pool().release(connection)

Kubernetes deployment for the application:

yaml

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: order-service
template:
metadata:
labels:
app: order-service
spec:
containers:
- name: order-service
image: 123456789.dkr.ecr.us-east-1.amazonaws.com/order-service:v1.0
ports:
- containerPort: 8080
env:
- name: OCI_DB_USER
valueFrom:
secretKeyRef:
name: oci-db-credentials
key: username
- name: OCI_DB_PASSWORD
valueFrom:
secretKeyRef:
name: oci-db-credentials
key: password
- name: OCI_DB_DSN
valueFrom:
configMapKeyRef:
name: oci-db-config
key: dsn
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: 1000m
memory: 1Gi
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10

Data Synchronization Between Clouds

Real multi-cloud architectures need data flowing between clouds. Here are practical patterns:

Pattern 1: Event-Driven Sync with Kafka

Use a managed Kafka service as the bridge:

python

# AWS Lambda producer - sends events to Kafka
from kafka import KafkaProducer
import json
producer = KafkaProducer(
bootstrap_servers=['kafka-broker-1:9092', 'kafka-broker-2:9092'],
value_serializer=lambda v: json.dumps(v).encode('utf-8'),
security_protocol='SASL_SSL',
sasl_mechanism='PLAIN',
sasl_plain_username=os.environ['KAFKA_USER'],
sasl_plain_password=os.environ['KAFKA_PASSWORD']
)
def handler(event, context):
# Process order and send to Kafka for OCI consumption
order_data = process_order(event)
producer.send(
'orders-topic',
key=str(order_data['order_id']).encode(),
value=order_data
)
producer.flush()
return {'statusCode': 200}

OCI side consumer using OCI Functions:

python

# OCI Function consumer
import io
import json
import logging
import cx_Oracle
from kafka import KafkaConsumer
def handler(ctx, data: io.BytesIO = None):
consumer = KafkaConsumer(
'orders-topic',
bootstrap_servers=['kafka-broker-1:9092'],
auto_offset_reset='earliest',
enable_auto_commit=True,
group_id='oci-order-processor',
value_deserializer=lambda x: json.loads(x.decode('utf-8'))
)
connection = get_adb_connection()
cursor = connection.cursor()
for message in consumer:
order = message.value
cursor.execute("""
MERGE INTO orders o
USING (SELECT :order_id AS order_id FROM dual) src
ON (o.order_id = src.order_id)
WHEN MATCHED THEN
UPDATE SET amount = :amount, status = :status, updated_at = SYSDATE
WHEN NOT MATCHED THEN
INSERT (order_id, customer_id, amount, status, created_at)
VALUES (:order_id, :customer_id, :amount, :status, SYSDATE)
""", order)
connection.commit()
cursor.close()
connection.close()

Pattern 2: Scheduled Batch Sync

For less time-sensitive data, batch synchronization is simpler and more cost-effective:

python

# AWS Step Functions state machine for batch sync
{
"Comment": "Sync data from AWS to OCI",
"StartAt": "ExtractFromAWS",
"States": {
"ExtractFromAWS": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789:function:extract-data",
"Next": "UploadToS3"
},
"UploadToS3": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789:function:upload-to-s3",
"Next": "CopyToOCI"
},
"CopyToOCI": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789:function:copy-to-oci-bucket",
"Next": "LoadToADB"
},
"LoadToADB": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789:function:load-to-adb",
"End": true
}
}
}

The Lambda function to copy data to OCI Object Storage:

python

# copy_to_oci.py
import boto3
import oci
import os
def handler(event, context):
# Get file from S3
s3 = boto3.client('s3')
s3_object = s3.get_object(
Bucket=event['bucket'],
Key=event['key']
)
file_content = s3_object['Body'].read()
# Upload to OCI Object Storage
config = oci.config.from_file()
object_storage = oci.object_storage.ObjectStorageClient(config)
namespace = object_storage.get_namespace().data
object_storage.put_object(
namespace_name=namespace,
bucket_name="data-sync-bucket",
object_name=event['key'],
put_object_body=file_content
)
return {
'oci_bucket': 'data-sync-bucket',
'object_name': event['key']
}

Load into Autonomous Database using DBMS_CLOUD:

sql

-- Create credential for OCI Object Storage access
BEGIN
DBMS_CLOUD.CREATE_CREDENTIAL(
credential_name => 'OCI_CRED',
username => 'your_oci_username',
password => 'your_auth_token'
);
END;
/
-- Load data from Object Storage
BEGIN
DBMS_CLOUD.COPY_DATA(
table_name => 'ORDERS_STAGING',
credential_name => 'OCI_CRED',
file_uri_list => 'https://objectstorage.me-dubai-1.oraclecloud.com/n/namespace/b/data-sync-bucket/o/orders_*.csv',
format => JSON_OBJECT(
'type' VALUE 'CSV',
'skipheaders' VALUE '1',
'dateformat' VALUE 'YYYY-MM-DD'
)
);
END;
/
-- Merge staging into production
MERGE INTO orders o
USING orders_staging s
ON (o.order_id = s.order_id)
WHEN MATCHED THEN
UPDATE SET o.amount = s.amount, o.status = s.status
WHEN NOT MATCHED THEN
INSERT (order_id, customer_id, amount, status)
VALUES (s.order_id, s.customer_id, s.amount, s.status);

Identity Federation

Managing identities across clouds is a headache unless you set up proper federation. Here’s how to enable SSO between AWS and OCI using a common identity provider.

Using Azure AD as Common IdP (Yes, a Third Cloud)

This is actually quite common. Many enterprises use Azure AD for identity even if their workloads run elsewhere.

Configure OCI to Trust Azure AD:

bash

# Create Identity Provider in OCI
oci iam identity-provider create-saml2-identity-provider \
--compartment-id $TENANCY_ID \
--name "AzureAD-Federation" \
--description "Federation with Azure AD" \
--product-type "IDCS" \
--metadata-url "https://login.microsoftonline.com/$TENANT_ID/federationmetadata/2007-06/federationmetadata.xml"

Configure AWS to Trust Azure AD:

bash

# Create SAML provider in AWS
aws iam create-saml-provider \
--saml-metadata-document file://azure-ad-metadata.xml \
--name AzureAD-Federation
# Create role for federated users
aws iam create-role \
--role-name AzureAD-Admins \
--assume-role-policy-document '{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {"Federated": "arn:aws:iam::123456789:saml-provider/AzureAD-Federation"},
"Action": "sts:AssumeRoleWithSAML",
"Condition": {
"StringEquals": {
"SAML:aud": "https://signin.aws.amazon.com/saml"
}
}
}]
}'

Now your team can use the same Azure AD credentials to access both clouds.

Monitoring Across Clouds

You need unified observability. Here’s a practical approach using Grafana as the common dashboard:

yaml

# docker-compose.yml for centralized Grafana
version: '3.8'
services:
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
volumes:
- grafana-data:/var/lib/grafana
- ./provisioning:/etc/grafana/provisioning
environment:
- GF_SECURITY_ADMIN_PASSWORD=secure_password
- GF_INSTALL_PLUGINS=oci-metrics-datasource
volumes:
grafana-data:

Configure data sources:

yaml

# provisioning/datasources/datasources.yaml
apiVersion: 1
datasources:
- name: AWS-CloudWatch
type: cloudwatch
access: proxy
jsonData:
authType: keys
defaultRegion: us-east-1
secureJsonData:
accessKey: ${AWS_ACCESS_KEY}
secretKey: ${AWS_SECRET_KEY}
- name: OCI-Monitoring
type: oci-metrics-datasource
access: proxy
jsonData:
tenancyOCID: ${OCI_TENANCY_OCID}
userOCID: ${OCI_USER_OCID}
region: me-dubai-1
secureJsonData:
privateKey: ${OCI_PRIVATE_KEY}

Create a unified dashboard that shows both clouds:

json

{
"title": "Multi-Cloud Overview",
"panels": [
{
"title": "AWS EKS CPU Utilization",
"datasource": "AWS-CloudWatch",
"targets": [{
"namespace": "AWS/EKS",
"metricName": "node_cpu_utilization",
"dimensions": {"ClusterName": "production"}
}]
},
{
"title": "OCI Autonomous DB Sessions",
"datasource": "OCI-Monitoring",
"targets": [{
"namespace": "oci_autonomous_database",
"metric": "CurrentOpenSessionCount",
"resourceGroup": "production-adb"
}]
},
{
"title": "Cross-Cloud Latency",
"datasource": "Prometheus",
"targets": [{
"expr": "histogram_quantile(0.95, rate(cross_cloud_request_duration_seconds_bucket[5m]))"
}]
}
]
}

Cost Management

Multi-cloud cost visibility is challenging. Here’s a practical approach:

python

# cost_aggregator.py
import boto3
import oci
from datetime import datetime, timedelta
def get_aws_costs(start_date, end_date):
client = boto3.client('ce')
response = client.get_cost_and_usage(
TimePeriod={
'Start': start_date.strftime('%Y-%m-%d'),
'End': end_date.strftime('%Y-%m-%d')
},
Granularity='DAILY',
Metrics=['UnblendedCost'],
GroupBy=[{'Type': 'DIMENSION', 'Key': 'SERVICE'}]
)
return response['ResultsByTime']
def get_oci_costs(start_date, end_date):
config = oci.config.from_file()
usage_api = oci.usage_api.UsageapiClient(config)
response = usage_api.request_summarized_usages(
request_summarized_usages_details=oci.usage_api.models.RequestSummarizedUsagesDetails(
tenant_id=config['tenancy'],
time_usage_started=start_date,
time_usage_ended=end_date,
granularity="DAILY",
group_by=["service"]
)
)
return response.data.items
def generate_report():
end_date = datetime.now()
start_date = end_date - timedelta(days=30)
aws_costs = get_aws_costs(start_date, end_date)
oci_costs = get_oci_costs(start_date, end_date)
total_aws = sum(float(day['Total']['UnblendedCost']['Amount']) for day in aws_costs)
total_oci = sum(item.computed_amount for item in oci_costs)
print(f"30-Day Multi-Cloud Cost Summary")
print(f"{'='*40}")
print(f"AWS Total: ${total_aws:,.2f}")
print(f"OCI Total: ${total_oci:,.2f}")
print(f"Combined Total: ${total_aws + total_oci:,.2f}")

Lessons Learned

After running multi-cloud architectures for several years, here’s what I’ve learned:

Network is everything. Invest in proper connectivity upfront. The $500/month you save on VPN versus dedicated connectivity will cost you thousands in debugging performance issues.

Pick one cloud for each workload type. Don’t run the same thing in both clouds. Use OCI for Oracle databases, AWS for its unique services. Avoid the temptation to replicate everything everywhere.

Standardize your tooling. Terraform works on both clouds. Use it. Same for monitoring, logging, and CI/CD. The more consistent your tooling, the less your team has to context-switch.

Document your data flows. Know exactly what data goes where and why. This will save you during security audits and incident response.

Test cross-cloud failures. What happens when the VPN goes down? Can your application degrade gracefully? Find out before your customers do.

Conclusion

Multi-cloud between OCI and AWS isn’t simple, but it’s absolutely achievable. The key is having clear reasons for using each cloud, solid networking fundamentals, and consistent operational practices.

Start small. Connect one application to one database across clouds. Get that working reliably before expanding. Build your team’s confidence and expertise incrementally.

The organizations that succeed with multi-cloud are the ones that treat it as an architectural choice, not a checkbox. They know exactly why they need both clouds and have designed their systems accordingly.

Regards,
Osama

Deep Dive into Oracle Kubernetes Engine Security and Networking in Production

Oracle Kubernetes Engine is often introduced as a managed Kubernetes service, but its real strength only becomes clear when you operate it in production. OKE tightly integrates with OCI networking, identity, and security services, which gives you a very different operational model compared to other managed Kubernetes platforms.

This article walks through OKE from a production perspective, focusing on security boundaries, networking design, ingress exposure, private access, and mutual TLS. The goal is not to explain Kubernetes basics, but to explain how OKE behaves when you run regulated, enterprise workloads.

Understanding the OKE Networking Model

OKE does not abstract networking away from you. Every cluster is deeply tied to OCI VCN constructs.

Core Components

An OKE cluster consists of:

  • A managed Kubernetes control plane
  • Worker nodes running in OCI subnets
  • OCI networking primitives controlling traffic flow

Key OCI resources involved:

  • Virtual Cloud Network
  • Subnets for control plane and workers
  • Network Security Groups
  • Route tables
  • OCI Load Balancers

Unlike some platforms, security in OKE is enforced at multiple layers simultaneously.

Worker Node and Pod Networking

OKE uses OCI VCN-native networking. Pods receive IPs from the subnet CIDR through the OCI CNI plugin.

What this means in practice

  • Pods are first-class citizens on the VCN
  • Pod IPs are routable within the VCN
  • Network policies and OCI NSGs both apply

Example subnet design:

VCN: 10.0.0.0/16

Worker Subnet: 10.0.10.0/24
Load Balancer Subnet: 10.0.20.0/24
Private Endpoint Subnet: 10.0.30.0/24

This design allows you to:

  • Keep workers private
  • Expose only ingress through OCI Load Balancer
  • Control east-west traffic using Kubernetes NetworkPolicies and OCI NSGs together

Security Boundaries in OKE

Security in OKE is layered by design.

Layer 1: OCI IAM and Compartments

OKE clusters live inside OCI compartments. IAM policies control:

  • Who can create or modify clusters
  • Who can access worker nodes
  • Who can manage load balancers and subnets

Example IAM policy snippet:

Allow group OKE-Admins to manage cluster-family in compartment OKE-PROD
Allow group OKE-Admins to manage virtual-network-family in compartment OKE-PROD

This separation is critical for regulated environments.

Layer 2: Network Security Groups

Network Security Groups act as virtual firewalls at the VNIC level.

Typical NSG rules:

  • Allow node-to-node communication
  • Allow ingress from load balancer subnet only
  • Block all public inbound traffic

Example inbound NSG rule:

Source: 10.0.20.0/24
Protocol: TCP
Port: 443

This ensures only the OCI Load Balancer can reach your ingress controller.

Layer 3: Kubernetes Network Policies

NetworkPolicies control pod-level traffic.

Example policy allowing traffic only from ingress namespace:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-ingress
  namespace: app-prod
spec:
  podSelector: {}
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              role: ingress

This blocks all lateral movement by default.

Ingress Design in OKE

OKE integrates natively with OCI Load Balancer.

Public vs Private Ingress

You can deploy ingress in two modes:

  • Public Load Balancer
  • Internal Load Balancer

For production workloads, private ingress is strongly recommended.

Example service annotation for private ingress:

service.beta.kubernetes.io/oci-load-balancer-internal: "true"
service.beta.kubernetes.io/oci-load-balancer-subnet1: ocid1.subnet.oc1..

This ensures the load balancer has no public IP.

Private Access to the Cluster Control Plane

OKE supports private API endpoints.

When enabled:

  • The Kubernetes API is accessible only from the VCN
  • No public endpoint exists

This is critical for Zero Trust environments.

Operational impact:

  • kubectl access requires VPN, Bastion, or OCI Cloud Shell inside the VCN
  • CI/CD runners must have private connectivity

This dramatically reduces the attack surface.

Mutual TLS Inside OKE

TLS termination at ingress is not enough for sensitive workloads. Many enterprises require mTLS between services.

Typical mTLS Architecture

  • TLS termination at ingress
  • Internal mTLS between services
  • Certificate management via Vault or cert-manager

Example cert-manager issuer using OCI Vault:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: oci-vault-issuer
spec:
  vault:
    server: https://vault.oci.oraclecloud.com
    path: pki/sign/oke

Each service receives:

  • Its own certificate
  • Short-lived credentials
  • Automatic rotation

Traffic Flow Example

End-to-end request path:

  1. Client connects to OCI Load Balancer
  2. Load Balancer forwards traffic to NGINX Ingress
  3. Ingress enforces TLS and headers
  4. Service-to-service traffic uses mTLS
  5. NetworkPolicy restricts lateral movement
  6. NSGs enforce VCN-level boundaries

Every hop is authenticated and encrypted.


Observability and Security Visibility

OKE integrates with:

  • OCI Logging
  • OCI Flow Logs
  • Kubernetes audit logs

This allows:

  • Tracking ingress traffic
  • Detecting unauthorized access attempts
  • Correlating pod-level events with network flows

Regards
Osama

Setting up a High-Availability (HA) Architecture with OCI Load Balancer and Compute Instances

Ensuring high availability (HA) for your applications is critical in today’s cloud-first environment. Oracle Cloud Infrastructure (OCI) provides robust tools such as Load Balancers and Compute Instances to help you create a resilient, highly available architecture for your applications. In this post, we’ll walk through the steps to set up an HA architecture using OCI Load Balancer with multiple compute instances across availability domains for fault tolerance.

Prerequisites

  • OCI Account: A working Oracle Cloud Infrastructure account.
  • OCI CLI: Installed and configured with necessary permissions.
  • Terraform: Installed and set up for provisioning infrastructure.
  • Basic knowledge of Load Balancers and Compute Instances in OCI.

Step 1: Set Up a Virtual Cloud Network (VCN)

A VCN is required to house your compute instances and load balancers. To begin, create a new VCN with subnets in different availability domains (ADs) for high availability.

Terraform Configuration (vcn.tf):

resource "oci_core_virtual_network" "vcn" {
  compartment_id = "<compartment_ocid>"
  cidr_block     = "10.0.0.0/16"
  display_name   = "HA-Virtual-Network"
}

resource "oci_core_subnet" "subnet1" {
  compartment_id      = "<compartment_ocid>"
  vcn_id              = oci_core_virtual_network.vcn.id
  cidr_block          = "10.0.1.0/24"
  availability_domain = "AD-1"
  display_name        = "HA-Subnet-AD1"
}

resource "oci_core_subnet" "subnet2" {
  compartment_id      = "<compartment_ocid>"
  vcn_id              = oci_core_virtual_network.vcn.id
  cidr_block          = "10.0.2.0/24"
  availability_domain = "AD-2"
  display_name        = "HA-Subnet-AD2"
}

Step 2: Provision Compute Instances

Create two compute instances (one in each subnet) to ensure redundancy.

Terraform Configuration (compute.tf):

resource "oci_core_instance" "instance1" {
  compartment_id = "<compartment_ocid>"
  availability_domain = "AD-1"
  shape = "VM.Standard2.1"
  display_name = "HA-Instance-1"
  
  create_vnic_details {
    subnet_id = oci_core_subnet.subnet1.id
    assign_public_ip = true
  }

  source_details {
    source_type = "image"
    source_id = "<image_ocid>"
  }
}

resource "oci_core_instance" "instance2" {
  compartment_id = "<compartment_ocid>"
  availability_domain = "AD-2"
  shape = "VM.Standard2.1"
  display_name = "HA-Instance-2"
  
  create_vnic_details {
    subnet_id = oci_core_subnet.subnet2.id
    assign_public_ip = true
  }

  source_details {
    source_type = "image"
    source_id = "<image_ocid>"
  }
}

Step 3: Set Up the OCI Load Balancer

Now, configure the OCI Load Balancer to distribute traffic between the compute instances in both availability domains.

Terraform Configuration (load_balancer.tf):

resource "oci_load_balancer_load_balancer" "ha_lb" {
  compartment_id = "<compartment_ocid>"
  display_name   = "HA-Load-Balancer"
  shape           = "100Mbps"

  subnet_ids = [
    oci_core_subnet.subnet1.id,
    oci_core_subnet.subnet2.id
  ]

  backend_sets {
    name = "backend-set-1"

    backends {
      ip_address = oci_core_instance.instance1.private_ip
      port = 80
    }

    backends {
      ip_address = oci_core_instance.instance2.private_ip
      port = 80
    }

    policy = "ROUND_ROBIN"
    health_checker {
      port = 80
      protocol = "HTTP"
      url_path = "/health"
      retries = 3
      timeout_in_seconds = 10
      interval_in_seconds = 5
    }
  }
}

resource "oci_load_balancer_listener" "ha_listener" {
  load_balancer_id = oci_load_balancer_load_balancer.ha_lb.id
  name = "http-listener"
  default_backend_set_name = "backend-set-1"
  port = 80
  protocol = "HTTP"
}

Step 4: Set Up Health Checks for High Availability

Health checks are critical to ensure that the load balancer sends traffic only to healthy instances. The health check configuration is included in the backend set definition above, but you can customize it as needed.
Step 5: Testing and Validation

Once all resources are provisioned, test the HA architecture:

Verify Load Balancer Health: Ensure that the backend instances are marked as healthy by checking the load balancer’s health checks.

oci load-balancer backend-set get --load-balancer-id <load_balancer_id> --name backend-set-1
  1. Access the Application: Test accessing your application through the Load Balancer’s public IP. The Load Balancer should evenly distribute traffic across the two compute instances.
  2. Failover Testing: Manually shut down one of the instances to verify that the Load Balancer reroutes traffic to the other instance.

Automating Oracle Cloud Networking with OCI Service Gateway and Terraform

Oracle Cloud Infrastructure (OCI) offers a wide range of services that enable users to create secure, scalable cloud environments. One crucial aspect of a cloud deployment is ensuring secure connectivity between services without relying on public internet access. In this blog post, we’ll walk through how to set up and manage OCI Service Gateway for secure, private access to OCI services using Terraform. This step-by-step guide is intended for cloud engineers looking to leverage automation to create robust networking configurations in OCI.

Step 1: Setting up Your Environment

Before deploying the OCI Service Gateway and other networking components with Terraform, you need to set up a few prerequisites:

  1. Terraform Installation: Make sure Terraform is installed on your local machine. You can download it from Terraform’s official site.
  2. OCI CLI and API Key: Install the OCI CLI and set up your authentication key. The key must be configured in your OCI console.
  3. OCI Terraform Provider: You will also need to download the OCI Terraform provider by adding the following configuration to your provider.tf file:
provider "oci" {
  tenancy_ocid     = "<TENANCY_OCID>"
  user_ocid        = "<USER_OCID>"
  fingerprint      = "<FINGERPRINT>"
  private_key_path = "<PRIVATE_KEY_PATH>"
  region           = "us-ashburn-1"
}

Step 2: Defining the Infrastructure

The key to deploying the Service Gateway and related infrastructure is defining the resources in a main.tf file. Below is an example to create a VCN, subnets, and a Service Gateway:

resource "oci_core_vcn" "example_vcn" {
  cidr_block     = "10.0.0.0/16"
  compartment_id = "<COMPARTMENT_OCID>"
  display_name   = "example-vcn"
}

resource "oci_core_subnet" "example_subnet" {
  vcn_id             = oci_core_vcn.example_vcn.id
  compartment_id     = "<COMPARTMENT_OCID>"
  cidr_block         = "10.0.1.0/24"
  availability_domain = "<AVAILABILITY_DOMAIN>"
  display_name       = "example-subnet"
  prohibit_public_ip_on_vnic = true
}

resource "oci_core_service_gateway" "example_service_gateway" {
  vcn_id         = oci_core_vcn.example_vcn.id
  compartment_id = "<COMPARTMENT_OCID>"
  services {
    service_id = "all-oracle-services-in-region"
  }
  display_name  = "example-service-gateway"
}

resource "oci_core_route_table" "example_route_table" {
  vcn_id         = oci_core_vcn.example_vcn.id
  compartment_id = "<COMPARTMENT_OCID>"
  display_name   = "example-route-table"
  route_rules {
    destination       = "all-oracle-services-in-region"
    destination_type  = "SERVICE_CIDR_BLOCK"
    network_entity_id = oci_core_service_gateway.example_service_gateway.id
  }
}

Explanation:

  • oci_core_vcn: Defines the Virtual Cloud Network (VCN) where all resources will reside.
  • oci_core_subnet: Creates a subnet within the VCN to host compute instances or other resources.
  • oci_core_service_gateway: Configures a Service Gateway to allow private access to Oracle services such as Object Storage.
  • oci_core_route_table: Configures the route table to direct traffic through the Service Gateway for services within OCI.

Step 3: Variables for Reusability

To make the code reusable, it’s best to define variables in a variables.tf file:

variable "compartment_ocid" {
  description = "The OCID of the compartment to create resources in"
  type        = string
}

variable "availability_domain" {
  description = "The Availability Domain to launch resources in"
  type        = string
}

variable "vcn_cidr" {
  description = "The CIDR block for the VCN"
  type        = string
  default     = "10.0.0.0/16"
}

This allows you to easily modify parameters like compartment ID, availability domain, and VCN CIDR without touching the core logic.

Step 4: Running the Terraform Script

  1. Initialize TerraformTo start using Terraform with OCI, initialize your working directory using:
terraform init
  1. This command downloads the necessary providers and prepares your environment.
  2. Plan the DeploymentBefore applying changes, always run the terraform plan command. This will provide an overview of what resources will be created.
terraform plan -var-file="config.tfvars"

Apply the Changes

Once you’re confident with the plan, apply it to create your Service Gateway and networking resources:

terraform apply -var-file="config.tfvars"

Step 5: Verification

After deployment, you can verify your resources via the OCI Console. Navigate to Networking > Virtual Cloud Networks to see your VCN, subnets, and the Service Gateway. You can also validate the route table settings to ensure that the traffic routes correctly to Oracle services.

Step 6: Destroy the Infrastructure

To clean up the resources and avoid any unwanted charges, you can use the terraform destroy command:

terraform destroy -var-file="config.tfvars"

Regards
Osama

Automating Block Volume Backups in Oracle Cloud Infrastructure (OCI) using CLI and Terraform

Briefly introduce the importance of block volumes in OCI and why automated backups are essential.Mention that this blog will cover two methods: using the OCI CLI and Terraform for automation.

Automating Block Volume Backups using OCI CLI

Prerequisites:

  • Set up OCI CLI on your machine (brief steps with links).
  • Ensure that you have the right permissions to manage block volumes.

Step-by-step guide:

  • Command to create a block volume
oci bv volume create --compartment-id <your_compartment_ocid> --availability-domain <your_ad> --display-name "MyVolume" --size-in-gbs 50

Command to take a backup of the block volume:

oci bv backup create --volume-id <your_volume_ocid> --display-name "MyVolumeBackup"

Scheduling backups using cron jobs for automation.

  • Example cron job configuration
0 2 * * * /usr/local/bin/oci bv backup create --volume-id <your_volume_ocid> --display-name "ScheduledBackup" >> /var/log/oci_backup.log 2>&1

Automating Block Volume Backups using Terraform

Prerequisites

  1. OCI Credentials: Make sure you have the proper API keys and permissions configured in your OCI tenancy.
  2. Terraform Setup: Terraform should be installed and configured to interact with OCI, including the OCI provider setup in your environment.
Step 1: Define the OCI Block Volume Resource

First, define the block volume that you want to automate backups for. Here’s an example of a simple block volume resource in Terraform:

resource "oci_core_volume" "my_block_volume" {
  availability_domain = "your-availability-domain"
  compartment_id      = "ocid1.compartment.oc1..your-compartment-id"
  display_name        = "my_block_volume"
  size_in_gbs         = 50
}
Step 2: Define a Backup Policy

OCI provides predefined backup policies such as gold, silver, and bronze, which define how frequently backups are taken. You can create a custom backup policy as well, but for simplicity, we’ll use one of the predefined policies in this example. The Terraform resource oci_core_volume_backup_policy_assignment will assign a backup policy to the block volume.

Here’s an example to assign the gold backup policy to the block volume:

resource "oci_core_volume_backup_policy_assignment" "backup_assignment" {
  volume_id       = oci_core_volume.my_block_volume.id
  policy_id       = data.oci_core_volume_backup_policy.gold.id
}

data "oci_core_volume_backup_policy" "gold" {
  name = "gold"
}
Step 3: Custom Backup Policy (Optional)

If you need a custom backup policy rather than using the predefined gold, silver, or bronze policies, you can define a custom backup policy using OCI’s native scheduling.

You can create a custom schedule by combining these elements in your oci_core_volume_backup_policy resource.

resource "oci_core_volume_backup_policy" "custom_backup_policy" {
  compartment_id = "ocid1.compartment.oc1..your-compartment-id"
  display_name   = "CustomBackupPolicy"

  schedules {
    backup_type = "INCREMENTAL"
    period      = "ONE_DAY"
    retention_duration = "THIRTY_DAYS"
  }

  schedules {
    backup_type = "FULL"
    period      = "ONE_WEEK"
    retention_duration = "NINETY_DAYS"
  }
}

You can then assign this policy to the block volume using the same method as earlier.

Step 4: Apply the Terraform Configuration

Once your Terraform configuration is ready, apply it using the standard Terraform workflow:

  1. Initialize Terraform:
terraform init

Plan the Terraform deployment:

terraform plan

Apply the Terraform plan:

terraform apply

This process will automatically provision your block volumes and assign the specified backup policy.



Regards
Osama

Automating Cloud Infrastructure Management with OCI Resource Manager

Setting Up OCI Resource Manager

Creating a Stack:

  • Log in to the OCI Console.
  • Navigate to Resource ManagerStacksCreate Stack.
  • Upload your Terraform configuration file.

Example Terraform Configuration:

provider "oci" {
region = "us-ashburn-1"
}

resource "oci_core_instance" "my_instance" {
availability_domain = "AD-1"
compartment_id = "<compartment_OCID>"
shape = "VM.Standard2.1"
display_name = "MyInstance"
image_id = "<image_OCID>"
subnet_id = "<subnet_OCID>"

source_details {
source_type = "image"
image_id = "<image_OCID>"
}

metadata = {
ssh_authorized_keys = file("~/.ssh/id_rsa.pub")
}
}

Deploying Infrastructure with Resource Manager

Creating a Job:

oci resource-manager stack create-job --stack-id <stack_OCID> --display-name "MyDeploymentJob" --operation-type APPLY

Monitoring Deployment:

oci resource-manager job list --stack-id <stack_OCID>

Managing and Updating Infrastructure

  • Updating a Stack:
    • Modify the Terraform configuration file.
    • Navigate to Resource ManagerStacksUpdate Stack.
    • Upload the updated Terraform configuration file and apply changes.

Destroying Infrastructure:

oci resource-manager stack create-job --stack-id <stack_OCID> --display-name "DestroyJob" --operation-type DESTROY

Integrating with CI/CD Pipelines

Example Integration with GitHub Actions:

name: Deploy to OCI

on:
push:
branches:
- main

jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2

- name: Set up Terraform
uses: hashicorp/setup-terraform@v1

- name: Terraform Init
run: terraform init

- name: Terraform Apply
run: terraform apply -auto-approve
env:
OCI_REGION: ${{ secrets.OCI_REGION }}
OCI_TENANCY_OCID: ${{ secrets.OCI_TENANCY_OCID }}
OCI_USER_OCID: ${{ secrets.OCI_USER_OCID }}
OCI_FINGERPRINT: ${{ secrets.OCI_FINGERPRINT }}
OCI_PRIVATE_KEY_PATH: ${{ secrets.OCI_PRIVATE_KEY_PATH }}
OCI_PRIVATE_KEY_PASSPHRASE: ${{ secrets.OCI_PRIVATE_KEY_PASSPHRASE }}

Thank you

Osama

DubOPS Event

DubOps is a unique event that brings together DevOps, IT operations, and software development experts to share their knowledge and insights with the community. This event provides a platform for attendees to learn about the latest trends and best practices in the industry, as well as network with peers and thought leaders.

Registration for the Dubops event is now open, and we encourage anyone interested in attending to sign up early, as space is limited. Don’t miss this chance to expand your knowledge, connect with peers, and stay ahead of the curve in the ever-changing world of DevOps and IT operations.

Date: May 11th, 2023
Time: 18:00 – 21:00
Location: Zabeel House, Dubai, UAE
Registration link: https://lnkd.in/dCd7V-vv
We look forward to seeing you there!

Regards

Osama

Multi POd Example – k8s

Create a Multi-Container Pod

  1. Create a YAML file named multi.yml:
apiVersion: v1
kind: Pod
metadata:
  name: multi
  namespace: baz
spec:
  containers:
  - name: nginx
    image: nginx
  - name: redis
    image: redis

Create a Complex Multi-Container Pod

apiVersion: v1
kind: Pod
metadata:
  name: logging-sidecar
  namespace: baz
spec:
  containers:
  - name: busybox1
    image: busybox
    command: ['sh', '-c', 'while true; do echo Logging data > /output/output.log; sleep 5; done']
    volumeMounts:
    - name: sharedvol
      mountPath: /output
  - name: sidecar
    image: busybox
    command: ['sh', '-c', 'tail -f /input/output.log']
    volumeMounts:
    - name: sharedvol
      mountPath: /input
  volumes:
  - name: sharedvol
    emptyDir: {}

FREE LEARNING ON UDEMY

The below is now Free courses on Udemy, not sure till when so enjoy as you can.

Free learning on Udemy DevOps Tutorials for Absolute Beginner

  1. DevOps – The Introduction
    https://lnkd.in/dD79ZpJF
  2. CI CD pipeline – Devops Automation in 1 hr
    https://lnkd.in/dMQEGJBN
  3. DevOps Crash Course
    https://lnkd.in/dt5CmYSN
  4. DevOps 101
    https://lnkd.in/dhyzHVQh
  5. DevOps on AWS: Code, Build, and Test (Course 1 of 3)
    https://lnkd.in/dV6NbWRJ
  6. Free Devops Interview Questions and Answers
    https://lnkd.in/dsQu76qm
  7. DevOps Tools for Beginners: Ansible in 1 hour
    https://lnkd.in/dKgMap-r
  8. DevOps on AWS: Release and Deploy (Course 2 of 3)
    https://lnkd.in/dzQuM4Ht
  9. DevOps on AWS: Operate and Monitor (Course 3 of 3)
    https://lnkd.in/d_P9wUgg
  10. Introduction to DevOps, Habits and Practices
    https://lnkd.in/dsvQQcYj
  11. Amazon AWS Cloud IAM Hands-On
    https://lnkd.in/ddSBhiST
  12. DevOps : CI/CD with Jenkins
    https://lnkd.in/d3qvi-Az
  13. Introduction to YAML – A hands -on course
    https://lnkd.in/d4ypNfGF
  14. Kubernetes: Getting Started
    https://lnkd.in/d_JQi6wF
  15. Docker Tutorial for Beginners practical hands on -Devops
    https://lnkd.in/dbSJ-zfX
  16. Ansible for the Absolute Beginner – DevOps
    https://lnkd.in/dn_w3bsK
  17. Docker, Docker SWARM and Kubernetes crash course for DevOps
    https://lnkd.in/dFirktd3
  18. Understanding Docker in about an Hour
    https://lnkd.in/dNBvbgqJ
  19. Learn terraform by setting up Highly available wordpress
    https://lnkd.in/d-AaXDT2
  20. Use Ansible with Amazon Web Services
    https://lnkd.in/d6VfZi7d
  21. GIT Crash Course
    https://lnkd.in/ddzznGuV
  22. Maven Quick Start: A Fast Introduction to Maven by Example
    https://lnkd.in/dhVam3zC
  23. Master Amazon EC2 Basics with 10 Labs
    https://lnkd.in/d9jQ6cmN
  24. Amazon Web Services (AWS): CloudFormation
    https://lnkd.in/dAc65c-H
  25. Just enough Ansible to be dangerous
    https://lnkd.in/dXaWmX5d
  26. AZ-900 Microsoft Azure Fundamentals
    https://lnkd.in/dcdae_VZ
  27. Deploy Azure Virtual Desktop for beginners
    https://lnkd.in/dQsbzHes
  28. Apache Maven for Beginners
    https://lnkd.in/dWTK6dxn
  29. AWS Certified Solutions Architect Associate Introduction
    https://lnkd.in/d4eR5gsW
  30. Microsoft Azure fundamentals Az900 crash course
    https://lnkd.in/de5GBCEB
  31. Azure Real World Hand-on Training For Beginners.
    https://lnkd.in/dB3VM7f7
  32. Introduction to Linux Shell Scripting
    https://lnkd.in/dCb4BkvH
  33. Create a 3-Tier Application Using Azure Virtual Machines
    https://lnkd.in/dfMtuW8C
  34. AWS VPC and VPC Peering Demo
    https://lnkd.in/dxnraPHf
  35. Amazon Web Services (AWS) EC2: An Introduction
    https://lnkd.in/drUvNuFk
  36. Hosting your static website on Amazon AWS S3 service
    https://lnkd.in/dBw4RKs2
  37. Mobaxterm Powerful tools to access Linux and Unix
    https://lnkd.in/dzKTB4xw
  38. Getting started with Cloud Computing using Microsoft Azure
    https://lnkd.in/dViDqS2t
  39. Cloud Computing Fundamental
    https://lnkd.in/d9ZY_Kdq

Cheers
Osama

K8s Example

Create a Service Account

It’s super simple command

kubectl create sa webautomation -n web

Create a ClusterRole That Provides Read Access to Pods

  1. Define the ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: pod-reader
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "watch", "list"]

Bind the ClusterRole to the Service Account to Only Read Pods in the web Namespace

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: rb-pod-reader
  namespace: web
subjects:
- kind: ServiceAccount
  name: webautomation
roleRef:
  kind: ClusterRole
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io

Cheers

Osama