Building a Multi-Cloud Secrets Management Strategy with HashiCorp Vault

Posted on February 22, 2026 by Osama Mustafa in Application

Let me ask you something. Where are your database passwords right now? Your API keys? Your TLS certificates?

If you’re like most teams I’ve worked with, the honest answer is “scattered everywhere.” Some are in environment variables. Some are in Kubernetes secrets (base64 encoded, which isn’t encryption by the way). A few are probably still hardcoded in configuration files that someone committed to Git three years ago.

I’m not judging. We’ve all been there. But as your infrastructure grows across multiple clouds, this approach becomes a ticking time bomb. One leaked credential can compromise everything.

In this article, I’ll show you how to build a centralized secrets management strategy using HashiCorp Vault. We’ll deploy it properly, integrate it with AWS, Azure, and GCP, and set up dynamic secrets that rotate automatically. No more shared passwords. No more “who has access to what” mysteries.

Why Vault? Why Now?

Before we dive into implementation, let me explain why I recommend Vault over cloud-native solutions like AWS Secrets Manager, Azure Key Vault, or GCP Secret Manager.

Don’t get me wrong. Those services are excellent. If you’re running entirely on one cloud, they might be all you need. But here’s the reality for most organizations:

You have workloads on AWS. Your data team uses GCP for BigQuery. Your enterprise applications run on Azure. Maybe you still have some on-premises systems. And you need a consistent way to manage secrets across all of them.

Vault gives you that single control plane. One audit log. One policy engine. One place to rotate credentials. And it integrates with everything.

Architecture Overview

Here’s what we’re building:

The key principle here is that applications never store long-lived credentials. Instead, they authenticate to Vault and receive short-lived, automatically rotated credentials for the specific resources they need.

Building a Multi-Cloud Secrets Management Strategy with HashiCorp Vault

Let me ask you something. Where are your database passwords right now? Your API keys? Your TLS certificates?

I’m not judging. We’ve all been there. But as your infrastructure grows across multiple clouds, this approach becomes a ticking time bomb. One leaked credential can compromise everything.

Why Vault? Why Now?

Before we dive into implementation, let me explain why I recommend Vault over cloud-native solutions like AWS Secrets Manager, Azure Key Vault, or GCP Secret Manager.

Don’t get me wrong. Those services are excellent. If you’re running entirely on one cloud, they might be all you need. But here’s the reality for most organizations:

Vault gives you that single control plane. One audit log. One policy engine. One place to rotate credentials. And it integrates with everything.

Architecture Overview

Here’s what we’re building:

Step 1: Deploy Vault on Kubernetes

I prefer running Vault on Kubernetes because it gives you high availability, easy scaling, and integrates beautifully with your existing workloads. We’ll use the official Helm chart.

Prerequisites

You’ll need a Kubernetes cluster. Any managed Kubernetes service works: EKS, AKS, GKE, or even OKE. For this guide, I’ll use commands that work across all of them.

Create the Namespace and Storage

bash

			
kubectl create namespace vault
# Create storage class for Vault data
# This example uses AWS EBS, adjust for your cloud
cat <<EOF | kubectl apply -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: vault-storage
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  encrypted: "true"
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
EOF

		

Configure Vault Helm Values

yaml

			
# vault-values.yaml
global:
  enabled: true
  tlsDisable: false
injector:
  enabled: true
  replicas: 2
  
  resources:
    requests:
      memory: 256Mi
      cpu: 250m
    limits:
      memory: 512Mi
      cpu: 500m
server:
  enabled: true
  
  # Run 3 replicas for high availability
  ha:
    enabled: true
    replicas: 3
    
    # Use Raft for integrated storage
    raft:
      enabled: true
      setNodeId: true
      
      config: |
        ui = true
        
        listener "tcp" {
          tls_disable = false
          address = "[::]:8200"
          cluster_address = "[::]:8201"
          tls_cert_file = "/vault/userconfig/vault-tls/tls.crt"
          tls_key_file = "/vault/userconfig/vault-tls/tls.key"
        }
        
        storage "raft" {
          path = "/vault/data"
          
          retry_join {
            leader_api_addr = "https://vault-0.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/vault-tls/ca.crt"
          }
          retry_join {
            leader_api_addr = "https://vault-1.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/vault-tls/ca.crt"
          }
          retry_join {
            leader_api_addr = "https://vault-2.vault-internal:8200"
            leader_ca_cert_file = "/vault/userconfig/vault-tls/ca.crt"
          }
        }
        
        service_registration "kubernetes" {}
        
        seal "awskms" {
          region     = "us-east-1"
          kms_key_id = "alias/vault-unseal-key"
        }
  
  resources:
    requests:
      memory: 1Gi
      cpu: 500m
    limits:
      memory: 2Gi
      cpu: 2000m
  
  dataStorage:
    enabled: true
    size: 20Gi
    storageClass: vault-storage
  
  auditStorage:
    enabled: true
    size: 10Gi
    storageClass: vault-storage
  # Service account for cloud integrations
  serviceAccount:
    create: true
    annotations:
      eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT_ID:role/vault-server-role
ui:
  enabled: true
  serviceType: LoadBalancer
  
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
    service.beta.kubernetes.io/aws-load-balancer-internal: "true"

		

Generate TLS Certificates

Vault should always use TLS. Here’s how to create certificates using cert-manager:

yaml

			
# vault-certificate.yaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: vault-tls
  namespace: vault
spec:
  secretName: vault-tls
  duration: 8760h # 1 year
  renewBefore: 720h # 30 days
  subject:
    organizations:
      - YourCompany
  commonName: vault.vault.svc.cluster.local
  dnsNames:
    - vault
    - vault.vault
    - vault.vault.svc
    - vault.vault.svc.cluster.local
    - vault-0.vault-internal
    - vault-1.vault-internal
    - vault-2.vault-internal
    - "*.vault-internal"
  ipAddresses:
    - 127.0.0.1
  issuerRef:
    name: cluster-issuer
    kind: ClusterIssuer

		

Install Vault

bash

			
helm repo add hashicorp https://helm.releases.hashicorp.com
helm repo update
helm install vault hashicorp/vault \
  --namespace vault \
  --values vault-values.yaml \
  --version 0.27.0

		

Initialize and Unseal

This is a one-time operation. Keep these keys safe. I mean really safe. Like offline, in multiple secure locations.

bash

			
# Initialize Vault
kubectl exec -n vault vault-0 -- vault operator init \
  -key-shares=5 \
  -key-threshold=3 \
  -format=json > vault-init.json
# The output contains your unseal keys and root token
# Store these securely!
# If not using auto-unseal, you'd need to unseal manually:
# kubectl exec -n vault vault-0 -- vault operator unseal <key1>
# kubectl exec -n vault vault-0 -- vault operator unseal <key2>
# kubectl exec -n vault vault-0 -- vault operator unseal <key3>
# With AWS KMS auto-unseal configured, Vault unseals automatically

		

Step 2: Configure Authentication Methods

Now we need to tell Vault how applications will authenticate. This is where it gets interesting.

Kubernetes Authentication

Applications running in Kubernetes can authenticate using their service account tokens. No passwords needed.

bash

			
# Enable Kubernetes auth
vault auth enable kubernetes
# Configure it to trust our cluster
vault write auth/kubernetes/config \
  kubernetes_host="https://$KUBERNETES_PORT_443_TCP_ADDR:443" \
  token_reviewer_jwt="$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
  kubernetes_ca_cert=@/var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
  issuer="https://kubernetes.default.svc.cluster.local"

		

AWS IAM Authentication

For workloads running on EC2, Lambda, or ECS, they can authenticate using their IAM roles.

bash

			
# Enable AWS auth
vault auth enable aws
# Configure AWS credentials for Vault to verify requests
vault write auth/aws/config/client \
  secret_key=$AWS_SECRET_KEY \
  access_key=$AWS_ACCESS_KEY
# Create a role that EC2 instances can use
vault write auth/aws/role/ec2-app-role \
  auth_type=iam \
  bound_iam_principal_arn="arn:aws:iam::ACCOUNT_ID:role/app-server-role" \
  policies=app-policy \
  ttl=1h

		

Azure Authentication

For Azure workloads using Managed Identities:

bash

			
# Enable Azure auth
vault auth enable azure
# Configure Azure
vault write auth/azure/config \
  tenant_id=$AZURE_TENANT_ID \
  resource="https://management.azure.com/" \
  client_id=$AZURE_CLIENT_ID \
  client_secret=$AZURE_CLIENT_SECRET
# Create a role for Azure VMs
vault write auth/azure/role/azure-app-role \
  policies=app-policy \
  bound_subscription_ids=$AZURE_SUBSCRIPTION_ID \
  bound_resource_groups=production-rg \
  ttl=1h

		

GCP Authentication

For GCP workloads using service accounts:

bash

			
# Enable GCP auth
vault auth enable gcp
# Configure GCP
vault write auth/gcp/config \
  credentials=@gcp-credentials.json
# Create a role for GCE instances
vault write auth/gcp/role/gce-app-role \
  type="gce" \
  policies=app-policy \
  bound_projects="my-project-id" \
  bound_zones="us-central1-a,us-central1-b" \
  ttl=1h

		

Step 3: Set Up Dynamic Secrets

Here’s where the magic happens. Instead of storing static database passwords, Vault can generate unique credentials on demand and revoke them automatically when they expire.

Dynamic AWS Credentials

bash

			
# Enable AWS secrets engine
vault secrets enable aws
# Configure root credentials (Vault uses these to create dynamic creds)
vault write aws/config/root \
  access_key=$AWS_ACCESS_KEY \
  secret_key=$AWS_SECRET_KEY \
  region=us-east-1
# Create a role that generates S3 read-only credentials
vault write aws/roles/s3-reader \
  credential_type=iam_user \
  policy_document=-<<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::my-bucket",
        "arn:aws:s3:::my-bucket/*"
      ]
    }
  ]
}
EOF
# Now any authenticated client can get temporary AWS credentials
vault read aws/creds/s3-reader
# Returns:
# access_key     AKIA...
# secret_key     xyz123...
# lease_duration 1h
# These credentials will be automatically revoked after 1 hour

		

Dynamic Database Credentials

This is probably my favorite feature. Every time an application needs to connect to a database, it gets a unique username and password that only it knows.

bash

			
# Enable database secrets engine
vault secrets enable database
# Configure PostgreSQL connection
vault write database/config/production-postgres \
  plugin_name=postgresql-database-plugin \
  allowed_roles="app-readonly,app-readwrite" \
  connection_url="postgresql://{{username}}:{{password}}@db.example.com:5432/appdb?sslmode=require" \
  username="vault_admin" \
  password="vault_admin_password"
# Create a read-only role
vault write database/roles/app-readonly \
  db_name=production-postgres \
  creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; \
    GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
  revocation_statements="DROP ROLE IF EXISTS \"{{name}}\";" \
  default_ttl="1h" \
  max_ttl="24h"
# Create a read-write role
vault write database/roles/app-readwrite \
  db_name=production-postgres \
  creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; \
    GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
  revocation_statements="DROP ROLE IF EXISTS \"{{name}}\";" \
  default_ttl="1h" \
  max_ttl="24h"

		

Now when your application requests credentials:

bash

			
vault read database/creds/app-readonly
# Returns:
# username    v-kubernetes-app-readonly-abc123
# password    A1B2C3D4E5F6...
# lease_duration 1h

		

Every request gets a different username and password. If credentials are compromised, they expire automatically. And you have a complete audit trail of who accessed what, when.

Dynamic Azure Credentials

bash

			
# Enable Azure secrets engine
vault secrets enable azure
# Configure Azure
vault write azure/config \
  subscription_id=$AZURE_SUBSCRIPTION_ID \
  tenant_id=$AZURE_TENANT_ID \
  client_id=$AZURE_CLIENT_ID \
  client_secret=$AZURE_CLIENT_SECRET
# Create a role that generates Azure Service Principals
vault write azure/roles/contributor \
  ttl=1h \
  azure_roles=-<<EOF
[
  {
    "role_name": "Contributor",
    "scope": "/subscriptions/$AZURE_SUBSCRIPTION_ID/resourceGroups/production-rg"
  }
]
EOF

		

Step 4: Application Integration

Let’s see how applications actually use Vault. I’ll show you several patterns.

Pattern 1: Vault Agent Sidecar (Kubernetes)

This is my recommended approach for Kubernetes. Vault Agent runs alongside your application and handles authentication and secret retrieval automatically.

yaml

			
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    metadata:
      annotations:
        # These annotations tell Vault Agent what to do
        vault.hashicorp.com/agent-inject: "true"
        vault.hashicorp.com/role: "my-app-role"
        vault.hashicorp.com/agent-inject-secret-db-creds: "database/creds/app-readonly"
        vault.hashicorp.com/agent-inject-template-db-creds: |
          {{- with secret "database/creds/app-readonly" -}}
          export DB_USERNAME="{{ .Data.username }}"
          export DB_PASSWORD="{{ .Data.password }}"
          {{- end }}
    spec:
      serviceAccountName: my-app
      containers:
      - name: my-app
        image: my-app:latest
        command: ["/bin/sh", "-c"]
        args:
          - source /vault/secrets/db-creds && ./start-app.sh

		

When this pod starts, Vault Agent automatically:

Authenticates to Vault using the Kubernetes service account
Retrieves database credentials
Writes them to /vault/secrets/db-creds
Renews the credentials before they expire
Updates the file when credentials change

Your application just reads from a file. It doesn’t need to know anything about Vault.

Pattern 2: Direct SDK Integration

For applications that need more control, you can use the Vault SDK directly:

python

			
# Python example
import hvac
import os
def get_vault_client():
    """Create Vault client using Kubernetes auth."""
    client = hvac.Client(url=os.environ['VAULT_ADDR'])
    
    # Read the service account token
    with open('/var/run/secrets/kubernetes.io/serviceaccount/token') as f:
        jwt = f.read()
    
    # Authenticate to Vault
    client.auth.kubernetes.login(
        role='my-app-role',
        jwt=jwt,
        mount_point='kubernetes'
    )
    
    return client
def get_database_credentials():
    """Get dynamic database credentials."""
    client = get_vault_client()
    
    # Request new database credentials
    response = client.secrets.database.generate_credentials(
        name='app-readonly',
        mount_point='database'
    )
    
    return {
        'username': response['data']['username'],
        'password': response['data']['password'],
        'lease_id': response['lease_id'],
        'lease_duration': response['lease_duration']
    }
def connect_to_database():
    """Connect to database with dynamic credentials."""
    creds = get_database_credentials()
    
    connection = psycopg2.connect(
        host='db.example.com',
        database='appdb',
        user=creds['username'],
        password=creds['password']
    )
    
    return connection

		

Pattern 3: External Secrets Operator

If you prefer Kubernetes-native secrets, use External Secrets Operator to sync Vault secrets to Kubernetes:

yaml

			
# external-secret.yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: app-secrets
spec:
  refreshInterval: 1h
  secretStoreRef:
    kind: ClusterSecretStore
    name: vault-backend
  target:
    name: app-secrets
    creationPolicy: Owner
  data:
  - secretKey: api-key
    remoteRef:
      key: secret/data/app/api-key
      property: value
  - secretKey: db-password
    remoteRef:
      key: secret/data/app/database
      property: password

		

Step 5: Policies and Access Control

Vault policies determine who can access what. Be specific and follow the principle of least privilege.

hcl

			
# app-policy.hcl
# Allow reading dynamic database credentials
path "database/creds/app-readonly" {
  capabilities = ["read"]
}
# Allow reading application secrets
path "secret/data/app/*" {
  capabilities = ["read", "list"]
}
# Deny access to admin paths
path "sys/*" {
  capabilities = ["deny"]
}
# Allow the app to renew its own token
path "auth/token/renew-self" {
  capabilities = ["update"]
}

		

Apply the policy:

bash

			
vault policy write app-policy app-policy.hcl
# Create a Kubernetes auth role that uses this policy
vault write auth/kubernetes/role/my-app-role \
  bound_service_account_names=my-app \
  bound_service_account_namespaces=production \
  policies=app-policy \
  ttl=1h

		

Step 6: Monitoring and Audit

You need visibility into who’s accessing secrets. Enable audit logging:

bash

			
# Enable file audit device
vault audit enable file file_path=/vault/audit/vault-audit.log
# Enable syslog for centralized logging
vault audit enable syslog tag="vault" facility="AUTH"

For monitoring, Vault exposes Prometheus metrics:

yaml

			
# ServiceMonitor for Prometheus
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: vault
  namespace: vault
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: vault
  endpoints:
  - port: http
    path: /v1/sys/metrics
    params:
      format: ["prometheus"]
    scheme: https
    tlsConfig:
      insecureSkipVerify: true

		

Key metrics to alert on:

yaml

			
# Prometheus alerting rules
groups:
- name: vault
  rules:
  - alert: VaultSealed
    expr: vault_core_unsealed == 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "Vault is sealed"
      description: "Vault instance {{ $labels.instance }} is sealed and unable to serve requests"
  
  - alert: VaultTooManyPendingTokens
    expr: vault_token_count > 10000
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Too many Vault tokens"
      description: "Vault has {{ $value }} active tokens. Consider reducing TTLs."
  
  - alert: VaultLeadershipLost
    expr: increase(vault_core_leadership_lost_count[5m]) > 0
    labels:
      severity: warning
    annotations:
      summary: "Vault leadership changes detected"

		

Common Mistakes to Avoid

Let me save you some headaches by sharing mistakes I’ve seen (and made):

Mistake 1: Using the root token for applications

The root token has unlimited access. Create specific policies and tokens for each application.

Mistake 2: Not rotating the root token

After initial setup, generate a new root token and revoke the original:

bash

			
vault operator generate-root -init
# Follow the process to generate a new root token
vault token revoke <old-root-token>

Mistake 3: Setting TTLs too long

Short TTLs mean compromised credentials are valid for less time. Start with 1 hour and adjust based on your needs.

Mistake 4: Not testing recovery procedures

Practice unsealing Vault. Practice recovering from backup. Do it regularly. The worst time to learn is during an actual incident.

Mistake 5: Storing unseal keys together

Distribute unseal keys to different people in different locations. Use a threshold scheme (3 of 5) so no single person can unseal Vault.

Regards, Enjoy the Cloud
Osama

Deep Dive into Oracle Kubernetes Engine Security and Networking in Production

Posted on December 22, 2025 by Osama Mustafa in Cloud, OCI

Oracle Kubernetes Engine is often introduced as a managed Kubernetes service, but its real strength only becomes clear when you operate it in production. OKE tightly integrates with OCI networking, identity, and security services, which gives you a very different operational model compared to other managed Kubernetes platforms.

This article walks through OKE from a production perspective, focusing on security boundaries, networking design, ingress exposure, private access, and mutual TLS. The goal is not to explain Kubernetes basics, but to explain how OKE behaves when you run regulated, enterprise workloads.

Understanding the OKE Networking Model

OKE does not abstract networking away from you. Every cluster is deeply tied to OCI VCN constructs.

Core Components

An OKE cluster consists of:

A managed Kubernetes control plane
Worker nodes running in OCI subnets
OCI networking primitives controlling traffic flow

Key OCI resources involved:

Virtual Cloud Network
Subnets for control plane and workers
Network Security Groups
Route tables
OCI Load Balancers

Unlike some platforms, security in OKE is enforced at multiple layers simultaneously.

Worker Node and Pod Networking

OKE uses OCI VCN-native networking. Pods receive IPs from the subnet CIDR through the OCI CNI plugin.

What this means in practice

Pods are first-class citizens on the VCN
Pod IPs are routable within the VCN
Network policies and OCI NSGs both apply

Example subnet design:

VCN: 10.0.0.0/16

Worker Subnet: 10.0.10.0/24
Load Balancer Subnet: 10.0.20.0/24
Private Endpoint Subnet: 10.0.30.0/24

This design allows you to:

Keep workers private
Expose only ingress through OCI Load Balancer
Control east-west traffic using Kubernetes NetworkPolicies and OCI NSGs together

Security Boundaries in OKE

Security in OKE is layered by design.

Layer 1: OCI IAM and Compartments

OKE clusters live inside OCI compartments. IAM policies control:

Who can create or modify clusters
Who can access worker nodes
Who can manage load balancers and subnets

Example IAM policy snippet:

Allow group OKE-Admins to manage cluster-family in compartment OKE-PROD
Allow group OKE-Admins to manage virtual-network-family in compartment OKE-PROD

This separation is critical for regulated environments.

Layer 2: Network Security Groups

Network Security Groups act as virtual firewalls at the VNIC level.

Typical NSG rules:

Allow node-to-node communication
Allow ingress from load balancer subnet only
Block all public inbound traffic

Example inbound NSG rule:

Source: 10.0.20.0/24
Protocol: TCP
Port: 443

This ensures only the OCI Load Balancer can reach your ingress controller.

Layer 3: Kubernetes Network Policies

NetworkPolicies control pod-level traffic.

Example policy allowing traffic only from ingress namespace:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-ingress
  namespace: app-prod
spec:
  podSelector: {}
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              role: ingress

This blocks all lateral movement by default.

Ingress Design in OKE

OKE integrates natively with OCI Load Balancer.

Public vs Private Ingress

You can deploy ingress in two modes:

Public Load Balancer
Internal Load Balancer

For production workloads, private ingress is strongly recommended.

Example service annotation for private ingress:

service.beta.kubernetes.io/oci-load-balancer-internal: "true"
service.beta.kubernetes.io/oci-load-balancer-subnet1: ocid1.subnet.oc1..

This ensures the load balancer has no public IP.

Private Access to the Cluster Control Plane

OKE supports private API endpoints.

When enabled:

The Kubernetes API is accessible only from the VCN
No public endpoint exists

This is critical for Zero Trust environments.

Operational impact:

kubectl access requires VPN, Bastion, or OCI Cloud Shell inside the VCN
CI/CD runners must have private connectivity

This dramatically reduces the attack surface.

Mutual TLS Inside OKE

TLS termination at ingress is not enough for sensitive workloads. Many enterprises require mTLS between services.

Typical mTLS Architecture

TLS termination at ingress
Internal mTLS between services
Certificate management via Vault or cert-manager

Example cert-manager issuer using OCI Vault:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: oci-vault-issuer
spec:
  vault:
    server: https://vault.oci.oraclecloud.com
    path: pki/sign/oke

Each service receives:

Its own certificate
Short-lived credentials
Automatic rotation

Traffic Flow Example

End-to-end request path:

Client connects to OCI Load Balancer
Load Balancer forwards traffic to NGINX Ingress
Ingress enforces TLS and headers
Service-to-service traffic uses mTLS
NetworkPolicy restricts lateral movement
NSGs enforce VCN-level boundaries

Every hop is authenticated and encrypted.

Observability and Security Visibility

OKE integrates with:

OCI Logging
OCI Flow Logs
Kubernetes audit logs

This allows:

Tracking ingress traffic
Detecting unauthorized access attempts
Correlating pod-level events with network flows

Regards
Osama

Setting up a High-Availability (HA) Architecture with OCI Load Balancer and Compute Instances

Posted on February 25, 2025 by Osama Mustafa in Cloud, OCI

Ensuring high availability (HA) for your applications is critical in today’s cloud-first environment. Oracle Cloud Infrastructure (OCI) provides robust tools such as Load Balancers and Compute Instances to help you create a resilient, highly available architecture for your applications. In this post, we’ll walk through the steps to set up an HA architecture using OCI Load Balancer with multiple compute instances across availability domains for fault tolerance.

Prerequisites

OCI Account: A working Oracle Cloud Infrastructure account.
OCI CLI: Installed and configured with necessary permissions.
Terraform: Installed and set up for provisioning infrastructure.
Basic knowledge of Load Balancers and Compute Instances in OCI.

Step 1: Set Up a Virtual Cloud Network (VCN)

A VCN is required to house your compute instances and load balancers. To begin, create a new VCN with subnets in different availability domains (ADs) for high availability.

Terraform Configuration (vcn.tf):

resource "oci_core_virtual_network" "vcn" {
  compartment_id = "<compartment_ocid>"
  cidr_block     = "10.0.0.0/16"
  display_name   = "HA-Virtual-Network"
}

resource "oci_core_subnet" "subnet1" {
  compartment_id      = "<compartment_ocid>"
  vcn_id              = oci_core_virtual_network.vcn.id
  cidr_block          = "10.0.1.0/24"
  availability_domain = "AD-1"
  display_name        = "HA-Subnet-AD1"
}

resource "oci_core_subnet" "subnet2" {
  compartment_id      = "<compartment_ocid>"
  vcn_id              = oci_core_virtual_network.vcn.id
  cidr_block          = "10.0.2.0/24"
  availability_domain = "AD-2"
  display_name        = "HA-Subnet-AD2"
}

Step 2: Provision Compute Instances

Create two compute instances (one in each subnet) to ensure redundancy.

Terraform Configuration (compute.tf):

resource "oci_core_instance" "instance1" {
  compartment_id = "<compartment_ocid>"
  availability_domain = "AD-1"
  shape = "VM.Standard2.1"
  display_name = "HA-Instance-1"
  
  create_vnic_details {
    subnet_id = oci_core_subnet.subnet1.id
    assign_public_ip = true
  }

  source_details {
    source_type = "image"
    source_id = "<image_ocid>"
  }
}

resource "oci_core_instance" "instance2" {
  compartment_id = "<compartment_ocid>"
  availability_domain = "AD-2"
  shape = "VM.Standard2.1"
  display_name = "HA-Instance-2"
  
  create_vnic_details {
    subnet_id = oci_core_subnet.subnet2.id
    assign_public_ip = true
  }

  source_details {
    source_type = "image"
    source_id = "<image_ocid>"
  }
}

Step 3: Set Up the OCI Load Balancer

Now, configure the OCI Load Balancer to distribute traffic between the compute instances in both availability domains.

Terraform Configuration (load_balancer.tf):

resource "oci_load_balancer_load_balancer" "ha_lb" {
  compartment_id = "<compartment_ocid>"
  display_name   = "HA-Load-Balancer"
  shape           = "100Mbps"

  subnet_ids = [
    oci_core_subnet.subnet1.id,
    oci_core_subnet.subnet2.id
  ]

  backend_sets {
    name = "backend-set-1"

    backends {
      ip_address = oci_core_instance.instance1.private_ip
      port = 80
    }

    backends {
      ip_address = oci_core_instance.instance2.private_ip
      port = 80
    }

    policy = "ROUND_ROBIN"
    health_checker {
      port = 80
      protocol = "HTTP"
      url_path = "/health"
      retries = 3
      timeout_in_seconds = 10
      interval_in_seconds = 5
    }
  }
}

resource "oci_load_balancer_listener" "ha_listener" {
  load_balancer_id = oci_load_balancer_load_balancer.ha_lb.id
  name = "http-listener"
  default_backend_set_name = "backend-set-1"
  port = 80
  protocol = "HTTP"
}

Step 4: Set Up Health Checks for High Availability

Health checks are critical to ensure that the load balancer sends traffic only to healthy instances. The health check configuration is included in the backend set definition above, but you can customize it as needed.
Step 5: Testing and Validation

Once all resources are provisioned, test the HA architecture:

Verify Load Balancer Health: Ensure that the backend instances are marked as healthy by checking the load balancer’s health checks.

oci load-balancer backend-set get --load-balancer-id <load_balancer_id> --name backend-set-1

Access the Application: Test accessing your application through the Load Balancer’s public IP. The Load Balancer should evenly distribute traffic across the two compute instances.
Failover Testing: Manually shut down one of the instances to verify that the Load Balancer reroutes traffic to the other instance.

Automating Block Volume Backups in Oracle Cloud Infrastructure (OCI) using CLI and Terraform

Posted on October 9, 2024 by Osama Mustafa in Cloud, OCI

Briefly introduce the importance of block volumes in OCI and why automated backups are essential.Mention that this blog will cover two methods: using the OCI CLI and Terraform for automation.

Automating Block Volume Backups using OCI CLI

Prerequisites:

Set up OCI CLI on your machine (brief steps with links).
Ensure that you have the right permissions to manage block volumes.

Step-by-step guide:

Command to create a block volume

oci bv volume create --compartment-id <your_compartment_ocid> --availability-domain <your_ad> --display-name "MyVolume" --size-in-gbs 50

Command to take a backup of the block volume:

oci bv backup create --volume-id <your_volume_ocid> --display-name "MyVolumeBackup"

Scheduling backups using cron jobs for automation.

Example cron job configuration

0 2 * * * /usr/local/bin/oci bv backup create --volume-id <your_volume_ocid> --display-name "ScheduledBackup" >> /var/log/oci_backup.log 2>&1

Automating Block Volume Backups using Terraform

Prerequisites

OCI Credentials: Make sure you have the proper API keys and permissions configured in your OCI tenancy.
Terraform Setup: Terraform should be installed and configured to interact with OCI, including the OCI provider setup in your environment.

Step 1: Define the OCI Block Volume Resource

First, define the block volume that you want to automate backups for. Here’s an example of a simple block volume resource in Terraform:

resource "oci_core_volume" "my_block_volume" {
  availability_domain = "your-availability-domain"
  compartment_id      = "ocid1.compartment.oc1..your-compartment-id"
  display_name        = "my_block_volume"
  size_in_gbs         = 50
}

Step 2: Define a Backup Policy

OCI provides predefined backup policies such as gold, silver, and bronze, which define how frequently backups are taken. You can create a custom backup policy as well, but for simplicity, we’ll use one of the predefined policies in this example. The Terraform resource oci_core_volume_backup_policy_assignment will assign a backup policy to the block volume.

Here’s an example to assign the gold backup policy to the block volume:

resource "oci_core_volume_backup_policy_assignment" "backup_assignment" {
  volume_id       = oci_core_volume.my_block_volume.id
  policy_id       = data.oci_core_volume_backup_policy.gold.id
}

data "oci_core_volume_backup_policy" "gold" {
  name = "gold"
}

Step 3: Custom Backup Policy (Optional)

If you need a custom backup policy rather than using the predefined gold, silver, or bronze policies, you can define a custom backup policy using OCI’s native scheduling.

You can create a custom schedule by combining these elements in your oci_core_volume_backup_policy resource.

resource "oci_core_volume_backup_policy" "custom_backup_policy" {
  compartment_id = "ocid1.compartment.oc1..your-compartment-id"
  display_name   = "CustomBackupPolicy"

  schedules {
    backup_type = "INCREMENTAL"
    period      = "ONE_DAY"
    retention_duration = "THIRTY_DAYS"
  }

  schedules {
    backup_type = "FULL"
    period      = "ONE_WEEK"
    retention_duration = "NINETY_DAYS"
  }
}

You can then assign this policy to the block volume using the same method as earlier.

Step 4: Apply the Terraform Configuration

Once your Terraform configuration is ready, apply it using the standard Terraform workflow:

Initialize Terraform:

terraform init

Plan the Terraform deployment:

terraform plan

Apply the Terraform plan:

terraform apply

This process will automatically provision your block volumes and assign the specified backup policy.

Regards
Osama

Automating Cloud Infrastructure Management with OCI Resource Manager

Posted on July 20, 2024 by Osama Mustafa in Uncategorized

Setting Up OCI Resource Manager

Creating a Stack:

Log in to the OCI Console.
Navigate to Resource Manager → Stacks → Create Stack.
Upload your Terraform configuration file.

Example Terraform Configuration:

provider "oci" {
  region = "us-ashburn-1"
}

resource "oci_core_instance" "my_instance" {
  availability_domain = "AD-1"
  compartment_id = "<compartment_OCID>"
  shape = "VM.Standard2.1"
  display_name = "MyInstance"
  image_id = "<image_OCID>"
  subnet_id = "<subnet_OCID>"

  source_details {
    source_type = "image"
    image_id = "<image_OCID>"
  }

  metadata = {
    ssh_authorized_keys = file("~/.ssh/id_rsa.pub")
  }
}

Deploying Infrastructure with Resource Manager

Creating a Job:

oci resource-manager stack create-job --stack-id <stack_OCID> --display-name "MyDeploymentJob" --operation-type APPLY

Monitoring Deployment:

oci resource-manager job list --stack-id <stack_OCID>

Managing and Updating Infrastructure

Updating a Stack:
- Modify the Terraform configuration file.
- Navigate to Resource Manager → Stacks → Update Stack.
- Upload the updated Terraform configuration file and apply changes.

Destroying Infrastructure:

oci resource-manager stack create-job --stack-id <stack_OCID> --display-name "DestroyJob" --operation-type DESTROY

Integrating with CI/CD Pipelines

Example Integration with GitHub Actions:

name: Deploy to OCI

on:
  push:
    branches:
      - main

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v2

      - name: Set up Terraform
        uses: hashicorp/setup-terraform@v1

      - name: Terraform Init
        run: terraform init

      - name: Terraform Apply
        run: terraform apply -auto-approve
        env:
          OCI_REGION: ${{ secrets.OCI_REGION }}
          OCI_TENANCY_OCID: ${{ secrets.OCI_TENANCY_OCID }}
          OCI_USER_OCID: ${{ secrets.OCI_USER_OCID }}
          OCI_FINGERPRINT: ${{ secrets.OCI_FINGERPRINT }}
          OCI_PRIVATE_KEY_PATH: ${{ secrets.OCI_PRIVATE_KEY_PATH }}
          OCI_PRIVATE_KEY_PASSPHRASE: ${{ secrets.OCI_PRIVATE_KEY_PASSPHRASE }}

Thank you

Osama

Connect to AKS cluster nodes

Posted on March 10, 2023 by Osama Mustafa in Azure, Cloud

sometimes you need to access AKS worker node to troubelshoot, but how to do that with AKS

Run the below command

kubectl get nodes

Output will give an idea about the worker nodes you have

Run a container image on the node by issuing the kubectl debug command in order to establish a connection to it. The following command begins the process of connecting to a privileged container that has been started on your node.

kubectl debug node/<node-name-you-wish-to-connect> -it --image=mcr.microsoft.com/dotnet/runtime-deps:6.0

Regards

Osama

Storing Container Data in Azure Blob Storage

Posted on July 3, 2021July 3, 2021 by Osama Mustafa in DevOps Tools

This time how to store your data to Azure Blog Storage 👍

Let’s start

Configuration

Obtain the Azure login credentials

az login

Copy the code provided by the command.
Open a browser and navigate to https://microsoft.com/devicelogin.
Enter the code copied in a previous step and click Next.
Use the login credentials from the lab page to finish logging in.
Switch back to the terminal and wait for the confirmation.

Storage

Find the name of the Storage account

 az storage account list | grep name | head -1

Copy the name of the Storage account to the clipboard.

Export the Storage account name

 export AZURE_STORAGE_ACCOUNT=<COPIED_STORAGE_ACCOUNT_NAME>

Retrieve the Storage access key

az storage account keys list --account-name=$AZURE_STORAGE_ACCOUNT

Copy the key1 “value” for later use.

Export the key value

export AZURE_STORAGE_ACCESS_KEY=<KEY1_VALUE>

Install blobfuse

sudo rpm -Uvh https://packages.microsoft.com/config/rhel/7/packages-microsoft-prod.rpm
sudo yum install blobfuse fuse -y

Modify the fuse.conf configuration file

sudo sed -ri 's/# user_allow_other/user_allow_other/' /etc/fuse.conf

Use Azure Blob container Storage

Create necessary directories

sudo mkdir -p /mnt/Osama /mnt/blobfusetmp

Change ownership of the directories

sudo chown cloud_user /mnt/Osama/ /mnt/blobfusetmp/

Mount the Blob Storage from Azure

blobfuse /mnt/Osama --container-name=website --tmp-path=/mnt/blobfusetmp -o allow_other

Copy What you want to the files into the Blob Storage container for example website files.

 cp -r ~/web/* /mnt/Osama/

Verify the copy worked

ll /mnt/Osama/

Verify the files made it to Azure Blob Storage

az storage blob list -c website --output table

Finally, Run a Docker container using the azure blob storage

docker run -d --name web1 -p 80:80 --mount type=bind,source=/mnt/Osama,target=/usr/local/apache2/htdocs,readonly httpd:2.4

Enjoy 🎉😁

Osama

Azure Terraform Code

Posted on March 19, 2021March 19, 2021 by Osama Mustafa in DevOps Tools

This is post to share complete setup of Azure infrastructure , that contains multiple VMS and subnets.

Code is uploaded to GitHub Here.

Enjoy the Auttomation

Osama

Setting up a Jenkins-Based Continuous Delivery Pipeline with Docker

Posted on August 4, 2020 by Osama Mustafa in Cloud

As an important step in agile development, continuous integration is designed to maintain high quality while accelerating product iteration. Every time when the codes are updated, an automatic test is performed to test the codes and function validity. The codes can only be delivered and deployed after they pass the automatic test, This post describes how to combine Jenkins, one of the most popular integration tools, with Alibaba Cloud Container Service to realize automatic test and image building pushing.

Deploying Jenkins Applications and the Slave Nodes

1. Create a Jenkins orchestration template.

Create a new template and create the orchestration based on the following content.

jenkins:  image: 'registry.aliyuncs.com/acs-sample/jenkins:latest'  ports:      - '8080:8080'      - '50000:50000'  volumes:      - /var/lib/docker/jenkins:/var/jenkins_home  privileged: true  restart: always   labels:      aliyun.scale: '1'      aliyun.probe.url: 'tcp://container:8080'      aliyun.probe.initial_delay_seconds: '10'      aliyun.routing.port_8080: jenkins  links:      - slave-nodejs slave-nodejs:  image: 'registry.aliyuncs.com/acs-sample/jenkins-slave-dind-nodejs'  restart: always   volumes:      - /var/run/docker.sock:/var/run/docker.sock  labels:      aliyun.scale: '1'

2. Use the template to create Jenkins applications and slave nodes.

You can also directly use a Jenkins sample template provided by Alibaba Cloud Container Service to create Jenkins applications and slave nodes.

3. After the successful creation, Jenkins applications and slave nodes will be displayed in the service list.

4. After opening the access endpoint provided by the Container Service, you can use the Jenkins application deployed just now.

Realizing Automatic Test and Automatic Build and Push of Image

Configure the slave container as the slave node of the Jenkins application.

Open the Jenkins application and enter the System Settings interface. Select Manage Node > Create Node, and configure corresponding parameters. See the figure below.

Note: Label is the only identifier of the slave. The slave container and Jenkins container run on the Alibaba Cloud platform at the same time. Therefore, you can fill in a container node IP address that is inaccessible to the Internet to isolate the test environment.

Use the jenkins account and password (the initial password is jenkins) in Dockerfile for the creation of the slave-nodejs image when adding Credential. Image Dockerfile address HERE

1. Create a project to implement the automatic test.

Create an item and choose to build a software project of free style.
Enter the project name and select a node for running the project. In this example, enter the slave-nodejs-ut node created above.

Configure the source code management and code branch. In this example, use GitHub to manage source codes.

Configure the trigger for building. In this example, automatically trigger project execution by combining GitHub Webhooks and services.

Add the Jenkins service hook to GitHub to implement automatic triggering.

Click the Settings tab on the Github project homepage, and click Webhooks & services > Add service and select Jenkins (Git plugin). Enter ${Jenkins IP}/github-webhook/ in the Jenkins hook URL dialog box.

1. http://jenkins.cd****************.cn-beijing.alicontainer.com/github-webhook/

Add a build step of Executes shell type and write shell scripts to execute the test.

The command in this example is as follows.

1. pwd
2. ls
3. cd chapter2
4. npm test

Create a project to automatically build and push images.

Create an item and choose to build a software project of free style.
Enter the project name and select a node for running the project. In this example, enter the slave-nodejs-ut node created above.
Configure the source code management and code branch. In this example, use GitHub to manage source codes.
Add the following trigger and set it to implement automatic image building only after success of the unit test.

Write shell scripts for building and pushing images.

The command in this example is as follows.

a.cd chapter2 b.docker build -t registry.aliyuncs.com/qinyujia-test/nodejs-demo . c.docker login -u ${yourAccount} -p ${yourPassword} registry.aliyuncs.com d.docker push registry.aliyuncs.com/qinyujia-test/nodejs-demo

Automatically Redeploy the Application

Deploy the application for the first time

Use the orchestration template to deploy the image created above to the Container Service and create the nodejs-demo application.

Example

1. 
2. express:
3. image: 'registry.aliyuncs.com/qinyujia-test/nodejs-demo'
4. expose:
5. - '22'
6. - '3000'
7. restart: always
8. labels:
9. aliyun.routing.port_3000: express
10.

1. Select the application nodejs-demo just created, and create the trigger.

Add a line to the shell scripts you wrote in Realize automatic test and automatic build and push of image. The address is the trigger link given by the trigger created above.

i.curl 'https://cs.console.aliyun.com/hook/trigger?triggerUrl=***==&secret=***'

Change the Command in the example from Realize automatic test and automatic build and push of image as follows.

i. cd chapter2
ii. docker build -t registry.aliyuncs.com/qinyujia-test/nodejs-demo .
iii. docker login -u ${yourAccount} -p ${yourPassword} registry.aliyuncs.com iv.docker push registry.aliyuncs.com/qinyujia-test/nodejs-demo
v. curl 'https://cs.console.aliyun.com/hook/trigger?triggerUrl=***==&secret=***'

After pushing the image, Jenkins automatically triggers redeployment of the nodejs-demo application.

Configure The Email Notification for the Results

If you want to send the unit test or image configuration results to relevant developers or project execution initiators through email, perform the following configurations.

On the Jenkins homepage, click System Management > System Settings, and configure a Jenkins system administrator email.

Install the Extended Email Notification plugin, configure SMTP server and other relevant information, and set the default recipient list. See the figure below.

The above example shows the parameter settings of the Jenkins application system. The following example shows the relevant configurations for Jenkins projects whose results are to be pushed through email.

1. Add post-building operation steps in the Jenkins project, select Editable Email Notification, and enter a recipient list.

2. Add a mailing trigger.

Cheers

Osama

Create a Serverless Website with Alibaba Cloud Function Compute

Posted on July 30, 2020 by Osama Mustafa in Cloud

Regarding to Wikipedia, Serverless computing is a cloud computing execution model in which the cloud provider runs the server, and dynamically manages the allocation of machine resources. Pricing is based on the actual amount of resources consumed by an application, rather than on pre-purchased units of capacity

Today i will show you an example how to create serverless website but this time not using Amazon AWS, Azure or OCI but Alibaba Cloud Provider.

Create a Function Compute Service

Go to the console page and click through to Function Compute.

Click the add button beside Services.

In the Service slide out, give your service a name, an optional description, and then slide open the Advanced Settings.

In Advanced Settings you can grant access for Functions to the Internet, to VPC resources, and you can attach storage and a log service to a Function. You can also configure roles.

For our tutorial, we will need Internet access so make sure this configuration is on.

We will leave VPC and Log Configs as they are.

In the Role Config section, select Create New Role, and in the dropdown list pick AliyunOSSReadOnlyAccess as we will be accessing our static webpages from an Object Storage Service bucket.

Click Authorize.

You will see a summary of the Role you created.

Click Confirm Authorization Policy.

You have successfully added the Role to the Service.

Click OK.

ou will see the details of the Function Compute Service you just created.

Now let’s create a Function in the Service. Click the add button next to Functions.

You will see the Create Function process. The first part of the process is Function Template.

There are many Function Templates available, including an empty Function for writing your own bespoke Functions.

Alibaba Cloud-supplied Template Functions are very useful as they have relevant method invocation and demo code for getting started quickly with Function Compute.

let’s choose the flask-web Function written in Python2.7.

Click Select.

We are now at the Configure Triggers section of creating a Function.

Select HTTP Trigger from the dropdown list. Give the Trigger a name and choose Authorization details (anonymous does not require authorization).

Choose your HTTP methods and click Next. We are going to build a simple web-form application so we will need both the GET and POST HTTP methods.

Now we arrive at the Configure Function Settings.

Give the Function a name then scroll down to Code details.

We’ll leave the supplied code for now. Scroll down to below the code sample.

You will see Environment Variable input options and Runtime Environment details.

Click Next.

Click Next at Configure Function Permissions.

Verify the Configuration details and click Create.

You will arrive at the Function’s IDE. Here you can enter new code, edit the code directly, upload code folders, run, test, and fix your code.

Scroll down.

Copy the URL as we will need to add this to our static webpages so they can connect to our Function Compute Service and Function.

Set Up and Configure an OSS Bucket

Click through to Object Storage Service on the Products page.

If you haven’t yet activated Object Storage Service, go ahead and activate it. In the OSS console, click Create Bucket.

Choose a name for the OSS Bucket and pick the region – you cannot change the region later. Select the Storage Class – you also cannot change this later.

We have selected Public Read for the Access Control List.

When you’re ready, click OK.

You will see the Overview page for your bucket. Make a note of the public Internet URL.

In the Files tab, upload your static web files.

I uploaded a simple index.html homepage and a background picture.

<script type="text/javascript">
        const functionURL = '<<Function URL>>';
        const doHome = new XMLHttpRequest();
doHome.open('GET', functionURL, true);
doHome.onload = function () {    
document.getElementById('home_message').innerHTML = doHome.responseText;
        };
        doHome.send();
</script>

In Basic Settings, click Configure to configure your Static Pages.

Add the homepage details and click Save.

Now go to a new browser window and access the OSS URL you saved earlier.

Back in the Function Compute console, you can now test the flask-app paths directly from the code.

We already tested index.html with no Path variable. Next, we test the app route signin with GET and check the Headers and status code.

The signin page code is working correctly. You can also check the Body to make sure the correct HTML will render on the page. Notice that because I entered the path variable, signin is appended to the URL.

Of course, any errors you encounter will show up in the Logs section for easy debugging.

Now, let’s test this page on the Internet.

If you get an error here, implement a soft link for the page in OSS. Go to the OSS bucket and click More dropdown for the HTML file in question and choose Set soft link.

Give the link a name and click OK.

A link file will appear in the list of static files and you will now be able to access the page online with the relevant soft link and it will render as above.

Back in Function Compute, we can test the POST method in the console with the correct username and password details in the same way.

Add the POST variables to the form upload section in the Body tab.

Now you can test this function online.

Cheers

Osama