OCI – Page 2 – Technology Geek

Building Enterprise Event-Driven Architectures with OCI Functions and Streaming

Posted on June 12, 2025 by Osama Mustafa in Cloud, OCI

Oracle Cloud Infrastructure Functions provides a powerful serverless computing platform that integrates seamlessly with OCI’s event-driven services. This deep-dive explores advanced patterns for building resilient, scalable event-driven architectures using OCI Functions, Streaming, Events, and Notifications services with real-time data processing capabilities.

OCI Functions Architecture and Event Integration

OCI Functions operates on a containerized execution model where each function runs in isolated containers managed by the Fn Project runtime. The service automatically handles scaling, from zero to thousands of concurrent executions, based on incoming event volume.

The event integration architecture centers around multiple trigger mechanisms. HTTP triggers provide direct REST API endpoints for synchronous invocations. OCI Events service enables asynchronous function execution based on resource state changes across OCI services. Streaming triggers process high-volume data streams in real-time, while Object Storage triggers respond to bucket operations.

Unlike traditional serverless platforms, OCI Functions provides deep integration with Oracle’s enterprise services stack, including Autonomous Database, Analytics Cloud, and Integration Cloud. This native integration eliminates the need for complex authentication mechanisms and network configurations typically required in multi-cloud architectures.

Advanced Streaming Integration Patterns

OCI Streaming service provides Apache Kafka-compatible message streaming with enterprise-grade durability and performance. Functions can consume streaming data using multiple consumption patterns, each optimized for specific use cases.
Single-partition consumption works well for ordered processing requirements where message sequence matters. The function processes messages sequentially from a single partition, ensuring strict ordering but limiting throughput to single-function concurrency.
Multi-partition consumption enables parallel processing across multiple partitions, dramatically increasing throughput for scenarios where message ordering within the entire stream isn’t critical. Each partition can trigger separate function instances, enabling horizontal scaling based on partition count.
Batch processing consumption accumulates messages over configurable time windows or message count thresholds before triggering function execution. This pattern optimizes for cost efficiency and reduces per-invocation overhead for high-volume scenarios.

Production Implementation Example

Here’s a comprehensive implementation of a real-time fraud detection system using OCI Functions with streaming integration:

Infrastructure as Code Setup

# fn.yaml - Function Configuration
schema_version: 20180708
name: fraud-detection-app
version: 0.0.1
runtime: python
build_image: fnproject/python:3.9-dev
run_image: fnproject/python:3.9
entrypoint: /python/bin/fdk /function/func.py handler
memory: 512
timeout: 300
config:
  STREAM_OCID: ${STREAM_OCID}
  DB_CONNECTION_STRING: ${DB_CONNECTION_STRING}
  NOTIFICATION_TOPIC_OCID: ${NOTIFICATION_TOPIC_OCID}
  COMPARTMENT_OCID: ${COMPARTMENT_OCID}
triggers:
  - name: fraud-detection-trigger
    type: oracle-streaming
    source: ${STREAM_OCID}
    config:
      batchSize: 10
      parallelism: 5
      startingPosition: LATEST

Function Implementation with Advanced Error Handling

import io
import json
import logging
import oci
import cx_Oracle
from datetime import datetime, timedelta
from typing import Dict, List, Any
import asyncio
import aiohttp

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class FraudDetectionProcessor:
    def __init__(self):
        self.signer = oci.auth.signers.get_resource_principals_signer()
        self.streaming_client = oci.streaming.StreamClient({}, signer=self.signer)
        self.ons_client = oci.ons.NotificationDataPlaneClient({}, signer=self.signer)
        self.monitoring_client = oci.monitoring.MonitoringClient({}, signer=self.signer)
        
        # Database connection pool
        self.connection_pool = self._create_db_pool()
        
        # Fraud detection models
        self.velocity_threshold = 5  # transactions per minute
        self.amount_threshold = 1000.0
        self.geo_velocity_threshold = 100  # km/hour
        
    def _create_db_pool(self):
        """Create database connection pool for high concurrency"""
        try:
            pool = cx_Oracle.create_pool(
                user=os.environ['DB_USER'],
                password=os.environ['DB_PASSWORD'],
                dsn=os.environ['DB_CONNECTION_STRING'],
                min=2,
                max=10,
                increment=1,
                threaded=True
            )
            return pool
        except Exception as e:
            logger.error(f"Failed to create DB pool: {str(e)}")
            raise

    async def process_transaction_batch(self, transactions: List[Dict]) -> List[Dict]:
        """Process batch of transactions for fraud detection"""
        results = []
        
        # Process transactions concurrently
        tasks = [self._analyze_transaction(tx) for tx in transactions]
        analysis_results = await asyncio.gather(*tasks, return_exceptions=True)
        
        for i, result in enumerate(analysis_results):
            if isinstance(result, Exception):
                logger.error(f"Error processing transaction {transactions[i]['transaction_id']}: {str(result)}")
                results.append({
                    'transaction_id': transactions[i]['transaction_id'],
                    'status': 'error',
                    'error': str(result)
                })
            else:
                results.append(result)
        
        return results

    async def _analyze_transaction(self, transaction: Dict) -> Dict:
        """Analyze individual transaction for fraud indicators"""
        transaction_id = transaction['transaction_id']
        user_id = transaction['user_id']
        amount = float(transaction['amount'])
        location = transaction.get('location', {})
        timestamp = datetime.fromisoformat(transaction['timestamp'])
        
        fraud_score = 0
        fraud_indicators = []
        
        # Velocity analysis
        velocity_score = await self._check_velocity_fraud(user_id, timestamp)
        fraud_score += velocity_score
        if velocity_score > 0:
            fraud_indicators.append('high_velocity')
        
        # Amount analysis
        if amount > self.amount_threshold:
            amount_score = min((amount / self.amount_threshold) * 10, 50)
            fraud_score += amount_score
            fraud_indicators.append('high_amount')
        
        # Geographic analysis
        geo_score = await self._check_geographic_fraud(user_id, location, timestamp)
        fraud_score += geo_score
        if geo_score > 0:
            fraud_indicators.append('geographic_anomaly')
        
        # Pattern analysis
        pattern_score = await self._check_pattern_fraud(user_id, transaction)
        fraud_score += pattern_score
        if pattern_score > 0:
            fraud_indicators.append('suspicious_pattern')
        
        # Determine fraud status
        if fraud_score >= 70:
            status = 'blocked'
        elif fraud_score >= 40:
            status = 'review'
        else:
            status = 'approved'
        
        result = {
            'transaction_id': transaction_id,
            'user_id': user_id,
            'fraud_score': fraud_score,
            'fraud_indicators': fraud_indicators,
            'status': status,
            'processed_at': datetime.utcnow().isoformat()
        }
        
        # Store results and trigger alerts if needed
        await self._store_analysis_result(result)
        if status in ['blocked', 'review']:
            await self._trigger_fraud_alert(result, transaction)
        
        return result

    async def _check_velocity_fraud(self, user_id: str, timestamp: datetime) -> float:
        """Check transaction velocity for fraud indicators"""
        try:
            connection = self.connection_pool.acquire()
            cursor = connection.cursor()
            
            # Check transactions in last 5 minutes
            time_window = timestamp - timedelta(minutes=5)
            
            cursor.execute("""
                SELECT COUNT(*) 
                FROM transactions 
                WHERE user_id = :user_id 
                AND transaction_time > :time_window
                AND transaction_time <= :current_time
            """, {
                'user_id': user_id,
                'time_window': time_window,
                'current_time': timestamp
            })
            
            count = cursor.fetchone()[0]
            cursor.close()
            self.connection_pool.release(connection)
            
            if count >= self.velocity_threshold:
                return min(count * 5, 30)  # Cap at 30 points
            return 0
            
        except Exception as e:
            logger.error(f"Velocity check error for user {user_id}: {str(e)}")
            return 0

    async def _check_geographic_fraud(self, user_id: str, location: Dict, timestamp: datetime) -> float:
        """Check for impossible geographic velocity"""
        if not location or 'latitude' not in location:
            return 0
            
        try:
            connection = self.connection_pool.acquire()
            cursor = connection.cursor()
            
            # Get last transaction location within 2 hours
            time_window = timestamp - timedelta(hours=2)
            
            cursor.execute("""
                SELECT latitude, longitude, transaction_time
                FROM transactions 
                WHERE user_id = :user_id 
                AND transaction_time > :time_window
                AND transaction_time < :current_time
                AND latitude IS NOT NULL
                ORDER BY transaction_time DESC
                FETCH FIRST 1 ROW ONLY
            """, {
                'user_id': user_id,
                'time_window': time_window,
                'current_time': timestamp
            })
            
            result = cursor.fetchone()
            cursor.close()
            self.connection_pool.release(connection)
            
            if result:
                last_lat, last_lon, last_time = result
                distance = self._calculate_distance(
                    last_lat, last_lon,
                    location['latitude'], location['longitude']
                )
                
                time_diff = (timestamp - last_time).total_seconds() / 3600  # hours
                if time_diff > 0:
                    velocity = distance / time_diff  # km/hour
                    if velocity > self.geo_velocity_threshold:
                        return min(velocity / 10, 40)  # Cap at 40 points
            
            return 0
            
        except Exception as e:
            logger.error(f"Geographic check error for user {user_id}: {str(e)}")
            return 0

    def _calculate_distance(self, lat1: float, lon1: float, lat2: float, lon2: float) -> float:
        """Calculate distance between two points using Haversine formula"""
        from math import radians, cos, sin, asin, sqrt
        
        lat1, lon1, lat2, lon2 = map(radians, [lat1, lon1, lat2, lon2])
        dlat = lat2 - lat1
        dlon = lon2 - lon1
        a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
        return 2 * asin(sqrt(a)) * 6371  # Earth radius in km

    async def _check_pattern_fraud(self, user_id: str, transaction: Dict) -> float:
        """Check for suspicious transaction patterns"""
        try:
            connection = self.connection_pool.acquire()
            cursor = connection.cursor()
            
            # Check for round-number bias (common fraud indicator)
            amount = float(transaction['amount'])
            if amount % 100 == 0 and amount >= 500:
                return 15
            
            # Check for repeated exact amounts
            cursor.execute("""
                SELECT COUNT(*) 
                FROM transactions 
                WHERE user_id = :user_id 
                AND amount = :amount
                AND transaction_time > SYSDATE - 7
            """, {
                'user_id': user_id,
                'amount': amount
            })
            
            repeat_count = cursor.fetchone()[0]
            cursor.close()
            self.connection_pool.release(connection)
            
            if repeat_count >= 3:
                return min(repeat_count * 5, 25)
            
            return 0
            
        except Exception as e:
            logger.error(f"Pattern check error for user {user_id}: {str(e)}")
            return 0

    async def _store_analysis_result(self, result: Dict):
        """Store fraud analysis result in database"""
        try:
            connection = self.connection_pool.acquire()
            cursor = connection.cursor()
            
            cursor.execute("""
                INSERT INTO fraud_analysis 
                (transaction_id, user_id, fraud_score, fraud_indicators, 
                 status, processed_at, created_at)
                VALUES (:transaction_id, :user_id, :fraud_score, 
                        :fraud_indicators, :status, :processed_at, SYSDATE)
            """, {
                'transaction_id': result['transaction_id'],
                'user_id': result['user_id'],
                'fraud_score': result['fraud_score'],
                'fraud_indicators': ','.join(result['fraud_indicators']),
                'status': result['status'],
                'processed_at': result['processed_at']
            })
            
            connection.commit()
            cursor.close()
            self.connection_pool.release(connection)
            
        except Exception as e:
            logger.error(f"Failed to store analysis result: {str(e)}")

    async def _trigger_fraud_alert(self, result: Dict, transaction: Dict):
        """Trigger fraud alert through OCI Notifications"""
        try:
            message = {
                'alert_type': 'fraud_detection',
                'transaction_id': result['transaction_id'],
                'user_id': result['user_id'],
                'fraud_score': result['fraud_score'],
                'status': result['status'],
                'amount': transaction['amount'],
                'indicators': result['fraud_indicators'],
                'timestamp': result['processed_at']
            }
            
            # Publish to ONS topic
            publish_result = self.ons_client.publish_message(
                topic_id=os.environ['NOTIFICATION_TOPIC_OCID'],
                message_details=oci.ons.models.MessageDetails(
                    body=json.dumps(message),
                    title=f"Fraud Alert - {result['status'].upper()}"
                )
            )
            
            logger.info(f"Fraud alert sent for transaction {result['transaction_id']}")
            
            # Send custom metrics
            await self._send_metrics(result)
            
        except Exception as e:
            logger.error(f"Failed to send fraud alert: {str(e)}")

    async def _send_metrics(self, result: Dict):
        """Send custom metrics to OCI Monitoring"""
        try:
            metric_data = [
                oci.monitoring.models.MetricDataDetails(
                    namespace="fraud_detection",
                    compartment_id=os.environ['COMPARTMENT_OCID'],
                    name="fraud_score",
                    dimensions={'status': result['status']},
                    datapoints=[
                        oci.monitoring.models.Datapoint(
                            timestamp=datetime.utcnow(),
                            value=result['fraud_score']
                        )
                    ]
                )
            ]
            
            self.monitoring_client.post_metric_data(
                post_metric_data_details=oci.monitoring.models.PostMetricDataDetails(
                    metric_data=metric_data
                )
            )
            
        except Exception as e:
            logger.error(f"Failed to send metrics: {str(e)}")

# Function handler
def handler(ctx, data: io.BytesIO = None):
    """Main function handler for OCI Functions"""
    processor = FraudDetectionProcessor()
    
    try:
        # Parse streaming data
        streaming_data = json.loads(data.getvalue())
        transactions = []
        
        # Extract transactions from stream messages
        for message in streaming_data.get('messages', []):
            transaction_data = json.loads(message['value'])
            transactions.append(transaction_data)
        
        # Process transactions
        loop = asyncio.new_event_loop()
        asyncio.set_event_loop(loop)
        results = loop.run_until_complete(
            processor.process_transaction_batch(transactions)
        )
        loop.close()
        
        # Prepare response
        response = {
            'processed_count': len(results),
            'results': results,
            'processing_time': datetime.utcnow().isoformat()
        }
        
        logger.info(f"Processed {len(results)} transactions")
        return response
        
    except Exception as e:
        logger.error(f"Function execution error: {str(e)}")
        return {
            'error': str(e),
            'timestamp': datetime.utcnow().isoformat()
        }

Deployment and Configuration Scripts

#!/bin/bash
# deploy.sh - Automated deployment script

set -e

# Configuration
APP_NAME="fraud-detection-app"
FUNCTION_NAME="fraud-processor"
COMPARTMENT_OCID="your-compartment-ocid"
STREAM_OCID="your-stream-ocid"
NOTIFICATION_TOPIC_OCID="your-notification-topic-ocid"

echo "Deploying fraud detection function..."

# Create application if it doesn't exist
fn create app $APP_NAME --annotation oracle.com/oci/subnetIds='["subnet-ocid"]' || true

# Configure environment variables
fn config app $APP_NAME STREAM_OCID $STREAM_OCID
fn config app $APP_NAME NOTIFICATION_TOPIC_OCID $NOTIFICATION_TOPIC_OCID
fn config app $APP_NAME COMPARTMENT_OCID $COMPARTMENT_OCID
fn config app $APP_NAME DB_CONNECTION_STRING $DB_CONNECTION_STRING

# Deploy function
fn deploy --app $APP_NAME --no-bump

# Create trigger for streaming
echo "Creating streaming trigger..."
oci fn trigger create \
    --display-name "fraud-detection-trigger" \
    --function-id $(fn inspect fn $APP_NAME $FUNCTION_NAME | jq -r '.id') \
    --type streaming \
    --source-details '{"streamId":"'$STREAM_OCID'","batchSizeInKbs":64,"batchTimeInSeconds":5}'

echo "Deployment completed successfully!"

Monitoring and Observability

Production serverless architectures require comprehensive monitoring and observability. OCI Functions integrates with multiple observability services to provide complete visibility into function performance and business metrics.

Function-level metrics automatically track invocation count, duration, errors, and memory utilization. These metrics feed into custom dashboards for operational monitoring and capacity planning.

Distributed tracing capabilities track request flows across multiple functions and services, essential for debugging complex event-driven workflows. Integration with OCI Application Performance Monitoring provides detailed transaction traces with performance bottleneck identification.

Custom business metrics can be published using the OCI Monitoring service, enabling tracking of domain-specific KPIs like fraud detection rates, processing latency, and accuracy metrics.

Error Handling and Resilience Patterns

Enterprise-grade serverless applications require robust error handling and resilience patterns. Dead letter queues capture failed events for later processing or manual investigation. Circuit breaker patterns prevent cascade failures by temporarily disabling failing downstream services.

Exponential backoff with jitter implements reliable retry mechanisms for transient failures. The implementation includes configurable retry limits and backoff multipliers to balance between quick recovery and system stability.

Bulkhead isolation separates different workload types using separate function applications and resource pools, preventing resource contention between critical and non-critical workloads.

This comprehensive approach to OCI Functions enables building production-grade, event-driven architectures that can handle enterprise-scale workloads with high reliability and performance requirements.

Advanced OCI Container Engine (OKE) with Network Security and Observability

Posted on June 8, 2025 by Osama Mustafa in Cloud, OCI

Oracle Cloud Infrastructure Container Engine for Kubernetes (OKE) provides enterprise-grade Kubernetes clusters with deep integration into OCI’s native services. This comprehensive guide explores advanced OKE configurations, focusing on network security policies, observability integration, and automated deployment strategies that enterprise teams need for production workloads.

OKE Architecture Deep Dive

OKE operates on a managed control plane architecture where Oracle handles the Kubernetes master nodes, etcd, and API server components. This design eliminates operational overhead while providing high availability across multiple availability domains.

The service integrates seamlessly with OCI’s networking fabric, allowing granular control over pod-to-pod communication, ingress traffic management, and service mesh implementations. Unlike managed Kubernetes services from other providers, OKE provides native integration with Oracle’s enterprise security stack, including Identity and Access Management (IAM), Key Management Service (KMS), and Web Application Firewall (WAF).

Worker nodes run on OCI Compute instances, providing flexibility in choosing instance shapes, including bare metal, GPU-enabled, and ARM-based Ampere processors. The networking layer supports both flannel and OCI VCN-native pod networking, enabling direct integration with existing network security policies.

Advanced Networking Configuration

OKE’s network architecture supports multiple pod networking modes. The VCN-native pod networking mode assigns each pod an IP address from your VCN’s CIDR range, enabling direct application of network security lists and route tables to pod traffic.

This approach provides several advantages over traditional overlay networking. Security policies become more granular since you can apply network security lists directly to pod traffic. Network troubleshooting becomes simpler as pod traffic flows through standard OCI networking constructs. Integration with existing network monitoring tools works seamlessly since pod traffic appears as regular VCN traffic.

Load balancing integrates deeply with OCI’s Load Balancing service, supporting both Layer 4 and Layer 7 load balancing with SSL termination, session persistence, and health checking capabilities.

Production-Ready Implementation Example

Here’s a comprehensive example that demonstrates deploying a highly available OKE cluster with advanced security and monitoring configurations:

Terraform Configuration for OKE Cluster

# OKE Cluster with Enhanced Security
resource "oci_containerengine_cluster" "production_cluster" {
  compartment_id     = var.compartment_id
  kubernetes_version = var.kubernetes_version
  name              = "production-oke-cluster"
  vcn_id            = oci_core_vcn.oke_vcn.id

  endpoint_config {
    is_public_ip_enabled = false
    subnet_id           = oci_core_subnet.oke_api_subnet.id
    nsg_ids             = [oci_core_network_security_group.oke_api_nsg.id]
  }

  cluster_pod_network_options {
    cni_type = "OCI_VCN_IP_NATIVE"
  }

  options {
    service_lb_subnet_ids = [oci_core_subnet.oke_lb_subnet.id]
    
    kubernetes_network_config {
      pods_cidr     = "10.244.0.0/16"
      services_cidr = "10.96.0.0/16"
    }

    add_ons {
      is_kubernetes_dashboard_enabled = false
      is_tiller_enabled              = false
    }

    admission_controller_options {
      is_pod_security_policy_enabled = true
    }
  }

  kms_key_id = oci_kms_key.oke_encryption_key.id
}

# Node Pool with Mixed Instance Types
resource "oci_containerengine_node_pool" "production_node_pool" {
  cluster_id         = oci_containerengine_cluster.production_cluster.id
  compartment_id     = var.compartment_id
  kubernetes_version = var.kubernetes_version
  name              = "production-workers"

  node_config_details {
    placement_configs {
      availability_domain = data.oci_identity_availability_domains.ads.availability_domains[0].name
      subnet_id          = oci_core_subnet.oke_worker_subnet.id
    }
    placement_configs {
      availability_domain = data.oci_identity_availability_domains.ads.availability_domains[1].name
      subnet_id          = oci_core_subnet.oke_worker_subnet.id
    }
    
    size                    = 3
    nsg_ids                = [oci_core_network_security_group.oke_worker_nsg.id]
    is_pv_encryption_in_transit_enabled = true
  }

  node_shape = "VM.Standard.E4.Flex"
  
  node_shape_config {
    ocpus         = 2
    memory_in_gbs = 16
  }

  node_source_details {
    image_id                = data.oci_containerengine_node_pool_option.oke_node_pool_option.sources[0].image_id
    source_type            = "IMAGE"
    boot_volume_size_in_gbs = 100
  }

  initial_node_labels {
    key   = "environment"
    value = "production"
  }

  ssh_public_key = var.ssh_public_key
}

# Network Security Group for API Server
resource "oci_core_network_security_group" "oke_api_nsg" {
  compartment_id = var.compartment_id
  vcn_id        = oci_core_vcn.oke_vcn.id
  display_name  = "oke-api-nsg"
}

resource "oci_core_network_security_group_security_rule" "oke_api_ingress" {
  network_security_group_id = oci_core_network_security_group.oke_api_nsg.id
  direction                 = "INGRESS"
  protocol                  = "6"
  source                   = "10.0.0.0/16"
  source_type              = "CIDR_BLOCK"
  
  tcp_options {
    destination_port_range {
      max = 6443
      min = 6443
    }
  }
}

# Network Security Group for Worker Nodes
resource "oci_core_network_security_group" "oke_worker_nsg" {
  compartment_id = var.compartment_id
  vcn_id        = oci_core_vcn.oke_vcn.id
  display_name  = "oke-worker-nsg"
}

# Allow pod-to-pod communication
resource "oci_core_network_security_group_security_rule" "worker_pod_communication" {
  network_security_group_id = oci_core_network_security_group.oke_worker_nsg.id
  direction                 = "INGRESS"
  protocol                  = "all"
  source                   = oci_core_network_security_group.oke_worker_nsg.id
  source_type              = "NETWORK_SECURITY_GROUP"
}

# KMS Key for Cluster Encryption
resource "oci_kms_key" "oke_encryption_key" {
  compartment_id = var.compartment_id
  display_name   = "oke-cluster-encryption-key"
  
  key_shape {
    algorithm = "AES"
    length    = 256
  }
  
  management_endpoint = oci_kms_vault.oke_vault.management_endpoint
}

resource "oci_kms_vault" "oke_vault" {
  compartment_id = var.compartment_id
  display_name   = "oke-vault"
  vault_type     = "DEFAULT"
}

Kubernetes Manifests with Network Policies



# Network Policy for Application Isolation
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: webapp-network-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: webapp
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: ingress-nginx
    - podSelector:
        matchLabels:
          app: webapp-frontend
    ports:
    - protocol: TCP
      port: 8080
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: database
    ports:
    - protocol: TCP
      port: 5432
  - to: []
    ports:
    - protocol: TCP
      port: 443
    - protocol: TCP
      port: 53
    - protocol: UDP
      port: 53

---
# Pod Security Policy
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: restricted-psp
spec:
  privileged: false
  allowPrivilegeEscalation: false
  requiredDropCapabilities:
    - ALL
  volumes:
    - 'configMap'
    - 'emptyDir'
    - 'projected'
    - 'secret'
    - 'downwardAPI'
    - 'persistentVolumeClaim'
  runAsUser:
    rule: 'MustRunAsNonRoot'
  seLinux:
    rule: 'RunAsAny'
  fsGroup:
    rule: 'RunAsAny'

---
# Deployment with Security Context
apiVersion: apps/v1
kind: Deployment
metadata:
  name: secure-webapp
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: webapp
  template:
    metadata:
      labels:
        app: webapp
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 65534
        fsGroup: 65534
      containers:
      - name: webapp
        image: nginx:1.21-alpine
        ports:
        - containerPort: 8080
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop:
            - ALL
        resources:
          limits:
            cpu: 500m
            memory: 512Mi
          requests:
            cpu: 250m
            memory: 256Mi
        volumeMounts:
        - name: tmp-volume
          mountPath: /tmp
        - name: cache-volume
          mountPath: /var/cache/nginx
      volumes:
      - name: tmp-volume
        emptyDir: {}
      - name: cache-volume
        emptyDir: {}

Monitoring and Observability Integration

OKE integrates natively with OCI Monitoring, Logging, and Logging Analytics services. This integration provides comprehensive observability without requiring additional third-party tools or complex configurations.

The monitoring integration automatically collects cluster-level metrics including CPU utilization, memory consumption, network throughput, and storage IOPS across all worker nodes. Custom metrics can be published using the OCI Monitoring SDK, enabling application-specific dashboards and alerting rules.

Logging integration captures both system logs from Kubernetes components and application logs from pods. The unified logging agent automatically forwards logs to OCI Logging service, where they can be searched, filtered, and analyzed using structured queries.

Security Best Practices Implementation

Enterprise OKE deployments require multiple layers of security controls. Network-level security starts with proper subnet segmentation, placing API servers in private subnets accessible only through bastion hosts or VPN connections.

Pod Security Policies enforce runtime security constraints, preventing privileged containers and restricting volume types. Network policies provide microsegmentation within the cluster, controlling pod-to-pod communication based on labels and namespaces.

Image security scanning integrates with OCI Container Registry’s vulnerability scanning capabilities, automatically checking container images for known vulnerabilities before deployment.

Automated CI/CD Integration

OKE clusters integrate seamlessly with OCI DevOps service for automated application deployment pipelines. The integration supports GitOps workflows, blue-green deployments, and automated rollback mechanisms.

Pipeline configurations can reference OCI Vault secrets for secure credential management, ensuring sensitive information never appears in deployment manifests or pipeline configurations.

Performance Optimization Strategies

Production OKE deployments benefit from several performance optimization techniques. Node pool configurations should match application requirements, using compute-optimized instances for CPU-intensive workloads and memory-optimized instances for data processing applications.

Pod disruption budgets ensure application availability during cluster maintenance operations. Horizontal Pod Autoscaling automatically adjusts replica counts based on CPU or memory utilization, while Cluster Autoscaling adds or removes worker nodes based on resource demands.

This comprehensive approach to OKE deployment provides enterprise-grade container orchestration with robust security, monitoring, and automation capabilities, enabling organizations to run production workloads confidently in Oracle Cloud Infrastructure.

DELETE All the VCNs in THE OCI Using BASH SCRIPT

Posted on May 27, 2025May 27, 2025 by Osama Mustafa in Cloud, OCI

The script below will allow you to list all VCNs in OCI and delete all attached resources to the COMPARTMENT_OCID.

Note: I wrote the scripts to perform the tasks mentioned below, which can be updated and expanded based on the needs. Feel free to do that and say the source

Complete Resource Deletion Chain: The script now handles the proper order of deletion:

Compute instances first
Clean route tables and security lists
Load balancers
Gateways (NAT, Internet, Service, DRG attachments)
Subnets
Custom security lists, route tables, and DHCP options
Finally, the VCN itself

#!/bin/bash

# ✅ Set this to the target compartment OCID
COMPARTMENT_OCID="Set Your OCID Here"

# (Optional) Force region
export OCI_CLI_REGION=me-jeddah-1

echo "📍 Region: $OCI_CLI_REGION"
echo "📦 Compartment: $COMPARTMENT_OCID"
echo "⚠️  WARNING: This will delete ALL VCNs and related resources in the compartment!"
echo "Press Ctrl+C within 10 seconds to cancel..."
sleep 10

# Function to wait for resource deletion
wait_for_deletion() {
    local resource_id=$1
    local resource_type=$2
    local max_attempts=30
    local attempt=1
    
    echo "    ⏳ Waiting for $resource_type deletion..."
    while [ $attempt -le $max_attempts ]; do
        if ! oci network $resource_type get --${resource_type//-/}-id "$resource_id" &>/dev/null; then
            echo "    ✅ $resource_type deleted successfully"
            return 0
        fi
        sleep 10
        ((attempt++))
    done
    echo "    ⚠️  Timeout waiting for $resource_type deletion"
    return 1
}

# Function to check if resource is default
is_default_resource() {
    local resource_id=$1
    local resource_type=$2
    
    case $resource_type in
        "security-list")
            result=$(oci network security-list get --security-list-id "$resource_id" --query "data.\"display-name\"" --raw-output 2>/dev/null)
            [[ "$result" == "Default Security List"* ]]
            ;;
        "route-table")
            result=$(oci network route-table get --rt-id "$resource_id" --query "data.\"display-name\"" --raw-output 2>/dev/null)
            [[ "$result" == "Default Route Table"* ]]
            ;;
        "dhcp-options")
            result=$(oci network dhcp-options get --dhcp-id "$resource_id" --query "data.\"display-name\"" --raw-output 2>/dev/null)
            [[ "$result" == "Default DHCP Options"* ]]
            ;;
        *)
            false
            ;;
    esac
}

# Function to clean all route tables in a VCN
clean_all_route_tables() {
    local VCN_ID=$1
    echo "  🧹 Cleaning all route tables..."
    
    local RT_IDS=$(oci network route-table list \
        --compartment-id "$COMPARTMENT_OCID" \
        --vcn-id "$VCN_ID" \
        --query "data[?\"lifecycle-state\" == 'AVAILABLE'].id" \
        --raw-output 2>/dev/null | jq -r '.[]' 2>/dev/null)
    
    for RT_ID in $RT_IDS; do
        if [ -n "$RT_ID" ]; then
            echo "    🔧 Clearing routes in route table: $RT_ID"
            oci network route-table update --rt-id "$RT_ID" --route-rules '[]' --force &>/dev/null || true
        fi
    done
    
    # Wait a bit for route updates to propagate
    sleep 5
}

# Function to clean all security lists in a VCN
clean_all_security_lists() {
    local VCN_ID=$1
    echo "  🧹 Cleaning all security lists..."
    
    local SL_IDS=$(oci network security-list list \
        --compartment-id "$COMPARTMENT_OCID" \
        --vcn-id "$VCN_ID" \
        --query "data[?\"lifecycle-state\" == 'AVAILABLE'].id" \
        --raw-output 2>/dev/null | jq -r '.[]' 2>/dev/null)
    
    for SL_ID in $SL_IDS; do
        if [ -n "$SL_ID" ]; then
            echo "    🔧 Clearing rules in security list: $SL_ID"
            oci network security-list update \
                --security-list-id "$SL_ID" \
                --egress-security-rules '[]' \
                --ingress-security-rules '[]' \
                --force &>/dev/null || true
        fi
    done
    
    # Wait a bit for security list updates to propagate
    sleep 5
}

# Function to delete compute instances in subnets
delete_compute_instances() {
    local VCN_ID=$1
    echo "  🖥️  Checking for compute instances..."
    
    local INSTANCES=$(oci compute instance list \
        --compartment-id "$COMPARTMENT_OCID" \
        --query "data[?\"lifecycle-state\" != 'TERMINATED'].id" \
        --raw-output 2>/dev/null | jq -r '.[]' 2>/dev/null)
    
    for INSTANCE_ID in $INSTANCES; do
        if [ -n "$INSTANCE_ID" ]; then
            # Check if instance is in this VCN
            local INSTANCE_VCN=$(oci compute instance list-vnics \
                --instance-id "$INSTANCE_ID" \
                --query "data[0].\"vcn-id\"" \
                --raw-output 2>/dev/null)
            
            if [[ "$INSTANCE_VCN" == "$VCN_ID" ]]; then
                echo "    🔻 Terminating compute instance: $INSTANCE_ID"
                oci compute instance terminate --instance-id "$INSTANCE_ID" --force &>/dev/null || true
            fi
        fi
    done
}

# Main cleanup function for a single VCN
cleanup_vcn() {
    local VCN_ID=$1
    echo -e "\n🧹 Cleaning resources for VCN: $VCN_ID"
    
    # Step 1: Delete compute instances first
    delete_compute_instances "$VCN_ID"
    
    # Step 2: Clean route tables and security lists
    clean_all_route_tables "$VCN_ID"
    clean_all_security_lists "$VCN_ID"
    
    # Step 3: Delete Load Balancers
    echo "  🔻 Deleting load balancers..."
    local LBS=$(oci lb load-balancer list \
        --compartment-id "$COMPARTMENT_OCID" \
        --query "data[?\"lifecycle-state\" == 'ACTIVE'].id" \
        --raw-output 2>/dev/null | jq -r '.[]' 2>/dev/null)
    
    for LB_ID in $LBS; do
        if [ -n "$LB_ID" ]; then
            echo "    🔻 Deleting Load Balancer: $LB_ID"
            oci lb load-balancer delete --load-balancer-id "$LB_ID" --force &>/dev/null || true
        fi
    done
    
    # Step 4: Delete NAT Gateways
    echo "  🔻 Deleting NAT gateways..."
    local NAT_GWS=$(oci network nat-gateway list \
        --compartment-id "$COMPARTMENT_OCID" \
        --vcn-id "$VCN_ID" \
        --query "data[?\"lifecycle-state\" == 'AVAILABLE'].id" \
        --raw-output 2>/dev/null | jq -r '.[]' 2>/dev/null)
    
    for NAT_ID in $NAT_GWS; do
        if [ -n "$NAT_ID" ]; then
            echo "    🔻 Deleting NAT Gateway: $NAT_ID"
            oci network nat-gateway delete --nat-gateway-id "$NAT_ID" --force &>/dev/null || true
        fi
    done
    
    # Step 5: Delete DRG Attachments
    echo "  🔻 Deleting DRG attachments..."
    local DRG_ATTACHMENTS=$(oci network drg-attachment list \
        --compartment-id "$COMPARTMENT_OCID" \
        --query "data[?\"vcn-id\" == '$VCN_ID' && \"lifecycle-state\" == 'ATTACHED'].id" \
        --raw-output 2>/dev/null | jq -r '.[]' 2>/dev/null)
    
    for DRG_ATTACHMENT_ID in $DRG_ATTACHMENTS; do
        if [ -n "$DRG_ATTACHMENT_ID" ]; then
            echo "    🔻 Deleting DRG Attachment: $DRG_ATTACHMENT_ID"
            oci network drg-attachment delete --drg-attachment-id "$DRG_ATTACHMENT_ID" --force &>/dev/null || true
        fi
    done
    
    # Step 6: Delete Internet Gateways
    echo "  🔻 Deleting internet gateways..."
    local IGWS=$(oci network internet-gateway list \
        --compartment-id "$COMPARTMENT_OCID" \
        --vcn-id "$VCN_ID" \
        --query "data[?\"lifecycle-state\" == 'AVAILABLE'].id" \
        --raw-output 2>/dev/null | jq -r '.[]' 2>/dev/null)
    
    for IGW_ID in $IGWS; do
        if [ -n "$IGW_ID" ]; then
            echo "    🔻 Deleting Internet Gateway: $IGW_ID"
            oci network internet-gateway delete --ig-id "$IGW_ID" --force &>/dev/null || true
        fi
    done
    
    # Step 7: Delete Service Gateways
    echo "  🔻 Deleting service gateways..."
    local SGWS=$(oci network service-gateway list \
        --compartment-id "$COMPARTMENT_OCID" \
        --vcn-id "$VCN_ID" \
        --query "data[?\"lifecycle-state\" == 'AVAILABLE'].id" \
        --raw-output 2>/dev/null | jq -r '.[]' 2>/dev/null)
    
    for SGW_ID in $SGWS; do
        if [ -n "$SGW_ID" ]; then
            echo "    🔻 Deleting Service Gateway: $SGW_ID"
            oci network service-gateway delete --service-gateway-id "$SGW_ID" --force &>/dev/null || true
        fi
    done
    
    # Step 8: Wait for gateways to be deleted
    echo "  ⏳ Waiting for gateways to be deleted..."
    sleep 30
    
    # Step 9: Delete Subnets
    echo "  🔻 Deleting subnets..."
    local SUBNETS=$(oci network subnet list \
        --compartment-id "$COMPARTMENT_OCID" \
        --vcn-id "$VCN_ID" \
        --query "data[?\"lifecycle-state\" == 'AVAILABLE'].id" \
        --raw-output 2>/dev/null | jq -r '.[]' 2>/dev/null)
    
    for SUBNET_ID in $SUBNETS; do
        if [ -n "$SUBNET_ID" ]; then
            echo "    🔻 Deleting Subnet: $SUBNET_ID"
            oci network subnet delete --subnet-id "$SUBNET_ID" --force &>/dev/null || true
        fi
    done
    
    # Step 10: Wait for subnets to be deleted
    echo "  ⏳ Waiting for subnets to be deleted..."
    sleep 30
    
    # Step 11: Delete non-default Security Lists
    echo "  🔻 Deleting custom security lists..."
    local SL_IDS=$(oci network security-list list \
        --compartment-id "$COMPARTMENT_OCID" \
        --vcn-id "$VCN_ID" \
        --query "data[?\"lifecycle-state\" == 'AVAILABLE'].id" \
        --raw-output 2>/dev/null | jq -r '.[]' 2>/dev/null)
    
    for SL_ID in $SL_IDS; do
        if [ -n "$SL_ID" ] && ! is_default_resource "$SL_ID" "security-list"; then
            echo "    🔻 Deleting Security List: $SL_ID"
            oci network security-list delete --security-list-id "$SL_ID" --force &>/dev/null || true
        fi
    done
    
    # Step 12: Delete non-default Route Tables
    echo "  🔻 Deleting custom route tables..."
    local RT_IDS=$(oci network route-table list \
        --compartment-id "$COMPARTMENT_OCID" \
        --vcn-id "$VCN_ID" \
        --query "data[?\"lifecycle-state\" == 'AVAILABLE'].id" \
        --raw-output 2>/dev/null | jq -r '.[]' 2>/dev/null)
    
    for RT_ID in $RT_IDS; do
        if [ -n "$RT_ID" ] && ! is_default_resource "$RT_ID" "route-table"; then
            echo "    🔻 Deleting Route Table: $RT_ID"
            oci network route-table delete --rt-id "$RT_ID" --force &>/dev/null || true
        fi
    done
    
    # Step 13: Delete non-default DHCP Options
    echo "  🔻 Deleting custom DHCP options..."
    local DHCP_IDS=$(oci network dhcp-options list \
        --compartment-id "$COMPARTMENT_OCID" \
        --vcn-id "$VCN_ID" \
        --query "data[?\"lifecycle-state\" == 'AVAILABLE'].id" \
        --raw-output 2>/dev/null | jq -r '.[]' 2>/dev/null)
    
    for DHCP_ID in $DHCP_IDS; do
        if [ -n "$DHCP_ID" ] && ! is_default_resource "$DHCP_ID" "dhcp-options"; then
            echo "    🔻 Deleting DHCP Options: $DHCP_ID"
            oci network dhcp-options delete --dhcp-id "$DHCP_ID" --force &>/dev/null || true
        fi
    done
    
    # Step 14: Wait before attempting VCN deletion
    echo "  ⏳ Waiting for all resources to be cleaned up..."
    sleep 60
    
    # Step 15: Finally, delete the VCN
    echo "  🔻 Deleting VCN: $VCN_ID"
    local max_attempts=5
    local attempt=1
    
    while [ $attempt -le $max_attempts ]; do
        if oci network vcn delete --vcn-id "$VCN_ID" --force &>/dev/null; then
            echo "    ✅ VCN deletion initiated successfully"
            break
        else
            echo "    ⚠️  VCN deletion attempt $attempt failed, retrying in 30 seconds..."
            sleep 30
            ((attempt++))
        fi
    done
    
    if [ $attempt -gt $max_attempts ]; then
        echo "    ❌ Failed to delete VCN after $max_attempts attempts"
        echo "    💡 You may need to manually check for remaining dependencies"
    fi
}

# Main execution
echo -e "\n🚀 Starting VCN cleanup process..."

# Fetch all VCNs in the compartment
VCN_IDS=$(oci network vcn list \
    --compartment-id "$COMPARTMENT_OCID" \
    --query "data[?\"lifecycle-state\" == 'AVAILABLE'].id" \
    --raw-output 2>/dev/null | jq -r '.[]' 2>/dev/null)

if [ -z "$VCN_IDS" ]; then
    echo "📭 No VCNs found in compartment $COMPARTMENT_OCID"
    exit 0
fi

echo "📋 Found VCNs to delete:"
for VCN_ID in $VCN_IDS; do
    VCN_NAME=$(oci network vcn get --vcn-id "$VCN_ID" --query "data.\"display-name\"" --raw-output 2>/dev/null)
    echo "  - $VCN_NAME ($VCN_ID)"
done

# Process each VCN
for VCN_ID in $VCN_IDS; do
    if [ -n "$VCN_ID" ]; then
        cleanup_vcn "$VCN_ID"
    fi
done

echo -e "\n✅ Cleanup complete for compartment: $COMPARTMENT_OCID"
echo "🔍 You may want to verify in the OCI Console that all resources have been deleted."

Output example

Regards

Osama Mustafa

Implementing OCI Logging Analytics for Proactive Incident Detection

Posted on April 1, 2025 by Osama Mustafa in Cloud, OCI

Oracle Cloud Infrastructure (OCI) Logging Analytics is a powerful service that helps organizations aggregate, analyze, and act on log data from across their OCI resources. In this guide, we’ll walk through setting up Logging Analytics to detect and alert on suspicious activities, using Terraform for automation and a real-world example for context.

Step 1: Enable OCI Logging Analytics

Navigate to the OCI Console:
Go to Observability & Management > Logging Analytics.

2. Create a Log Group:

oci logging-analytics log-group create \
  --compartment-id <your-compartment-ocid> \
  --display-name "Security-Logs" \
  --description "Logs for security monitoring"

Step 2: Ingest Logs from OCI Audit Service
Configure the OCI Audit service to forward logs to Logging Analytics:

Create a Service Connector:

resource "oci_sch_service_connector" "audit_to_la" {
  compartment_id = var.compartment_ocid
  display_name  = "Audit-to-Logging-Analytics"
  source {
    kind = "logging"
    log_sources {
      compartment_id = var.tenant_ocid
      log_group_id   = oci_logging_log_group.audit_logs.id
    }
  }
  target {
    kind = "loggingAnalytics"
    log_group_id = oci_logging_analytics_log_group.security_logs.id
  }
}

Step 3: Create Custom Detection Rules

Example: Detect repeated failed login attempts (brute-force attacks).

Use OCI Query Language (OCIQL):

SELECT * 
FROM AuditLogs 
WHERE eventName = 'Login' AND action = 'FAIL' 
GROUP BY actorName 
HAVING COUNT(*) > 5

Set Up Alerts:
Configure an OCI Notification topic to trigger emails or PagerDuty alerts when the rule matches.

Step 4: Visualize with Dashboards

Create a dashboard to monitor security events:

Metrics: Failed logins, API calls from unusual IPs.

Enjoy
Osama

Setting up a High-Availability (HA) Architecture with OCI Load Balancer and Compute Instances

Posted on February 25, 2025 by Osama Mustafa in Cloud, OCI

Ensuring high availability (HA) for your applications is critical in today’s cloud-first environment. Oracle Cloud Infrastructure (OCI) provides robust tools such as Load Balancers and Compute Instances to help you create a resilient, highly available architecture for your applications. In this post, we’ll walk through the steps to set up an HA architecture using OCI Load Balancer with multiple compute instances across availability domains for fault tolerance.

Prerequisites

OCI Account: A working Oracle Cloud Infrastructure account.
OCI CLI: Installed and configured with necessary permissions.
Terraform: Installed and set up for provisioning infrastructure.
Basic knowledge of Load Balancers and Compute Instances in OCI.

Step 1: Set Up a Virtual Cloud Network (VCN)

A VCN is required to house your compute instances and load balancers. To begin, create a new VCN with subnets in different availability domains (ADs) for high availability.

Terraform Configuration (vcn.tf):

resource "oci_core_virtual_network" "vcn" {
  compartment_id = "<compartment_ocid>"
  cidr_block     = "10.0.0.0/16"
  display_name   = "HA-Virtual-Network"
}

resource "oci_core_subnet" "subnet1" {
  compartment_id      = "<compartment_ocid>"
  vcn_id              = oci_core_virtual_network.vcn.id
  cidr_block          = "10.0.1.0/24"
  availability_domain = "AD-1"
  display_name        = "HA-Subnet-AD1"
}

resource "oci_core_subnet" "subnet2" {
  compartment_id      = "<compartment_ocid>"
  vcn_id              = oci_core_virtual_network.vcn.id
  cidr_block          = "10.0.2.0/24"
  availability_domain = "AD-2"
  display_name        = "HA-Subnet-AD2"
}

Step 2: Provision Compute Instances

Create two compute instances (one in each subnet) to ensure redundancy.

Terraform Configuration (compute.tf):

resource "oci_core_instance" "instance1" {
  compartment_id = "<compartment_ocid>"
  availability_domain = "AD-1"
  shape = "VM.Standard2.1"
  display_name = "HA-Instance-1"
  
  create_vnic_details {
    subnet_id = oci_core_subnet.subnet1.id
    assign_public_ip = true
  }

  source_details {
    source_type = "image"
    source_id = "<image_ocid>"
  }
}

resource "oci_core_instance" "instance2" {
  compartment_id = "<compartment_ocid>"
  availability_domain = "AD-2"
  shape = "VM.Standard2.1"
  display_name = "HA-Instance-2"
  
  create_vnic_details {
    subnet_id = oci_core_subnet.subnet2.id
    assign_public_ip = true
  }

  source_details {
    source_type = "image"
    source_id = "<image_ocid>"
  }
}

Step 3: Set Up the OCI Load Balancer

Now, configure the OCI Load Balancer to distribute traffic between the compute instances in both availability domains.

Terraform Configuration (load_balancer.tf):

resource "oci_load_balancer_load_balancer" "ha_lb" {
  compartment_id = "<compartment_ocid>"
  display_name   = "HA-Load-Balancer"
  shape           = "100Mbps"

  subnet_ids = [
    oci_core_subnet.subnet1.id,
    oci_core_subnet.subnet2.id
  ]

  backend_sets {
    name = "backend-set-1"

    backends {
      ip_address = oci_core_instance.instance1.private_ip
      port = 80
    }

    backends {
      ip_address = oci_core_instance.instance2.private_ip
      port = 80
    }

    policy = "ROUND_ROBIN"
    health_checker {
      port = 80
      protocol = "HTTP"
      url_path = "/health"
      retries = 3
      timeout_in_seconds = 10
      interval_in_seconds = 5
    }
  }
}

resource "oci_load_balancer_listener" "ha_listener" {
  load_balancer_id = oci_load_balancer_load_balancer.ha_lb.id
  name = "http-listener"
  default_backend_set_name = "backend-set-1"
  port = 80
  protocol = "HTTP"
}

Step 4: Set Up Health Checks for High Availability

Health checks are critical to ensure that the load balancer sends traffic only to healthy instances. The health check configuration is included in the backend set definition above, but you can customize it as needed.
Step 5: Testing and Validation

Once all resources are provisioned, test the HA architecture:

Verify Load Balancer Health: Ensure that the backend instances are marked as healthy by checking the load balancer’s health checks.

oci load-balancer backend-set get --load-balancer-id <load_balancer_id> --name backend-set-1

Access the Application: Test accessing your application through the Load Balancer’s public IP. The Load Balancer should evenly distribute traffic across the two compute instances.
Failover Testing: Manually shut down one of the instances to verify that the Load Balancer reroutes traffic to the other instance.

Automating Oracle Cloud Networking with OCI Service Gateway and Terraform

Posted on November 1, 2024 by Osama Mustafa in Cloud, OCI

Oracle Cloud Infrastructure (OCI) offers a wide range of services that enable users to create secure, scalable cloud environments. One crucial aspect of a cloud deployment is ensuring secure connectivity between services without relying on public internet access. In this blog post, we’ll walk through how to set up and manage OCI Service Gateway for secure, private access to OCI services using Terraform. This step-by-step guide is intended for cloud engineers looking to leverage automation to create robust networking configurations in OCI.

Step 1: Setting up Your Environment

Before deploying the OCI Service Gateway and other networking components with Terraform, you need to set up a few prerequisites:

Terraform Installation: Make sure Terraform is installed on your local machine. You can download it from Terraform’s official site.
OCI CLI and API Key: Install the OCI CLI and set up your authentication key. The key must be configured in your OCI console.
OCI Terraform Provider: You will also need to download the OCI Terraform provider by adding the following configuration to your provider.tf file:

provider "oci" {
  tenancy_ocid     = "<TENANCY_OCID>"
  user_ocid        = "<USER_OCID>"
  fingerprint      = "<FINGERPRINT>"
  private_key_path = "<PRIVATE_KEY_PATH>"
  region           = "us-ashburn-1"
}

Step 2: Defining the Infrastructure

The key to deploying the Service Gateway and related infrastructure is defining the resources in a main.tf file. Below is an example to create a VCN, subnets, and a Service Gateway:

resource "oci_core_vcn" "example_vcn" {
  cidr_block     = "10.0.0.0/16"
  compartment_id = "<COMPARTMENT_OCID>"
  display_name   = "example-vcn"
}

resource "oci_core_subnet" "example_subnet" {
  vcn_id             = oci_core_vcn.example_vcn.id
  compartment_id     = "<COMPARTMENT_OCID>"
  cidr_block         = "10.0.1.0/24"
  availability_domain = "<AVAILABILITY_DOMAIN>"
  display_name       = "example-subnet"
  prohibit_public_ip_on_vnic = true
}

resource "oci_core_service_gateway" "example_service_gateway" {
  vcn_id         = oci_core_vcn.example_vcn.id
  compartment_id = "<COMPARTMENT_OCID>"
  services {
    service_id = "all-oracle-services-in-region"
  }
  display_name  = "example-service-gateway"
}

resource "oci_core_route_table" "example_route_table" {
  vcn_id         = oci_core_vcn.example_vcn.id
  compartment_id = "<COMPARTMENT_OCID>"
  display_name   = "example-route-table"
  route_rules {
    destination       = "all-oracle-services-in-region"
    destination_type  = "SERVICE_CIDR_BLOCK"
    network_entity_id = oci_core_service_gateway.example_service_gateway.id
  }
}

Explanation:

oci_core_vcn: Defines the Virtual Cloud Network (VCN) where all resources will reside.
oci_core_subnet: Creates a subnet within the VCN to host compute instances or other resources.
oci_core_service_gateway: Configures a Service Gateway to allow private access to Oracle services such as Object Storage.
oci_core_route_table: Configures the route table to direct traffic through the Service Gateway for services within OCI.

Step 3: Variables for Reusability

To make the code reusable, it’s best to define variables in a variables.tf file:

variable "compartment_ocid" {
  description = "The OCID of the compartment to create resources in"
  type        = string
}

variable "availability_domain" {
  description = "The Availability Domain to launch resources in"
  type        = string
}

variable "vcn_cidr" {
  description = "The CIDR block for the VCN"
  type        = string
  default     = "10.0.0.0/16"
}

This allows you to easily modify parameters like compartment ID, availability domain, and VCN CIDR without touching the core logic.

Step 4: Running the Terraform Script

Initialize TerraformTo start using Terraform with OCI, initialize your working directory using:

terraform init

This command downloads the necessary providers and prepares your environment.
Plan the DeploymentBefore applying changes, always run the terraform plan command. This will provide an overview of what resources will be created.

terraform plan -var-file="config.tfvars"

Apply the Changes

Once you’re confident with the plan, apply it to create your Service Gateway and networking resources:

terraform apply -var-file="config.tfvars"

Step 5: Verification

After deployment, you can verify your resources via the OCI Console. Navigate to Networking > Virtual Cloud Networks to see your VCN, subnets, and the Service Gateway. You can also validate the route table settings to ensure that the traffic routes correctly to Oracle services.

Step 6: Destroy the Infrastructure

To clean up the resources and avoid any unwanted charges, you can use the terraform destroy command:

terraform destroy -var-file="config.tfvars"

Regards
Osama

Automating Block Volume Backups in Oracle Cloud Infrastructure (OCI) using CLI and Terraform

Posted on October 9, 2024 by Osama Mustafa in Cloud, OCI

Briefly introduce the importance of block volumes in OCI and why automated backups are essential.Mention that this blog will cover two methods: using the OCI CLI and Terraform for automation.

Automating Block Volume Backups using OCI CLI

Prerequisites:

Set up OCI CLI on your machine (brief steps with links).
Ensure that you have the right permissions to manage block volumes.

Step-by-step guide:

Command to create a block volume

oci bv volume create --compartment-id <your_compartment_ocid> --availability-domain <your_ad> --display-name "MyVolume" --size-in-gbs 50

Command to take a backup of the block volume:

oci bv backup create --volume-id <your_volume_ocid> --display-name "MyVolumeBackup"

Scheduling backups using cron jobs for automation.

Example cron job configuration

0 2 * * * /usr/local/bin/oci bv backup create --volume-id <your_volume_ocid> --display-name "ScheduledBackup" >> /var/log/oci_backup.log 2>&1

Automating Block Volume Backups using Terraform

Prerequisites

OCI Credentials: Make sure you have the proper API keys and permissions configured in your OCI tenancy.
Terraform Setup: Terraform should be installed and configured to interact with OCI, including the OCI provider setup in your environment.

Step 1: Define the OCI Block Volume Resource

First, define the block volume that you want to automate backups for. Here’s an example of a simple block volume resource in Terraform:

resource "oci_core_volume" "my_block_volume" {
  availability_domain = "your-availability-domain"
  compartment_id      = "ocid1.compartment.oc1..your-compartment-id"
  display_name        = "my_block_volume"
  size_in_gbs         = 50
}

Step 2: Define a Backup Policy

OCI provides predefined backup policies such as gold, silver, and bronze, which define how frequently backups are taken. You can create a custom backup policy as well, but for simplicity, we’ll use one of the predefined policies in this example. The Terraform resource oci_core_volume_backup_policy_assignment will assign a backup policy to the block volume.

Here’s an example to assign the gold backup policy to the block volume:

resource "oci_core_volume_backup_policy_assignment" "backup_assignment" {
  volume_id       = oci_core_volume.my_block_volume.id
  policy_id       = data.oci_core_volume_backup_policy.gold.id
}

data "oci_core_volume_backup_policy" "gold" {
  name = "gold"
}

Step 3: Custom Backup Policy (Optional)

If you need a custom backup policy rather than using the predefined gold, silver, or bronze policies, you can define a custom backup policy using OCI’s native scheduling.

You can create a custom schedule by combining these elements in your oci_core_volume_backup_policy resource.

resource "oci_core_volume_backup_policy" "custom_backup_policy" {
  compartment_id = "ocid1.compartment.oc1..your-compartment-id"
  display_name   = "CustomBackupPolicy"

  schedules {
    backup_type = "INCREMENTAL"
    period      = "ONE_DAY"
    retention_duration = "THIRTY_DAYS"
  }

  schedules {
    backup_type = "FULL"
    period      = "ONE_WEEK"
    retention_duration = "NINETY_DAYS"
  }
}

You can then assign this policy to the block volume using the same method as earlier.

Step 4: Apply the Terraform Configuration

Once your Terraform configuration is ready, apply it using the standard Terraform workflow:

Initialize Terraform:

terraform init

Plan the Terraform deployment:

terraform plan

Apply the Terraform plan:

terraform apply

This process will automatically provision your block volumes and assign the specified backup policy.

Regards
Osama

Implementing Zero Trust Architecture in OCI

Posted on August 25, 2024 by Osama Mustafa in Cloud, OCI

In this blog, we will explore how to implement a Zero Trust architecture in Oracle Cloud Infrastructure (OCI). We’ll cover key principles of Zero Trust, configuring identity and access management, securing the network, and monitoring for continuous security assurance.

Introduction to Zero Trust Architecture

Overview of Zero Trust principles: “Never trust, always verify.”
Importance of Zero Trust in modern cloud environments.
Key components: Identity, network, data, device, and workloads.

Identity and Access Management (IAM) in OCI

Configuring IAM for Zero Trust

Set Up OCI IAM:
- Use OCI IAM to manage identities and enforce strict authentication.
- Configure Multi-Factor Authentication (MFA) for all users.
Conditional Access Policies:
- Implement policies that require additional verification for high-risk actions.

Securing OCI Network with Micro-Segmentation

Implementing Micro-Segmentation

VCN and Subnet Segmentation:
- Create Virtual Cloud Networks (VCNs) and segment your network by function, sensitivity, and environment.
Network Security Groups (NSGs):
- Apply Network Security Groups to enforce micro-segmentation policies within your VCN.

Implementing Least Privilege Access

Access Control Policies

Define Fine-Grained IAM Policies:
- Use OCI IAM to define least privilege policies that restrict user and service access based on specific needs.
Role-Based Access Control (RBAC):
- Implement RBAC to ensure users have only the permissions necessary for their roles.

Continuous Monitoring and Threat Detection

Monitoring with Oracle Cloud Guard

Enable Cloud Guard:
- Use Oracle Cloud Guard to monitor and automatically respond to potential security threats.
Logging and Auditing:
- Enable OCI Logging and Audit services to keep track of all access and configuration changes.
Integrate with SIEM:
- Integrate with Security Information and Event Management (SIEM) tools for comprehensive threat detection and incident response.

Integrating with Third-Party Security Tools

Using External Security Services

Integrate Third-Party Identity Providers:
- Use OCI’s integration capabilities to bring in third-party identity providers like Okta or Azure AD.
Connect with External Threat Detection Services:
- Utilize third-party threat detection tools for enhanced monitoring and incident response.

Regards
Osama

Enhancing Security with OCI Vault for Secrets Management

Posted on August 23, 2024 by Osama Mustafa in Cloud, OCI

This blog will explore how to enhance the security of your Oracle Cloud Infrastructure (OCI) applications by implementing OCI Vault for secrets management. We’ll cover setting up OCI Vault, managing secrets and encryption keys, integrating with other OCI services, and best practices for secure secrets management.

Setting Up OCI Vault

Create a Vault:

Navigate to the OCI Console, go to “Security,” and select “Vault.”
Click “Create Vault,” name it, choose the compartment, and select the type (Virtual Private Vault for added security).
Define backup and key rotation policies.

Create Keys:

Within the Vault, select “Create Key.”
Define the key’s attributes (e.g., name, algorithm, key length) and create the key.

Create Secrets:

Navigate to the “Secrets” section within the Vault.
Click “Create Secret,” provide a name, and input the secret data (e.g., API keys, passwords).
Choose the encryption key you created earlier for securing the secret.

Managing Secrets and Encryption Keys

Rotating Keys and Secrets

Configure automatic rotation for encryption keys and secrets based on your organization’s security policies.
Rotate secrets manually as needed via the OCI Console or CLI.

Access Controls

Use OCI Identity and Access Management (IAM) to define who can access the Vault, keys, and secrets.
Implement fine-grained permissions to control access to specific secrets or keys.

Integrating OCI Vault with Other Services

OCI Compute

Securely inject secrets into OCI Compute instances at runtime using OCI Vault.
Example: Retrieve a database password from OCI Vault within a Compute instance using an SDK or CLI.

OCI Kubernetes (OKE)

Integrate OCI Vault with OKE for managing secrets in containerized applications.
Example: Use a sidecar container to fetch secrets from OCI Vault and inject them into application pods.

Automating Secrets Management

Using Terraform

Automate the creation and management of OCI Vault, keys, and secrets using Terraform.
Example Terraform snippet for creating a secret:

resource "oci_kms_vault" "example_vault" {
  compartment_id = var.compartment_id
  display_name   = "example_vault"
  vault_type     = "DEFAULT"
}

resource "oci_kms_key" "example_key" {
  management_endpoint = oci_kms_vault.example_vault.management_endpoint
  key_shape {
    algorithm = "AES"
    length    = 256
  }
}

resource "oci_secrets_secret" "example_secret" {
  compartment_id = var.compartment_id
  vault_id       = oci_kms_vault.example_vault.id
  key_id         = oci_kms_key.example_key.id
  secret_content {
    content = base64encode("super_secret_value")
  }
}

Using OCI SDKs

Programmatically manage secrets with OCI SDKs in languages like Python, Java, or Go.
Example: Retrieve a secret in Python:

import oci

config = oci.config.from_file("~/.oci/config", "DEFAULT")
secrets_client = oci.secrets.SecretsClient(config)
secret_id = "<your_secret_ocid>"
response = secrets_client.get_secret_bundle(secret_id)
secret_content = response.data.secret_bundle_content.content.decode("utf-8")

Notes

Regularly rotate encryption keys and secrets to minimize exposure.
Implement least privilege access controls using OCI IAM.
Enable auditing and logging for all key and secret management activities.
Use the Virtual Private Vault for sensitive data requiring higher security levels.

Implementing Multi-Region Resiliency with OCI Load Balancer

Posted on August 14, 2024 by Osama Mustafa in Cloud, OCI

This blog will focus on building a highly resilient and globally available architecture using Oracle Cloud Infrastructure (OCI) Load Balancer. We’ll cover setting up a multi-region architecture, configuring global load balancing, and managing failover to ensure uninterrupted service availability.

Introduction to Multi-Region Resiliency

Overview of multi-region architecture benefits.
Importance of global availability and disaster recovery in cloud deployments.

2. Setting Up OCI Load Balancer

Step-by-Step Configuration

Create Load Balancer:
- Navigate to the OCI Console and access the Load Balancer service.
- Select the load balancer type (public or private), and configure the backend sets and listeners.
Configure Health Checks:
- Set up health checks for backend servers to ensure only healthy instances receive traffic.

3. Configuring Global Load Balancing

Cross-Region Load Balancing

Set up load balancers in multiple OCI regions.
Configure policies to distribute traffic across regions based on proximity, load, or other factors.

4. Implementing DNS Failover

Using OCI DNS

Set up DNS zones and records for your application.
Implement DNS failover to route traffic to the next healthy region in case of failure.

5. Monitoring and Managing Traffic

Using OCI Monitoring

Monitor traffic distribution and load balancer performance using OCI Monitoring.
Set up alerts for traffic spikes or health check failures.

6. Optimizing for Performance and Cost

Use auto-scaling to adjust the number of backend instances based on demand.
Implement cost-saving strategies, such as traffic routing based on regional costs.

Regards
osama