Advanced OCI Identity and Access Management: Zero-Trust Security Automation and Governance at Scale

Oracle Cloud Infrastructure’s Identity and Access Management (IAM) service provides enterprise-grade security capabilities that extend far beyond basic user authentication. This comprehensive guide explores advanced IAM automation strategies, zero-trust security implementations, and governance frameworks that enable organizations to maintain security at scale while supporting DevOps velocity and compliance requirements.

OCI IAM Architecture and Zero-Trust Principles

OCI IAM operates on a compartment-based security model that naturally aligns with zero-trust architecture principles. Unlike traditional perimeter-based security models, zero-trust assumes no implicit trust and continuously validates every request based on multiple factors including user identity, device state, location, and resource sensitivity.

The architecture consists of multiple layers of automation. The infrastructure layer manages compute and storage scaling based on workload demands. The database layer continuously optimizes SQL execution plans, indexes, and memory allocation. The security layer automatically applies patches and implements threat detection mechanisms.

Unlike traditional database services, Autonomous Database provides predictable performance through automatic workload management. The service can handle mixed workloads by automatically prioritizing critical transactions and throttling less important background processes during peak periods.

Resource allocation occurs dynamically across CPU, memory, and I/O subsystems. The machine learning algorithms analyze query patterns and automatically adjust resource distribution to optimize for current workload characteristics while maintaining performance SLAs.

Fleet Management and Automation Strategies

Managing multiple Autonomous Databases across development, testing, and production environments requires sophisticated automation strategies. Fleet management enables consistent configuration, monitoring, and lifecycle management across database instances.

Automated provisioning workflows ensure new database instances follow organizational standards for security, backup policies, and resource allocation. Template-based deployment eliminates configuration drift and reduces manual errors during database creation.

Cross-database monitoring provides unified visibility into performance metrics, resource utilization, and cost optimization opportunities across the entire database fleet. Centralized alerting ensures rapid response to performance degradation or security incidents.

Production Implementation Example

Here’s a comprehensive implementation of automated Autonomous Database fleet management with advanced monitoring and optimization:

Terraform Infrastructure for Database Fleet

# Variables for fleet configuration
variable "database_environments" {
  description = "Database environments configuration"
  type = map(object({
    cpu_core_count          = number
    data_storage_size_in_tbs = number
    display_name           = string
    db_name               = string
    admin_password        = string
    db_workload           = string
    license_model         = string
    whitelisted_ips       = list(string)
    auto_scaling_enabled  = bool
    backup_retention_days = number
  }))
  default = {
    production = {
      cpu_core_count          = 4
      data_storage_size_in_tbs = 2
      display_name           = "Production ADB"
      db_name               = "PRODADB"
      admin_password        = "ComplexPassword123!"
      db_workload           = "OLTP"
      license_model         = "LICENSE_INCLUDED"
      whitelisted_ips       = ["10.0.0.0/16"]
      auto_scaling_enabled  = true
      backup_retention_days = 30
    }
    staging = {
      cpu_core_count          = 2
      data_storage_size_in_tbs = 1
      display_name           = "Staging ADB"
      db_name               = "STAGINGADB"
      admin_password        = "ComplexPassword123!"
      db_workload           = "OLTP"
      license_model         = "LICENSE_INCLUDED"
      whitelisted_ips       = ["10.0.0.0/16"]
      auto_scaling_enabled  = false
      backup_retention_days = 7
    }
  }
}

# Autonomous Database instances
resource "oci_database_autonomous_database" "fleet_databases" {
  for_each = var.database_environments
  
  compartment_id              = var.compartment_id
  cpu_core_count             = each.value.cpu_core_count
  data_storage_size_in_tbs   = each.value.data_storage_size_in_tbs
  db_name                    = each.value.db_name
  display_name               = each.value.display_name
  admin_password             = each.value.admin_password
  db_workload               = each.value.db_workload
  license_model             = each.value.license_model
  is_auto_scaling_enabled   = each.value.auto_scaling_enabled
  
  # Network security
  whitelisted_ips = each.value.whitelisted_ips
  subnet_id      = oci_core_subnet.database_subnet.id
  nsg_ids        = [oci_core_network_security_group.database_nsg.id]
  
  # Backup configuration
  backup_config {
    manual_backup_bucket_name = oci_objectstorage_bucket.backup_bucket[each.key].name
    manual_backup_type       = "OBJECT_STORE"
  }
  
  # Enable advanced features
  operations_insights_status = "ENABLED"
  database_management_status = "ENABLED"
  
  # Tags for fleet management
  defined_tags = {
    "Operations.Environment" = each.key
    "Operations.CostCenter" = "Database"
    "Operations.Owner"      = "DBA-Team"
  }
  
  lifecycle {
    ignore_changes = [
      admin_password,
    ]
  }
}

# Dedicated backup buckets per environment
resource "oci_objectstorage_bucket" "backup_bucket" {
  for_each       = var.database_environments
  compartment_id = var.compartment_id
  name          = "${each.key}-adb-backups"
  namespace     = data.oci_objectstorage_namespace.ns.namespace
  
  retention_rules {
    display_name = "backup-retention"
    duration {
      time_amount = each.value.backup_retention_days
      time_unit   = "DAYS"
    }
    time_rule_locked = formatdate("YYYY-MM-DD'T'hh:mm:ss'Z'", timeadd(timestamp(), "24h"))
  }
  
  object_events_enabled = true
  versioning           = "Enabled"
}

# Database monitoring alarms
resource "oci_monitoring_alarm" "cpu_utilization" {
  for_each                = var.database_environments
  compartment_id         = var.compartment_id
  destinations          = [oci_ons_notification_topic.database_alerts.id]
  display_name          = "${each.value.display_name} - High CPU"
  is_enabled            = true
  metric_compartment_id = var.compartment_id
  namespace             = "oci_autonomous_database"
  
  query = "CpuUtilization[5m]{resourceId = \"${oci_database_autonomous_database.fleet_databases[each.key].id}\"}.mean() > 80"
  
  severity = "WARNING"
  
  suppression {
    time_suppress_from  = "0T08:00:00Z"
    time_suppress_until = "0T09:00:00Z"
  }
  
  repeat_notification_duration = "PT2H"
}

resource "oci_monitoring_alarm" "storage_utilization" {
  for_each                = var.database_environments
  compartment_id         = var.compartment_id
  destinations          = [oci_ons_notification_topic.database_alerts.id]
  display_name          = "${each.value.display_name} - High Storage"
  is_enabled            = true
  metric_compartment_id = var.compartment_id
  namespace             = "oci_autonomous_database"
  
  query = "StorageUtilization[5m]{resourceId = \"${oci_database_autonomous_database.fleet_databases[each.key].id}\"}.mean() > 85"
  
  severity = "CRITICAL"
  repeat_notification_duration = "PT30M"
}

# Network Security Group for database access
resource "oci_core_network_security_group" "database_nsg" {
  compartment_id = var.compartment_id
  vcn_id        = oci_core_vcn.database_vcn.id
  display_name  = "database-nsg"
}

resource "oci_core_network_security_group_security_rule" "database_ingress_https" {
  network_security_group_id = oci_core_network_security_group.database_nsg.id
  direction                 = "INGRESS"
  protocol                  = "6"
  source                   = "10.0.0.0/16"
  source_type              = "CIDR_BLOCK"
  
  tcp_options {
    destination_port_range {
      max = 1522
      min = 1521
    }
  }
}

# Notification topic for database alerts
resource "oci_ons_notification_topic" "database_alerts" {
  compartment_id = var.compartment_id
  name          = "database-fleet-alerts"
  description   = "Alerts for Autonomous Database fleet"
}

Advanced Performance Monitoring Script





#!/usr/bin/env python3
"""
Advanced Autonomous Database Fleet Performance Monitor
Provides automated performance analysis, recommendation generation,
and proactive optimization suggestions.
"""

import oci
import json
import logging
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from typing import Dict, List, Tuple, Optional
import cx_Oracle
import asyncio
import aiohttp
from dataclasses import dataclass
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

@dataclass
class DatabaseMetrics:
    """Database performance metrics container"""
    database_id: str
    database_name: str
    cpu_utilization: float
    memory_utilization: float
    storage_utilization: float
    active_sessions: int
    blocked_sessions: int
    average_response_time: float
    throughput_transactions: float
    wait_events: Dict[str, float]
    top_sql: List[Dict]
    timestamp: datetime

@dataclass
class PerformanceRecommendation:
    """Performance optimization recommendation"""
    database_id: str
    category: str
    severity: str
    title: str
    description: str
    impact_score: float
    implementation_effort: str
    sql_statements: List[str]

class AutonomousDatabaseFleetMonitor:
    def __init__(self, config_file: str = 'config.json'):
        """Initialize the fleet monitoring system"""
        self.config = self._load_config(config_file)
        self.signer = oci.auth.signers.get_resource_principals_signer()
        
        # Initialize OCI clients
        self.db_client = oci.database.DatabaseClient({}, signer=self.signer)
        self.monitoring_client = oci.monitoring.MonitoringClient({}, signer=self.signer)
        self.ons_client = oci.ons.NotificationDataPlaneClient({}, signer=self.signer)
        
        # Performance thresholds
        self.thresholds = {
            'cpu_warning': 70.0,
            'cpu_critical': 85.0,
            'memory_warning': 75.0,
            'memory_critical': 90.0,
            'storage_warning': 80.0,
            'storage_critical': 90.0,
            'response_time_warning': 2.0,
            'response_time_critical': 5.0
        }
        
        # Initialize database connections cache
        self.db_connections = {}

    def _load_config(self, config_file: str) -> Dict:
        """Load configuration from JSON file"""
        try:
            with open(config_file, 'r') as f:
                return json.load(f)
        except FileNotFoundError:
            logger.error(f"Configuration file {config_file} not found")
            return {}

    async def monitor_fleet(self) -> List[DatabaseMetrics]:
        """Monitor all databases in the fleet"""
        databases = await self._discover_databases()
        monitoring_tasks = [
            self._monitor_database(db) for db in databases
        ]
        
        results = await asyncio.gather(*monitoring_tasks, return_exceptions=True)
        
        # Filter out exceptions and return valid metrics
        valid_metrics = [
            result for result in results 
            if isinstance(result, DatabaseMetrics)
        ]
        
        # Log any errors
        for result in results:
            if isinstance(result, Exception):
                logger.error(f"Monitoring error: {str(result)}")
        
        return valid_metrics

    async def _discover_databases(self) -> List[Dict]:
        """Discover all Autonomous Databases in the compartment"""
        try:
            response = self.db_client.list_autonomous_databases(
                compartment_id=self.config['compartment_id'],
                lifecycle_state='AVAILABLE'
            )
            return response.data
        except Exception as e:
            logger.error(f"Failed to discover databases: {str(e)}")
            return []

    async def _monitor_database(self, database: Dict) -> DatabaseMetrics:
        """Monitor individual database performance"""
        db_id = database.id
        db_name = database.display_name
        
        try:
            # Get connection to database
            connection = await self._get_database_connection(database)
            
            # Collect performance metrics
            cpu_util = await self._get_cpu_utilization(db_id)
            memory_util = await self._get_memory_utilization(connection)
            storage_util = await self._get_storage_utilization(db_id)
            session_metrics = await self._get_session_metrics(connection)
            response_time = await self._get_response_time_metrics(connection)
            throughput = await self._get_throughput_metrics(connection)
            wait_events = await self._get_wait_events(connection)
            top_sql = await self._get_top_sql_statements(connection)
            
            return DatabaseMetrics(
                database_id=db_id,
                database_name=db_name,
                cpu_utilization=cpu_util,
                memory_utilization=memory_util,
                storage_utilization=storage_util,
                active_sessions=session_metrics['active'],
                blocked_sessions=session_metrics['blocked'],
                average_response_time=response_time,
                throughput_transactions=throughput,
                wait_events=wait_events,
                top_sql=top_sql,
                timestamp=datetime.utcnow()
            )
            
        except Exception as e:
            logger.error(f"Error monitoring database {db_name}: {str(e)}")
            raise

    async def _get_database_connection(self, database: Dict):
        """Get or create database connection"""
        db_id = database.id
        
        if db_id not in self.db_connections:
            try:
                # Get connection details
                wallet_response = self.db_client.generate_autonomous_database_wallet(
                    autonomous_database_id=db_id,
                    generate_autonomous_database_wallet_details=oci.database.models.GenerateAutonomousDatabaseWalletDetails(
                        password="WalletPassword123!"
                    )
                )
                
                # Create connection (implementation depends on wallet setup)
                # This is a simplified example
                connection_string = f"{database.connection_urls.sql_dev_web_url}"
                
                connection = cx_Oracle.connect(
                    user="ADMIN",
                    password=self.config['admin_password'],
                    dsn=connection_string
                )
                
                self.db_connections[db_id] = connection
                
            except Exception as e:
                logger.error(f"Failed to connect to database {database.display_name}: {str(e)}")
                raise
        
        return self.db_connections[db_id]

    async def _get_cpu_utilization(self, database_id: str) -> float:
        """Get CPU utilization from OCI Monitoring"""
        try:
            end_time = datetime.utcnow()
            start_time = end_time - timedelta(minutes=5)
            
            response = self.monitoring_client.summarize_metrics_data(
                compartment_id=self.config['compartment_id'],
                summarize_metrics_data_details=oci.monitoring.models.SummarizeMetricsDataDetails(
                    namespace="oci_autonomous_database",
                    query=f'CpuUtilization[1m]{{resourceId = "{database_id}"}}.mean()',
                    start_time=start_time,
                    end_time=end_time
                )
            )
            
            if response.data and response.data[0].aggregated_datapoints:
                latest_datapoint = response.data[0].aggregated_datapoints[-1]
                return latest_datapoint.value
            
            return 0.0
            
        except Exception as e:
            logger.error(f"Failed to get CPU utilization: {str(e)}")
            return 0.0

    async def _get_memory_utilization(self, connection) -> float:
        """Get memory utilization from database"""
        try:
            cursor = connection.cursor()
            cursor.execute("""
                SELECT ROUND((1 - (bytes_free / bytes_total)) * 100, 2) as memory_usage_pct
                FROM (
                    SELECT SUM(bytes) as bytes_total
                    FROM v$sgainfo
                    WHERE name = 'Maximum SGA Size'
                ), (
                    SELECT SUM(bytes) as bytes_free
                    FROM v$sgastat
                    WHERE name = 'free memory'
                )
            """)
            result = cursor.fetchone()
            cursor.close()
            return float(result[0]) if result else 0.0
        except Exception as e:
            logger.error(f"Failed to get memory utilization: {str(e)}")
            return 0.0

    async def _get_storage_utilization(self, database_id: str) -> float:
        """Get storage utilization from OCI Monitoring"""
        try:
            end_time = datetime.utcnow()
            start_time = end_time - timedelta(minutes=5)
            
            response = self.monitoring_client.summarize_metrics_data(
                compartment_id=self.config['compartment_id'],
                summarize_metrics_data_details=oci.monitoring.models.SummarizeMetricsDataDetails(
                    namespace="oci_autonomous_database",
                    query=f'StorageUtilization[1m]{{resourceId = "{database_id}"}}.mean()',
                    start_time=start_time,
                    end_time=end_time
                )
            )
            
            if response.data and response.data[0].aggregated_datapoints:
                latest_datapoint = response.data[0].aggregated_datapoints[-1]
                return latest_datapoint.value
            
            return 0.0
            
        except Exception as e:
            logger.error(f"Failed to get storage utilization: {str(e)}")
            return 0.0

    async def _get_session_metrics(self, connection) -> Dict[str, int]:
        """Get session metrics from database"""
        try:
            cursor = connection.cursor()
            cursor.execute("""
                SELECT 
                    COUNT(CASE WHEN status = 'ACTIVE' THEN 1 END) as active_sessions,
                    COUNT(CASE WHEN blocking_session IS NOT NULL THEN 1 END) as blocked_sessions
                FROM v$session
                WHERE type = 'USER'
            """)
            result = cursor.fetchone()
            cursor.close()
            
            return {
                'active': int(result[0]) if result[0] else 0,
                'blocked': int(result[1]) if result[1] else 0
            }
        except Exception as e:
            logger.error(f"Failed to get session metrics: {str(e)}")
            return {'active': 0, 'blocked': 0}

    async def _get_response_time_metrics(self, connection) -> float:
        """Get average response time metrics"""
        try:
            cursor = connection.cursor()
            cursor.execute("""
                SELECT AVG(elapsed_time) / 1000000 as avg_response_time_seconds
                FROM v$sql
                WHERE last_active_time > SYSDATE - 1/24
                AND executions > 0
            """)
            result = cursor.fetchone()
            cursor.close()
            return float(result[0]) if result and result[0] else 0.0
        except Exception as e:
            logger.error(f"Failed to get response time metrics: {str(e)}")
            return 0.0

    async def _get_throughput_metrics(self, connection) -> float:
        """Get transaction throughput metrics"""
        try:
            cursor = connection.cursor()
            cursor.execute("""
                SELECT value
                FROM v$sysstat
                WHERE name = 'user commits'
            """)
            result = cursor.fetchone()
            cursor.close()
            return float(result[0]) if result else 0.0
        except Exception as e:
            logger.error(f"Failed to get throughput metrics: {str(e)}")
            return 0.0

    async def _get_wait_events(self, connection) -> Dict[str, float]:
        """Get top wait events"""
        try:
            cursor = connection.cursor()
            cursor.execute("""
                SELECT event, time_waited_micro / 1000000 as time_waited_seconds
                FROM v$system_event
                WHERE wait_class != 'Idle'
                ORDER BY time_waited_micro DESC
                FETCH FIRST 10 ROWS ONLY
            """)
            results = cursor.fetchall()
            cursor.close()
            
            return {row[0]: float(row[1]) for row in results}
        except Exception as e:
            logger.error(f"Failed to get wait events: {str(e)}")
            return {}

    async def _get_top_sql_statements(self, connection) -> List[Dict]:
        """Get top SQL statements by various metrics"""
        try:
            cursor = connection.cursor()
            cursor.execute("""
                SELECT 
                    sql_id,
                    executions,
                    elapsed_time / 1000000 as elapsed_seconds,
                    cpu_time / 1000000 as cpu_seconds,
                    buffer_gets,
                    disk_reads,
                    SUBSTR(sql_text, 1, 100) as sql_text_preview
                FROM v$sql
                WHERE executions > 0
                ORDER BY elapsed_time DESC
                FETCH FIRST 20 ROWS ONLY
            """)
            results = cursor.fetchall()
            cursor.close()
            
            return [
                {
                    'sql_id': row[0],
                    'executions': int(row[1]),
                    'elapsed_seconds': float(row[2]),
                    'cpu_seconds': float(row[3]),
                    'buffer_gets': int(row[4]),
                    'disk_reads': int(row[5]),
                    'sql_text_preview': row[6]
                }
                for row in results
            ]
        except Exception as e:
            logger.error(f"Failed to get top SQL statements: {str(e)}")
            return []

    async def analyze_performance(self, metrics: List[DatabaseMetrics]) -> List[PerformanceRecommendation]:
        """Analyze performance metrics and generate recommendations"""
        recommendations = []
        
        for metric in metrics:
            # CPU analysis
            if metric.cpu_utilization > self.thresholds['cpu_critical']:
                recommendations.append(
                    PerformanceRecommendation(
                        database_id=metric.database_id,
                        category="CPU",
                        severity="CRITICAL",
                        title="High CPU Utilization",
                        description=f"CPU utilization is {metric.cpu_utilization:.1f}%, exceeding critical threshold",
                        impact_score=0.9,
                        implementation_effort="LOW",
                        sql_statements=["ALTER DATABASE SET auto_scaling = TRUE;"]
                    )
                )
            
            # Memory analysis
            if metric.memory_utilization > self.thresholds['memory_critical']:
                recommendations.append(
                    PerformanceRecommendation(
                        database_id=metric.database_id,
                        category="MEMORY",
                        severity="CRITICAL",
                        title="High Memory Utilization",
                        description=f"Memory utilization is {metric.memory_utilization:.1f}%, consider scaling up",
                        impact_score=0.8,
                        implementation_effort="MEDIUM",
                        sql_statements=["-- Consider increasing CPU cores to get more memory"]
                    )
                )
            
            # Storage analysis
            if metric.storage_utilization > self.thresholds['storage_critical']:
                recommendations.append(
                    PerformanceRecommendation(
                        database_id=metric.database_id,
                        category="STORAGE",
                        severity="CRITICAL",
                        title="High Storage Utilization",
                        description=f"Storage utilization is {metric.storage_utilization:.1f}%, expand storage immediately",
                        impact_score=0.95,
                        implementation_effort="LOW",
                        sql_statements=["-- Storage will auto-expand, monitor costs"]
                    )
                )
            
            # Session analysis
            if metric.blocked_sessions > 0:
                recommendations.append(
                    PerformanceRecommendation(
                        database_id=metric.database_id,
                        category="CONCURRENCY",
                        severity="WARNING",
                        title="Blocked Sessions Detected",
                        description=f"{metric.blocked_sessions} blocked sessions found, investigate locking",
                        impact_score=0.7,
                        implementation_effort="HIGH",
                        sql_statements=[
                            "SELECT * FROM v$lock WHERE block > 0;",
                            "SELECT * FROM v$session WHERE blocking_session IS NOT NULL;"
                        ]
                    )
                )
            
            # Response time analysis
            if metric.average_response_time > self.thresholds['response_time_critical']:
                recommendations.append(
                    PerformanceRecommendation(
                        database_id=metric.database_id,
                        category="PERFORMANCE",
                        severity="WARNING",
                        title="High Response Time",
                        description=f"Average response time is {metric.average_response_time:.2f}s, optimize queries",
                        impact_score=0.6,
                        implementation_effort="HIGH",
                        sql_statements=[
                            "-- Review top SQL statements for optimization opportunities",
                            "-- Consider adding indexes for frequently accessed data"
                        ]
                    )
                )
        
        return recommendations

    async def generate_fleet_report(self, metrics: List[DatabaseMetrics], 
                                  recommendations: List[PerformanceRecommendation]) -> str:
        """Generate comprehensive fleet performance report"""
        report = f"""
# Autonomous Database Fleet Performance Report
Generated: {datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S UTC')}

## Fleet Summary
- Total Databases: {len(metrics)}
- Databases with Issues: {len([m for m in metrics if any(r.database_id == m.database_id for r in recommendations)])}
- Critical Recommendations: {len([r for r in recommendations if r.severity == 'CRITICAL'])}

## Database Performance Overview
"""
        
        for metric in metrics:
            db_recommendations = [r for r in recommendations if r.database_id == metric.database_id]
            critical_issues = len([r for r in db_recommendations if r.severity == 'CRITICAL'])
            
            report += f"""
### {metric.database_name}
- CPU Utilization: {metric.cpu_utilization:.1f}%
- Memory Utilization: {metric.memory_utilization:.1f}%
- Storage Utilization: {metric.storage_utilization:.1f}%
- Active Sessions: {metric.active_sessions}
- Blocked Sessions: {metric.blocked_sessions}
- Average Response Time: {metric.average_response_time:.2f}s
- Critical Issues: {critical_issues}
"""
        
        if recommendations:
            report += "\n## Recommendations\n"
            for rec in sorted(recommendations, key=lambda x: x.impact_score, reverse=True):
                report += f"""
### {rec.title} - {rec.severity}
- Database: {next(m.database_name for m in metrics if m.database_id == rec.database_id)}
- Category: {rec.category}
- Impact Score: {rec.impact_score:.1f}
- Implementation Effort: {rec.implementation_effort}
- Description: {rec.description}
"""
        
        return report

# Main execution function
async def main():
    """Main monitoring execution"""
    monitor = AutonomousDatabaseFleetMonitor()
    
    try:
        # Monitor fleet
        logger.info("Starting fleet monitoring...")
        metrics = await monitor.monitor_fleet()
        logger.info(f"Collected metrics from {len(metrics)} databases")
        
        # Analyze performance
        recommendations = await monitor.analyze_performance(metrics)
        logger.info(f"Generated {len(recommendations)} recommendations")
        
        # Generate report
        report = await monitor.generate_fleet_report(metrics, recommendations)
        
        # Save report
        with open(f"fleet_report_{datetime.utcnow().strftime('%Y%m%d_%H%M%S')}.md", 'w') as f:
            f.write(report)
        
        logger.info("Fleet monitoring completed successfully")
        
    except Exception as e:
        logger.error(f"Fleet monitoring failed: {str(e)}")
        raise

if __name__ == "__main__":
    asyncio.run(main())

Advanced Performance Optimization Techniques

Autonomous Database provides several advanced optimization features that can be leveraged programmatically. Automatic indexing continuously monitors query patterns and creates or drops indexes based on actual usage patterns. This feature eliminates the traditional DBA task of index management while ensuring optimal query performance.

SQL plan management automatically captures and evolves execution plans, preventing performance regressions when statistics change or new Oracle versions are deployed. The system maintains a repository of proven execution plans and automatically selects the best plan for each SQL statement.

Real-time SQL monitoring provides detailed execution statistics for long-running queries, enabling identification of performance bottlenecks during execution rather than after completion. This capability is essential for optimizing complex analytical workloads and batch processing operations.

Automated Scaling and Cost Optimization

Autonomous Database’s auto-scaling feature dynamically adjusts CPU resources based on workload demands, but understanding the patterns enables better cost optimization. Monitoring CPU utilization patterns over time reveals opportunities for right-sizing base allocations while maintaining auto-scaling for peak periods.

Scheduled scaling operations can be implemented to proactively adjust resources for known workload patterns, such as batch processing windows or business reporting cycles. This approach optimizes costs by scaling down during predictable low-usage periods.

Storage auto-expansion occurs automatically, but monitoring growth patterns enables better capacity planning and cost forecasting. Integration with OCI Cost Management APIs provides automated cost tracking and budget alerting capabilities.

Security and Compliance Automation

Database security automation encompasses multiple layers of protection. Automatic patching ensures systems remain current with security updates without manual intervention. Data encryption occurs automatically for data at rest and in transit, with key rotation handled transparently.

Audit logging automation captures all database activities and integrates with OCI Logging Analytics for security event correlation and threat detection. Automated compliance reporting generates audit trails required for regulatory compliance frameworks.

Access control automation integrates with OCI Identity and Access Management to ensure consistent security policies across the database fleet. Database user lifecycle management can be automated through integration with enterprise identity management systems.

This comprehensive approach to Autonomous Database management enables organizations to operate enterprise-scale database fleets with minimal administrative overhead while maintaining optimal performance, security, and cost efficiency.

Integration with DevOps Pipelines

Modern database operations require seamless integration with CI/CD pipelines and DevOps workflows. Autonomous Database supports automated schema migrations and application deployments through integration with OCI DevOps service and popular tools like Jenkins, GitLab CI, and GitHub Actions.

Database schema versioning becomes manageable through automated migration scripts that can be tested in development environments before production deployment. The immutable infrastructure approach ensures consistent database configurations across environments while maintaining data integrity during updates.

Blue-green deployment strategies for database schema changes minimize downtime and provide instant rollback capabilities. The approach involves maintaining parallel database environments and switching traffic after successful validation of schema changes.

Automated Database Lifecycle Management Script





#!/bin/bash
# Database Lifecycle Management Automation
# Handles provisioning, configuration, monitoring, and decommissioning

set -e

# Configuration
ENVIRONMENT=${1:-"development"}
ACTION=${2:-"provision"}
CONFIG_FILE="database-config-${ENVIRONMENT}.json"

# Load configuration
if [[ ! -f "$CONFIG_FILE" ]]; then
    echo "Configuration file $CONFIG_FILE not found"
    exit 1
fi

DATABASE_NAME=$(jq -r '.database_name' "$CONFIG_FILE")
CPU_CORES=$(jq -r '.cpu_cores' "$CONFIG_FILE")
STORAGE_TB=$(jq -r '.storage_tb' "$CONFIG_FILE")
COMPARTMENT_ID=$(jq -r '.compartment_id' "$CONFIG_FILE")

echo "Managing database lifecycle: $DATABASE_NAME ($ENVIRONMENT)"

case $ACTION in
    "provision")
        echo "Provisioning new Autonomous Database..."
        
        # Create database using OCI CLI
        oci db autonomous-database create \
            --compartment-id "$COMPARTMENT_ID" \
            --db-name "$DATABASE_NAME" \
            --display-name "${DATABASE_NAME}-${ENVIRONMENT}" \
            --cpu-core-count "$CPU_CORES" \
            --data-storage-size-in-tbs "$STORAGE_TB" \
            --admin-password "$ADMIN_PASSWORD" \
            --db-workload "OLTP" \
            --is-auto-scaling-enabled true \
            --license-model "LICENSE_INCLUDED" \
            --wait-for-state "AVAILABLE" \
            --max-wait-seconds 3600
        
        echo "Database provisioned successfully"
        
        # Apply initial configuration
        ./configure-database.sh "$DATABASE_NAME" "$ENVIRONMENT"
        
        # Set up monitoring
        ./setup-monitoring.sh "$DATABASE_NAME" "$ENVIRONMENT"
        ;;
        
    "scale")
        echo "Scaling database resources..."
        
        # Get current database OCID
        DB_OCID=$(oci db autonomous-database list \
            --compartment-id "$COMPARTMENT_ID" \
            --display-name "${DATABASE_NAME}-${ENVIRONMENT}" \
            --query 'data[0].id' \
            --raw-output)
        
        # Scale CPU cores
        oci db autonomous-database update \
            --autonomous-database-id "$DB_OCID" \
            --cpu-core-count "$CPU_CORES" \
            --wait-for-state "AVAILABLE"
        
        echo "Database scaled successfully"
        ;;
        
    "backup")
        echo "Creating manual backup..."
        
        DB_OCID=$(oci db autonomous-database list \
            --compartment-id "$COMPARTMENT_ID" \
            --display-name "${DATABASE_NAME}-${ENVIRONMENT}" \
            --query 'data[0].id' \
            --raw-output)
        
        BACKUP_NAME="${DATABASE_NAME}-manual-$(date +%Y%m%d-%H%M%S)"
        
        oci db autonomous-database-backup create \
            --autonomous-database-id "$DB_OCID" \
            --display-name "$BACKUP_NAME" \
            --wait-for-state "ACTIVE"
        
        echo "Backup created: $BACKUP_NAME"
        ;;
        
    "clone")
        echo "Creating database clone..."
        
        SOURCE_DB_OCID=$(oci db autonomous-database list \
            --compartment-id "$COMPARTMENT_ID" \
            --display-name "${DATABASE_NAME}-production" \
            --query 'data[0].id' \
            --raw-output)
        
        CLONE_NAME="${DATABASE_NAME}-${ENVIRONMENT}-$(date +%Y%m%d)"
        
        oci db autonomous-database create-from-clone \
            --compartment-id "$COMPARTMENT_ID" \
            --source-id "$SOURCE_DB_OCID" \
            --db-name "${DATABASE_NAME}CLONE" \
            --display-name "$CLONE_NAME" \
            --admin-password "$ADMIN_PASSWORD" \
            --wait-for-state "AVAILABLE"
        
        echo "Clone created: $CLONE_NAME"
        ;;
        
    "migrate-schema")
        echo "Applying schema migrations..."
        
        # Connect to database and apply migrations
        python3 << EOF
import cx_Oracle
import os
import glob

# Database connection
connection = cx_Oracle.connect(
    user="ADMIN",
    password=os.environ['ADMIN_PASSWORD'],
    dsn=os.environ['DATABASE_CONNECTION_STRING']
)

cursor = connection.cursor()

# Create migration tracking table if not exists
cursor.execute("""
    BEGIN
        EXECUTE IMMEDIATE 'CREATE TABLE schema_migrations (
            version VARCHAR2(50) PRIMARY KEY,
            applied_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
            checksum VARCHAR2(64)
        )';
    EXCEPTION
        WHEN OTHERS THEN
            IF SQLCODE != -955 THEN  -- Table already exists
                RAISE;
            END IF;
    END;
""")

# Get applied migrations
cursor.execute("SELECT version FROM schema_migrations ORDER BY version")
applied_migrations = {row[0] for row in cursor.fetchall()}

# Apply new migrations
migration_files = sorted(glob.glob('migrations/*.sql'))
for migration_file in migration_files:
    version = os.path.basename(migration_file).split('_')[0]
    
    if version not in applied_migrations:
        print(f"Applying migration: {migration_file}")
        
        with open(migration_file, 'r') as f:
            migration_sql = f.read()
        
        # Calculate checksum
        import hashlib
        checksum = hashlib.sha256(migration_sql.encode()).hexdigest()
        
        # Apply migration
        for statement in migration_sql.split(';'):
            if statement.strip():
                cursor.execute(statement)
        
        # Record migration
        cursor.execute(
            "INSERT INTO schema_migrations (version, checksum) VALUES (:1, :2)",
            (version, checksum)
        )
        
        connection.commit()
        print(f"Migration {version} applied successfully")

cursor.close()
connection.close()
EOF
        ;;
        
    "performance-report")
        echo "Generating performance report..."
        
        python3 performance_monitor.py --environment "$ENVIRONMENT" --report-type comprehensive
        
        # Upload report to Object Storage
        REPORT_FILE="fleet_report_$(date +%Y%m%d_%H%M%S).md"
        
        oci os object put \
            --bucket-name "database-reports" \
            --name "$REPORT_FILE" \
            --file "$REPORT_FILE"
        
        echo "Performance report uploaded to Object Storage"
        ;;
        
    "decommission")
        echo "Decommissioning database..."
        
        # Create final backup before deletion
        ./database-lifecycle.sh "$ENVIRONMENT" backup
        
        # Get database OCID
        DB_OCID=$(oci db autonomous-database list \
            --compartment-id "$COMPARTMENT_ID" \
            --display-name "${DATABASE_NAME}-${ENVIRONMENT}" \
            --query 'data[0].id' \
            --raw-output)
        
        # Terminate database
        oci db autonomous-database delete \
            --autonomous-database-id "$DB_OCID" \
            --force \
            --wait-for-state "TERMINATED"
        
        echo "Database decommissioned successfully"
        ;;
        
    *)
        echo "Usage: $0 <environment> <action>"
        echo "Actions: provision, scale, backup, clone, migrate-schema, performance-report, decommission"
        exit 1
        ;;
esac

echo "Database lifecycle operation completed successfully"

Advanced Monitoring and Alerting Strategies

Enterprise database monitoring requires sophisticated alerting strategies that go beyond simple threshold-based alerts. Predictive alerting uses machine learning algorithms to identify trends that may lead to performance issues before they impact users.

Anomaly detection compares current performance metrics against historical baselines to identify unusual patterns that may indicate emerging problems. This approach is particularly effective for detecting gradual performance degradation that might not trigger traditional threshold-based alerts.

Correlation analysis across multiple databases in the fleet can identify systematic issues affecting multiple systems simultaneously. This capability is essential for detecting infrastructure-level problems or common configuration issues across the database estate.

Custom Metrics Collection and Analysis

# Custom metrics collection for advanced analytics
class DatabaseMetricsCollector:
    def __init__(self):
        self.metrics_buffer = []
        self.anomaly_detector = IsolationForest(contamination=0.1)
        
    async def collect_custom_metrics(self, connection) -> Dict:
        """Collect custom performance metrics"""
        custom_metrics = {}
        
        # SQL execution patterns
        cursor = connection.cursor()
        cursor.execute("""
            SELECT 
                sql_id,
                plan_hash_value,
                executions,
                elapsed_time,
                cpu_time,
                buffer_gets,
                rows_processed,
                optimizer_cost
            FROM v$sql
            WHERE last_active_time > SYSDATE - 1/24
            AND executions > 10
        """)
        
        sql_metrics = cursor.fetchall()
        custom_metrics['sql_efficiency'] = self._calculate_sql_efficiency(sql_metrics)
        
        # Wait event analysis
        cursor.execute("""
            SELECT event, total_waits, time_waited_micro
            FROM v$system_event
            WHERE wait_class != 'Idle'
            AND total_waits > 0
        """)
        
        wait_events = cursor.fetchall()
        custom_metrics['wait_distribution'] = self._analyze_wait_distribution(wait_events)
        
        # Lock contention analysis
        cursor.execute("""
            SELECT 
                COUNT(*) as total_locks,
                COUNT(CASE WHEN lmode > 0 THEN 1 END) as active_locks,
                COUNT(CASE WHEN request > 0 THEN 1 END) as waiting_locks
            FROM v$lock
        """)
        
        lock_data = cursor.fetchone()
        custom_metrics['lock_contention'] = {
            'total_locks': lock_data[0],
            'active_locks': lock_data[1],
            'waiting_locks': lock_data[2],
            'contention_ratio': lock_data[2] / max(lock_data[0], 1)
        }
        
        cursor.close()
        return custom_metrics
    
    def _calculate_sql_efficiency(self, sql_metrics: List) -> Dict:
        """Calculate SQL execution efficiency metrics"""
        if not sql_metrics:
            return {'average_efficiency': 0, 'inefficient_queries': 0}
        
        efficiency_scores = []
        inefficient_count = 0
        
        for metric in sql_metrics:
            executions = metric[2]
            elapsed_time = metric[3]
            rows_processed = max(metric[6], 1)
            
            # Calculate efficiency as rows per second
            avg_elapsed = elapsed_time / executions / 1000000  # Convert to seconds
            efficiency = rows_processed / max(avg_elapsed, 0.001)
            efficiency_scores.append(efficiency)
            
            # Flag inefficient queries (less than 100 rows per second)
            if efficiency < 100:
                inefficient_count += 1
        
        return {
            'average_efficiency': np.mean(efficiency_scores),
            'inefficient_queries': inefficient_count,
            'efficiency_distribution': np.percentile(efficiency_scores, [25, 50, 75, 95])
        }
    
    def _analyze_wait_distribution(self, wait_events: List) -> Dict:
        """Analyze wait event distribution patterns"""
        if not wait_events:
            return {}
        
        total_wait_time = sum(event[2] for event in wait_events)
        wait_distribution = {}
        
        for event in wait_events:
            event_name = event[0]
            wait_time = event[2]
            percentage = (wait_time / total_wait_time) * 100
            
            wait_distribution[event_name] = {
                'total_waits': event[1],
                'time_waited_micro': wait_time,
                'percentage': percentage
            }
        
        # Identify top wait events
        top_waits = sorted(
            wait_distribution.items(),
            key=lambda x: x[1]['percentage'],
            reverse=True
        )[:5]
        
        return {
            'distribution': wait_distribution,
            'top_wait_events': top_waits,
            'io_intensive': any('read' in event[0].lower() for event in top_waits),
            'cpu_intensive': any('cpu' in event[0].lower() for event in top_waits)
        }
    
    async def detect_anomalies(self, current_metrics: Dict, 
                             historical_metrics: List[Dict]) -> List[Dict]:
        """Detect performance anomalies using machine learning"""
        if len(historical_metrics) < 50:  # Need sufficient historical data
            return []
        
        # Prepare feature vectors
        features = ['cpu_utilization', 'memory_utilization', 'active_sessions', 
                   'average_response_time', 'throughput_transactions']
        
        historical_vectors = []
        for metrics in historical_metrics:
            vector = [metrics.get(feature, 0) for feature in features]
            historical_vectors.append(vector)
        
        current_vector = [current_metrics.get(feature, 0) for feature in features]
        
        # Train anomaly detector
        self.anomaly_detector.fit(historical_vectors)
        
        # Detect anomalies
        is_anomaly = self.anomaly_detector.predict([current_vector])[0] == -1
        anomaly_score = self.anomaly_detector.decision_function([current_vector])[0]
        
        anomalies = []
        if is_anomaly:
            # Identify which metrics are anomalous
            feature_importance = self._calculate_feature_importance(
                current_vector, historical_vectors, features
            )
            
            anomalies.append({
                'type': 'performance_anomaly',
                'severity': 'warning' if anomaly_score > -0.5 else 'critical',
                'score': anomaly_score,
                'affected_metrics': feature_importance,
                'timestamp': datetime.utcnow().isoformat()
            })
        
        return anomalies

Cost Optimization and Resource Management

Autonomous Database cost optimization requires understanding usage patterns and implementing intelligent resource management strategies. The service offers multiple pricing models including OCPU-based pricing for predictable workloads and serverless pricing for variable workloads.

Resource scheduling enables automatic scaling operations based on business requirements. Development and testing environments can be automatically scaled down during non-business hours, while production systems maintain consistent performance levels.

Storage optimization involves monitoring data growth patterns and implementing archival strategies for historical data. Integration with OCI Archive Storage provides cost-effective long-term data retention while maintaining accessibility for compliance requirements.

Cross-region cost analysis helps optimize placement of database instances based on data locality and network costs. Understanding data transfer patterns enables better architectural decisions for multi-region deployments.

Disaster Recovery and Business Continuity

Autonomous Database disaster recovery capabilities extend beyond traditional backup and restore operations. Autonomous Data Guard provides automatic failover capabilities with real-time data synchronization across regions.

Recovery time objectives (RTO) and recovery point objectives (RPO) can be configured based on business requirements. The service supports both automatic and manual failover scenarios, with comprehensive testing capabilities to validate disaster recovery procedures.

Cross-region cloning enables rapid creation of database copies for disaster recovery testing without impacting production operations. This capability is essential for meeting compliance requirements that mandate regular disaster recovery validation.

Backup retention policies can be automated based on regulatory requirements, with automatic lifecycle management transitioning older backups to lower-cost storage tiers while maintaining accessibility for compliance audits.

Regards
Osama

Building a Serverless Event-Driven Architecture with AWS EventBridge, SQS, and Lambda

In this blog, we’ll design a system where:

  1. Events (e.g., order placements, file uploads) are published to EventBridge.
  2. SQS queues act as durable buffers for downstream processing.
  3. Lambda functions consume events and take action (e.g., send notifications, update databases).

Architecture Overview

![EventBridge → SQS → Lambda Architecture]
(Visual: Producers → EventBridge → SQS → Lambda Consumers)

  1. Event Producers (e.g., API Gateway, S3, custom apps) emit events.
  2. EventBridge routes events to targets (e.g., SQS queues).
  3. SQS ensures reliable delivery and decoupling.
  4. Lambda processes events asynchronously.

Step-by-Step Implementation

1. Set Up an EventBridge Event Bus

Create a custom event bus (or use the default one):

aws events create-event-bus --name MyEventBus

2. Define an Event Rule to Route Events to SQS

Create a rule to forward events matching a pattern (e.g., order_placed) to an SQS queue:

aws events put-rule \
  --name "OrderPlacedRule" \
  --event-pattern '{"detail-type": ["order_placed"]}' \
  --event-bus-name "MyEventBus"

3. Create an SQS Queue and Link It to EventBridge

Create a queue and grant EventBridge permission to send messages:

aws sqs create-queue --queue-name OrderProcessingQueue

Attach the queue as a target to the EventBridge rule:

aws events put-targets \
  --rule "OrderPlacedRule" \
  --targets "Id"="OrderQueueTarget","Arn"="arn:aws:sqs:us-east-1:123456789012:OrderProcessingQueue" \
  --event-bus-name "MyEventBus"

4. Write a Lambda Function to Process SQS Messages

Create a Lambda function (process_order.py) to poll the queue and process orders:

import json
import boto3

def lambda_handler(event, context):
    for record in event['Records']:
        message = json.loads(record['body'])
        order_id = message['detail']['orderId']
        
        print(f"Processing order: {order_id}")
        # Add business logic (e.g., update DynamoDB, send SNS notification)
        
    return {"status": "processed"}

5. Configure SQS as a Lambda Trigger

In the AWS Console:

  • Go to Lambda → Add Trigger → SQS.
  • Select OrderProcessingQueue and set batch size (e.g., 10 messages per invocation).

6. Test the Flow

Emit a test event to EventBridge:

aws events put-events \
  --entries '[{
    "EventBusName": "MyEventBus",
    "Source": "my.app",
    "DetailType": "order_placed",
    "Detail": "{ \"orderId\": \"123\", \"amount\": 50 }"
  }]'

Verify the flow:

  1. EventBridge routes the event to SQS.
  2. Lambda picks up the message and logs:
Processing order: 123  

Use Cases

  • Order processing (e.g., e-commerce workflows).
  • File upload pipelines (e.g., resize images after S3 upload).
  • Notifications (e.g., send emails/SMS for system events).

Enjoy
Thank you
Osama

Real-Time Data Processing with AWS Kinesis, Lambda, and DynamoDB

Many applications today require real-time data processing—whether it’s for analytics, monitoring, or triggering actions. AWS provides powerful services like Amazon Kinesis for streaming data, AWS Lambda for serverless processing, and DynamoDB for scalable storage.

In this blog, we’ll build a real-time data pipeline that:

  1. Ingests streaming data (e.g., clickstream, IoT sensor data, or logs) using Kinesis Data Streams.
  2. Processes records in real-time using Lambda.
  3. Stores aggregated results in DynamoDB for querying.

Architecture Overview

![AWS Kinesis + Lambda + DynamoDB Architecture]
(Visual: Kinesis → Lambda → DynamoDB)

  1. Kinesis Data Stream – Captures high-velocity data.
  2. Lambda Function – Processes records as they arrive.
  3. DynamoDB Table – Stores aggregated results (e.g., counts, metrics).

Step-by-Step Implementation

1. Set Up a Kinesis Data Stream

Create a Kinesis stream to ingest data:

aws kinesis create-stream --stream-name ClickStream --shard-count 1

Producers (e.g., web apps, IoT devices) can send data like:

{
  "userId": "user123",
  "action": "click",
  "timestamp": "2024-05-20T12:00:00Z"
}

2. Create a Lambda Function to Process Streams

Write a Python Lambda function (process_stream.py) to:

  • Read records from Kinesis.
  • Aggregate data (e.g., count clicks per user).
  • Update DynamoDB.
import json
import boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('UserClicks')

def lambda_handler(event, context):
    for record in event['Records']:
        payload = json.loads(record['kinesis']['data'])
        user_id = payload['userId']
        
        # Update DynamoDB (increment click count)
        table.update_item(
            Key={'userId': user_id},
            UpdateExpression="ADD clicks :incr",
            ExpressionAttributeValues={':incr': 1}
        )
    return {"status": "success"}

3. Configure Lambda as a Kinesis Consumer

In the AWS Console:

  • Go to Lambda → Create Function → Python.
  • Add Kinesis as the trigger (select your stream).
  • Set batch size (e.g., 100 records per invocation).

4. Set Up DynamoDB for Aggregations

Create a table with userId as the primary key:

aws dynamodb create-table \
    --table-name UserClicks \
    --attribute-definitions AttributeName=userId,AttributeType=S \
    --key-schema AttributeName=userId,KeyType=HASH \
    --billing-mode PAY_PER_REQUEST

5. Test the Pipeline

Send test data to Kinesis:

aws kinesis put-record \
    --stream-name ClickStream \
    --data '{"userId": "user123", "action": "click"}' \
    --partition-key user123

Check DynamoDB for aggregated results:

aws dynamodb get-item --table-name UserClicks --key '{"userId": {"S": "user123"}}'

Output:

{ "userId": "user123", "clicks": 1 }

Use Cases

  • Real-time analytics (e.g., dashboard for user activity).
  • Fraud detection (trigger alerts for unusual patterns).
  • IoT monitoring (process sensor data in real-time).

Enjoy
Thank you
Osama

Building a Scalable Web Application Using AWS Lambda, API Gateway, and DynamoDB

s?

Let’s imagine we want to build a To-Do List Application where users can:

  • Add tasks to their list.
  • View all tasks.
  • Mark tasks as completed.

We’ll use the following architecture:

  1. API Gateway to handle HTTP requests.
  2. Lambda Functions to process business logic.
  3. DynamoDB to store task data.

Step 1: Setting Up DynamoDB

First, we need a database to store our tasks. DynamoDB is an excellent choice because it scales automatically and provides low-latency access.

Creating a DynamoDB Table

  1. Open the AWS Management Console and navigate to DynamoDB .
  2. Click Create Table .
    • Table Name : TodoList
    • Primary Key : id (String)
  3. Enable Auto Scaling for read/write capacity units to ensure the table scales based on demand.

Sample Table Structure

id (Primary Key)task_namestatus
1Buy groceriesPending
2Read a bookCompleted

Step 2: Creating Lambda Functions

Next, we’ll create Lambda functions to handle CRUD operations for our To-Do List application.

Lambda Function: Create Task

This function will insert a new task into the TodoList table.

import json
import boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('TodoList')

def lambda_handler(event, context):
    # Extract task details from the event
    task_name = event['task_name']
    
    # Generate a unique ID for the task
    import uuid
    task_id = str(uuid.uuid4())
    
    # Insert the task into DynamoDB
    table.put_item(
        Item={
            'id': task_id,
            'task_name': task_name,
            'status': 'Pending'
        }
    )
    
    return {
        'statusCode': 200,
        'body': json.dumps({'message': 'Task created successfully!', 'task_id': task_id})
    }

Lambda Function: Get All Tasks

This function retrieves all tasks from the TodoList table.

import json
import boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('TodoList')

def lambda_handler(event, context):
    # Scan the DynamoDB table
    response = table.scan()
    
    # Return the list of tasks
    return {
        'statusCode': 200,
        'body': json.dumps(response['Items'])
    }

Lambda Function: Update Task Status

This function updates the status of a task (e.g., mark as completed).

import json
import boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('TodoList')

def lambda_handler(event, context):
    # Extract task ID and new status from the event
    task_id = event['id']
    new_status = event['status']
    
    # Update the task in DynamoDB
    table.update_item(
        Key={'id': task_id},
        UpdateExpression='SET #status = :new_status',
        ExpressionAttributeNames={'#status': 'status'},
        ExpressionAttributeValues={':new_status': new_status}
    )
    
    return {
        'statusCode': 200,
        'body': json.dumps({'message': 'Task updated successfully!'})
    }

Step 3: Configuring API Gateway

Now that we have our Lambda functions, we’ll expose them via API Gateway.

Steps to Set Up API Gateway

  1. Open the AWS Management Console and navigate to API Gateway .
  2. Click Create API and select HTTP API .
  3. Define the following routes:
    • POST /tasks : Maps to the “Create Task” Lambda function.
    • GET /tasks : Maps to the “Get All Tasks” Lambda function.
    • PUT /tasks/{id} : Maps to the “Update Task Status” Lambda function.
  4. Deploy the API and note the endpoint URL.

Step 4: Testing the Application

Once everything is set up, you can test the application using tools like Postman or cURL .

Example Requests

  1. Create a Task
curl -X POST https://<api-id>.execute-api.<region>.amazonaws.com/tasks \
-H "Content-Type: application/json" \
-d '{"task_name": "Buy groceries"}'

Get All Tasks

curl -X GET https://<api-id>.execute-api.<region>.amazonaws.com/tasks

Update Task Status

curl -X PUT https://<api-id>.execute-api.<region>.amazonaws.com/tasks/<task-id> \
-H "Content-Type: application/json" \
-d '{"status": "Completed"}'

Benefits of This Architecture

  1. Scalability : DynamoDB and Lambda automatically scale to handle varying loads.
  2. Cost Efficiency : You only pay for the compute time and storage you use.
  3. Low Maintenance : AWS manages the underlying infrastructure, reducing operational overhead.

Enjoy the cloud 😁
Osama

Setting up a High-Availability (HA) Architecture with OCI Load Balancer and Compute Instances

Ensuring high availability (HA) for your applications is critical in today’s cloud-first environment. Oracle Cloud Infrastructure (OCI) provides robust tools such as Load Balancers and Compute Instances to help you create a resilient, highly available architecture for your applications. In this post, we’ll walk through the steps to set up an HA architecture using OCI Load Balancer with multiple compute instances across availability domains for fault tolerance.

Prerequisites

  • OCI Account: A working Oracle Cloud Infrastructure account.
  • OCI CLI: Installed and configured with necessary permissions.
  • Terraform: Installed and set up for provisioning infrastructure.
  • Basic knowledge of Load Balancers and Compute Instances in OCI.

Step 1: Set Up a Virtual Cloud Network (VCN)

A VCN is required to house your compute instances and load balancers. To begin, create a new VCN with subnets in different availability domains (ADs) for high availability.

Terraform Configuration (vcn.tf):

resource "oci_core_virtual_network" "vcn" {
  compartment_id = "<compartment_ocid>"
  cidr_block     = "10.0.0.0/16"
  display_name   = "HA-Virtual-Network"
}

resource "oci_core_subnet" "subnet1" {
  compartment_id      = "<compartment_ocid>"
  vcn_id              = oci_core_virtual_network.vcn.id
  cidr_block          = "10.0.1.0/24"
  availability_domain = "AD-1"
  display_name        = "HA-Subnet-AD1"
}

resource "oci_core_subnet" "subnet2" {
  compartment_id      = "<compartment_ocid>"
  vcn_id              = oci_core_virtual_network.vcn.id
  cidr_block          = "10.0.2.0/24"
  availability_domain = "AD-2"
  display_name        = "HA-Subnet-AD2"
}

Step 2: Provision Compute Instances

Create two compute instances (one in each subnet) to ensure redundancy.

Terraform Configuration (compute.tf):

resource "oci_core_instance" "instance1" {
  compartment_id = "<compartment_ocid>"
  availability_domain = "AD-1"
  shape = "VM.Standard2.1"
  display_name = "HA-Instance-1"
  
  create_vnic_details {
    subnet_id = oci_core_subnet.subnet1.id
    assign_public_ip = true
  }

  source_details {
    source_type = "image"
    source_id = "<image_ocid>"
  }
}

resource "oci_core_instance" "instance2" {
  compartment_id = "<compartment_ocid>"
  availability_domain = "AD-2"
  shape = "VM.Standard2.1"
  display_name = "HA-Instance-2"
  
  create_vnic_details {
    subnet_id = oci_core_subnet.subnet2.id
    assign_public_ip = true
  }

  source_details {
    source_type = "image"
    source_id = "<image_ocid>"
  }
}

Step 3: Set Up the OCI Load Balancer

Now, configure the OCI Load Balancer to distribute traffic between the compute instances in both availability domains.

Terraform Configuration (load_balancer.tf):

resource "oci_load_balancer_load_balancer" "ha_lb" {
  compartment_id = "<compartment_ocid>"
  display_name   = "HA-Load-Balancer"
  shape           = "100Mbps"

  subnet_ids = [
    oci_core_subnet.subnet1.id,
    oci_core_subnet.subnet2.id
  ]

  backend_sets {
    name = "backend-set-1"

    backends {
      ip_address = oci_core_instance.instance1.private_ip
      port = 80
    }

    backends {
      ip_address = oci_core_instance.instance2.private_ip
      port = 80
    }

    policy = "ROUND_ROBIN"
    health_checker {
      port = 80
      protocol = "HTTP"
      url_path = "/health"
      retries = 3
      timeout_in_seconds = 10
      interval_in_seconds = 5
    }
  }
}

resource "oci_load_balancer_listener" "ha_listener" {
  load_balancer_id = oci_load_balancer_load_balancer.ha_lb.id
  name = "http-listener"
  default_backend_set_name = "backend-set-1"
  port = 80
  protocol = "HTTP"
}

Step 4: Set Up Health Checks for High Availability

Health checks are critical to ensure that the load balancer sends traffic only to healthy instances. The health check configuration is included in the backend set definition above, but you can customize it as needed.
Step 5: Testing and Validation

Once all resources are provisioned, test the HA architecture:

Verify Load Balancer Health: Ensure that the backend instances are marked as healthy by checking the load balancer’s health checks.

oci load-balancer backend-set get --load-balancer-id <load_balancer_id> --name backend-set-1
  1. Access the Application: Test accessing your application through the Load Balancer’s public IP. The Load Balancer should evenly distribute traffic across the two compute instances.
  2. Failover Testing: Manually shut down one of the instances to verify that the Load Balancer reroutes traffic to the other instance.