Building a Secure Data Pipeline with OCI Data Flow and OCI Data Integration

Setting Up OCI Data Flow

Creating a Data Flow Application:

oci data-flow application create --compartment-id <compartment_OCID> --display-name "MyDataFlowApp" --image-id <image_OCID> --description "Data processing application"

Creating a Data Flow Run:

oci data-flow run create --application-id <application_OCID> --display-name "MyDataFlowRun" --compartment-id <compartment_OCID> --arguments '{"input":"<input_data_location>", "output":"<output_data_location>"}' --wait-for-state SUCCEEDED

Setting Up OCI Data Integration

  • Creating a Data Integration Task:
    • Go to Data IntegrationData TasksCreate Task.
    • Define your task type (e.g., Copy Data, Data Mapping) and configure source and target data stores.
  • Setting Up Data Flows:
  • Define and configure data flows that transform and move data between different sources and targets.
  • Example: Copy data from an OCI Object Storage bucket to a database
  • Securing Your Data Pipeline
  • Data Encryption:
    • At Rest: Ensure data stored in OCI Object Storage is encrypted using server-side encryption.
    • In Transit: Use HTTPS for secure data transfers between services.
  • Access Control:
    • Configure IAM policies to restrict access to data sources and pipelines.
    • Example IAM Policy:
allow group <group_name> to manage data-integrations in compartment <compartment_OCID>

Network Security:

  • Use VCNs and subnets to isolate data processing environments.
  • Example: Set up a private endpoint for data flow applications.

Monitoring and Managing Data Pipelines

Monitoring Data Flow Runs:

oci data-flow run list --compartment-id <compartment_OCID> --application-id <application_OCID>

Setting Up Alarms:

  • Use OCI Monitoring to create alarms based on metrics from data flows and integration tasks.

Example Alarm:

oci monitoring alarm create --compartment-id <compartment_OCID> --display-name "HighErrorRate" --metric-compartment-id <compartment_OCID> --metric-name "error_rate" --threshold 5 --comparison "<" --enabled true

putting in place a safe data pipeline that uses OCI Data Integration to import log data into an OCI Autonomous Database, OCI Data Flow to process the log data, and OCI Object Storage bucket to modify it. To protect the security and integrity of the data, the pipeline has access controls, encryption, and monitoring.

Thank you
Osama

Automating Cloud Infrastructure Management with OCI Resource Manager

Setting Up OCI Resource Manager

Creating a Stack:

  • Log in to the OCI Console.
  • Navigate to Resource ManagerStacksCreate Stack.
  • Upload your Terraform configuration file.

Example Terraform Configuration:

provider "oci" {
region = "us-ashburn-1"
}

resource "oci_core_instance" "my_instance" {
availability_domain = "AD-1"
compartment_id = "<compartment_OCID>"
shape = "VM.Standard2.1"
display_name = "MyInstance"
image_id = "<image_OCID>"
subnet_id = "<subnet_OCID>"

source_details {
source_type = "image"
image_id = "<image_OCID>"
}

metadata = {
ssh_authorized_keys = file("~/.ssh/id_rsa.pub")
}
}

Deploying Infrastructure with Resource Manager

Creating a Job:

oci resource-manager stack create-job --stack-id <stack_OCID> --display-name "MyDeploymentJob" --operation-type APPLY

Monitoring Deployment:

oci resource-manager job list --stack-id <stack_OCID>

Managing and Updating Infrastructure

  • Updating a Stack:
    • Modify the Terraform configuration file.
    • Navigate to Resource ManagerStacksUpdate Stack.
    • Upload the updated Terraform configuration file and apply changes.

Destroying Infrastructure:

oci resource-manager stack create-job --stack-id <stack_OCID> --display-name "DestroyJob" --operation-type DESTROY

Integrating with CI/CD Pipelines

Example Integration with GitHub Actions:

name: Deploy to OCI

on:
push:
branches:
- main

jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2

- name: Set up Terraform
uses: hashicorp/setup-terraform@v1

- name: Terraform Init
run: terraform init

- name: Terraform Apply
run: terraform apply -auto-approve
env:
OCI_REGION: ${{ secrets.OCI_REGION }}
OCI_TENANCY_OCID: ${{ secrets.OCI_TENANCY_OCID }}
OCI_USER_OCID: ${{ secrets.OCI_USER_OCID }}
OCI_FINGERPRINT: ${{ secrets.OCI_FINGERPRINT }}
OCI_PRIVATE_KEY_PATH: ${{ secrets.OCI_PRIVATE_KEY_PATH }}
OCI_PRIVATE_KEY_PASSPHRASE: ${{ secrets.OCI_PRIVATE_KEY_PASSPHRASE }}

Thank you

Osama

Implementing Serverless Computing with Oracle Functions on OCI

Setting Up Oracle Functions

Installing Oracle Functions CLI:

fn update context oracle.compartment-id <compartment_OCID>

Creating and Deploying Functions

Creating a Function:

fn init --runtime <runtime> myfunction

Deploying Function to OCI:

fn -v deploy --app myapp

Integrating Functions with OCI Services

Triggering Functions from OCI Events:

fn create trigger myapp mytrigger --type oci --config <config_file>

Using Functions with OCI Object Storage:

fn invoke myapp myfunction --path /etc/config.json

Monitoring and Scaling Functions

Monitoring Function Execution:

fn inspect myapp myfunction

Scaling Functions Automatically:

fn config function myfunction --min-instances 1 --max-instances 10

Thank you

Osama

Configuring and Scaling Kubernetes Applications with Oracle Kubernetes Engine (OKE) in OCI

Overview of Kubernetes and its benefits for container orchestration.

Introduction to Oracle Kubernetes Engine (OKE) in OCI.

Creating an OKE Cluster

oci ce cluster create --compartment-id <compartment_OCID> --name "MyCluster" --kubernetes-version <version> --wait-for-state ACTIVE

Managing Node Pools

  • Adding Node Pool
oci ce node-pool create --compartment-id <compartment_OCID> --cluster-id <cluster_OCID> --name "MyNodePool" --node-image-name "<image_name>" --node-shape "<shape>" --node-pool-lifecycle-state ACTIVE

Scaling Node Pool:

oci ce node-pool update --node-pool-id <node_pool_OCID> --quantity <new_quantity>

Deploying Applications

Deploying Application with kubectl:

kubectl create deployment my-app --image=<docker_image>

Configuring Ingress and Load Balancing

Creating Ingress Controller:

kubectl apply -f ingress-controller.yaml

Exposing Service with LoadBalancer:

kubectl expose deployment my-app --type=LoadBalancer --port=80 --target-port=8080

Implementing Secure Networking with OCI Network Security Groups (NSGs) Using CLI

Introduction

  • Overview of OCI NSGs for network security policies.

Step-by-Step Guide

  1. Creating NSGs
oci network nsg create --compartment-id <compartment_OCID> --display-name "MyNSG" --wai

Defining Ingress and Egress Rules

Adding Ingress Rule:

oci network nsg rules add --nsg-id <NSG_OCID> --direction INGRESS --protocol tcp --source <CIDR_block> --source-type CIDR_BLOCK --destination-port-range 22

Adding Egress Rule:

oci network nsg rules add --nsg-id <NSG_OCID> --direction EGRESS --protocol tcp --destination <CIDR_block> --destination-type CIDR_BLOCK --destination-port-range 80

Applying NSGs to Resources

Applying NSG to VCN:

oci network vcn update --vcn-id <VCN_OCID> --nsg-ids <NSG_OCID>

Securing a web application deployment on OCI by configuring NSGs to allow specific inbound and outbound traffic flows between instances and the internet, enhancing network security posture.

Thank you

Osama

Configuring High-Availability Storage with OCI Block Volumes

Creating Block Volumes

oci bv volume create --availability-domain "<AD>" --compartment-id <compartment_OCID> --display-name "MyVolume" --size-in-gbs 50 --wait-for-state AVAILABLE

Attaching Volumes to Instances

oci compute volume-attachment attach --instance-id <instance_OCID> --volume-id <volume_OCID> --wait-for-state ATTACHED

Snapshot Management

Creating Snapshot:

oci bv snapshot create --volume-id <volume_OCID> --display-name "MySnapshot" --wait-for-state AVAILABLE

Restoring Snapshot:

oci bv volume restore --volume-id <volume_OCID> --snapshot-id <snapshot_OCID> --wait-for-state RESTORED

Thank you

Osama

How to setup the OCI CLI

Setting up the OCI CLI (Command Line Interface) involves several steps to authenticate, configure, and start using it effectively. Here’s a detailed guide to help you set up OCI CLI.

Step 1: Prerequisites

  1. OCI Account: Ensure you have an Oracle Cloud Infrastructure account.
  2. Access: Make sure you have appropriate permissions to create and manage resources.
  3. Operating System: OCI CLI supports Windows, macOS, and Linux distributions.

Step 2: Install OCI CLI

Install Python: OCI CLI requires Python 3.5 or later. Install Python if it’s not already installed:

On Linux:

sudo apt update
sudo apt install python3

On macOS:
Install via Homebrew:

brew install python3
  • On Windows: Download and install Python from python.org.

Install OCI CLI: Use pip, Python’s package installer, to install OCI CLI:

pip3 install oci-cli

Step 3: Configure OCI CLI

  1. Generate API Signing Keys: OCI CLI uses API signing keys for authentication. If you haven’t created keys yet, generate them through the OCI Console:
    • Go to IdentityUsers.
    • Select your user.
    • Under Resources, click on API Keys.
    • Generate a new key pair if none exists.

Configure OCI CLI: After installing OCI CLI, configure it with your tenancy, user details, and API key:

  • Open a terminal or command prompt.
  • Run the following command:
oci setup config
  • Enter a location for your config file: Choose a path where OCI CLI configuration will be stored (default is ~/.oci/config).
  • Enter a user OCID: Enter your user OCID (Oracle Cloud Identifier).
  • Enter a tenancy OCID: Enter your tenancy OCID.
  • Enter a region name: Choose the OCI region where your resources are located (e.g., us-ashburn-1).
  • Do you want to generate a new API Signing RSA key pair?: If you haven’t generated API keys, choose yes and follow the prompts.

Once configured, OCI CLI will create a configuration file (config) and a key file (oci_api_key.pem) in the specified location.

Thank you

Osama

Exploring Oracle Cloud Infrastructure (OCI)

In today’s rapidly evolving digital landscape, choosing the right cloud infrastructure is crucial for organizations aiming to scale, secure, and innovate efficiently. Oracle Cloud Infrastructure (OCI) stands out as a robust platform offering a comprehensive suite of cloud services tailored for enterprise-grade performance and reliability.

1. Overview of OCI: Oracle Cloud Infrastructure (OCI) provides a highly scalable and secure cloud computing platform designed to meet the needs of both traditional enterprise workloads and modern cloud-native applications. Key components include:

  • Compute Services: OCI offers Virtual Machines (VMs) for general-purpose and high-performance computing, Bare Metal instances for demanding workloads, and Container Engine for Kubernetes clusters.
  • Storage Solutions: Includes Block Volumes for persistent storage, Object Storage for scalable and durable data storage, and File Storage for file-based workloads.
  • Networking Capabilities: Virtual Cloud Network (VCN) enables customizable network topologies with VPN and FastConnect for secure and high-bandwidth connectivity. Load Balancer distributes incoming traffic across multiple instances.
  • Database Options: Features Autonomous Database for self-driving, self-securing, and self-repairing databases, MySQL Database Service for fully managed MySQL databases, and Exadata Cloud Service for high-performance databases.

Example 2: Implementing Autonomous Database

Autonomous Database handles routine tasks like patching, backups, and updates automatically, allowing the IT team to focus on enhancing customer experiences.

Security and Compliance: OCI provides robust security features such as Identity and Access Management (IAM) for centralized control over access policies, Security Zones for isolating critical workloads, and Web Application Firewall (WAF) for protecting web applications from threats.

Management and Monitoring: OCI’s Management Tools offer comprehensive monitoring, logging, and resource management capabilities. With tools like Oracle Cloud Infrastructure Monitoring and Logging, organizations gain insights into performance metrics and operational logs, ensuring proactive management and troubleshooting.

Integration and Developer Tools: For seamless integration, OCI offers Oracle Integration Cloud and API Gateway, enabling organizations to connect applications and services securely across different environments. Developer Tools like Oracle Cloud Developer Tools and SDKs support agile development and deployment practices.

Oracle Cloud Infrastructure (OCI) emerges as a robust solution for enterprises seeking a secure, scalable, and high-performance cloud platform. Whether it’s deploying mission-critical applications, managing large-scale databases, or ensuring compliance and security, OCI offers the tools and capabilities to drive innovation and business growth.

AWS Data migration tools

AWS offers a wide variety of services and Partner tools to help you migrate your data sets, whether they are files, databases, machine images, block volumes, or even tape backups.

AWS Storage Gateway

AWS Storage Gateway is a service that gives your applications seamless and secure integration between on-premises environments and AWS storage.

It provides you low-latency access to cloud data with a Storage Gateway appliance.

Storage Gateway types

Choose a Storage Gateway type that is the best fit for your workload.

  • Amazon s3 file Gateway
  • Amazon FSx file Gateway
  • Tape Gateway
  • Volume Gateway

The Storage Gateway Appliance supports the following protocols to connect to your local data:

  • NFS or SMB for files
  • iSCSI for volumes
  • iSCSI VTL for tapes

Your storage gateway appliance runs in one of four modes: Amazon S3 File Gateway, Amazon FSx File Gateway, Tape Gateway, or Volume Gateway.

Data moved to AWS using Storage Gateway can be sent to the following destinations through the Storage Gateway managed service:

  • Amazon S3 (Amazon S3 File Gateway, Tape Gateway)
  • Amazon S3 Glacier (Amazon S3 File Gateway, Tape Gateway)
  • Amazon FSx for Windows File Server (Amazon FSx File Gateway)
  • Amazon EBS (Volume Gateway)

AWS Datasync

Manual tasks related to data transfers can slow down migrations and burden IT operations. DataSync facilitates moving large amounts of data between on-premises storage and Amazon S3 and Amazon EFS, or FSx for Windows File Server. By default, data is encrypted in transit using Transport Layer Security (TLS) 1.2. DataSync automatically handles scripting copy jobs, scheduling and monitoring transfers, validating data, and optimizing network usage. 

Reduce on-premises storage infrastructure by shifting SMB-based data stores and content repositories from file servers and NAS arrays to Amazon S3 and Amazon EFS for analytics.

DataSync deploys as a single software agent that can connect to multiple shared file systems and run multiple tasks. The software agent is typically deployed on premises through a virtual machine to handle the transfer of data over the wide area network (WAN) to AWS. On the AWS side, the agent connects to the DataSync service infrastructure. Because DataSync is a service, there is no infrastructure for customers to set up or maintain in the cloud. DataSync configuration is managed directly from the console.

AWS Snow Family service models

The AWS Snow Family helps customers that need to run operations in austere, non-data center environments and in locations where there’s lack of consistent network connectivity. The AWS Snow Family, comprised of AWS Snowcone, AWS Snowball, and AWS Snowmobile, offers several physical devices and capacity points.

You can check my blog post here about the model https://osamaoracle.com/2023/01/28/aws-snow-family-members/

Regards

Osama

AWS CLOUD STORAGE OVERVIEW

There are three types of cloud storage: object, file, and block. Each storage option has a unique combination of performance, durability, cost, and interface.

  • Block storage – Enterprise applications like databases or enterprise resource planning (ERP) systems often require dedicated, low-latency storage for each host. This is similar to direct-attached storage (DAS) or a Storage Area Network (SAN). Block-based cloud storage solutions like Amazon Elastic Block Store (Amazon EBS) are provisioned with each virtual server and offer the ultra-low latency required for high-performance workloads.
  • File storage – Many applications must access shared files and require a file system. This type of storage is often supported with a Network Attached Storage (NAS) server. File storage solutions like Amazon Elastic File System (Amazon EFS) are ideal for use cases such as large content repositories, development environments, media stores, or user home directories.
  • Object storage – Applications developed in the cloud need the vast scalability and metadata of object storage. Object storage solutions like Amazon Simple Storage Service (Amazon S3) are ideal for building modern applications. Amazon S3 provides scale and flexibility. You can use it to import existing data stores for analytics, backup, or archive.

AWS provides you with services for your block, file and object storage needs. Select each hotspot in the image to see what services are available for you to explore to build solutions.

Amazon S3 use cases

  • Backup and restore.
  • Data Lake for analytics.
  • Media storage
  • Static website.
  • Archiving

Buckets and objects

Amazon S3 stores data as objects within buckets. An object is composed of a file and any metadata that describes that file.  The diagram below contains a URL comprised of a bucket and an object key. The object key is the unique identifier of an object in a bucket. The combination of a bucket, key, and version ID uniquely identifies each object. The object is uniquely addressed through the combination of the web service endpoint, bucket name, key, and optionally, a version. 

To store an object in Amazon S3, upload the file into a bucket. When you upload a file, you can set permissions on the object and add metadata. You can have one or more buckets in your account. For each bucket, you control who can create, delete, and list objects in the bucket.

Amazon S3 access control

By default, all Amazon S3 resources—buckets, objects, and related resources (for example, lifecycle configuration and website configuration)—are private. Only the resource owner, an AWS account that created it, can access the resource. The resource owner can grant access permissions to others by writing access policies. 

AWS provides several different tools to help developers configure buckets for a wide variety of workloads. 

  • Most Amazon S3 use cases do not require public access. 
  • Amazon S3 usually stores data from other applications. Public access is not recommended for these types of buckets. 
  • Amazon S3 includes a block public access feature. This acts as an additional layer of protection to prevent accidental exposure of customer data. 

Amazon S3 Event Notifications

Amazon S3 event notifications enable you to receive notifications when certain object events happen in your bucket. Here is an example of an event notification workflow to convert images to thumbnails. To learn more, select each of the three hotspots in the diagram below.

Amazon S3 cost factors and best practices

Cost is an important part of choosing the right Amazon S3 storage solution. Some of the Amazon S3 cost factors to consider include the following:

  • Storage – Per-gigabyte cost to hold your objects. You pay for storing objects in your S3 buckets. The rate you’re charged depends on your objects’ size, how long you stored the objects during the month, and the storage class. There are per-request ingest charges when using PUT, COPY, or lifecycle rules to move data into any S3 storage class.
  • Requests and retrievals – The number of API calls: PUT and GET requests. You pay for requests made against your S3 buckets and objects. S3 request costs are based on the request type, and are charged on the quantity of requests. When you use the Amazon S3 console to browse your storage, you incur charges for GET, LIST, and other requests that are made to facilitate browsing.
  • Data transfer – Usually no transfer fee for data-in from the internet and, depending on the requestor location and medium of data transfer, different charges for data-out. 
  • Management and analytics – You pay for the storage management features and analytics that are enabled on your account’s buckets. These features are not discussed in detail in this course.

S3 Replication and S3 Versioning can have a big impact on your AWS bill. These services both create multiple copies of your objects and you pay for each PUT request in addition to the storage tier charge. S3 Cross-Region Replication also requires data transfer between AWS Regions.

Shared file systems

Using a fully managed cloud shared file system solution removes complexities, reduces costs, and simplifies management. To learn more about shared file systems, select each hotspot in the image below.

Amazon Elastic File System (EFS) 

Amazon EFS provides a scalable, elastic file system for Linux-based workloads for use with AWS Cloud services and on-premises resources. 

You’re able to access your file system across Availability Zones, AWS Regions, and VPCs while sharing files between thousands of EC2 instances and on-premises servers through AWS Direct Connect or AWS VPN. 

You can create a file system, mount the file system on an Amazon EC2 instance, and then read and write data to and from your file system. 

Amazon EFS provides a shared, persistent layer that allows stateful applications to elastically scale up and down. Examples include DevOps, web serving, web content systems, media processing, machine learning, analytics, search index, and stateful microservices applications. Amazon EFS can support a petabyte-scale file system, and the throughput of the file system also scales with the capacity of the file system.

Because Amazon EFS is serverless, you don’t need to provision or manage the infrastructure or capacity. Amazon EFS file systems can be shared with up to tens of thousands of concurrent clients, no matter the type. These could be traditional EC2 instances, containers running in one of your self-managed clusters or in one of the AWS container services, Amazon ECS, Amazon EKS, and Fargate, or in a serverless function running in Lambda.

Use Amazon EFS to lower your total cost of ownership for shared file storage. Choose Amazon EFS One Zone for data that does not require replication across multiple Availability Zones and save on storage costs. Amazon EFS Standard-Infrequent Access (EFS Standard-IA) and Amazon EFS One Zone-Infrequent Access (EFS One Zone-IA) are storage classes that provide price/performance that is cost-optimized for files not accessed every day.

Use Amazon EFS scaling and automation to save on management costs, and pay only for what you use.

Amazon FSx

With Amazon FSx, you can quickly launch and run feature-rich and high-performing file systems. The service provides you with four file systems to choose from. This choice is based on your familiarity with a given file system or by matching the feature sets, performance profiles, and data management capabilities to your needs.

Amazon FSx for Windows File Server

FSx for Windows File Server provides fully managed Microsoft Windows file servers that are backed by a native Windows file system. Built on Windows Server, Amazon FSx delivers a wide range of administrative features such as data deduplication, end-user file restore, and Microsoft Active Directory.

Amazon FSx for Lustre (FSx for Lustre) 

FSx for Lustre is a fully managed service that provides high-performance, cost-effective storage. FSx for Lustre is compatible with the most popular Linux-based AMIs, including Amazon Linux, Amazon Linux 2, Red Hat Enterprise Linux (RHEL), CentOS, SUSE Linux, and Ubuntu.

Amazon FSx for NETapp ONTAP

FSx for NETapp ONTAP provides fully managed shared storage in the AWS Cloud with the popular data access and management capabilities of ONTAP.

Amazon FSx for OpenZFS

Where the road leads, I will go. Along the stark desert, across the wide plains, into the deep forests I will follow the call of the world and embrace its ferocious beauty.

Regards

Osama