Deploying Real-Time Feature Store on Amazon SageMaker Feature Store with Amazon Kinesis Data Streams & Amazon DynamoDB for Low-Latency ML Inference

Modern ML inference often depends on up-to-date features (customer behaviour, session counts, recent events) that need to be available in low-latency operations. In this article you’ll learn how to build a real-time feature store on AWS using:

  • Amazon Kinesis Data Streams for streaming events
  • AWS Lambda for processing and feature computation
  • Amazon DynamoDB (or SageMaker Feature Store) for storage of feature vectors
  • Amazon SageMaker Endpoint for low-latency inference
    You’ll see end-to-end code snippets and architecture guidance so you can implement this in your environment.

1. Architecture Overview

The pipeline works like this:

  1. Front-end/app produces events (e.g., user click, transaction) → published to Kinesis.
  2. A Lambda function consumes from Kinesis, computes derived features (for example: rolling window counts, recency, session features).
  3. The Lambda writes/updates these features into a DynamoDB table (or directly into SageMaker Feature Store).
  4. When a request arrives for inference, the application fetches the current feature set from DynamoDB (or Feature Store) and calls a SageMaker endpoint.
  5. Optionally, after inference you can stream feedback events for model refinement.

This architecture provides real-time feature freshness and low-latencyinference.

2. Setup & Implementation

2.1 Create the Kinesis data stream

aws kinesis create-stream \
  --stream-name UserEventsStream \
  --shard-count 2 \
  --region us-east-1

2.2 Create DynamoDB table for features

aws dynamodb create-table \
  --table-name RealTimeFeatures \
  --attribute-definitions AttributeName=userId,AttributeType=S \
  --key-schema AttributeName=userId,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST \
  --region us-east-1

2.3 Lambda function to compute features

Here is a Python snippet (using boto3) which will be triggered by Kinesis:

import json
import boto3
from datetime import datetime, timedelta

dynamo = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamo.Table('RealTimeFeatures')

def lambda_handler(event, context):
    for record in event['Records']:
        payload = json.loads(record['kinesis']['data'])
        user_id = payload['userId']
        event_type = payload['eventType']
        ts = datetime.fromisoformat(payload['timestamp'])

        # Fetch current features
        resp = table.get_item(Key={'userId': user_id})
        item = resp.get('Item', {})
        
        # Derive features: e.g., event_count_last_5min, last_event_type
        last_update = item.get('lastUpdate', ts.isoformat())
        count_5min = item.get('count5min', 0)
        then = datetime.fromisoformat(last_update)
        if ts - then < timedelta(minutes=5):
            count_5min += 1
        else:
            count_5min = 1
        
        # Update feature item
        new_item = {
            'userId': user_id,
            'lastEventType': event_type,
            'count5min': count_5min,
            'lastUpdate': ts.isoformat()
        }
        table.put_item(Item=new_item)
    return {'statusCode': 200}

2.4 Deploy and connect Lambda to Kinesis

  • Create Lambda function in AWS console or via CLI.
  • Add Kinesis stream UserEventsStream as event source with batch size and start position = TRIM_HORIZON.
  • Assign IAM role allowing kinesis:DescribeStream, kinesis:GetRecords, dynamodb:PutItem, etc.

2.5 Prepare SageMaker endpoint for inference

  • Train model offline (outside scope here) with features stored in training dataset matching real-time features.
  • Deploy model as endpoint, e.g., arn:aws:sagemaker:us-east-1:123456789012:endpoint/RealtimeModel.
  • In your application code call endpoint by fetching features from DynamoDB then invoking endpoint:
import boto3
sagemaker = boto3.client('sagemaker-runtime', region_name='us-east-1')
dynamo = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamo.Table('RealTimeFeatures')

def get_prediction(user_id, input_payload):
    resp = table.get_item(Key={'userId': user_id})
    features = resp.get('Item')
    payload = {
        'features': features,
        'input': input_payload
    }
    response = sagemaker.invoke_endpoint(
        EndpointName='RealtimeModel',
        ContentType='application/json',
        Body=json.dumps(payload)
    )
    result = json.loads(response['Body'].read().decode())
    return result

Conclusion

In this blog post you learned how to build a real-time feature store on AWS: streaming event ingestion with Kinesis, real-time feature computation with Lambda, storage in DynamoDB, and serving via SageMaker. You got specific code examples and operational considerations for production readiness. With this setup, you’re well-positioned to deliver low-latency, ML-powered applications.

Enjoy the cloud
Osama

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.