Modern ML inference often depends on up-to-date features (customer behaviour, session counts, recent events) that need to be available in low-latency operations. In this article you’ll learn how to build a real-time feature store on AWS using:
- Amazon Kinesis Data Streams for streaming events
- AWS Lambda for processing and feature computation
- Amazon DynamoDB (or SageMaker Feature Store) for storage of feature vectors
- Amazon SageMaker Endpoint for low-latency inference
You’ll see end-to-end code snippets and architecture guidance so you can implement this in your environment.
1. Architecture Overview
The pipeline works like this:
- Front-end/app produces events (e.g., user click, transaction) → published to Kinesis.
- A Lambda function consumes from Kinesis, computes derived features (for example: rolling window counts, recency, session features).
- The Lambda writes/updates these features into a DynamoDB table (or directly into SageMaker Feature Store).
- When a request arrives for inference, the application fetches the current feature set from DynamoDB (or Feature Store) and calls a SageMaker endpoint.
- Optionally, after inference you can stream feedback events for model refinement.
This architecture provides real-time feature freshness and low-latencyinference.
2. Setup & Implementation
2.1 Create the Kinesis data stream
aws kinesis create-stream \
--stream-name UserEventsStream \
--shard-count 2 \
--region us-east-1
2.2 Create DynamoDB table for features
aws dynamodb create-table \
--table-name RealTimeFeatures \
--attribute-definitions AttributeName=userId,AttributeType=S \
--key-schema AttributeName=userId,KeyType=HASH \
--billing-mode PAY_PER_REQUEST \
--region us-east-1
2.3 Lambda function to compute features
Here is a Python snippet (using boto3) which will be triggered by Kinesis:
import json
import boto3
from datetime import datetime, timedelta
dynamo = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamo.Table('RealTimeFeatures')
def lambda_handler(event, context):
for record in event['Records']:
payload = json.loads(record['kinesis']['data'])
user_id = payload['userId']
event_type = payload['eventType']
ts = datetime.fromisoformat(payload['timestamp'])
# Fetch current features
resp = table.get_item(Key={'userId': user_id})
item = resp.get('Item', {})
# Derive features: e.g., event_count_last_5min, last_event_type
last_update = item.get('lastUpdate', ts.isoformat())
count_5min = item.get('count5min', 0)
then = datetime.fromisoformat(last_update)
if ts - then < timedelta(minutes=5):
count_5min += 1
else:
count_5min = 1
# Update feature item
new_item = {
'userId': user_id,
'lastEventType': event_type,
'count5min': count_5min,
'lastUpdate': ts.isoformat()
}
table.put_item(Item=new_item)
return {'statusCode': 200}
2.4 Deploy and connect Lambda to Kinesis
- Create Lambda function in AWS console or via CLI.
- Add Kinesis stream
UserEventsStreamas event source with batch size and start position = TRIM_HORIZON. - Assign IAM role allowing
kinesis:DescribeStream,kinesis:GetRecords,dynamodb:PutItem, etc.
2.5 Prepare SageMaker endpoint for inference
- Train model offline (outside scope here) with features stored in training dataset matching real-time features.
- Deploy model as endpoint, e.g.,
arn:aws:sagemaker:us-east-1:123456789012:endpoint/RealtimeModel. - In your application code call endpoint by fetching features from DynamoDB then invoking endpoint:
import boto3
sagemaker = boto3.client('sagemaker-runtime', region_name='us-east-1')
dynamo = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamo.Table('RealTimeFeatures')
def get_prediction(user_id, input_payload):
resp = table.get_item(Key={'userId': user_id})
features = resp.get('Item')
payload = {
'features': features,
'input': input_payload
}
response = sagemaker.invoke_endpoint(
EndpointName='RealtimeModel',
ContentType='application/json',
Body=json.dumps(payload)
)
result = json.loads(response['Body'].read().decode())
return result
Conclusion
In this blog post you learned how to build a real-time feature store on AWS: streaming event ingestion with Kinesis, real-time feature computation with Lambda, storage in DynamoDB, and serving via SageMaker. You got specific code examples and operational considerations for production readiness. With this setup, you’re well-positioned to deliver low-latency, ML-powered applications.
Enjoy the cloud
Osama