Amazon Kinesis for data collection and analysis
With Amazon Kinesis, you:
- Collect, process, and analyze data streams in real time. Kinesis has the capacity to process streaming data at any scale. It provides you the flexibility to choose the tools that best suit the requirements of your application in a cost-effective way.
- Ingest real-time data such as video, audio, application logs, website clickstreams, and Internet of Things (IoT) telemetry data. The ingested data can be used for machine learning, analytics, and other applications.
- Can process and analyze data as it arrives, and respond instantly. You don’t have to wait until all data is collected before the processing begins.
Amazon Kinesis Data Streams
To get started using Amazon Kinesis Data Streams, create a stream and specify the number of shards. Each shard is a unit of read and write capacity. Each shard can read up to 1 MB of data per second and write at a rate of 2 MB per second. The total capacity of a stream is the sum of the capacities of its shards. Increase or decrease the number of shards in a stream as needed. Data being written is in the form of a record, which can be up to 1 MB in size.
- Producers write data into the stream. A producer might be an Amazon EC2 instance, a mobile client, an on-premises server, or an IoT device.
- Consumers receive the streaming data that the producers generate. A consumer might be an application running on an EC2 instance or AWS Lambda. If it’s on an Amazon EC2 instance, the application will need to scale as the amount of streaming data increases. If this is the case, run it in an Auto Scaling group.
- Each consumer reads from a particular shard. There might be more than one application processing the same data.
- Another way to write a consumer application is to use AWS Lambda, which lets you run code without having to provision or manage servers.
- The results of the consumer applications can be stored by AWS services such as Amazon S3, Amazon DynamoDB, and Amazon RedShift.
Amazon Kinesis Data Firehose
Amazon Kinesis Data Firehose starts to process data in near-real time. Kinesis Data Firehose can send records to Amazon S3, Amazon Redshift, Amazon Elasticsearch Service (ES), and any HTTP endpoint owned by you. It can also send records to any of your third-party service providers, including Datadog, New Relic, and Splunk.
Regards
Osama