Every day we generate 2.5 quintillion bytes of data. However, only 1% of the data is processed into meaningful information due to lack of compute power. Among which most of it is generated as streaming data from a large number of sources such as social media feeds, IT logs, IoT telemetry data, online gaming, financial transactions, etc.
Streaming data is continuously generated from various sources. The stream processing technology is utilized to process streaming data in real-time. However, It is expensive to set up a large compute capacity and storage facility to process and store the streaming data.
To address this problem Amazon Web Services offers Amazon Kinesis a versatile service as the solution for challenges faced in handling and processing streaming data.
What is Amazon Kinesis?
Amazon Kinesis is a fully managed, scalable service that can ingest, buffer, and process streaming data in real-time.
Amazon Kinesis Services include Kinesis Data Stream, Kinesis Firehose, Kinesis Video Stream, and Kinesis Data Analytics.
1. Amazon Kinesis Data Streams
Kinesis Data Streams is a real-time streaming service that captures gigabytes of data from hundreds and thousands of data sources. Data streams are divided into one or more shards, each of which provides a fixed unit of capacity. Shard is the unit of base throughput for Amazon Kinesis Data Streams.
To create a stream, the number of shards has to be predefined. Each shard can support up to 5 transactions per second for reads, up to a maximum total data read rate of 2 MB per second. And up to 1,000 records per second for writes, up to a maximum total data write rate of 1 MB per second.
The total capacity of the stream is the sum of the capacities of its shards. An increase in the number of shards results in high processing speed and capacity of the data stream. Kinesis Data Streams cannot scale up in real-time if the data throughput is higher than shard capacity.
Kinesis Data Streams High-Level Architecture:
- Real-time performance: Streaming data is available to multiple real-time analytics applications, Amazon S3, and AWS Lambda within 70 milliseconds of the data being collected.
- Data retention: Data can be retained for seven days, the default is 24 hours.
- Secure: Data can be secured at-rest by using server-side encryption, AWS KMS master keys, and by privately accessing your data via Amazon Virtual Private Cloud (VPC).
- Low cost: Kinesis Data Streams has no upfront cost, and you only pay for the resources you use.
2. Amazon Data Firehose
Amazon Data Firehose is a near real-time service that takes care of almost everything that’s needed in capturing, transforming, and storing the data. It is a fully managed service that automatically scales to allow an increase or decrease in data throughput with no administration.
Firehose has a minimum latency of 60 seconds. It can transform and compress the data before loading it. The supported compressed formats are GZIP, ZIP, and Snappy when the target is S3.
Amazon Data Firehose provides effortless ingestion of data to data lakes such as S3, Amazon ElasticSearch, Amazon Redshift, and Splunk.
It allows the conversion of data formats before ingesting into any of the storage services, usually from JSON to Parquet or ORC and that’s only for storage in S3. It doesn’t directly convert CSV to Parquet or other formats, an AWS Lambda function can be triggered to convert CSV to JSON.
Amazon Data Firehose High-Level architecture:
- Easy to use: With just a few clicks Amazon Data Firehose can be set up from AWS management console.
- Pay for only what you use: It costs only for the amount of data that is streamed. It is much cheaper compared to Kinesis Data Stream showing about 60+% saving.
- No ongoing administration: It automatically provisions, scales compute capacity, memory, and network resources required to load the streaming data.
- Serverless data transformation: Process, transform, and compress prior to downloading it to data stores.
3. Amazon Kinesis Data Analytics
Amazon Kinesis Data Analytics is a service for ETL using SQL queries on the streaming data. It analyzes and provides insights in real-time. Kinesis Data Analytics implements Amazon’s state of the art Random Cut Forest for anomaly detection. Another machine learning tool Hotspots locates and returns information about relatively dense regions in the data.
The architecture of Amazon Kinesis Data Analytics:
- Real-time analytics and ML: Powerful real-time analytics for ETL, and built-in real-time anomaly detection using ML.
- No servers to manage: It runs your streaming applications without requiring you to provision or manage any infrastructure.
- Pay only for what you use: pay only for the processing resources that your streaming applications use.
4. Amazon Kinesis video streams
Amazon Kinesis video streaming is a high performance and efficient streaming service for videos with very low latency rates. Videos can be streamed effortlessly in a secure manner from sources like security cameras and be stored in data stores for further analysis.
AWS AI services such as Amazon Rekognition can be utilized to get real-time insights and predictions as well as create meta data of the video streams.
The architecture of Kinesis video streams:
- Amazon Kinesis video stream is serverless and therefore saves the customer from administrative and service management overhead costs.
- Capacity to stream from millions of devices as well as build real-time applications.
Use Case of Amazon Kinesis
The architecture below depicts a use case that implements the Amazon Kinesis Services for real-time video surveillance.
The surveillance footage from the security camera is ingested by Amazon Kinesis Video Streams.
Amazon Rekognition (a service for object and face detection) is implemented to identify faces in real-time from the video footage. Amazon Rekognition can identify known and unknown faces by comparing the faces in video streams against the faces in the S3 bucket. It can also provide the identifying features without the actual footage of faces only in the S3, where privacy is quintessential.
Amazon Kinesis Firehose streams the analyzed data from Amazon Rekognition to S3 bucket. A Lambda Function is invoked to update the face collection database every time a face is processed and stored in the S3 bucket. Simultaneously, The data analyzed by the Amazon Rekognition is written to Kinesis Data Streams which can trigger a Lambda function and invoke SNS to create a notification to the security system. This allows for immediate action in case of any security breach.
This use case demonstrates that integration of Kinesis services allows real-time analysis, monitoring, and notification which is not feasible with batch processing methods.
Amazon Kinesis handles streaming data, performs ETL, and real-time ML with minimal DevOps support. Amazon Kinesis is very flexible and highly reliable which makes it a very good candidate for processing as well as building applications from real-time data.
The transition from batch processing to stream processing is of great value to businesses that rely on time-sensitive data. Amazon Kinesis services empower businesses that rely on real-time applications essential for their business decisions. This saves them from spending time and energy on deploying and managing infrastructure.
Would you like to know more about AWS Kinesis or need any assistance in building a real-time application from streaming data for your business? Contact us at email@example.com.