Sunday, 11 March 2018

AWS Kinesis Data Firehose

Overview
  •  Delivers streaming data directly to target (no need to write Consumer application)

Model
  • Delivery Stream - main entity
    • No need specify shards/partition keys
  • Data record - 1000 KB
  • At-least-once semantics - duplicates possible (like SQS)
  • Retention: 24h (if destination is not available)
    • Retries are automatic

Source
  • Direct PUT
    • API
    • AWS IoT
    • CloudWatch Logs (Subscription)
    • CloudWatch Events (Rules)
    • Amazon Kinesis Agent
      • Monitors files and sends records to Kinesis Firehose
      • Handles file rotation, checkpointing
      • Similiar to CloudWatch Agent (Logs)
      • Also works with Kinesis streams
  • Kinesis Data Streams

Destination
  • S3 bucket
    • records are concatenated into larger objects
    • compression: gzip, zip, snappy
    • needs IAM role
    • Supports encryption (SSE-KMS)
  • Redshift table
    • uses intermediate S3 bucket
    • issues COPY command continuously
      • no error-tolerance
      • skipped objects are written to manifest file (S3)
    • Compression: gzip
  • ElasticSearch
  • Splunk

Data transformation
  • Invoke Lambda function on every record
  • Source record backup possible

Buffer
  • Size (1MB-128MB)
  • Time (1-15 minutes)
  • Buffer may be raised if delivery falls behind

No comments:

Post a Comment