Saturday, 10 March 2018

AWS Kinesis Data Analytics

Overview
  • Managed stream consumer 
    • Source: Kinesis Data Stream or Kinesis Firehose
  • Processes using SQL code (specified by user)
  • Serverless
  • Use cases
    • Real-time reactive analytics ("needle in the haystack")
    • Aggregation upfront to reduce stress on downstream database

Source
  • Kinesis Data Stream
  • Kinesis Firehose
  • Continously read

Process
  • STREAM
    • In-application stream (in-memory object)
      • Materialized view of the stream
    • Similar to a table but data flows continously
    • Read -> deserialize -> map to stream schema
      • -> in-application error stream when cannot map
    • multiple streams can be JOINed
  • PUMP
    • Continous "Select" statement 


Output
  • External destination
    • Kinesis data stream
    • Kinesis data firehose
    • Lambda
  • At-least one delivery (checkpointing)


Windowing
  • See Stream Processing (Windowing)
  • Ingest time
    • exposed as ApproximateArrivalTimeStamp
  • Processing time
    • exposed as  ROWTIME

References

No comments:

Post a Comment