Monday, 12 March 2018

AWS Athena

Overview
  • Interactive Data Query Service
    • Cannot be used to modify data
  • Allows to query data on S3 
  • Based on Presto
  • Uses schema-on-read

Model
  • Table
    • EXTERNAL (data stored in S3)
  • Partition
    • Up to 20K partitions per table
    • Typically split by date
    • Can use Lambda to automatically create partitions
  • S3 data location
  • Metadata
    • Uses Apache Hive DDL
    • Stored in AWS Glue if availalble in Region
  • SerDe (how to interpret a row)


Query Results
  • Stored in S3
    • Can use KMS for encryption

Pricing
  • Pay for data scanned in S3
    • $5 per TB of data scanned
    • Minimum 10 MB per query
  • Optimizations
    • Compression (gzip)
    • Parititioning
    • Columnar data formats (e.g. Parquet)

References

No comments:

Post a Comment