Notes on AWS, Big Data, Machine Learning and Leadership: AWS Athena

Monday, 12 March 2018

AWS Athena

Overview

Interactive Data Query Service
- Cannot be used to modify data
Allows to query data on S3
Based on Presto
Uses schema-on-read

Model

Table
- EXTERNAL (data stored in S3)
Partition
- Up to 20K partitions per table
- Typically split by date
- Can use Lambda to automatically create partitions
S3 data location
Metadata
- Uses Apache Hive DDL
- Stored in AWS Glue if availalble in Region
SerDe (how to interpret a row)

Query Results

Stored in S3
- Can use KMS for encryption

Pricing

Pay for data scanned in S3
- $5 per TB of data scanned
- Minimum 10 MB per query
Optimizations
- Compression (gzip)
- Parititioning
- Columnar data formats (e.g. Parquet)

References

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)