Overview
- Interactive Data Query Service
- Cannot be used to modify data
- Allows to query data on S3
- Based on Presto
- Uses schema-on-read
Model
- Table
- EXTERNAL (data stored in S3)
- Partition
- Up to 20K partitions per table
- Typically split by date
- Can use Lambda to automatically create partitions
- S3 data location
- Metadata
- Uses Apache Hive DDL
- Stored in AWS Glue if availalble in Region
- SerDe (how to interpret a row)
Query Results
- Stored in S3
- Can use KMS for encryption
Pricing
- Pay for data scanned in S3
- $5 per TB of data scanned
- Minimum 10 MB per query
- Optimizations
- Compression (gzip)
- Parititioning
- Columnar data formats (e.g. Parquet)
References
No comments:
Post a Comment