Leader Node
- SQL Endpoint
- Stores Metadata
- Query planning and execution
- Parser
- Initial Query Tree Input to Optimizer
- Optimizer
- Logical Transformation
- Physical Planning
- Statistics (System Table) cardinality of the columns
- Execution Engine
- Sends to CN
Compute Node
- Local storage
- Columnar Storage
- All the values for column_0, then all the values for column_1, etc.
- Efficient queries as you typically look at subset of columns only
- Execute queries in parallel
- Load/backup/restore: S3 | EMR | DynamoDB | SSH
- May talk to additional layer (see Redshift Spectrum)
- Slice - thread of execution on a node
- Split into slices: 1 slice per core
- DW1: 2 on XL, 16 on 8XL
- DW2: 2 on L, 32 on 8XL
- Allocated resources: CPU, Memory, Disk
- Processes query
- Split into slices: 1 slice per core
Slice
- Virtual compute unit
- Compare YARN containers
- Every physical node has multiple
- Depending on instance size 2-32
Hardware
- Dense Compute (SSD)
- Dense Storage (Magnetic)
Disks
- Locally attached
- Only ~1/3 is exposed to user data
- Partitions
- Local data storage
- Mirrored data storage (remotely accessed)
- Redundancy mechanism
Sizing
- Use >= 2 nodes
- Data redundancy (mirror)
- Leader node is give for free
No comments:
Post a Comment