Saturday, 10 March 2018

AWS Redshift (cluster)

Leader Node
  • SQL Endpoint
  • Stores Metadata
  • Query planning and execution
    • Parser
    • Initial Query Tree Input to Optimizer
    • Optimizer
      • Logical Transformation
      • Physical Planning
        • Statistics (System Table) cardinality of the columns
    • Execution Engine
      • Sends to CN

Compute Node
  • Local storage
  • Columnar Storage
    • All the values for column_0, then all the values for column_1, etc.
    • Efficient queries as you typically look at subset of columns only 
  • Execute queries in parallel
  • Load/backup/restore: S3 | EMR | DynamoDB | SSH
  • May talk to additional layer (see Redshift Spectrum)
  • Slice - thread of execution on a node
    • Split into slices: 1 slice per core
      • DW1: 2 on XL, 16 on 8XL
      • DW2: 2 on L, 32 on 8XL
    • Allocated resources: CPU, Memory, Disk
    • Processes query

Slice
  • Virtual compute unit
    • Compare YARN containers
  • Every physical node has multiple
    • Depending on instance size 2-32

Hardware
  • Dense Compute (SSD)
  • Dense Storage (Magnetic)

Disks
  • Locally attached
  • Only ~1/3 is exposed to user data
  • Partitions
    • Local data storage
    • Mirrored data storage (remotely accessed)
      • Redundancy mechanism

Sizing
  • Use >= 2 nodes
    • Data redundancy (mirror)
    • Leader node is give for free

No comments:

Post a Comment