Saturday, 3 February 2018

AWS DynamoDB

Overview
  • NoSQL database
  • Loosely based on original "Dynamo"
  • Single digit millisecond latency
  • Tier 0 - required by EC2 and many other AWS services
  • Replicated in 3 AZs

Table 
  • ~table in RDBMS
  • name must be unique in the region (but not globally)
  • Contains Items

Item
  • ~row in RDBMS
  • collection of attributes
    • not all items must have the same attributes
  • has Primary Key
  • max 400KB (total: attribute names, attribute values, etc.)
  • Entire item is always read (see GSI for look-up indexes)
  • Item Collection 
    • All items in the table that have the same Partition Key
    • Must not exceed 10GB (partition limit)

Primary Key 
  • Special attribute(s) on the item
  • Hash Key
    • aka "Partition Key", "Simple Key"
    • Size: 1 - 2048 bytes
    • Single attribute (e.g. UserId)
    • Ordering
      • Unordered hash index
      • Key-Value queries: return item with hash attribute = X
  • Hash and Range Key 
    • aka "Partition and Sort Key", "Composite Key"
    • Range Key size: 1 -1024 bytes
    • Two attributes (e.g. ForumName = hash attribute, Subject = range attribute)
    • Models 1:N relationships
    • Rich queries on Range Key: "begins with", "between", "top N", etc.
    • Ordering
      • Unordered hash index on hash attribute
      • Sorted range index on the range attribute
        • Range queries: return rows with hash attribute = X and range attribute > Y

 Attribute
  • name-value pair
  • values
    • Scalar Types
      • String
        • UTF-8 binary encoding
      • Number
        • 38 digits precision
        • Sent as String but treated as numbers for math operations
      • Binary
      • Boolean
      • "Null"
    • Collections (can mix types, e.g. String, Number)
      • Lists
        • "FavoriteThings: ["Cookies", "Coffee", 3.14159]"
        • DocumentPath: e.g. ThisList[5][11]
      • Maps
        • DocumentPath: e.g. MyMap.nestedField.deeplyNestedField
    • Set Types
      • String Set
      • Number Set
      • Binary Set
      • Empty Sets not supported
    • JSON document
      • Secondary Index can be used to index top-level fields in document

TTL
  • One attribute is designated as TTL
    • Timestamp in epoch time
  • Defines expiry time
  • Use cases
    • Garbage collection of unnecessary items (costs, compliance)

Local Secondary Index
  • Max 5 per table
  • Colocated on the same node as data partition (consistent)
    • Same Partition Key (Hash) but different Sort Key
  • PK of LSI must be a composite key
  • Query on Partition Key + Alternative Sort Key
  • Total Size of all indexed items <= 10GB per Partion Key
    • This is restriction for the Partition = 10GB
  • Must be added when creating table
  • Eventual or Strong consistency
  • Any attributes on the item can be accessed
  • Provisioned Throughput
    • Draw from the Throughput of the table (read/write)

Global Secondary Index (GSI)
  • Max 5 per table 
  • Different Partition Key and/or Sort Key allowed
    • PK of GSI may be simple key
  • Can be added after table is created
  • Eventual consistency
    • Derivative table created under the hood
  • Only projected attributes can be accessed
  • Can model LSI (i.e. GSI is more generic)
    • Use instead of LSI when data size > 10GB
  • Provisioned Throughput
    • independent RCU/WCU
  • Use Cases for the same Primary Key
    • Subset of attributes are projected
      • No need to read the entire item for quick look-up
    • Eventually consistent read replicas
      • Each may have different read throughput
      • Applications do not interfere with each other

Projection 
  • Attributes that are copied (projected) into index
    • In addition to primary key and index key attributes (automatically projected)
  • Trade-off
    • Projected attribute takes up space and costs more WRITE activity
    • Non-projected attribute must be fetched from the table
      • Entire item is retrieved (increased READ activity)
  • ProjectionType
    • KEYS_ONLY: only index and primary keys
    • INCLUDE:  only specified in NonKeyAttributes
    • ALL: all attributes



Provisioned Throughput
  • Read Capacity Unit (RCU)
    • 1 strongly consistent read/s of item up to 4KB
      • Items larger than 4KB need more units (10KB = 3 units)
    • reads by default are "eventually consistent" - consume 0.5 RCU
  • Write Capacity Unit (WCU)
    • 1 write/s of item up to 1KB
    • ACK'ed on qorum (2/3)
  • RCU and WCU are independent
  • Throughput is divided by partitions, e.g.
    • 400 WCU and 4 partitions
      • Each partition gets 100 WCU

Consistency
  • Eventually Consistent - default for Query, GetItem
  • Strongly Consistent - may be less available (also eats more RCU)
  • Atomic counters supported

Partition
  • Internal "shard"
  • Capacity
    • 3000 RCU
    • 1000 WCU
  • By Size: Total Size /10GB
  • Total Partition = CEILING(MAX(By Capacity, By Size))


DAX (DynamoDB accelerator)
  • In memory cache
  • Micro-second latency
  • Writethrough
  • Same API so it can just be "enabled"

Global Tables
  • Multi-region, multi-master table
    • Collection of replica tables
  • Replica table (replica)
    • Max one per region
    • Each stores the same set of data
    • Each have identical schema 

  • Write conflict resolution
    • Timestamp (last write wins)
  • Eventually consistent across regions
    • Typically within seconds
      • CloudWatch: ReplicationLatency, PendingReplicationCount

Backup&Restore
  • On demand backup
  • Instantenous
    • DynamoDB takes continous snapshots + change logs
    • It is just saving a timestamp+metadata
  • No impact on CU
  • Encrypted (SSE, AWS key)
  • Restore
    • From 30 minutes to several hours: O(size)

Scaling
  • Avoid hot partitions (hammering small number of keys)
    • Each partition can handle only fraction of the overall throughput
    • Hash key should have very high cardinality (i.e. unique values)
  • Auto Scaling
    • Automatically manages throughput capacity
    • Scaling policy for table/GSI
      • Minimum/Maximum capacity
      • Target utilization (%)
    • Based on CloudWatch Alarms
  • Bursting
    • 5 minutes of rollover time (unused capacity sums up from 5 minutes)
    • Useful for rapid bursts (Autoscaling may not react fast enough)
    • Do not rely on it in normal operations
  • Adaptive Capacity
    • Handles imbalanced ("hot") partitions
    • Needs time to adjust
    • You need to have spare capacity on other partitions (cannot exceed total CU)
  • Throttling occurs on:
    • Hot Keys/Partitions
    • Very large bursts
    • Mixing hot and cold data

Concurrency
  • Optimistic locking
    • Designate one attribute as a version
    • Perform conditional write/delete (similiar to UPDATE ... WHERE)

API
  • CRUD for Table
  • CRUD for item
    • Batch Put/Get
    • Query (filtered, unfiltered)
    • Scan
      • Access every item in the table/index
      • ScanFilter - restrict returned items
      • Paging supported (max 1MB)
      • Parallel
        • Segment - non-overlapping 
        • TotalSegments - how many workers
        • AWS performs the division itself based on the above
  • Streams
    • List, Describe, GetShardIterator, GetRecords
  • AWS SDK client uses exponential backoff with jitter for retries (see references)


Limits
  • Provisioned Throughput
    • Per table 40K RCU  + 40K WCU
    • Per account 80K RCU + 80K WCU
    • Soft
    • Decrease ProvisionedThroughput - max 3/day
  • Secondary Index
    • Max 20 projected attributes
    • LSI: for a partition key total size of data in table and LSI must be <= 10GB

Security
  • VPC endpoint available
  • Fine-grained access control
    • permissions for individual items and/or attributes
    • use IAM Conditions
    • examples
      • dynamodb:LeadingKeys - Web Identity Users can see only their own values
      • dynamodb:Attributes - see only specific attributes
  • Encryption at rest
    • Entire table encrypted by default
    • Uses KMS
      • Default - AWS owned key (free but no access control)
      • Optionally - AWS managed key (standard AWS KMS key)

References

No comments:

Post a Comment