Notes on AWS, Big Data, Machine Learning and Leadership: AWS DynamoDB (Design Patterns)

Saturday, 3 February 2018

AWS DynamoDB (Design Patterns)

Write Sharding (Table)

append known random value to the key
- e.g. CandidateId + rand(0,n)
Scatter (write) + Gather (read)
Use case
- Low key cardinality (e.g. voting application for 2 people)

Write Sharding (GSI)

create additional attribute "GSIKey" - rand(0,n)
create GSI
- Primary Key = GSIKey
- Sort Key = timestamp
Use case
- Avoids full table scan on "timestamp"

Time Series Data

Partition by time (e.g. April_2015, March_2015)
Precreate tables
Writes go to current table
Lower RCU/WCU as tables age-out (no traffic to them)

Sparse Index

Comparable to "FilteredIndex" in SQL Server
LSI
- Items with no SortKey attribute are not indexed
- Example
  - Table: CustomerId, OrderId, OrderOpenDate
  - LSI: CustomerId, OrderOpenDate
    - When the order is closed DELETE the "OrderOpenDate" attribute
GSI
- Items with no index key value are not indexed
- Example
  - Table: UserId, Champ attribute (for some items)
  - GSI: PK = Champ, SortKey = UserId

Geo-Hashing

Split the map into big squares (1,2,3,4)
- Each square contains smaller squares (1:11,12,13,14)
  - Up to desired level of granularity
Use SortKey to do comparison (e.g. "show me neighbours")
Client library available

Query Filters

Filter may be expensive (less data "on the wire" but reads still happen)
- RCU consumed
Concatenate attributes to form useful composite range key
- e.g. Status + date: "PENDING_2015-12-25"
- Query "begins with" PENDING
Use cases
- filtering data

Vertical Partitioning

Split item into multiple to limit RCU consumption
Use GSI to model M:N relationship
Use case
- Message bodies separated out to a different table

Adjacency Lists

Partition on nodeId
Model relationships as edges
- Default (self-edge) to describe node itself
Use Sharded GSI to query on "data"

References

https://www.youtube.com/watch?v=jzeKPKpucS0

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)