Notes on AWS, Big Data, Machine Learning and Leadership: AWS S3

Model

Resources
- Bucket - subresources
  - website
  - versioning
  - bucket policy
  - ACL
  - CORS
  - logging
  - event notifications

Object - subresources
- ACL
- restore (when using Glacier restore)

Limits

100 buckets per account (soft limit)

Bucket Addressing

virtual hosting style
- https://BUCKET.s3-eu-west-1.amazonaws.com/FILE
- https://BUCKET.s3.amazonaws.com/FILE
  - DNS has enough information to route to the correct region (will route to eu-west-1)
path style
- https://s3-eu-west-1.amazonaws.com/BUCKET/FILE
  - must specify correct region
  - s3.amazon.aws.com refers to us-east-1
  - when wrong region specified you get "301 Moved Permanently"
name globally unique
- must be DNS compliant (except for us-east)
- may contain "."

Request Redirects

DNS is used to route to S3 nodes - temporary errors may occur
Must resend request to different endpoint
- Do not reuse temporary redirect as it may fail in future
Typically happen when bucket just created/deleted (eventual consistency)
Permanent redirect - addressed bucket incorrectly (see bucket addressing)
Use 100 - Expect Continue for PUT requests
- Server decides based on headers if it can accept the requests
- Avoid unnecessary work if the request is to be redirected anyway

Requester Pays

Owner pays for storage, requester pays for data transfer and requests
Requester includes x-amz-request-payer header
Requester must be authenticated by AWS
- Not allowed for anonymous access
  - In particular not allowed for static website hosting

DevPay

"Tenant pays"
Charging for S3 based products
- Once a month Amazon bills your customers
- Deducts fixed transaction fee and gives you the rest
- Amazon charges for my S3 costs + Dev Pay percentage fee
- If customers do not pay - their access is cut-off
Customer data is isolated (cannot be accessed directly via S3 API)
DevPay Tokens
- Product token - identifies the app
- User token - identifies the user to be charged

Storage Classes

STANDARD - default (S)
- Design for Hot/Temporary data
STANDARD_IA - infrequently accessed (IA)
- Suitable for long lived infrequently accessed data
- Storage cheaper than (S)
- Requests more expensive than (S)
- Minimum object size: 128kB
- Minimum 30 days storage charge (not suitable for temporary objects)
GLACIER (G)
- Retrieval takes many hours (async job)
- Cannot be used for initial upload
  - can be used as lifecycle target
  - alternatively upload directly via Glacier API (but S3 does not see it then)
- Object must be restored before accessed
- Object visible in S3
REDUCED_REDUNDANCY (RR)
- Sustains concurrent loss of 2 replicas
- Probably being phased-out (some regions do not support it)
- 99.9% durability (1/10000 can be lost per year)
- When object is lost AWS returns 405 method not allowed
  - 410 (Gone) - would be inappropriate as owner may decide to re-upload
Storage class can be changed
- PUT object copy (request)
  - Destination is the same as source (x-amz-copy-source)
  - Indicate it is a copy by using directive: x-amz-directive: COPY
  - Set x-amz-storage-class: [STANDARD, REDUCED_REDUNDANCY]

Restoring from Glacier

Specify number of days to keep the restored file
- Possible to modify later (until expires)
- Cheapest storage class used (e.g. RR)
Restored objects charged for both S3 and Glacier

Versioning

Multiple versions of file
Enabled on bucket level (versioning-enabled)
- Cannot be disabled
Can be suspended (versioning-suspended)
- stop accruing objects
- delete can only remove object with (null) versionId otherwise delete marker is inserted
Each object gets unique versionId (1024 bytes string)
Listing
- versions treated as separate objects
Deleting
- S3 inserts DELETE MARKER
- You can specify versionId to retrieve it
- To permanently delete specify versionId

Cross-region replication (CRR)

Enabled on bucket level
- Subset of objects can be replicated (prefix)
Asynchronous copy of all S3 objects from (S)ource bucket to (D)estination bucket
- Can override storage class
- Takes up to several hours
Ownership
- By default ACL is copied
  - (S) account owns the (D) object
  - Can be overridden
Storage Class
- Can be specified
  - DR may want STANDARD_IA
Bi-directional replication
- Master-master
Use cases
- Compliance
- Minimize latency
- Operations (e.g. computing on the same set of resources)
- DR
Requirements
- Versioning must be enabled
- (S) and (D) must be in different regions
- Permissions need to be setup
Not replicable
- Retroactive objects (i.e. created before configuration enabled)
- Objects encrypted with SSE-C
  - SSE-KMS is replicable
    - KMS key is regional so you specify what to use in (D)
- Bucket subresources
  - e.g. Lifecycle configuration
    - (S) and (D) may have different
- Non-customer actions (e.g. lifecycle inserts delete marker)
- Replicas from other buckets (i.e. not transitive)
Status
- GET Includes "x-amz-replication-status" header in response
  - S: PENDING, COMPLETED, FAILED
  - D: REPLICA

Server Access Logging

Require Log Delivery ACL on a bucket
Best effort
Alternative
- CloudTrail: Data Events

Bit Torrent

Speeds-up large and popular object files
.torrent file - bootstrap information for the file
- Use s3-path?torrent
Need BitTorrent client
Every anonymous object available for download
S3 acts as a "seeder"

Performance

No special action needed: < 100/s {PUT, LIST, DELETE } && < 300/s GET
Rapid increase: ask S3 Team to pre-partition (submit support case)
Key name dictates the S3 partition
- Unlike DynamoDb it does not shard on key hash
  - Hence "List" is available
- Objects are stored lexicographically across partitions
- Recommendation for high-scale workloads
  - Avoid sequential keys (timestamps, ids)
    - They start with the same prefix and land on the same partition
    - Shard keys (MD5 - 4 digit hash)
      - Listing becomes very expensive (scan)
    - Group objects by key names (animations, videos)
    - Reverse the key name for better distribution of initial characters
      - e.g. userId=1234 -> 4321
Transfer optimizations
- TCP window scaling (increase initial receive windows WSCALE)
- TCP selective acknowledgment - speed-up recovery after large packet loss

Data Consistency Model

Updates are atomic but on single key only
Latest PUT wins
Read-after-write consistency for NEW objects
Eventually consistent (may return stale data)
- list newly written NEW object
- read-after-write for EXISTING objects (i.e. overwrite)
- read-after-delete
- list deleted object

Multi-part upload

Recommended for size > 100 MBs
- Max object size = 5TB
Process
- Split file locally
- Initiate (S3 returns uploadId)
- Upload each part (1-10000)
  - Specify part number (1-10000)
    - Determines order
    - May not be contigous
  - Can be done in parallel
- Finalize for each part number (Etag/part number)
ETag not generally MD5 anymore (like in SSE encryption)
Orphaned uploads
- Billed as normal objects
- Not visible in S3 console
- May be cleaned-up with lifecycle rule

Static Website Hosting

Supports GET and HEAD requests (no POST)
Public only content
On error returns HTML (not XML like S3 REST API)
Supports redirects (object and bucket level)
Supports root documents (e.g. index.html)
Does not support SSL
- Use CloudFront on top
Option to redirect all requests to different hostname
- upload content to example.org
- create bucket www.example.org
- redirect www.example.org -> example.org

Event Notifications

Bucket configuration
Types
- object created (Put,Post,Copy,CompleteMultiPlartUpload)
- object removed (Delete, DeleteMarkerCreated)
- object loss detected (RRSObjectLost)
Filters (optional)
- Prefix
- Suffix
Target
- SNS, SQS, Lambda

Lifecycle configuration for versioned enabled buckets

Acts as Recycle Bin
Action on Current Version
- Transition to the Standard-Infrequent Access
  - X days after the object's creation date
- Archive to the Glacier Storage Class
  - X days after the object's creation date
- Expire
  - Expiring current version will generate new version
Action on Previous Version
- Transition to the Standard-Infrequent Access
  - X days after object becoming a previous vesion
- Archive to the Glacier Storage Class
  - X days after object becoming a previous vesion
- Permanently Delete
  - X days after object becoming a previous vesion

Object Tagging

Way to organize data
- more flexible than location (bucket/prefix)
Max 10 tags per object
Can grant IAM policy permissions per Tag
Can be used in Lifecycle rules

S3 Inventory

Same set of metadata as LIST API
Format: CSV, ORC
Can be queried by Athena (S3 file)
Split between multiple files
- Manifest files
  - manifest.json
  - symlink.txt - Apache Hive compatible
Report: daily/weekly
Delivery to S3 bucket
Delivery Notification
- On "checksum" written (last step)
- SNS/SQS/Lambda
Eventualyl consistent
Pricing - cost half of LIST API

Storage Class Analysis

Daily report
Set of heuristics what is the appropriate Storage Class
- Access
- Retention
Provides lifecycle recommendation
Can be exported to BI tool
Additional pricing

Transfer Acceleration

Uses CloudFront edge network (in reverse)
Upload to closest "PoP"
Uses backbone network to deliver to target
Enable on bucket level (new endpoint)

References

https://forums.aws.amazon.com/ann.jspa?annID=3112
https://aws.amazon.com/blogs/aws/amazon-s3-multipart-upload/
http://docs.aws.amazon.com/AmazonS3/latest/dev/cors.html

Notes on AWS, Big Data, Machine Learning and Leadership

Friday, 23 March 2018

AWS S3

No comments:

Post a Comment