Friday, 23 March 2018

AWS S3

Model
  • Resources
    • Bucket - subresources
      • website 
      • versioning 
      • bucket policy
      • ACL
      • CORS
      • logging
      • event notifications
  • Object - subresources
    • ACL
    • restore (when using Glacier restore)
Limits
  • 100 buckets per account (soft limit)

Bucket Addressing
  • virtual hosting style
  • path style
    • https://s3-eu-west-1.amazonaws.com/BUCKET/FILE 
      • must specify correct region    
      • s3.amazon.aws.com refers to us-east-1
      • when wrong region specified you get "301 Moved Permanently"
  • name globally unique
    • must be DNS compliant (except for us-east)
    • may contain "."

Request Redirects
  • DNS is used to route to S3 nodes - temporary errors may occur
  • Must resend request to different endpoint
    • Do not reuse temporary redirect as it may fail in future
  • Typically happen when bucket just created/deleted (eventual consistency)
  • Permanent redirect - addressed bucket incorrectly (see bucket addressing)
  • Use 100 - Expect Continue for PUT requests
    • Server decides based on headers if it can accept the requests
    • Avoid unnecessary work if the request is to be redirected anyway

Requester Pays
  • Owner pays for storage, requester pays for data transfer and requests
  • Requester includes x-amz-request-payer header
  • Requester must be authenticated by AWS
    • Not allowed for anonymous access
      • In particular not allowed for static website hosting

DevPay
  • "Tenant pays"
  • Charging for S3 based products
    • Once a month Amazon bills your customers
    • Deducts fixed transaction fee and gives you the rest
    • Amazon charges for my S3 costs + Dev Pay percentage fee
    • If customers do not pay - their access is cut-off
  • Customer data is isolated (cannot be accessed directly via S3 API)
  • DevPay Tokens
    • Product token - identifies the app
    • User token - identifies the user to be charged

Storage Classes
  • STANDARD - default (S)
    • Design for Hot/Temporary data
  •  STANDARD_IA - infrequently accessed (IA)
    • Suitable for long lived infrequently accessed data 
    • Storage cheaper than (S)
    • Requests more expensive than (S)
    • Minimum object size: 128kB
    • Minimum 30 days storage charge (not suitable for temporary objects)
  • GLACIER (G)
    • Retrieval takes many hours (async job)
    • Cannot be used for initial upload
      • can be used as lifecycle target
      • alternatively upload directly via Glacier API (but S3 does not see it then)
    • Object must be restored before accessed
    • Object visible in S3
  • REDUCED_REDUNDANCY (RR)
    • Sustains concurrent loss of 2 replicas
    • Probably being phased-out (some regions do not support it)
    • 99.9% durability (1/10000 can be lost per year)
    • When object is lost AWS returns 405 method not allowed   
      •     410 (Gone) - would be inappropriate as owner may decide to re-upload
  • Storage class can be changed
    • PUT object copy (request)
      • Destination is the same as source (x-amz-copy-source)
      • Indicate it is a copy by using directive: x-amz-directive: COPY
      • Set x-amz-storage-class: [STANDARD, REDUCED_REDUNDANCY]

Restoring from Glacier
  • Specify number of days to keep the restored file
    • Possible to modify later (until expires)
    • Cheapest storage class used (e.g. RR)
  • Restored objects charged for both S3 and Glacier

Versioning
  • Multiple versions of file 
  • Enabled on bucket level (versioning-enabled)
    • Cannot be disabled
  • Can be suspended (versioning-suspended)
    • stop accruing objects
    • delete can only remove object with (null) versionId otherwise delete marker is inserted
  • Each object gets unique versionId (1024 bytes string)
  • Listing
    • versions treated as separate objects
  • Deleting
    • S3 inserts DELETE MARKER
    • You can specify versionId to retrieve it
    • To permanently delete specify versionId

Cross-region replication (CRR)
  • Enabled on bucket level
    • Subset of objects can be replicated (prefix)
  • Asynchronous copy of all S3 objects from (S)ource bucket to (D)estination bucket
    • Can override storage class
    • Takes up to several hours
  • Ownership
    • By default ACL is copied
      • (S) account owns the (D) object
      • Can be overridden 
  • Storage Class
    • Can be specified
      • DR may want STANDARD_IA
  • Bi-directional replication
    • Master-master
  • Use cases
    • Compliance
    • Minimize latency
    • Operations (e.g. computing on the same set of resources)
    • DR
  • Requirements
    • Versioning must be enabled
    • (S) and (D) must be in different regions
    • Permissions need to be setup
  • Not replicable
    • Retroactive objects (i.e. created before configuration enabled)
    • Objects encrypted with SSE-C
      • SSE-KMS is replicable
        • KMS key is regional so you specify what to use in (D)
    • Bucket subresources
      • e.g. Lifecycle configuration
        • (S) and (D) may have different
    • Non-customer actions (e.g. lifecycle inserts delete marker)
    • Replicas from other buckets  (i.e. not transitive)
  • Status
    • GET Includes "x-amz-replication-status" header in response
      • S: PENDING, COMPLETED, FAILED
      • D: REPLICA

Server Access Logging
  • Require Log Delivery ACL on a bucket
  • Best effort
  • Alternative
    • CloudTrail: Data Events

Bit Torrent
  • Speeds-up large and popular object files
  • .torrent file - bootstrap information for the file
    • Use s3-path?torrent
  • Need BitTorrent client
  • Every anonymous object available for download
  • S3 acts as a "seeder"

Performance
  • No special action needed: < 100/s {PUT, LIST, DELETE } && < 300/s GET
  • Rapid increase: ask S3 Team to pre-partition (submit support case)
  • Key name dictates the S3 partition
    • Unlike DynamoDb it does not shard on key hash
      • Hence "List" is available
    • Objects are stored lexicographically across partitions
    • Recommendation for high-scale workloads
      • Avoid sequential keys (timestamps, ids)
        • They start with the same prefix and land on the same partition
        • Shard keys (MD5 - 4 digit hash)
          • Listing becomes very expensive (scan)
        • Group objects by key names (animations, videos)
        • Reverse the key name for better distribution of initial characters
          • e.g. userId=1234 -> 4321
  • Transfer optimizations 
    • TCP window scaling (increase initial receive windows WSCALE)
    • TCP selective acknowledgment - speed-up recovery after large packet loss

Data Consistency Model
  • Updates are atomic but on single key only
  • Latest PUT wins
  • Read-after-write consistency for NEW objects
  • Eventually consistent (may return stale data)
    • list newly written NEW object 
    • read-after-write for EXISTING objects (i.e. overwrite)
    • read-after-delete
    • list deleted object

Multi-part upload
  • Recommended for size > 100 MBs
    • Max object size = 5TB
  • Process
    • Split file locally
    • Initiate (S3 returns uploadId)
    • Upload each part  (1-10000)
      • Specify part number (1-10000)
        • Determines order
        • May not be contigous
      • Can be done in parallel
    • Finalize for each part number (Etag/part number)
  • ETag not generally MD5 anymore (like in SSE encryption)
  • Orphaned uploads
    • Billed as normal objects
    • Not visible in S3 console
    • May be cleaned-up with lifecycle rule

Static Website Hosting
  • Supports GET and HEAD requests (no POST)
  • Public only content 
  • On error returns HTML (not XML like S3 REST API)
  • Supports redirects (object and bucket level)
  • Supports root documents (e.g. index.html)
  • Does not support SSL
    • Use CloudFront on top
  • Option to redirect all requests to different hostname

Event Notifications
  • Bucket configuration
  • Types
    •  object created (Put,Post,Copy,CompleteMultiPlartUpload)
    •  object removed (Delete, DeleteMarkerCreated)
    •  object loss detected (RRSObjectLost)
  • Filters (optional)
    • Prefix
    • Suffix
  • Target
    • SNS, SQS, Lambda

Lifecycle configuration for versioned enabled buckets
  • Acts as Recycle Bin
  • Action on Current Version
    • Transition to the Standard-Infrequent Access
      • X days after the object's creation date
    • Archive to the Glacier Storage Class
      • X days after the object's creation date
    • Expire
      • Expiring current version will generate new version
  • Action on Previous Version
    • Transition to the Standard-Infrequent Access
      • X days after object becoming a previous vesion
    • Archive to the Glacier Storage Class
      • X days after object becoming a previous vesion
    • Permanently Delete
      • X days after object becoming a previous vesion

Object Tagging
  • Way to organize data
    • more flexible than location (bucket/prefix) 
  • Max 10 tags per object
  • Can grant IAM policy permissions per Tag
  • Can be used in Lifecycle rules

S3 Inventory
  • Same set of metadata as LIST API
  • Format: CSV, ORC
  • Can be queried by Athena (S3 file)
  • Split between multiple files
    • Manifest files
      • manifest.json 
      • symlink.txt - Apache Hive compatible
  • Report: daily/weekly
  • Delivery to S3 bucket
  • Delivery Notification
    • On "checksum" written (last step)
    • SNS/SQS/Lambda
  • Eventualyl consistent
  • Pricing - cost half of LIST API

Storage Class Analysis
  • Daily report
  • Set of heuristics what is the appropriate Storage Class
    • Access
    • Retention
  • Provides lifecycle recommendation
  • Can be exported to BI tool
  • Additional pricing

Transfer Acceleration
  • Uses CloudFront edge network (in reverse)
  • Upload to closest "PoP"
  • Uses backbone network to deliver to target
  • Enable on bucket level (new endpoint)

References

No comments:

Post a Comment