Model
- Resources
- Bucket - subresources
- website
- versioning
- bucket policy
- ACL
- CORS
- logging
- event notifications
- Object - subresources
- ACL
- restore (when using Glacier restore)
Limits
- 100 buckets per account (soft limit)
Bucket Addressing
- virtual hosting style
- path style
- https://s3-eu-west-1.amazonaws.com/BUCKET/FILE
- must specify correct region
- s3.amazon.aws.com refers to us-east-1
- when wrong region specified you get "301 Moved Permanently"
- name globally unique
- must be DNS compliant (except for us-east)
- may contain "."
Request Redirects
- DNS is used to route to S3 nodes - temporary errors may occur
- Must resend request to different endpoint
- Do not reuse temporary redirect as it may fail in future
- Typically happen when bucket just created/deleted (eventual consistency)
- Permanent redirect - addressed bucket incorrectly (see bucket addressing)
- Use 100 - Expect Continue for PUT requests
- Server decides based on headers if it can accept the requests
- Avoid unnecessary work if the request is to be redirected anyway
Requester Pays
- Owner pays for storage, requester pays for data transfer and requests
- Requester includes x-amz-request-payer header
- Requester must be authenticated by AWS
- Not allowed for anonymous access
- In particular not allowed for static website hosting
DevPay
- "Tenant pays"
- Charging for S3 based products
- Once a month Amazon bills your customers
- Deducts fixed transaction fee and gives you the rest
- Amazon charges for my S3 costs + Dev Pay percentage fee
- If customers do not pay - their access is cut-off
- Customer data is isolated (cannot be accessed directly via S3 API)
- DevPay Tokens
- Product token - identifies the app
- User token - identifies the user to be charged
- STANDARD - default (S)
- Design for Hot/Temporary data
- STANDARD_IA - infrequently accessed (IA)
- Suitable for long lived infrequently accessed data
- Storage cheaper than (S)
- Requests more expensive than (S)
- Minimum object size: 128kB
- Minimum 30 days storage charge (not suitable for temporary objects)
- GLACIER (G)
- Retrieval takes many hours (async job)
- Cannot be used for initial upload
- can be used as lifecycle target
- alternatively upload directly via Glacier API (but S3 does not see it then)
- Object must be restored before accessed
- Object visible in S3
- REDUCED_REDUNDANCY (RR)
- Sustains concurrent loss of 2 replicas
- Probably being phased-out (some regions do not support it)
- 99.9% durability (1/10000 can be lost per year)
- When object is lost AWS returns 405 method not allowed
- 410 (Gone) - would be inappropriate as owner may decide to re-upload
- Storage class can be changed
- PUT object copy (request)
- Destination is the same as source (x-amz-copy-source)
- Indicate it is a copy by using directive: x-amz-directive: COPY
- Set x-amz-storage-class: [STANDARD, REDUCED_REDUNDANCY]
Restoring from Glacier
- Specify number of days to keep the restored file
- Possible to modify later (until expires)
- Cheapest storage class used (e.g. RR)
- Restored objects charged for both S3 and Glacier
Versioning
- Multiple versions of file
- Enabled on bucket level (versioning-enabled)
- Can be suspended (versioning-suspended)
- stop accruing objects
- delete can only remove object with (null) versionId otherwise delete marker is inserted
- Each object gets unique versionId (1024 bytes string)
- Listing
- versions treated as separate objects
- Deleting
- S3 inserts DELETE MARKER
- You can specify versionId to retrieve it
- To permanently delete specify versionId
Cross-region replication (CRR)
- Enabled on bucket level
- Subset of objects can be replicated (prefix)
- Asynchronous copy of all S3 objects from (S)ource bucket to (D)estination bucket
- Can override storage class
- Takes up to several hours
- Ownership
- By default ACL is copied
- (S) account owns the (D) object
- Can be overridden
- Storage Class
- Bi-directional replication
- Use cases
- Compliance
- Minimize latency
- Operations (e.g. computing on the same set of resources)
- DR
- Requirements
- Versioning must be enabled
- (S) and (D) must be in different regions
- Permissions need to be setup
- Not replicable
- Retroactive objects (i.e. created before configuration enabled)
- Objects encrypted with SSE-C
- SSE-KMS is replicable
- KMS key is regional so you specify what to use in (D)
- Bucket subresources
- e.g. Lifecycle configuration
- (S) and (D) may have different
- Non-customer actions (e.g. lifecycle inserts delete marker)
- Replicas from other buckets (i.e. not transitive)
- Status
- GET Includes "x-amz-replication-status" header in response
- S: PENDING, COMPLETED, FAILED
- D: REPLICA
Server Access Logging
- Require Log Delivery ACL on a bucket
- Best effort
- Alternative
Bit Torrent
- Speeds-up large and popular object files
- .torrent file - bootstrap information for the file
- Need BitTorrent client
- Every anonymous object available for download
- S3 acts as a "seeder"
Performance
- No special action needed: < 100/s {PUT, LIST, DELETE } && < 300/s GET
- Rapid increase: ask S3 Team to pre-partition (submit support case)
- Key name dictates the S3 partition
- Unlike DynamoDb it does not shard on key hash
- Hence "List" is available
- Objects are stored lexicographically across partitions
- Recommendation for high-scale workloads
- Avoid sequential keys (timestamps, ids)
- They start with the same prefix and land on the same partition
- Shard keys (MD5 - 4 digit hash)
- Listing becomes very expensive (scan)
- Group objects by key names (animations, videos)
- Reverse the key name for better distribution of initial characters
- Transfer optimizations
- TCP window scaling (increase initial receive windows WSCALE)
- TCP selective acknowledgment - speed-up recovery after large packet loss
Data Consistency Model
- Updates are atomic but on single key only
- Latest PUT wins
- Read-after-write consistency for NEW objects
- Eventually consistent (may return stale data)
- list newly written NEW object
- read-after-write for EXISTING objects (i.e. overwrite)
- read-after-delete
- list deleted object
Multi-part upload
- Recommended for size > 100 MBs
- Process
- Split file locally
- Initiate (S3 returns uploadId)
- Upload each part (1-10000)
- Specify part number (1-10000)
- Determines order
- May not be contigous
- Can be done in parallel
- Finalize for each part number (Etag/part number)
- ETag not generally MD5 anymore (like in SSE encryption)
- Orphaned uploads
- Billed as normal objects
- Not visible in S3 console
- May be cleaned-up with lifecycle rule
Static Website Hosting
- Supports GET and HEAD requests (no POST)
- Public only content
- On error returns HTML (not XML like S3 REST API)
- Supports redirects (object and bucket level)
- Supports root documents (e.g. index.html)
- Does not support SSL
- Option to redirect all requests to different hostname
Event Notifications
- Bucket configuration
- Types
- object created (Put,Post,Copy,CompleteMultiPlartUpload)
- object removed (Delete, DeleteMarkerCreated)
- object loss detected (RRSObjectLost)
- Filters (optional)
- Target
Lifecycle configuration for versioned enabled buckets
- Acts as Recycle Bin
- Action on Current Version
- Transition to the Standard-Infrequent Access
- X days after the object's creation date
- Archive to the Glacier Storage Class
- X days after the object's creation date
- Expire
- Expiring current version will generate new version
- Action on Previous Version
- Transition to the Standard-Infrequent Access
- X days after object becoming a previous vesion
- Archive to the Glacier Storage Class
- X days after object becoming a previous vesion
- Permanently Delete
- X days after object becoming a previous vesion
Object Tagging
- Way to organize data
- more flexible than location (bucket/prefix)
- Max 10 tags per object
- Can grant IAM policy permissions per Tag
- Can be used in Lifecycle rules
S3 Inventory
- Same set of metadata as LIST API
- Format: CSV, ORC
- Can be queried by Athena (S3 file)
- Split between multiple files
- Manifest files
- manifest.json
- symlink.txt - Apache Hive compatible
- Report: daily/weekly
- Delivery to S3 bucket
- Delivery Notification
- On "checksum" written (last step)
- SNS/SQS/Lambda
- Eventualyl consistent
- Pricing - cost half of LIST API
Storage Class Analysis
- Daily report
- Set of heuristics what is the appropriate Storage Class
- Provides lifecycle recommendation
- Can be exported to BI tool
- Additional pricing
Transfer Acceleration
- Uses CloudFront edge network (in reverse)
- Upload to closest "PoP"
- Uses backbone network to deliver to target
- Enable on bucket level (new endpoint)
References