Model
- Vault
- name may be the same across regions (unique per region)
- container for archives
- analogous to S3 bucket
- max 1000 per region
- Archive
- base unit storage in Glacier
- immutable (create/delete only)
- can be any data (photo, document, etc.)
- best practice: aggregate data into .zip or .tar
- 32 kB metadata overhead
- Recommended >= 1MB per object
- max 40TB per archive
- Upload
- Single max 4GB
- Recommended multi-part for > 100 MB
- Compute and supply tree-hash
- Hash for each megabyte segment and combine in tree fashion
- Compute and supply tree-hash
- Inventory
- Updated once per day
- List of all archives
- Inventory date not changed if no add/delete of archives
- Format: CSV or JSON
- Similar concepts exists now for S3
Jobs
- Executed asynchronously (Job ID returned)
- Associated with vault
- Multiple jobs may be in-progress
- When it completes user can download the output (available for 24h)
- Types
- Archive Retrieval
- entire archive or subset of files in the archive
- Inventory Retrieval (list of archives)
- filter can be applied (e.g. archive creation date)
- Archive Retrieval
- May have SNS notifications enabled
Upload (Tree Hash)
- On upload include 2 headers
- x-amz-content-sha256
- hash of entire payload used for signature calculation
- x-amz-sha256-tree-hash
- specific to archive upload
- main benefit - avoids re-reading a (potentially big) file to calculate its hash
- it's computed piece-meal
- for each chunk of 1MB compute hash (last may be < 1 MB)
- build the next level of tree (compute hash again)
- repeat until you reach top (root)
- build the next level of tree (compute hash again)
- Examples
- Single request (6.5 MB)
- 1 request (SHA256 computed 13 times)
- Multi-part
- 2 requests each has hash-tree of corresponding parts
- Complete Multipart Upload (tree hash of entire archive)
- Single request (6.5 MB)
- x-amz-content-sha256
Vault Access Policy
- Resource-based policy
- similar to bucket policy
Vault Lock Policy
- Similar to vault access control
- Enforce compliance requirements
- e.g. WORM (Write Once Read Many)
- Once policy is locked it cannot be edited
- Stronger control than vault access policy
- Use case
- time-based data retention rules (deny deletes) but allow read access
- Combine vault lock policy (deny delete) and vault access policy (read)
- Compliance
- time-based data retention rules (deny deletes) but allow read access
- Process
- Initiate lock
- Sets to IN_PROGRESS and returns LockId
- Validate and test your policy
- 24 hours timeout (abort)
- Complete the lock process
- Initiate lock
- Policy elements
- Resource (vault)
- Conditions
- glacier:ArchiveAgeInDays, glacier:ResourceTag
- Action
Pricing
- Storage
- ~20% of S3 Standard
- ~50% of S3 IA
- Depends on Access Frequency
- Bulk 5-12h - cheapest
- Standard 3-5h
- Expedited 1-5 minutes
- Up to 250MB objects
- Larger take linearly longer
- Provisioned Capacity Unit available
- Up to 250MB objects
Glacier Select
- Filtering on Glacier side
- Similar to S3 Select
- Pattern matching
- Auditing
- Data integration
- Allows to GET subset of an object
S3
- Integrated with S3 (storage class)
- Lifecycle configuration can transition between S3 <-> Glacier Storage Class
No comments:
Post a Comment