Wednesday, 21 February 2018

AWS Glacier


Model
  • Vault
    • name may be the same across regions (unique per region)
    • container for archives
    • analogous to S3 bucket
    • max 1000 per region
  • Archive
    • base unit storage in Glacier
    • immutable (create/delete only)
    • can be any data (photo, document, etc.)
      • best practice: aggregate data into .zip or .tar
    • 32 kB metadata overhead
      • Recommended >= 1MB per object
    • max 40TB per archive
    • Upload
      • Single max 4GB
      • Recommended multi-part for > 100 MB
        • Compute and supply tree-hash
          • Hash for each megabyte segment and combine in tree fashion
  • Inventory
    • Updated once per day
    • List of all archives
    • Inventory date not changed if no add/delete of archives
    • Format: CSV or JSON
    • Similar concepts exists now for S3

Jobs
  • Executed asynchronously (Job ID returned)
  • Associated with vault
    • Multiple jobs may be in-progress
  • When it completes user can download the output (available for 24h)
  • Types
    • Archive Retrieval
      • entire archive or subset of files in the archive
    • Inventory Retrieval (list of archives)
      • filter can be applied (e.g. archive creation date)
  • May have SNS notifications enabled

Upload (Tree Hash)
  • On upload include 2 headers
    • x-amz-content-sha256
      • hash of entire payload used for signature calculation
    • x-amz-sha256-tree-hash
      • specific to archive upload
      • main benefit - avoids re-reading a (potentially big) file to calculate its hash
        • it's computed piece-meal
      • for each chunk of 1MB compute hash (last may be < 1 MB)
        • build the next level of tree (compute hash again)
          • repeat until you reach top (root)
      • Examples
        • Single request (6.5 MB)
          • 1 request (SHA256 computed 13 times)
        • Multi-part
          • 2 requests each has hash-tree of corresponding parts
          • Complete Multipart Upload (tree hash of entire archive)

Vault Access Policy
  • Resource-based policy
    • similar to bucket policy

Vault Lock Policy
  • Similar to vault access control
  • Enforce compliance requirements
    • e.g. WORM (Write Once Read Many)
  • Once policy is locked it cannot be edited
    • Stronger control than vault access policy
  • Use case
    • time-based data retention rules (deny deletes) but allow read access
      • Combine vault lock policy (deny delete) and vault access policy (read)
    • Compliance
  • Process
    • Initiate lock
      • Sets to IN_PROGRESS and returns LockId
      • Validate and test your policy
      • 24 hours timeout (abort)
    • Complete the lock process
  • Policy elements
    • Resource (vault)
    • Conditions
      • glacier:ArchiveAgeInDays, glacier:ResourceTag
    • Action

Pricing
  • Storage 
    • ~20% of S3 Standard
    • ~50% of S3 IA
  • Depends on Access Frequency
    • Bulk 5-12h - cheapest
    • Standard 3-5h
    • Expedited 1-5 minutes
      • Up to 250MB objects
        • Larger take linearly longer
      • Provisioned Capacity Unit available

Glacier Select
  • Filtering on Glacier side
  • Similar to S3 Select
    • Pattern matching
    • Auditing 
    • Data integration
  • Allows to GET subset of an object

S3


  • Integrated with S3 (storage class)
  • Lifecycle configuration can transition between S3 <-> Glacier Storage Class

No comments:

Post a Comment