Thursday, 28 April 2016

AWS Glacier


Model
  • Vault
    • name may be the same across regions (unique per region)
    • container for archives
    • analogous to S3 bucket
    • max 1000 per region
  • Archive
    • base unit storage in Glacier
    • immutable (create/delete only)
    • can be any data (photo, document, etc.)
      • best practice: aggregate data into .zip or .tar
    • 32 kB metadata overhead
    • max 40 TB
    • Upload
      • Single max 4GB
      • Recommended multi-part for > 100 MB
        • Compute and supply tree-hash
          • Hash for each megabyte segment and combine in tree fashion
  • Inventory
    • Updated once per day
    • List of all archives
    • Inventory date not changed if no add/delete of archives
    • Format: CSV or JSON

Jobs
  • Executed asynchronously (Job ID returned)
  • Associated with vault
    • Multiple jobs may be in-progress
  • Typically 3.5-4.5 hours
  • When it completes user can download the output (available for 24h)
  • Types
    • Archive Retrieval
      • entire archive or subset of files in the archive
    • Inventory Retrieval (list of archives)
      • filter can be applied (e.g. archive creation date)
  • May have SNS notifications enabled

Tree Hash
  • On upload include 2 headers
    • x-amz-content-sha256
      • hash of entire payload used for signature calculation
    • x-amz-sha256-tree-hash
      • specific to archive upload
      • main benefit - avoids re-reading a (potentially big) file to calculate its hash
        • it's computed piece-meal
      • for each chunk of 1MB compute hash (last may be < 1 MB)
        • build the next level of tree (compute hash again)
          • repeat until you reach top (root)
      • Examples
        • Single request (6.5 MB)
          • 1 request (SHA256 computed 13 times)
        • Multi-part
          • 2 requests each has hash-tree of corresponding parts
          • Complete Multipart Upload (tree hash of entire archive)

Vault Access Policy
  • Resource-based policy (similar to bucket policy)

Vault Lock Policy
  • Similar to vault access control
  • Enforce compliance requirements
    • e.g. WORM (Write Once Read Many)
  • Once policy is locked it cannot be edited
    • Stronger control than vault access policy
  • Use case
    • time-based data retention rules (deny deletes) but allow read access
      • Combine vault lock policy (deny delete) and vault access policy (read)
  • Process
    • Initiate lock
      • Sets to IN_PROGRESS and returns LockId
      • Validate and test your policy
      • 24 hours timeout 
    • Complete the lock process
  • Policy elements
    • Resource (vault)
    • Conditions
      • glacier:ArchiveAgeInDays, glacier:ResourceTag
    • Action

Pricing
  • Allowed to retrieve 5% of data any month for free
    • More retrieval you pay per 1 GB
  • Peak usage taken and applied to the whole month retroactively
  • Data Retrieval Policy
    • Simplifies cost management by setting limits
      • Free Tier Only
      • Max Retrieval Rate (GB/h)
      • No Retrieval Limit

S3

  • Integrated with S3
  • Lifecycle configuration can transition between S3 <-> Glacier

No comments:

Post a Comment