Notes on AWS, Big Data, Machine Learning and Leadership: AWS Glacier

Model

Vault
- name may be the same across regions (unique per region)
- container for archives
- analogous to S3 bucket
- max 1000 per region
Archive
- base unit storage in Glacier
- immutable (create/delete only)
- can be any data (photo, document, etc.)
  - best practice: aggregate data into .zip or .tar
- 32 kB metadata overhead
  - Recommended >= 1MB per object
- max 40TB per archive
- Upload
  - Single max 4GB
  - Recommended multi-part for > 100 MB
    - Compute and supply tree-hash
      - Hash for each megabyte segment and combine in tree fashion
Inventory
- Updated once per day
- List of all archives
- Inventory date not changed if no add/delete of archives
- Format: CSV or JSON
- Similar concepts exists now for S3

Jobs

Executed asynchronously (Job ID returned)
Associated with vault
- Multiple jobs may be in-progress
When it completes user can download the output (available for 24h)
Types
- Archive Retrieval
  - entire archive or subset of files in the archive
- Inventory Retrieval (list of archives)
  - filter can be applied (e.g. archive creation date)
May have SNS notifications enabled

Upload (Tree Hash)

Vault Access Policy

Vault Lock Policy

Similar to vault access control
Enforce compliance requirements
- e.g. WORM (Write Once Read Many)
Once policy is locked it cannot be edited
- Stronger control than vault access policy
Use case
- time-based data retention rules (deny deletes) but allow read access
  - Combine vault lock policy (deny delete) and vault access policy (read)
- Compliance
Process
- Initiate lock
  - Sets to IN_PROGRESS and returns LockId
  - Validate and test your policy
  - 24 hours timeout (abort)
- Complete the lock process
Policy elements
- Resource (vault)
- Conditions
  - glacier:ArchiveAgeInDays, glacier:ResourceTag
- Action

Pricing

Glacier Select

Notes on AWS, Big Data, Machine Learning and Leadership