Tuesday, 3 May 2016

AWS EC2 - EBS

Network block storage as a service.

Model
  • Block Device
    • Moves data in blocks of bits/bytes (does not understand filesystem on top of it)
    • Block is 256KB
    • Supports random I/O and generally uses buffered I/O
  • Replication
    • within the same AZ 
    • write issued to primary and mirror
    • write returns when both are ack'ed
    • underlying technology not disclosed
      • Perhaps synchronous mirroring similar to drdb

Availability
  • Replicated within AZ
  • 99.999% availability (5 nines)
  • AFR: 0.1-0.5% provided that you snapshot more often then every 20GB modified blocks
    • Snapshotted EBS volume is practicaly 0% AFR (S3 durability)
    • It grows slowly from there
  • RAIDing (1/5/6) does not work as well as on physical volumes due to shared infrastructure

IOPS
  • Input/Output operation per second 
    • Read or Write
  • Measured in increments of 256KiB
    • 1 * 1024KiB  = 4 IOPS
    • 1024 * 1KiB = 1024 IOPS
  • For very large chunks you may reach throughput limit before IOPS limit
    • Example
      • 1000GiB gp2 has 3000 IOPS limit and throughput 160 MiB/s
      • At 256KiB you reach max throughput at 640 IOPS
      • For smaller size (e.g. 16Kib) same volume sustains 3000 IOPS

Average Queue Length
  • Queue length - number of I/O operations pending on the device
  • Workload demand important to take full advantage of EBS volume

Magnetic - standard
  • IOPS around 100 (1 seek = 10msec)
    • Not changeable
    • Can burst to hundreds
  • Throughput: 40-90 MB/s
  • Latency Read: 10-40ms
  • Latency Write: 2-10 ms
  • Size: 1GiB-1TiB

General Purpose (SSD) - gp2
  • IOPS baseline (3 IOPS/GiB)
    • size:IOPS = 1:3
      • e.g. 50 GIB = 150 IOPS
    • At 1TiB you reach 3000 all the time (no more bursts)
    • At 3.333TiB you reach cap 10,000 IOPS
  • Burst: 30 minutes @ 3000
    • 3 credits per GiB per second
    • Initial credit: 5.4M
    • Cap at 5.4M
  • Throughput: 128MB/s - 160MB/s
    • Pairs well with 1000 Mbps EBS optimized
  • Good for boot devices
  • Size: 1GiB - 16TiB

Provisioned IOPS (SSD) - io1
  • precise definition of performance
  • IOPS: 100-20,000
    • Can achieve very high IOPS for small volume size
    • e.g. 50GiB = 1500 IOPS
  • Max ratio size:IOPS = 1:30
    • 100GiB = max 3000 IOPS
    • 666.6GiB ~ max 20,000 IOPS
    • Prevents tiny disks with enormous IOPS
  • Max Throughput: 320 MB/s
  • 4Gib-16 TiB


EBS Encryption
  • Transparent for the OS
  • Data in-transit
    • occurs on EC2 instance 
  • Data at rest
    • disk, snapshots
  • protects against physical access
  • Volume Encryption Key 
    • DataKey protected with region-specific CMK
    • Stored encrypted on the disk
    • Stored plaintext in-memory only
  • Performance impact
    • Increased latency
    • Same IOPS
  • Supported on all EBS types (gp2, io1, standard)
  • Certain instance types are supported
    • Must support Intel AES New Instructions (AES-NI)
      • e.g. Not supported for t2 type
  • Encryption Context: EBS volumeId

Snapshots
  • Start using application when CreateSnapshot returns (list of blocks)
    • Tool: ec2-consistent-snapshot
  • Restored from snapshot: you can read all volume upfront (lazy loading)
  • Can be copied between regions
    • New snapshotId in destination
    • S3-SSE used
    • Up to 5 in progress
    • First is full - subsequent are incremental
    • Encrypted Snapshot
      • You can specify it should be encrypted (if source is not)
      • You can change encryption key
      • Cannot be shared or made public

Prewarming (initialization)
  • EBS created from snapshots have blocks lazy loaded from S3
    • Performance impact 5-50%
    • Provisioned IOPS volumes may display warning
    • Blocks can be loaded by running
      • dd - standard but single threaded
      • fio - requires apt-get, multithreaded
  • EBS newly created (blank) does not require initialization

Monitoring
  • Frequency
    • Basic (5 minutes)
    • Detailed (io1)
  • Metrics
    • io1
      • VolumeThroughputPercentage - delivered/provisioned %
      • VolumeConsumedReadWriteOps - total consumed IOPS (in 256K units)
  • Volume Status Check
    • Automated test running every 5 minutes: PASS or FAIL
    • Status: ok, impaired, insufficient-data
    • Automatically disable IO on failure and create event
      • Stop applications -> enable manually -> run fsck -> start applications
      • Set AutoEnableIO attribute if you don't care about potential corruption
  • Volume I/O Performance check
    • io1 only
    • alerts when performance too low
      • warning if < 50% IOPS
    • Status: normal (ok), degraded (warning), severly degraded (warning), stalled (impaired)

EBS optimized instance
  • Additional, dedicated capacity of EBS I/O
    • Separated from normal network I/O traffic
  • Depending on instance type:
    • Throughput: 500 Mbps - 4000 Mbps, e.g:
      • c4.xlarge = 500 Mbps
      • c4.8xlarge = 4000 Mbps
    • IOPS (16K): 450-3600
  • Some instance types are EBS optimized by default (e.g. c4.large)
  • Instance types with 10Gbps are different concept
    • network channel is shared 
    • some do not offer EBS-optimized version at all (traffic shared)
    • Example: i2.8xlarge
    • Max
      • Throughput: 800 MB/s (3200 Mbps)
      • IOPS: 48,000

Deletion
  • When instance is terminated
    • Root volumes default to delete
    • Non-root volumes default to retain


References

No comments:

Post a Comment