Monday, 8 August 2016

AWS VPC (VPN)


Virtual Private Gateway (VGW)
  • VPN concentrator on VPC side
  • Max 5 per region
  • Represents 2 distinct endpoints in separate data centers (AZs)
    • Each endpoint in distinct AZ (no control which AZs)
    • Two unique public IP addresses
  • Redundancy 
    • Add second CGW that points to the same VGW
    • This creates additional 2 tunnels (total 4 tunnels)
    • No point in adding more VGW in the same VPC
  • Terminates
    • Hardware VPN connection
    • Direct Connect
  • Acts as ECMP route to the set of WAN routers (not Internet routers, compare with IGW)

Customer Gateway (CGW)
  • Represents customer device in the VPN connection
  • Customer-end router supporting IPSec termination
    • This can be software solution
  • Initiates the VPN connection
  • Device may support static or dynamic routing
  • Max 50 per region

AWS Hardware VPN
  • Uses Virtual Private Gateway
  • IPSec based
    • Pre-shared key generated by AWS
    • Uses tunnel mode (no policy based VPN)
    • Customer Gateway must be able to terminate IPSec
  • NAT Traversal (NAT-T) supported
    • Uses UDP/4500 to encapsulate ESP packet
  • Depends on "Internet weather" (latency, jitter)
  • 1 VPN connection -> 1 VPC
    • 1 VPC max 10 VPN connections
    • Max 50 VPN Connections per region
      •  5 VPC per region * 10 VPN connections each
  • 2 endpoints (tunnels) in different AZs for HA
    • Single customer endpoint at minimum
  • You can maintain your on-premise private IP address (no NAT)
  • Routing type
    • Static 
      • Requires routes (IP prefixes) to be manually specifed 
      • Consolidate ACL's to cover all IPs
    • Dynamic  (preferred)
      • Customer device must support single-hop BGP
      • BGP based - announces routes to AWS
        • AWS adds them to the route table in VPC, e.g.
          • 192.168.0.0/16 -> VGW
        • Route propagation
          • Anything I receive on VGW via BGP: add to route table
      • Max-prefixes: 100
      • 2 pr 4 byte AS (Autonomous System) number for both ends
        • Can be changed on both ends
      • Automatically generates BGP IP addresses (link local) for each end, e.g.
        • Tunnel 1
          • AWS: 169.254.169.1/30 
          • Customer: 169.254.169.2/30
        • Tunnel 2
          • AWS: 169.254.169.5/30
          • Customer: 169.254.169.6/30
  • Connections are initiated by Customer Gateway
    • on-premise router sees interesting traffic and initiates session to VGW
    • Fragment IP packets before encryption
    • Reason why there is no VGW<->VGW Hardware VPN
      • Who would initiate
    • Keepalive required to keep the tunnel up
  • 1 unique IPSec Security Association (SA) pair per tunnel
    • 1 inbound and 1 outbound 
    • Total: 2 unique pairs for 2 tunnels = 4 SA
  • HA
    • Use two customer gateway connections cross wired (i.e. 4 tunnels total)
      • Routing on-premise to decide which one to use

Software VPN
  • Run EC2 instance acting as VPN server ("Software VPN Appliance")
    • OpenSwan - IPSec based
    • OpenVPN - SSL based
    • Commercial: Fortinet, Cisco, Check Point, Microsoft, Astaro
  • Customer can manage both ends of VPN (unlike AWS Hardware VPN)
    • Responsible for HA


SSL VPN
  • Clientless
    • Uses browser
    • Non Web applications must use Java/ActiveX controls
  • SSL used as tunnel to access internal application
  • May use special "Portal" (Gateway)
  • Layer 6/7
  • Use cases:
    • remote access
    • roaming client

Misc (Data Transfer)

EU prices

EC2 
  • IN to EC2 from
    • Free
      • Internet
      • Another AWS region
      • S3, Glacier, DynamoDB, SES, SQS, SimpleDB
        • same region
      • EC2, RDS,Redshift, ElastiCache, ENI
        • Same AZ and using private IP
    • Paid (1 GB = $0.01)
      • EC2, RDS,Redshift, ElastiCache, ENI
        • same AZ but using public IP or EIP
        • different AZ
        • peered VPC
      • Billing: "Data Transfer" (regional data transfer)
  • OUT from EC2 to
    • Free
      • S3, Glacier, DynamoDB, SES, SQS, SimpleDB
        • same region 
      • EC2, RDS, Redshift, ElastiCache, ENI
        • same AZ and using private IP
      • CloudFront (origin fetch)
    • Paid
      • (1 GB = $0.01)
        • EC2, RDS, Redshift, ElastiCache, ENI
          • same AZ but using public IP or EIP
          • different AZ
          • peered VPC
        • Billing: "Data Transfer" (regional data transfer)
      • (1 GB = $0.02)
        • another AWS region
      • (1 GB = $0.09)
        • Internet
        • Billing: "Data Transfer"  (data transfer out)

CloudFront
  • OUT to
    • Internet
      • Paid (1GB = $0.085)
    • Origin
      • Paid (1GB = $0.020)
    • Billing: "CloudFront" section

ELB
  • Paid ( 1GB = $0.008)
    • Processed traffic (IN/OUT)
  • Billing: "Elastic Cloud Compute" section

Tuesday, 10 May 2016

AWS IAM - STS


Security Token Service (STS)
  • Gives out temporary credentials
  • Global and regional endpoint

Temporary Credentials
  • Similar to Access Keys but short-lived (minutes to hours)
  • Not stored with the user but dynamically generated
  • No need to distribute them
  • Can grant access without AWS identity (so they are basis for federated identity)
  • Have restrictions based on the API used
    • e.g. Cannot call GetFederationToken and GetSessionToken
      • You could extend your token

Session
  • Temporary access to AWS
  • Generated by STS (see below)
  • Elements
    • Access Key
    • Secret Access Key
    • Session Token
      • Must be submitted to every API call
    • Expiration (Min/Max/Default)
      • GetFederationToken (15m/36h/12h)
      • AssumeRole*(15m/1h/1h)

Policy scoping
  • Allows to restrict permissions (logical: role permissions && policy)

GetFederationToken
  • Works within AWS account
  • Up to 36 hours (much longer than others)
  • No MFA
  • Requires AWS credentials
    • Must have union of all policies that you want to grant
  • Policy
    • There are no "role permissions" here so you only get what you specify
    • If no policy specified authenticated user may still get access based on resource policy, e.g.
      • Temporary credentials created for "Susan" (federation token)
      • S3 bucket access for "arn:aws:sts::111122223333:federated-user/Susan"
  • Use cases 
    • server side proxies (must safely store long term credentials)
    • you want to manage permissions within the orgnization

AssumeRole
  • Works cross-account
  • 15 minutes - 1 hour
  • MFA supported
  • Supports policy scoping
  • Requires AWS credentials
  • Use cases
    • Grant access to resources in different AWS account
    • Enforce MFA authentication for privilage escalations

AssumeRoleWithSAML (SAML 2.0)
  • Works cross-acount
  • 15 minutes - 1 hour
  • Does not require AWS credentials (SAML response is cryptographically signed)
  • Uses off-the-shelf software
  • Used for corporate Single Sign On
  • Must configure SAML Identity Provider first
  • Supports policy scoping
  • RoleSessionName is visible in CloudTrail so use correct value for traceability
  • Use cases
    • Enterprise organizations who have software that produces SAML assertions
      • Active Directory -> Active Directory Federation Services
    • Used for corporate Single Sign On (e.g. Isengard)

AssumeRoleWithWebIdentity
  • Works cross-account
  • 15 minutes - 1 hour
  • Does not require AWS credentials
  • Obsoleted by Cognito for mobile scenarios
  • Supports Policy Scoping
    • Request is not signed so make sure no intermediate can alter the policy
  • Use cases
    • Mobile and web users who do not have IAM users

GetSessionToken
  • Give temporary credentials for IAM user
  • 15 minutes - 36 hours
  • Requires AWS Credentials
  • Use cases
    • enforce MFA for privilage escalation
    • untrusted environments (web, mobile)

Single Sign On to the console

  • Temporary credentials can be used for sign-in
  • Endpoint: https://signin.aws.amazon.com/federation
  • Pass temporary credentials

AWS IAM - Federation


Federation 
  • trust relationship between external IdP and Identity Consumer (AWS)
    • i.e. "combining users between one domain (e.g. IAM) and another (e.g. AD)"
    • contrast with delegation (which grants permissions to other users to access your resources)
  • federated user - user managed (authenticated) outside of AWS account
    • e.g. in Active Directory 
  • Enterprise Identity Federation 
    • SAML 2.0
  • Web Identity Federation
    • FB
    • G+
    • Amazon
    • OpenID 2.0
  • Rationale
    • You do not need to manage unique credentials (e.g. IAM users)
      • You manage roles and policies (1:Many reuse)
    • Credentials centrally managed (one castle do defend)
    • Compliance (onboarding/offboarding)

Identity Provider
  • External system 
    • Stores identity information
    • "Speaks" federation protocol
    • Authenticates (various factors)
    • Authorizes
      • coarse grained: does not know details of the customer system
  • (AWS) Metadata about external IdP (model) is configured
    • OpenID Connect
      • Provider Url, Audience
    • SAML
      • Metadata (cryptographic details)

Identity Consumer
  • Customer application
  • Stores references to identity
  • aka Relying Party
    • i.e. relies on Identity Provider 
  • Authorization (fine grained)
    • Knows the details of the customer system

SAML (Security Assertion Markup Language)
  • Identity Provider (IdP) 
    • Active Directory Federation Service (ADFS)
    • Shibboleth
    • G Suite (SAML)
  • Service Provider (SP) - identity consumer
  • Setup
    • Exchange metadata in advance - establishes the contract
      • XML document
        • Encryption keys, signing certificates, endpoints
  • Usage
    • Trade SAML assertion for
      • Cryptographically trusted assertion
        • Uniquely identifes the user
        • Describes authorization information

SAML federation
  • Flow
    • User talks to Identity Provider (IdP)
    • IdP authenticates
    • IdP returns "SAML authentication response"
      • "This is the user in my identity store"
      • "I have authenticated the user"
      • "Here are some attributes about the user (assertions)"
    • Browsers POSTs the SAML response to AWS "sign-in" endpoint
      • AWS shows the roles available 
      • Call AssumeRoleWithSAML
    • Constructs the Console Url
  • Use cases
    • AWS Console/API access
    • AWS Services 
    • Cognito (User pool)

OpenID Connect (OIDC)
  • Sucessor to SAML
  • OpenID Provider (OP) - identity provider
    • Social Identity Providers
      • FB, G+, Login with Amazon
  • Relying Party (RP) - identity consumer
  • Setup
    • Exchange metadata in advance
      • where its endpoints are
    • Register RP with OP
  • Tokens
    • ID Token - user identity
    • Access Token - can be used to call-out to APIs
    • Refresh Token - allows to renew "Access Token"

AD/Kerberos Federation
  • Flow
    • On-premises: AD
      • Setup users
    • AWS: Directory Service for Microsoft AD
      • Setup groups
    • Pre-establish trust
      • i.e. Forest trust
    • Kerberos-enabled resource
      • Windows
      • SQL Server
      • Work*

Cross-Account access (XA)
  • Switch role

Custom federation broker (proxy)
  • Build Custom Federation Proxy
    • It uses its own AWS access keys
    • Scoping policy
      • restrict access (broker has wider credentials)
  • AWS Console Federation using Custom Broker (CFP)
    • Flow
      • Browser makes request to CFP
      • Proxy authenticates users with Corporate Directory (CD)
      • CFP enumerates user's groups in CD (or other attributes)
      • CFP lists roles in AWS account
      • CFP chooses which role to assume (or asks user)
      • CFP calls AssumeRole
      • CFP generates Console Url and redirects the user
  • AWS API Federation using Custom Proxy
    • Flow 
      • Uses GetFederationToken (restrict policy scope)
      • Command Line App makes request to CFP
      • CFP authenticates the user
      • CFP gets back entitlements (i.e. IAM Policies)
      • CFP user must have union of all permissions for Federated Users
  • Legacy mechanism (not recommended by AWS)
    • Use SAML or OIDC


References

AWS IAM


Principal Type
  • AWS Account (root)
  • IAM User
  • Federated User
    • Web Federated User (Login with Amazon, Cognito, Facebook, Google)
    • SAML federated user
  • Assumed Role
  • Role assigned to EC2 instance
  • Anonymous

Credentials
  • Login Profile (username, password)
  • Access Keys
    • API: ListAccessKeys for a given user
      • CreateDate
      • Status (Active/Inactive)
    • Start with "AKIA..."
  • X.509
    • Required by certain EC2 CLI tools and AMI bundle (instance store) tools
    • Rarely used
  • CloudFront key-pair
  • Temporary Access Keys
    • Generated by STS
    • Start with "ASIA..."
    • Must be accompanied by "Session Token"

Role
  • Can be assumed by various entities
  • 2 Policies
    • Trust - "who can assume the role?"
      • e.g. other AWS account
    • Access - "what can principal do?"
      • (e.g. "can upload to S3")
  • Does not have credentials associated
  • Use cases
    • Role for IAM user in other AWS account I own
      • IAM user from other AWS account can access (e.g. cross-account for AWS accounts I own)
    • Role for IAM user for Third Party company that performs a service (e.g. Skeddly)
      • ExternalId should be part of policy condition to prevent "confused deputy" attack
    • Role for AWS service
      • EC2 (Instance Profile), Data Pipeline, Elastic Transcoder, OpsWorks
      • Alternative: Service Linked Role
    • Role for Identity Federation

Service Linked Role
  • Supported by subset of AWS Services (called Linked Services)
    • Lex
    • ElasticBeanstalk
  • AWS manages permission necessary for the Linked Service to work (managed policy)
    • Impossible to remove rqeuired permissions 
  • Trust policy (who can assume the role) cannot be modified
    • Only Linked AWS Service can assume

Identifiers
  • Each entity has unique ID (e.g. "AIDAJQABLZS4A3QDU576Q")
    • Can be used to disambiguate
  • Friendly Name (e.g. "jsmith", "Administrators", "CloudWatchPolicy"
  • Paths - can be used to indicate division/deparment
    • No semantics attached (e.g. users with the same path do not share group)

Policy Elements
  • Version  (optional) - language version 
  • Id (optional) - required for some AWS services (e.g. SQS, SNS)
  • Statatement (mandatory) - main element
    • Sid (optional) - sub-identifier if Id is used
      • For IAM policy basic alphanumeric string
      • Some services may require it to be unique     
    • Effect
      • allow
      • deny
    • Principal
      • IAM Role Trust Policy - entity who can assume the role
      • Resource Based Policy - entity who can access the resource
        • e.g. S3, SQS, SNS, Glacier, KMS
      • IAM User/Group - not required (implicit - "entity to which the policy is attached to")
    • NotPrincipal - use for exceptions with deny (whitelisting)
    • Action
    • Resource
    • NotResource
    • Condition
      • Multi-key values are OR'ed
      • Conditions are AND'ed with each other (i.e. you must satisfy all off them)
        • e.g. MFA required AND Source IP

Policy Variables
  • Generalizes policy so that it can apply to mulptiple entities (e.g. IAM users)
    • ${aws:username}
    • Use Case: home folders in S3
  • Use cases
    • Resource
    • Condition 
      • String Operators: StringEqualsStringLikeStringNotLike, etc.
      • ARN operators: ArnEqualsArnLike, etc.
  • Variables
    • aws:username, aws:userid, aws:UserAgent, aws:SourceIp, aws:principalType, etc.

Policy Conditions
  • Optional element
  • Operators
  • Keys 
    • Global
      • aws:CurrentTime
      • aws:EpochTim
      • aws:TokenIssueTime
    • Service specific
      • S3 (examples)
        • s3:x-amz-acl
          • on PUT must specify canned permisions
        • s3:x-amz-server-side-encryption
          • on PUT must specify the header (i.e. encrypt)
        • s3:x-amz-storage-class
          • on PUT enfore storage class
      • EC2 (examples)
        • ec2:Region
        • ec2:InstanceType

Policy Evaluation
  • Deny overrides Allow

Effective Permissions (privilege escalation)
  • PutUserPolicy (I can modify my own priviliges)
  • CredentialCreation (requesting temporary credentials)
  • PassRole (I can launch my own instance - pass it a high privilege role and get temporary credentials out of it)

Policy Simulator
  • Tool to test policies (existing or new)
    • Does the user have access to action on a resource
      • Explains why he he is allowed/denied access

Decoding Authorization Message
  • Details about authorization failure (403)
  • Can be decoded with STS (sts:DecodeAuthorizationMessage)

Misc
  • Resource-specific policy vs. Tag based policy
    • Resource - very fine grained control
    • Tag based - logical group (e.g. for a project)


References

Saturday, 7 May 2016

AWS S3 - Security


Encryption
  • Metadata is never encrypted
  • Server Side (SSE)
    • Possible to enforce with bucket policy (only encrypted data can be uploaded)
    • SSE-S3
      • S3 manages keys (AES-256)
    • SSE-KMS
      • More flexible than SSE-S3 but additional charges (for KMS)
      • Customer can manage or use default KMS key generated for him (aws/s3)
      • ETag is not MD5 hash anymore (as it would be security hole)
      • Headers
        • x-amz-server-side-encryption = aws:kms
        • x-amz-server-side-encryption-aws-kms-key-id
        • x-amz-server-side-encryption-context (do not use sensitive data here)
    • SSE-C
      • Customer provides the key
      • Different objects(versions) may have different key
      • Headers
        • x-amz-server-side​-encryption​-customer-algorithm = AES256
        • x-amz-server-side​-encryption​-customer-key
        • x-amz-server-side​-encryption​-customer-key-MD5
  • Client Side (CSE)
    • Can be used to store sensitive configuration
    • Integrates with KMS
    • Encryption is opaque to S3

Permissions
  • Places where you setup access permissions
    • Bucket Policy
      • Max 20 kB
    • ACL
      • Bucket ACL
      • Object ACL
    • User IAM Policy
  • Authorities
    • Parent Account Owner
    • Bucket Account Owner
    • Object Account Owner
  • User Context
    • Only when IAM user
  • Bucket Context
  • Object Context
    • Bucket Account Owner can deny access

ACL
  • Bucket and object level
  • Default ACL: grants owner full permissions
  • Max 100 grants per ACL
  • Grantee
    • AWS account
      • can be identified by email address
      • Cannot grant permissions to IAM users
    • Predefined AWS Group
      • Authenticated Users (all AWS accounts) - must have Authentication header
      • All Users (includes Anonymous)
      • Log Delivery Group (WRITE permission enables storing S3 logs)
  • Permissions
    • READ
      • Bucket
        • ListBucket, ListBucketVersions, ListBucketMultiPartUploads
      • Object
        • GetObject, GetObjectVersion, GetObjectTorrent
    • WRITE
      • Bucket
        • PutObject, DeleteObject, DeleteObjectVersion (only when grantee is owner)
    • READ_ACP (read bucket/object ACL)
      • Bucket
        • GetBucketACL
      • Object
        • GetObjectACL, GetObjectACLVersion
    • WRITE_ACP (change bucket/object ACL)
      • Bucket
        • PutBucketACL
      • Object
        • PutObjectACL
  • Canned ACL (predefined grants)
    • private
    • public-read
    • public-read-write
    • aws-exec-read
    • authenticated-read
    • bucket-owner-read
    • bucket-owner-fullcontrol
    • log-delivery-write
  • Use cases
    • Generally prefer Bucket Policy and IAM policy (ACL is legacy mechanism)
    • LogDeliveryGroup must use ACL
    • Bucket Policy limit reached (20kb)
    • Wide variety of permissions on objects (cannot be captured by policy easily)

Pre-signed urls
  • Example
    • https://s3.amazonaws.com/examplebucket/test.txt
      ?X-Amz-Algorithm=AWS4-HMAC-SHA256
      &X-Amz-Credential=<your-access-key-id>/20130721/us-east-1/s3/aws4_request
      &X-Amz-Date=20130721T201207Z
      &X-Amz-Expires=86400
      &X-Amz-SignedHeaders=host
      &X-Amz-Signature=<signature-value>  
  • Uploading encrypted object
    • SSE-KMS
    • SSE-S3
    • SSE-C (customer specified key)
      • restricts that upload to specific encryption key
  • Use cases
    • Temporary access to a file (max 7 days)
    • Upload to a bucket without having any AWS credentials
    • Communication mechanism in CloudFormation
      • Signaling
        • CreatePolicy - Signalling
      •  WaitCondition/WaitHandle
  • Generating
    • Anyone with valid security credentials can create pre-signed url
      • It will only work if my permissions actually allow to upload (otherwise there would be privilage escalation)
    • Java SDK supports creation

CORS

  • Cross-origin access to mitigate JavaScript SOP restrictions
    • Preflight (OPTIONS) request to determine access rights
  • Configured on bucket
  • CORSRule
    • Allowed Origin (i.e. requestor domain)
    • Allowed Methods (GET, PUT, POST, ...)
    • Allowed Headers (in the preflight request which headers requestor may ask for)
    • Expose Headers (which headers can be read on the client side)
    • MaxAgeInSeconds - how long preflight response can be cached
  • Use Cases
    • Auto-complete
    • Drag'n'Drop upload to S3
    • Upload progress
    • Update content directly from JS
    • Serving Web Fonts

AWS S3

Model
  • Resources
    • Bucket - subresources
      • website 
      • versioning 
      • bucket policy
      • ACL
      • CORS
      • logging
      • event notifications
  • Object - subresources
    • ACL
    • restore (when using Glacier restore)
Limits
  • 100 buckets per account (soft limit)

Bucket Addressing
  • virtual hosting style
  • path style
    • https://s3-eu-west-1.amazonaws.com/BUCKET/FILE 
      • must specify correct region    
      • s3.amazon.aws.com refers to us-east-1
      • when wrong region specified you get "301 Moved Permanently"
  • name globally unique
    • must be DNS compliant (except for us-east)
    • may contain "."

Request Redirects
  • DNS used to route to S3 nodes - temporary errors may occur
  • Must resend request to different endpoint
    • Do not reuse temporary redirect as it may fail in future
  • Typically happen when bucket just created/deleted
  • Permanent redirect - addressed bucket incorrectly (see bucket addressing)
  • Use 100 - Expect Continue for PUT requests
    • Server decides based on headers if it can accept the requests
    • Avoid unnecessary work if the request is to be redirected anyway

Requester Pays
  • Owner pays for storage, requester pays for data transfer and requests
  • Requester includes x-amz-request-payer header
  • Requester must be authenticated
  • Not allowed for anonymous access
    • In particular not allowed for static website hosting

DevPay
  • "Tenant pays"
  • Charging for S3 based products
    • Once a month Amazon bills your customers
    • Deducts fixed transaction fee and gives you the rest
    • Amazon charges for my S3 costs + Dev Pay percentage fee
    • If customers do not pay - their access is cut-off
  • Customer data is isolated (cannot be accessed directly via S3 API)
  • DevPay Tokens
    • Product token - identifies the app
    • User token - identifies the user to be charged

Storage Classes
  • STANDARD - default (S)
    • $0.03 / GB
    • $0.004 per 10,000 GET requests
    • $0.005 per 1,000 PUT requests
  •  STANDARD_IA - infrequently accessed (IA)
    • Suitable for long lived infrequently accessed data 
    • Requests more expensive than (S)
    • Minimum object size: 128kB
    • Minimum 30 days storage charge
    • $0.0125 / GB
    • $0.01 per 10,000 GET requests
    • $0.01 per 1,000 PUT requests
  • GLACIER (G)
    • Retrieval takes many hours (async job)
    • Cannot be used for initial upload
      • can be used as lifecycle target
      • alternatively upload directly via Glacier API (but S3 does not see it then)
    • Object must be restored before access
    • Object visible in S3
    • $0.007 / GB
  • REDUCED_REDUNDANCY (RR)
    • Sustains concurrent loss of 2 replicas
    • 99.9% durability (1/10000 can be lost per year)
    • $0.0240 / GB
    • $0.004 per 10,000 GET requests
    • When object is lost AWS returns 405 method not allowed   
      •     410 (Gone) - would be inappropriate as owner may decide to re-upload
  • Storage class can be changed
    •     PUT object copy (request)
      • Destination is the same as source (x-amz-copy-source)
      • Indicate it is a copy by using directive: x-amz-directive: COPY
      • Set x-amz-storage-class: [STANDARD, REDUCED_REDUNDANCY]

Restoring from Glacier
  • Job takes 3-5h
  • Specify number of days to keep the restored file
    • Possible to modify later (until expires)
  • Restored objects charged for both S3 and Glacier

Versioning
  • Multiple versions of file 
  • Enabled on bucket level (versioning-enabled)
  • Cannot be disabled
  • Can be suspended (versioning-suspended)
    • stop accruing objects
    • delete can only remove object with (null) versionId otherwise delete marker is inserted
  • Each object gets unique versionId (1024 bytes string)
  • Listing
    • versions treated as separate objects
  • Deleting
    • S3 inserts DELETE MARKER
    • You can specify versionId to retrieve it
    • To permanently delete specify versionId

Cross-region replication
  • Enabled on bucket level
    • Subset of objects can be replicated
  • Asynchronous copy of all S3 objects from (S)ource bucket to (D)estination bucket
    • Can override storage class
    • Takes up to several hours
  • Use cases
    • Compliance
    • Minimize latency
    • Operations (e.g. computing on the same set of resources)
    • DR
  • Requirements
    • Versioning must be enabled
    • (S) and (D) must be in different regions
    • Permissions need to be setup
  • Not replicable
    • Retroactive objects (i.e. created before configuration enabled)
    • Objects encrypted with SSE-C or SSE-KMS
    • Bucket subresources
      • e.g. Lifecycle configuration
    • Non-customer actions (e.g. lifecycle inserts delete marker)
    • Replicas from other buckets  (i.e. not transitive)
  • Status
    • GET Includes "x-amz-replication-status" header in response
      • S: PENDING, COMPLETED, FAILED
      • D: REPLICA

Server Access Logging
  • Require Log Delivery ACL on a bucket
  • Best effort

Bit Torrent
  • Speeds-up large and popular object files
  • .torrent file - bootstrap information for the file
    • Use s3-path?torrent
  • Need BitTorrent client
  • Every anonymous object available for download
  • S3 acts as a "seeder"

Performance
  • No special action needed: < 100/s {PUT, LIST, DELETE } && < 300/s GET
  • Rapid increase: ask S3 Team to pre-wam (submit support case)
  • Key name dictates the S3 partition
    • Unlike DynamoDb it does not shard on key hash
      • Hence "List" is available
    • Objects are stored lexicographically across partitions
    • Avoid sequential keys (timestamps, ids)
      • They start with the same prefix and land on the same partition
      • Shard keys (MD5 - 4 digit hash)
        • Listing becomes very expensive (scan)
      • Group objects by key names (animations, videos)
      • Reverse the key name for better distribution of initial characters
        • e.g. userId=1234 -> 4321
  • Transfer optimizations 
    • TCP window scaling (increase initial receive windows WSCALE)
    • TCP selective acknowledgment - speed-up recovery after large packet loss

Data Consistency Model
  • Updates are atomic but on single key only
  • Latest PUT wins
  • Read-after-write consistency for NEW objects
    • Since 2015/06/19 also for US-Standard (use Northern Virginia endpoint)
  • Eventually consistent (may return stale data)
    • list newly written NEW object 
    • read-after-write for EXISTING objects (i.e. overwrite)
    • read-after-delete
    • list deleted object

Multi-part upload
  • Recommended for size > 100 MBs
  • Process
    • Split file locally
    • Initiate (S3 returns uploadId)
    • Upload each part  (1-10000)
      • Specify part number (1-10000)
        • Determines order
        • May not be contigous
      • Can be done in parallel
    • Finalize for each part number (Etag/part number)
  • Orphaned uploads may be cleaned-up with lifecycle rule (since 2016/03/19)

Static Website Hosting
  • Supports GET and HEAD requests (no POST)
  • Public only content 
  • On error returns HTML (not XML like S3 REST API)
  • Supports redirects (object and bucket level)
  • Supports root documents (e.g. index.html)
  • Does not support SSL
  • Option to redirect all requests to different hostname

Event Notifications
  • Bucket configuration
  • Types
    •  object created (Put,Post,Copy,CompleteMultiPlartUpload)
    •  object removed (Delete, DeleteMarkerCreated)
    •  object loss detected (RRSObjectLost)
  • Filters (optional)
    • Prefix
    • Suffix
  • Target
    • SNS, SQS, Lambda

Monitoring
  • Per-bucket CW metrics
    • BucketSizeBytes
    • NumberOfObjects

Lifecycle configuration for versioned enabled buckets
  • Acts as Recycle Bin
  • Action on Current Version
    • Transition to the Standard-Infrequent Access
      • X days after the object's creation date
    • Archive to the Glacier Storage Class
      • X days after the object's creation date
    • Expire
      • Expiring current version will generate new version
  • Action on Previous Version
    • Transition to the Standard-Infrequent Access
      • X days after object becoming a previous vesion
    • Archive to the Glacier Storage Class
      • X days after object becoming a previous vesion
    • Permanently Delete
      • X days after object becoming a previous vesion


References

Thursday, 5 May 2016

AWS CloudFront

Model
  • Distribution
    • Web: DNS starts with "d" (download)
    • RTMP: DNS starts with "s" (streaming)
  • Origin - place where the authority files are stored
    • S3
    • Custom Web Server
  • Behavior
    • How CF behaves when receives request
    • Path pattern - specifies requests the behaviors applies to
    • Examples
      • Forward Headers/Cookies
      • Minimum/Default/Maximum TTL
      • Restrict Viewer Access (signed Urls only)
  • Integrated with WAF

Forwarding Requests
  • Origin does not see all request data 
  • Forwardable
    • Headers (All, Whitelisted)
    • Cookies
    • Query parameters
  • Forwarding allows caching different object version based on value
    • Increases memory footprint
  • Use cases
    • Prevent hotlinking
    • Allow CORS for everyone

TTL
  • Obey origin response headers (Cache-Control max-age, s-max-age, Expires)
    • max-age is recommended
  • Behavior can override: mininum TTL, default TTL, Maximum TTL
    • e.g. when Origin does not set it properly
  • TTL-0
    • Used for Dynamic Content
    • CloudFront still caches the content
    • Makes GET If-Modified-Since every time
      • gives origin a chance to signal content hasn't changed
      • this saves bandwidth as Origin does not have to resend the page

Origin
  • S3
    • Origin Access Identity (OAI)
      • special CF user associated with customer Distribution-Origin
        • "Principal":{"CanonicalUser":"79a59d8f8d5218e7cd47ef2be"},
      • change S3 bucket policy to only allow OAI
  • Custom (customer own Web Server)
  • Multiple origins
    • First match (based on path) wins
    • Requires cache behavior for each origin

Signing
  • Restrict access with signed urls or/and signed cookies
    • Date/Time
    • IPs
    • Requires: CloudFront Key-Pair
  • Signed Url
    • Restrict access to individual files
    • Query parameters: Expires, Policy, Signature, Key-Pair-Id
  • Signed Cookies
    • Not supported for RTMP
    • No need to change urls
    • Restrict access to multiple files at once
      • e.g. HLS stream (multiple file segments)
    • User authenticates on customer site which sets his Signed Cookie
  • Process
    • Create public-private key-pair
    • Upload to account (Console)
    • Indicate which AWS accounts can sign (Trusted Signer)
    • Create policy document (i.e. rules of access)
      • SHA1 of policy document signed with private key
    • Include encoded policy document + signature as query string parameters
    • CloudFront verifies policy/signature on access
  • Account Id added to 
    • Web - behavior  (can have multiple behaviors)
    • RTMP - distribution

Trusted Signer
  • AWS account with an active CloudFront Key Pair
  • Key Pair allowed for root account only (not IAM user)
  • Max 2 active key pairs at a time
  • Possible to upload your own RSA key

Geoblocking
  • Built-in: country-level (~99.8% accuracy)
  • ThirdParty
    • Use your webserver to build links 

Compression
  • Supported natively 
  • Compressed by edge locations
  • Compressible files 1,000 bytes - 10,000,000 bytes
  • ETAGs are stripped (as "compressed vs non-compressed" should have different values)
  • Enabled on Behavior
  • Custom Origin Compressions
    • Still use when file type not supported by CF

Invalidation
  • Expensive
  • Supports wildcards
    • e.g. "/images/hi-res/*"

SSL
  • Custom SSL certificates 
    • Dedicated IP - 600$/month
    • SNI - only newer browsers support them
  • Supports Redirection HTTP->HTTPS on the edge
  • Communitcation to Origin may use SSLv3, TLSv1, TLSv1.1, TLSv1.2
    • Match Viewer
    • Enforce HTTPS
    • Enforce HTTP

Pricing
  • Classes 
    • All (us, eu, ap-northeast, ap-southeast-1, ap-southeast-2, sa-east)
    • 200 (us, eu, ap-northeast, ap-southeast-1)
    • 100 (us, eu)
    • Viewers in locations not covered in price class see larger latency
  • Reserved Capacity available

Act as reverse proxy
  • May sit in front of dynamic website
    • cache only certain portions based on rules
    • works like Varnish

Header manipulation

  • Custom Headers can be added/overridden
  • Use case:
  • Add  X-Shared-Secret=****** to allow Origin verify the request is from CF

Wednesday, 4 May 2016

AWS Kinesis Firehose

Simplifies usage of Kinesis Streams by delivering data directly to target (no need to write Consumer)

Model
  • Delivery Stream - main entity
    • No need specify shards/partition keys
  • Data record - 1000 KB
  • Destination
    • S3 bucket
      • records are concatenated into larger objects
      • compression: gzip, zip, snappy
      • needs IAM role
      • Supports SSE-KMS
    • Redshift table
      • uses intermediate S3 bucket
      • issues COPY command continously
        • no error-tolerance
        • skipped objects are written to manifest file (S3)
      • Compression: gzip
  • At-least-once semantics - duplicates possible (like SQS)
  • Retention: 24h (if destination is not available)
    • Retries are automatic

Amazon Kinesis Agent
  • Monitors files and sends records to Kinesis Firehose
  • Handles file rotation, checkpointing
  • Similiar to CloudWatch Agent (Logs)
  • Also works with Kinesis streams

Buffer

  • Size (1MB-128MB)
  • Time (60s-900s (15m)
  • Buffer may be raised if delivery falls behind

AWS SNS (Push)

Overview
  • Supports Mobile and Web push notifications
  • Opt-in
    • Not required by SNS (unlike other subscription types)
    • Required by OS (iOS, Android, Kindle Fire)

Mobile Push notifications
  • Appears on mobile device (notification)
  • Device must have an app installed
  • Received even if the app is not running
  • When you click on such notification the app is typically started on mobile device
  • Cheaper than SMS and better experience
  • Each installed app must be registered with SNS
  • Examples: Pingdom alerts, HipChat notifications

Web Push notifications
  • Notification displayed by the browser
  • Supported by most major browsers

Push Notification Service (PNS)
  • Platforms
    • Apple Push Notification Service (APNS)
    • Google Cloud Messaging for Android (GCM)
    • Amazon Device Messaging (ADM)
    • Windows Push Notification Service (WNS) for Windows Phone 8+ and 8.1+
    • Microsoft Push Notification Service (MPNS) for Windows Phone 7+
    • Baidu Cloud Push for Android in China
  • PNS maintains connection with app@device
  • AWS object ("Application") must be created for each platform
    • CreatePlatformApplication
      • PlatformPrincipal
      • PlatformCredentials

Device Token 
  • Created by PNS 
  • Alternative names
    • Registration Id
  • App receives it when it registers itself with PNS
  • App must send it to the Publisher system (i.e. "our server")
    • POST to Proxy server
    • Cognito - registers directly with SNS
  •  Publisher registers app@device with SNS
    • CreatePlatformEndpoint
  • can be subscribed to any number of topics
  • can be target of direct publish (aka "direct addressing")
  • it is like "phone number" in SMS
  • Registering existing device token with SNS
    • AWS Console (single)
    • CreatePlatformEndpoint (several)
    • AWS Console CSV (bulk)

PNS Device Token Feedback
  • Similar to email "hard-bounce" 
  • SNS automatically handles it
    • Disables the endpoint and notifies about the event
    • GCM returns new token: SNS updates existing

TTL
  • useful for time-sensitive messages
  • how much time in seconds PNS has to deliver the message 
    • e.g. drop if device is turned-off
    • relative to Publish time
  • dwell time
    • time between publish and hand-off to PNS
  • Default: 4 weeks
  • 0 value - if no special meaning for PNS - drop the message

Delivery Status
  • Collect feedback on successful and unsuccessful delivery attempts
  • Specify separate IAM roles for success/failure
  • Specify sampling rate (0,100)
  • AWS writes to CloudWatch Logs
    • Create Metrics/Filters

References
  • https://caniuse.com/#feat=push-api