Tuesday, 3 April 2018

AWS CodePipeline

Overview
  • CI/CD service

Definitions
  • Continous Integration
    • Integrate work frequently
    • Automated building and testing
  • Continous Delivery
    • Automated release process 
    • Release frequently (at extreme on every commit)
    • Typically manual sign-off required before pushing to Production
  • Continous Deployment
    • Like Continous Delivery but no sign-off for Production
    • Requires very reliable automated testing

Model
  • Pipeline
  • pipeline
  • Revision
  • pipeline
  • Stage Transition
    • On completion pipelines transitions to the next stage
    • Transition can be disabled (this is how you disable a stage)
  • Artifact - object
    • Source, jar, result of a build, etc.

Stage
  • Pipeline contains stages (at least 2)
  • Unique name in the workflow
  • Processes only one revision at a time
    • Revisions processed by previous stage are batched
  • Contains sequence of actions (1+)
    • All must complete OK for the Stage to be completed
  • Examples
  • deployments

Action
  • Element of a Stage
  • May run sequentially or in parallel
  • May have input artifact
  • May have output artifact (unique name)
    • Must match input artifact of the next action
  • Examples
    • Source Action (must be the first one)
    • Build Action
    • Deploy Action
Custom Action
  • Type
    • Build
    • Deploy
    • Test
    • Invoke function
  • Similar to SWF
  • Pipeline

Providers
  • Source Provider
  • code, e.g.
  • Deployment Provider
    • CodeDeploy
    • EB
  • Build And Test Providers
    • CodeBuild, Jenkins, CloudBees (see Integrations)

Integrations
  • AWS
    • S3
    • CloudTrail
    • Code*
    • ElasticBeanstalk
    • KMS
  • Third Party
    • GitHub
    • Jenkins (plugin)
    • Many others
References
  • http://www.stelligent.com/build/aws-codepipeline-released-and-there-was-much-rejoicing/

Friday, 23 March 2018

AWS SNS (SMS)


Overview
  • Worldwide SMS delivery
  • Sending modes
    • Direct
    • Bulk (topic subscribers)
  • DisplayName (first 10 characters) used in text message
  • Phone numebr in E.164 format (e.g. +48XXXYYYZZ)
  • Character limit
    • 160 GSM characters
    • 140 ASCII
    • 70 for Unicode (UCS-2)
  • Type
    • Promotional
    • Transactional

Opt-out
  • Certain countries require it (e.g. US, Canada)
  • Send STOP or QUIT to unsubscribe

Short code
  • 5-6 digit
  • By default AWS assigns a shared code
  • Dedicated code
    • Assigned exlusively to sender
    • Supports higher throughput
  • Recipient may reply to the message (e.g. opt-out)

Sender Id
  • 11 alphanumeric character
    • e.g. brand name
  • Only certain countries supported
    • e.g. US is not


Monitoring
  • Sent
  • Failed
  • Delivery rate (Sent/Failed)

AWS S3 (Security)


Encryption
  • Metadata is never encrypted
  • Server Side (SSE)
    • Possible to enforce with bucket policy (e.g. only encrypted data can be uploaded)
    • SSE-S3
      • S3 manages keys (AES-256)
    • SSE-KMS
      • More flexible than SSE-S3 but additional charges (for KMS)
      • Customer can manage or use default KMS key generated for him (aws/s3)
      • ETag is not MD5 hash anymore (as it would be a security hole)
      • Headers
        • x-amz-server-side-encryption = aws:kms
        • x-amz-server-side-encryption-aws-kms-key-id
        • x-amz-server-side-encryption-context (do not use sensitive data here)
    • SSE-C
      • Customer provides the key
      • Different objects(versions) may have different key
      • Headers
        • x-amz-server-side​-encryption​-customer-algorithm = AES256
        • x-amz-server-side​-encryption​-customer-key
        • x-amz-server-side​-encryption​-customer-key-MD5
    • Default Encryption
      • Feature to have S3 automatically encrypt the object (SSE-S3, SSE-KMS)
  • Client Side (CSE)
    • Encryption is opaque to S3 (just a blob)

Permissions
  • Places where you setup access permissions
    • Bucket Policy
      • Limited document size
    • ACL
      • Bucket ACL
      • Object ACL
    • User IAM Policy
  • Authorities
    • Parent Account Owner
    • Bucket Account Owner
    • Object Account Owner
  • User Context
    • Only when IAM user
  • Bucket Context
  • Object Context
    • Bucket Account Owner can deny access

ACL
  • Bucket and object level
  • Default ACL: grants owner full permissions
  • Max 100 grants per ACL
  • Grantee
    • AWS account
      • can be identified by email address
      • Cannot grant permissions to IAM users
    • Predefined AWS Group
      • Authenticated Users (any AWS account) - must have Authentication header
      • All Users (includes Anonymous)
      • Log Delivery Group (WRITE permission enables storing S3 logs)
  • Permissions
    • READ
      • Bucket
        • ListBucket, ListBucketVersions, ListBucketMultiPartUploads
      • Object
        • GetObject, GetObjectVersion, GetObjectTorrent
    • WRITE
      • Bucket
        • PutObject, DeleteObject, DeleteObjectVersion (only when grantee is owner)
    • READ_ACP (read bucket/object ACL)
      • Bucket
        • GetBucketACL
      • Object
        • GetObjectACL, GetObjectACLVersion
    • WRITE_ACP (change bucket/object ACL)
      • Bucket
        • PutBucketACL
      • Object
        • PutObjectACL
  • Canned ACL (predefined grants)
    • private
    • public-read
    • public-read-write
    • aws-exec-read
    • authenticated-read
    • bucket-owner-read
    • bucket-owner-fullcontrol
    • log-delivery-write
  • Use cases
    • Generally prefer Bucket Policy and IAM policy (ACL is legacy mechanism)
    • LogDeliveryGroup must use ACL
    • Bucket Policy document limit reached
    • Wide variety of permissions on objects (cannot be captured by policy easily)
    • Used in conjuntion with Requester Pays

Pre-signed urls
  • Example
    • https://s3.amazonaws.com/examplebucket/test.txt
      ?X-Amz-Algorithm=AWS4-HMAC-SHA256
      &X-Amz-Credential=<your-access-key-id>/20130721/us-east-1/s3/aws4_request
      &X-Amz-Date=20130721T201207Z
      &X-Amz-Expires=86400
      &X-Amz-SignedHeaders=host
      &X-Amz-Signature=<signature-value>  
  • Uploading encrypted object
    • SSE-KMS
    • SSE-S3
    • SSE-C (customer specified key)
      • restricts that upload to specific encryption key
  • Use cases
    • Restricted download
      • e.g. temporary access to a file (max 7 days)
    • Restricted upload
      • e.g. having any AWS credentials
    • Communication mechanism in CloudFormation
      • Signaling
        • CreatePolicy - Signalling
      •  WaitCondition/WaitHandle
  • Generating
    • Anyone with valid security credentials can create pre-signed url
      • It will only work if my permissions actually allow to upload (otherwise there would be privilage escalation)
    • Java SDK supports creation

CORS
  • Cross-origin access to mitigate JavaScript SOP restrictions
    • Preflight (OPTIONS) request to determine access rights
  • Configured on bucket
  • CORSRule
    • Allowed Origin (i.e. requestor domain)
    • Allowed Methods (GET, PUT, POST, ...)
    • Allowed Headers (in the preflight request which headers requestor may ask for)
    • Expose Headers (which headers can be read on the client side)
    • MaxAgeInSeconds - how long preflight response can be cached
  • Use Cases
    • Auto-complete
    • Drag'n'Drop upload to S3
    • Upload progress
    • Update content directly from JS
    • Serving Web Fonts

VPC Endpoint
  • Allows direct access to S3 from VPC
  • Use case
    • Bypass public Internet
  • Policies
    • S3 bucket policy - who can access me (aws:SourceVpc and aws:SourceVpce)
    • Endpoint policy - whom can I access (e.g. my own buckets only)
  • No need to change DNS name
    • Internally requests are routed differently
  • See also: VPC (Endpoint)

Macie
  • AWS managed service to scan/categorize data in S3
  • See also: Macie

AWS S3

Model
  • Resources
    • Bucket - subresources
      • website 
      • versioning 
      • bucket policy
      • ACL
      • CORS
      • logging
      • event notifications
  • Object - subresources
    • ACL
    • restore (when using Glacier restore)
Limits
  • 100 buckets per account (soft limit)

Bucket Addressing
  • virtual hosting style
  • path style
    • https://s3-eu-west-1.amazonaws.com/BUCKET/FILE 
      • must specify correct region    
      • s3.amazon.aws.com refers to us-east-1
      • when wrong region specified you get "301 Moved Permanently"
  • name globally unique
    • must be DNS compliant (except for us-east)
    • may contain "."

Request Redirects
  • DNS is used to route to S3 nodes - temporary errors may occur
  • Must resend request to different endpoint
    • Do not reuse temporary redirect as it may fail in future
  • Typically happen when bucket just created/deleted (eventual consistency)
  • Permanent redirect - addressed bucket incorrectly (see bucket addressing)
  • Use 100 - Expect Continue for PUT requests
    • Server decides based on headers if it can accept the requests
    • Avoid unnecessary work if the request is to be redirected anyway

Requester Pays
  • Owner pays for storage, requester pays for data transfer and requests
  • Requester includes x-amz-request-payer header
  • Requester must be authenticated by AWS
    • Not allowed for anonymous access
      • In particular not allowed for static website hosting

DevPay
  • "Tenant pays"
  • Charging for S3 based products
    • Once a month Amazon bills your customers
    • Deducts fixed transaction fee and gives you the rest
    • Amazon charges for my S3 costs + Dev Pay percentage fee
    • If customers do not pay - their access is cut-off
  • Customer data is isolated (cannot be accessed directly via S3 API)
  • DevPay Tokens
    • Product token - identifies the app
    • User token - identifies the user to be charged

Storage Classes
  • STANDARD - default (S)
    • Design for Hot/Temporary data
  •  STANDARD_IA - infrequently accessed (IA)
    • Suitable for long lived infrequently accessed data 
    • Storage cheaper than (S)
    • Requests more expensive than (S)
    • Minimum object size: 128kB
    • Minimum 30 days storage charge (not suitable for temporary objects)
  • GLACIER (G)
    • Retrieval takes many hours (async job)
    • Cannot be used for initial upload
      • can be used as lifecycle target
      • alternatively upload directly via Glacier API (but S3 does not see it then)
    • Object must be restored before accessed
    • Object visible in S3
  • REDUCED_REDUNDANCY (RR)
    • Sustains concurrent loss of 2 replicas
    • Probably being phased-out (some regions do not support it)
    • 99.9% durability (1/10000 can be lost per year)
    • When object is lost AWS returns 405 method not allowed   
      •     410 (Gone) - would be inappropriate as owner may decide to re-upload
  • Storage class can be changed
    • PUT object copy (request)
      • Destination is the same as source (x-amz-copy-source)
      • Indicate it is a copy by using directive: x-amz-directive: COPY
      • Set x-amz-storage-class: [STANDARD, REDUCED_REDUNDANCY]

Restoring from Glacier
  • Specify number of days to keep the restored file
    • Possible to modify later (until expires)
    • Cheapest storage class used (e.g. RR)
  • Restored objects charged for both S3 and Glacier

Versioning
  • Multiple versions of file 
  • Enabled on bucket level (versioning-enabled)
    • Cannot be disabled
  • Can be suspended (versioning-suspended)
    • stop accruing objects
    • delete can only remove object with (null) versionId otherwise delete marker is inserted
  • Each object gets unique versionId (1024 bytes string)
  • Listing
    • versions treated as separate objects
  • Deleting
    • S3 inserts DELETE MARKER
    • You can specify versionId to retrieve it
    • To permanently delete specify versionId

Cross-region replication (CRR)
  • Enabled on bucket level
    • Subset of objects can be replicated (prefix)
  • Asynchronous copy of all S3 objects from (S)ource bucket to (D)estination bucket
    • Can override storage class
    • Takes up to several hours
  • Ownership
    • By default ACL is copied
      • (S) account owns the (D) object
      • Can be overridden 
  • Storage Class
    • Can be specified
      • DR may want STANDARD_IA
  • Bi-directional replication
    • Master-master
  • Use cases
    • Compliance
    • Minimize latency
    • Operations (e.g. computing on the same set of resources)
    • DR
  • Requirements
    • Versioning must be enabled
    • (S) and (D) must be in different regions
    • Permissions need to be setup
  • Not replicable
    • Retroactive objects (i.e. created before configuration enabled)
    • Objects encrypted with SSE-C
      • SSE-KMS is replicable
        • KMS key is regional so you specify what to use in (D)
    • Bucket subresources
      • e.g. Lifecycle configuration
        • (S) and (D) may have different
    • Non-customer actions (e.g. lifecycle inserts delete marker)
    • Replicas from other buckets  (i.e. not transitive)
  • Status
    • GET Includes "x-amz-replication-status" header in response
      • S: PENDING, COMPLETED, FAILED
      • D: REPLICA

Server Access Logging
  • Require Log Delivery ACL on a bucket
  • Best effort
  • Alternative
    • CloudTrail: Data Events

Bit Torrent
  • Speeds-up large and popular object files
  • .torrent file - bootstrap information for the file
    • Use s3-path?torrent
  • Need BitTorrent client
  • Every anonymous object available for download
  • S3 acts as a "seeder"

Performance
  • No special action needed: < 100/s {PUT, LIST, DELETE } && < 300/s GET
  • Rapid increase: ask S3 Team to pre-partition (submit support case)
  • Key name dictates the S3 partition
    • Unlike DynamoDb it does not shard on key hash
      • Hence "List" is available
    • Objects are stored lexicographically across partitions
    • Recommendation for high-scale workloads
      • Avoid sequential keys (timestamps, ids)
        • They start with the same prefix and land on the same partition
        • Shard keys (MD5 - 4 digit hash)
          • Listing becomes very expensive (scan)
        • Group objects by key names (animations, videos)
        • Reverse the key name for better distribution of initial characters
          • e.g. userId=1234 -> 4321
  • Transfer optimizations 
    • TCP window scaling (increase initial receive windows WSCALE)
    • TCP selective acknowledgment - speed-up recovery after large packet loss

Data Consistency Model
  • Updates are atomic but on single key only
  • Latest PUT wins
  • Read-after-write consistency for NEW objects
  • Eventually consistent (may return stale data)
    • list newly written NEW object 
    • read-after-write for EXISTING objects (i.e. overwrite)
    • read-after-delete
    • list deleted object

Multi-part upload
  • Recommended for size > 100 MBs
    • Max object size = 5TB
  • Process
    • Split file locally
    • Initiate (S3 returns uploadId)
    • Upload each part  (1-10000)
      • Specify part number (1-10000)
        • Determines order
        • May not be contigous
      • Can be done in parallel
    • Finalize for each part number (Etag/part number)
  • ETag not generally MD5 anymore (like in SSE encryption)
  • Orphaned uploads
    • Billed as normal objects
    • Not visible in S3 console
    • May be cleaned-up with lifecycle rule

Static Website Hosting
  • Supports GET and HEAD requests (no POST)
  • Public only content 
  • On error returns HTML (not XML like S3 REST API)
  • Supports redirects (object and bucket level)
  • Supports root documents (e.g. index.html)
  • Does not support SSL
    • Use CloudFront on top
  • Option to redirect all requests to different hostname

Event Notifications
  • Bucket configuration
  • Types
    •  object created (Put,Post,Copy,CompleteMultiPlartUpload)
    •  object removed (Delete, DeleteMarkerCreated)
    •  object loss detected (RRSObjectLost)
  • Filters (optional)
    • Prefix
    • Suffix
  • Target
    • SNS, SQS, Lambda

Lifecycle configuration for versioned enabled buckets
  • Acts as Recycle Bin
  • Action on Current Version
    • Transition to the Standard-Infrequent Access
      • X days after the object's creation date
    • Archive to the Glacier Storage Class
      • X days after the object's creation date
    • Expire
      • Expiring current version will generate new version
  • Action on Previous Version
    • Transition to the Standard-Infrequent Access
      • X days after object becoming a previous vesion
    • Archive to the Glacier Storage Class
      • X days after object becoming a previous vesion
    • Permanently Delete
      • X days after object becoming a previous vesion

Object Tagging
  • Way to organize data
    • more flexible than location (bucket/prefix) 
  • Max 10 tags per object
  • Can grant IAM policy permissions per Tag
  • Can be used in Lifecycle rules

S3 Inventory
  • Same set of metadata as LIST API
  • Format: CSV, ORC
  • Can be queried by Athena (S3 file)
  • Split between multiple files
    • Manifest files
      • manifest.json 
      • symlink.txt - Apache Hive compatible
  • Report: daily/weekly
  • Delivery to S3 bucket
  • Delivery Notification
    • On "checksum" written (last step)
    • SNS/SQS/Lambda
  • Eventualyl consistent
  • Pricing - cost half of LIST API

Storage Class Analysis
  • Daily report
  • Set of heuristics what is the appropriate Storage Class
    • Access
    • Retention
  • Provides lifecycle recommendation
  • Can be exported to BI tool
  • Additional pricing

Transfer Acceleration
  • Uses CloudFront edge network (in reverse)
  • Upload to closest "PoP"
  • Uses backbone network to deliver to target
  • Enable on bucket level (new endpoint)

References