Sunday, 31 December 2017

AWS CloudWatch (Events)

Overview
  • AWS Resources publish information about state changes as CloudWatch Events
  • Target can execute action upon event
  • Rule can route event to Target
  • Use cases
    • Invoke Lambda to modify DNS when EC2 instance is launched
    • Direct CloudTrail records to Kinesis
    • Run SSM command on when instance is launced
    • Log AWS API Calls
  • Near real-time
  • At least-one trigger


Event
  • Triggered by:
    • AWS resource changes state, e.g
      • EC2 instance pending->running
      • ASG launches or terminates an instance
      • EBS created a snapshot
      • Code Deploy instance state change
      • Sign-in to AWS Management Console
      • [many other AWS Services]
    • AWS CloudTrail
      • Can be used as intermediary
      • Read/Write calls supported by CloudTrail can be relayed as Events
    • Customer code publishes event (PutEvents)
    • Scheduled (self-triggered)
      • Cron expressions
      • Rate expressions
  • Uses JSON format
  • Can contain custom payload (useful for Lambda)

Event Bus
  • Each AWS account has default bus
  • Allows sending events to receiver AWS account
    • On receiver account specify permissions
    • Create a rule 
    • Attach foreign Event Bus as a target

Rule
  • matches incoming events and route to targets
  • matching is unordered

Target
  • Receives event as JSON
    • AWS Systems Manager (Run Command)
    • EC2 API calls
    • ECS tasks
    • Lambda
    • Kinesis Streams 
    • SNS
    • [other AWS Services]
    • Event Bus in another account

References

AWS ELB(NLB)

Overview
  • Operates at OSI Level 4 (connection level)
    • TCP: IP + Port
    • Level 3 would be just IP
  • Full control over IP addresses
    • Single IP address per AZ (VPC subnet)
      • EIP possible to attach
      • No CNAME resolution
  • Long-running connections (months) supported
    • Normally idles after timeout
    • Use cases: IoT, gaming, messaging
    • No idle-timeout configuration
  • Zonality 
    • No cross-zone balancing
      • But fails over to another AZ if all targets unhealthy (Route 53)

Limitations
  • No SSL termination
  • No Backend server encryption

Types
  • Internet-facing
  • Internal

Target Types
  • Instance Id or IP (just like ALB)

Performance
  • Scales to millions of requests
  • Very low latency
  • Handles volatile traffic well
    • Sudden spike (e.g. "flash sales")

Client source IP
  • Unlike other ELB it preserves source IP address
  • Only applies to targets registered by instance ID (not IP targerts)
  • Proxy Protocol still available 
  • No need for X-Forwarded-For

Monitoring 
  • VPC flow logs (instead of access logs)
  • CloudWatch

Healthchecks
  • Network level
    • Observes normal (organic) traffic to target 
  • Application level (like CLB/ALB)
    • Synthetic

Pricing
  • NLCU
    • 100K active connections / minute
    • 800 new connections (flows) / second
    • 2.22 Mbps (1 GB / h)
  • Highest dimension used (like in ALB)

References

AWS ELB(ALB)

Overview
  • Layer 7 (advanced)
    • Content based routing
  • Evaluates listener rules
  • Use cases
    • Single LB fronting different types of services (e.g. website, api)
    • Microservices in containers (integrated wih ECS)
  • Improved performance over ELB (cheaper)
  • Integrated with WAF
  • IPv6 support

Types
  • Internet facing
  • Internal

Limitations
  • No backend authentication (unlike CLB)

Listeners
  • HTTP/HTTPS
    • Ports 1-65535
  • HTTPS
    • Multiple certificates possible (SNI)
      • Smart selection if 
  • WebSockets
    • HTTP (ws://) or HTTPS (wss://)
  • HTTP/2
    • HTTPS listeners only
    • Server-Push not available
  • Has Listener Rules (1+)

Listener Rule
  • Contains
    • Priority
    • Action
      • Always forward request
    • Optional Host
      • Host-based routing
    • Optional Path
      • Path-based routing
  • Default rule has no conditions (catch-all)

Target
  • Type
    • EC2 instance
    • IP address
      • Inside/outside VPC (e.g. on-premise)
      • IP must be private
        • ClassicLink instances
        • Peered-VPC
        • On-premise instances (Direct Connection/VPN)
          • Use case: migrate-to-cloud/burst-to-cloud/fail-over-to-cloud
  • State
    • draining
  • Same target may be registered multiple times (different ports) e.g. microservices


Target Group
  • Set of targets
  • Listener rule forwards traffic to Target Group
  • Has its own HealthCheck
    • If no healthy targets still routes traffic
  • You don't need to take the whole instance out of rotation
  • May be attached to Auto Scaling Group

Request Tracing
  • LB injects a header X-Amzn-Trace-Id
  • Supports chaining: Field={Root, Self}
  • Visible in Access Logs ("trace_id")

Sticky Sessions
  • Only LB cookie supported (AWSALB)
  • Websockets are inherently sticky (long-lasting connection)

Healthchecks
  • Ability to define "successful" HTTP status codes

Pricing
  • Per-hour fee
  • LCU
    • Dimenstions
      • 3000 Active Connections per minute
      • 25 new connections established per second
        • Certificate key size matters (shorter = cheaper)
      • 1000 rules evaluation
      • Data transferred 2.25 Mbps (=1 GB/per-hour)
    • Highest dimension used to evaluate number of LCUs consumed 

References



Thursday, 28 December 2017

AWS Batch

Overview
  • Simplifies running batch jobs
  • Provisions EC2 resources
    • Allows to specify % of spot instances
  • Lower level than Hadoop
  • No additional pricing
  • Uses ECS container instances to execute jobs
  • Scales to 100K+ jobs

Batch Computing Advantages
  • You can shift computing when it is cheaper
  • Avoids idling resources + higher efficiency
  • Enables prioritization

Use Cases
  • File uploaded to S3
    • SNS notification
      • Lambda submits a batch job

Job (what)
  • Unit of work
    • Shell script
    • Linux executable
    • Container image (docker)
      • Pulled from internal/external registry
  • Runs as containerized application
  • Has AWS Job Id and Name
  • Can reference other jobs (dependencies)
    • You can chain multiple jobs
  • Parameters can get overridden

Job Definition (how)
  • "Blueprint for resources"
  • Name:RevisionNumber
  • Hardware requirements
    • vCPU
    • Memory
  • Mount points
  • Environmental variables
  • jobRoleARN - permission passed to the container

Job States
  • SUBMITTED
    • Added to the queue
    • Upon evaluation by Job Scheduler transitions to 
      • PENDING (has dependencies)
      • RUNNABLE (no dependencies
  • PENDING
    • cannot run due to dependencies
    • if dependency fails the parent jobs moves to FAILED, too
  • RUNNABLE
    • no outstanding dependencies
    • can be started as soon as resources are available
  • STARTING
    • scheduled on the host
    • container initialization is underway; transitions to
      • RUNNING
  • RUNNING
    • Running as a container job on ECS container instance
  • SUCCEEDED
    • Job completed with Exit code = 0
    • Logs available in CloudWatch Logs
  • FAILED
    • All available attempts failed 
    • Retry
      • Trigger
        • Exit Code != 0 
        • EC2 instancec failure
        • AWS failure
      • Attempts
        • Default:1 , Max:10
        • AWS_BATCH_JOB_ATTEMPT environmental variable passed

Job Queue
  • Place where submitted jobs reside until scheduled
  • Priority value associated
  • Has Compute Environments associated
    • Ordered, Max 3

Scheduler
  • Attached to a Job Queue
  • Decides when and where jobs are run (i.e. what resources)
  • Dependency-aware
  • Runs queues according to priorities
  • FIFO

Compute Environment
  • Same as ECS Cluster
  • Set of compute resources
  • Types
    • Managed
      • Specific Instance Types (multiple) or The Newest
      • Min/Max/Desired vCPUs
    • Unmanaged
      • AWS Batch creates ECS Cluster
      • Use when you need special resources (e.g. EFS, Dedicated Hosts)

Array Job
  • Collection or
  • Examples (embarrassingly parallel)
    • Monte Carlo simulations
    • Parametric sweeps
    • Large rendering jobs
  • Submitted like a single job
    • Specify array size
    • AWS_BATCH_JOB_ARRAY_INDEX passed to container
  • Parent Array Job has normal AWS Batch Id (e.g. 1)
    • Children have index appended (e.g. 1:0)
  • Dependency type
    • SEQUENTIAL
      • A:1 cannot start until A:0 succeeds
    • N_TO_N
      • Allows to run multi-stage processing
      • Each job corresponds to input split

References

Wednesday, 27 September 2017

AWS EMR

Model
  • EMR is more than "map reduce"
  • Hadoop
    • Moniker for all Open Source big data projectes (ecosystem)
      • Extract: Sqoop, MapReduce API
      • Transform & Load: Spark, Cascading, Pig, MR
      • Data Warehouse (file formats): Parquet, ORC, Seq, Text
      • Report Generation: Hive, Spark, Cascading, Pig
      • Ad hoc analysis: Presto, Hive, Spark-SQL, Lingual, Impala
    • Distributed storage and compute
  • EMR manages Hadoop cluster
    • Deploying software bits
    • Managing nodes lifecycle 
    • AWS runs customized version of Hadoop - new release every month
    • Uses Amazon Linux (A-Linux)
  • EMR also supports non-Hadoop distribution MapR
    • no-NameNode architecture 
    • can tolerate multiple failures with automatic failover/failback

Cluster
  • Collection of nodes
  • Master node - management, coordination of slaves
    • Not much processing power required
    • Do not use spot instances
  • Slave nodes
    • Core nodes - run tasks and store data
      • Processing Power + Storage
    • Task nodes (optional) - run tasks
      • Processing Power (no storage)
    • Failed slaves are not automatically replaced
  • Use cases
    • Job flow engine (i.e. schedule jobs)
    • Long running cluster (shared EMR cluster that stays up)
      • e.g. for Facebook Presto
      • can use  (blue/green) deployment for new cluster

Security
  • Security Groups
    • Master - ingress
      • SSH
      • Various IP ranges belonging to AWS
    • Slave
  • IAM Roles
    • EMR Role - EMR service access on your behalf (i.e. running nodes)
    • EC2 Instance profile - associated with running EMR nodes (i.e. what can be accessed by EC2 instance)
    • Auto Scaling role - allows Autoscaling interact with EMR


Job
  • Workflow that represents program executed by EMR
  • Consists of series of steps
    • Step types
      • Streaming program
        • reads standard input
        • runs mapper,
        • run reducer
        • writes to standard output
      • Hive program
      • Pig program
      • Spark application
      • Custom JAR
        • Java program
        • Bash script

Processing Data
  • Submit jobs directly to installed app (e.g. Hive, Pig)
    • SSH to master
    • Access tools
  • Running steps

Cluster lifecycle (Job flow)
  • STARTING: AWS provisions clusters, installs Hadoop
  • BOOTSTRAPPING: install additional apps
  • RUNNING: runs all the steps
  • After steps are completed
    • WAITING: if long running persistent cluster
      • SHUTTING_DOWN: manually terminated
        • TERMINATED
    • SHUTTING_DOWN
      • COMPLETED

Cost
  • EMR
  • S3
  • EC2
    • Spot Instances
      • Hadoop is already interruptible so a good fit
      • Do not use spot for master node


Storage
  • Hadoop HDFS
    • native filesystem (also used for HBase)
    • cannot decouple storage from compute 
    • ehpemeral (lost when cluster terminated)
    • useful for caching intermediate results
    • replicates data between nodes 
    • Node Types
      • DataNode - stores files' blocks (64MB)
      • NameNode - master for DataNode (tracks which block is where)
  • EMRFS
    • S3 single source of truth: data lake
    • Multiple clusters can work on the same data
    • Consistent View 
      • DynamoDB based index
      • Very fast index
    • Copy
      • s3distcp - efficient, parallel  copy of data S3 <-> EMR cluster
  • Combination
    • S3 as input/output
    • HDFS intermediate results

Hadoop YARN
  • "Yet Another Resource Negotiator"
  • Component for managing resources
    • nodes
    • allocating tasks
  • It can be used to run application not related to Hadoop MapReduce, e.g.
    • Apache Tez 
    • Apache Spark

Tools
  • Hue - UI for Hadoop
  • Hive
    • Uses SQL syntax to generate map reduce jobs
    • Code generation
    • Schedule with the engine
    • Quite slow
    • extensible with Java
    • complex user defined types
    • can access DynamoDB, S3
    • can process very large amounts of data
  • Impala
    • SQL-like language
    • In-memory
    • Uses hive metadata    
    • Bypasses Hadoop MapReduce
    • Only works with HDFS
  • Facebook Presto
    • In-memory 
    • Can work with Hive tables
    • Very low latency
    • Query directly against S3
    • Bypasses MapReducec
References