Thursday, 28 December 2017

AWS Batch

Overview
  • Simplifies running batch jobs
  • Provisions EC2 resources
    • Allows to specify % of spot instances
  • Lower level than Hadoop
  • No additional pricing
  • Uses ECS container instances to execute jobs
  • Scales to 100K+ jobs

Batch Computing Advantages
  • You can shift computing when it is cheaper
  • Avoids idling resources + higher efficiency
  • Enables prioritization

Use Cases
  • File uploaded to S3
    • SNS notification
      • Lambda submits a batch job

Job (what)
  • Unit of work
    • Shell script
    • Linux executable
    • Container image (docker)
      • Pulled from internal/external registry
  • Runs as containerized application
  • Has AWS Job Id and Name
  • Can reference other jobs (dependencies)
    • You can chain multiple jobs
  • Parameters can get overridden

Job Definition (how)
  • "Blueprint for resources"
  • Name:RevisionNumber
  • Hardware requirements
    • vCPU
    • Memory
  • Mount points
  • Environmental variables
  • jobRoleARN - permission passed to the container

Job States
  • SUBMITTED
    • Added to the queue
    • Upon evaluation by Job Scheduler transitions to 
      • PENDING (has dependencies)
      • RUNNABLE (no dependencies
  • PENDING
    • cannot run due to dependencies
    • if dependency fails the parent jobs moves to FAILED, too
  • RUNNABLE
    • no outstanding dependencies
    • can be started as soon as resources are available
  • STARTING
    • scheduled on the host
    • container initialization is underway; transitions to
      • RUNNING
  • RUNNING
    • Running as a container job on ECS container instance
  • SUCCEEDED
    • Job completed with Exit code = 0
    • Logs available in CloudWatch Logs
  • FAILED
    • All available attempts failed 
    • Retry
      • Trigger
        • Exit Code != 0 
        • EC2 instancec failure
        • AWS failure
      • Attempts
        • Default:1 , Max:10
        • AWS_BATCH_JOB_ATTEMPT environmental variable passed

Job Queue
  • Place where submitted jobs reside until scheduled
  • Priority value associated
  • Has Compute Environments associated
    • Ordered, Max 3

Scheduler
  • Attached to a Job Queue
  • Decides when and where jobs are run (i.e. what resources)
  • Dependency-aware
  • Runs queues according to priorities
  • FIFO

Compute Environment
  • Same as ECS Cluster
  • Set of compute resources
  • Types
    • Managed
      • Specific Instance Types (multiple) or The Newest
      • Min/Max/Desired vCPUs
    • Unmanaged
      • AWS Batch creates ECS Cluster
      • Use when you need special resources (e.g. EFS, Dedicated Hosts)

Array Job
  • Collection or
  • Examples (embarrassingly parallel)
    • Monte Carlo simulations
    • Parametric sweeps
    • Large rendering jobs
  • Submitted like a single job
    • Specify array size
    • AWS_BATCH_JOB_ARRAY_INDEX passed to container
  • Parent Array Job has normal AWS Batch Id (e.g. 1)
    • Children have index appended (e.g. 1:0)
  • Dependency type
    • SEQUENTIAL
      • A:1 cannot start until A:0 succeeds
    • N_TO_N
      • Allows to run multi-stage processing
      • Each job corresponds to input split

References

No comments:

Post a Comment