Overview
- Simplifies running batch jobs
- Provisions EC2 resources
- Allows to specify % of spot instances
- Lower level than Hadoop
- No additional pricing
- Uses ECS container instances to execute jobs
- Scales to 100K+ jobs
Batch Computing Advantages
- You can shift computing when it is cheaper
- Avoids idling resources + higher efficiency
- Enables prioritization
Use Cases
- File uploaded to S3
- SNS notification
- Lambda submits a batch job
- SNS notification
Job (what)
- Unit of work
- Shell script
- Linux executable
- Container image (docker)
- Pulled from internal/external registry
- Runs as containerized application
- Has AWS Job Id and Name
- Can reference other jobs (dependencies)
- You can chain multiple jobs
- Parameters can get overridden
Job Definition (how)
- "Blueprint for resources"
- Name:RevisionNumber
- Hardware requirements
- vCPU
- Memory
- Mount points
- Environmental variables
- jobRoleARN - permission passed to the container
Job States
- SUBMITTED
- Added to the queue
- Upon evaluation by Job Scheduler transitions to
- PENDING (has dependencies)
- RUNNABLE (no dependencies
- PENDING
- cannot run due to dependencies
- if dependency fails the parent jobs moves to FAILED, too
- RUNNABLE
- no outstanding dependencies
- can be started as soon as resources are available
- STARTING
- scheduled on the host
- container initialization is underway; transitions to
- RUNNING
- RUNNING
- Running as a container job on ECS container instance
- SUCCEEDED
- Job completed with Exit code = 0
- Logs available in CloudWatch Logs
- FAILED
- All available attempts failed
- Retry
- Trigger
- Exit Code != 0
- EC2 instancec failure
- AWS failure
- Attempts
- Default:1 , Max:10
- AWS_BATCH_JOB_ATTEMPT environmental variable passed
- Trigger
Job Queue
- Place where submitted jobs reside until scheduled
- Priority value associated
- Has Compute Environments associated
- Ordered, Max 3
Scheduler
- Attached to a Job Queue
- Decides when and where jobs are run (i.e. what resources)
- Dependency-aware
- Runs queues according to priorities
- FIFO
Compute Environment
- Same as ECS Cluster
- Set of compute resources
- Types
- Managed
- Specific Instance Types (multiple) or The Newest
- Min/Max/Desired vCPUs
- Unmanaged
- AWS Batch creates ECS Cluster
- Use when you need special resources (e.g. EFS, Dedicated Hosts)
- Managed
Array Job
- Collection or
- Examples (embarrassingly parallel)
- Monte Carlo simulations
- Parametric sweeps
- Large rendering jobs
- Submitted like a single job
- Specify array size
- AWS_BATCH_JOB_ARRAY_INDEX passed to container
- Parent Array Job has normal AWS Batch Id (e.g. 1)
- Children have index appended (e.g. 1:0)
- Dependency type
- SEQUENTIAL
- A:1 cannot start until A:0 succeeds
- N_TO_N
- Allows to run multi-stage processing
- Each job corresponds to input split
- SEQUENTIAL
References
No comments:
Post a Comment