Use cases
- Scaling activity
- Instance replacement
Auto Scaling Group
- Group of resources
- Associated with either
- Launch Configuration
- Launch Template
- Group size
- min
- max
- desired
- used for manual scaling and scheduled scaling
- must be beteween min <= desired <= max
- Networking
- VPC and subnets
- AZ
- (Optional) Placement Group
- Cannot specify multiple AZ if used
- (Optional) ELB
- Healthcheck
- Cooldown period
- Termination Policies
- Suspended Processes
- Instance Protection
- Protects from termination on scale-in
Notification
- Sends SNS notification on following instance events
- launch
- terminate
- failed to launch
- failed to terminate
Launch Configuration
- Template (blueprint) for instance to be launched by AS
- AMI
- Instance type
- Spot Instances (Yes/No)
- Detailed monitoring
- Role
- Public IP assignment
- Additional Storage (EBS)
- Security Groups
- Keypair
- Each ASG has exactly 1 (current) launch configuration
- Cannot be edited - must be cloned
- Upating Launch Configuration does impact existing instances
- As of 2018 recommended to use Launch Templates
Healthcheck
- Instance starts as healthy
- Type
- EC2 (default)
- System Status
- Instance Status
- ELB (optional)
- AS Reports instance as unhealthy if ELB reports OutOfService
- Combined with EC2 healthcheck (logical "AND")
- Custom healthcheck
- Manually notify AS that an instance is healthy/unhealthy (set-instance-health)
- Overrides the health status
- e.g. Can be used to mark the instance as healthy when it is rebooted
- HealthcheckGracePeriod
- Amount of time to wait before AS starts relying on Healthcheck
- By default assumes Healthy
- Used when new instance is started to give it time to prepare itself
- If lifecycle hook is attached the time starts AFTER it is completed
Healthcheck Replacement
- When instance is marked unhealthy it is immediately marked for termination
- e.g Stop instance
- You can attempt to call "SetInstanceHealth=Healthy" but there is race condition
- Subsequently new activity to launch replacement is initiated
- EIP and EBS volumes are NOT attached to replacement instance
- You can handle this via User data script
Auto Scaling Instance Lifecycle
- Pending (instance launched/attached)
- Pending:Wait (lifecycle hook: 1h)
- Pending:Proceed
- InService (passed healthcheck)
- EnteringStandby
- Standby
- Call "ExitStandby" to return to Pending
- Terminating
- Terminating:Wait (lifecycle hook: 1h)
- Terminating:Proceed
- Terminated
- Detaching
Lifecycle hooks
- Custom code executed when instance enters "Wait" state
- launch - Pending:Wait, e.g.
- Install additional software with CodeDeploy
- Fill-up cache
- terminate - Terminating:Wait, e.g.
- Analyze crashed instance
- Retrieve Logs
- Copy data out of instance
- AS sends notification to Notification Target:
- Targets
- SQS
- SNS
- CloudWatch Events (e.g. Lambda)
- Code must return result CompleteLifecycleAction
- CONTINUE
- ABANDON
- On Terminate:Wait it still terminates but no other lifecycle hooks are executed
- Timeout: default 60 minutes,
- Can be extended with
RecordLifecycleActionHeartbeat
- Cooldown starts AFTER hook is completed
- AS is frozen for a longer period of time
- Max 50 hooks per Auto Scaling group
Instance actions
- Attach existing EC2 instance
- Must not be part of other ASG
- AMI must exist
- Increases Desired Count
- Lifecycle
- Detach EC2 instance from ASG
- Can be used to move to different ASG
- Lifecycle
- Specify if you want to decrement "Desired"
- Standby
- manually take instance out of ASG
- Instance is deregistered from ELB (if applicable)
- No healthcheck performed
- Use cases
- Update or modify instance
- Troubleshoot
- By default "Desired" decremented (i.e. no replacement launched)
ELB integration
- ASG can use multiple ELBs
- Max 50 per ASG
- If any healthcheck fails the instance is marked unhealthy by ASG
- even if all other ELBs consider it OK
- Use case
- Each ELB has a different SSL certificate associated
- ELB points to ASG rather than specific instances inside it
- ASG can re-use
- ELB healthcheck
- Connection Draining (waits before termination)
Scaling
- Methods
- Manual
- Schedule
- Dynamic
- Simple
- Step
- Target Tracking
Scaling Manually
- Manually change the size of the ASG
Scaling by Schedule
- When you can predict exact dates
- Maximum 125 scheduled actions (4*31) per month
- Similar to programming a room thermostat
- Group size properties change (min, desired, max)
- Types
- One time - start time
- Recurring - cron syntax
Scaling by Policy (dynamic)
- Scaling Adjustment Types
- ChangeInCapacity(+/- number_of_instances)
- ExactCapacity(number_of_instances)
- PercentChangeInCapacity(+/- percent_change_in_capacity)
- MinAdjustmentMagnitue (minimum number of instances)
- Rounding
- (-1,1) => 1
- (-inf,-1)u(1,+inf) => drop fraction part (cast)
- Policy Type
- Simple Scaling - single adjustment
- Supports any ALARM
- When breach defined adjustment occurs (e.g. 3->8)
- Cooldown supported
- Step Scaling
- Recommended
- One or more steps
- Responds to the magnitude of the Alarm (not just binary: ALARM/OK)
- Warm-up supported
- Target Tracking
- Supported metrics
- ALB Request Count per Target
- Average CPU Utilization
- Average Network In
- Average Network Out
- When breached scale-out occurs
- Works like thermostat ("I want average CPU Utilization to be < 50%")
- Warm-up supported
- Scale-in can be disabled
- Based on CloudWatch Alarms
- e.g. CPU Utilization, ELB Latency, ELB RequestCount, SQSNumberOfMessagesVisible
Oscillations
- Adding/removing instances changes the state of the system
- This may cause oscillation behavior
- In order to damp oscillations two mechanisms are provided
- Cooldown
- Period of time to wait before another scaling action
- How long to wait before previous action gives result
- Damps oscillations
- Supported for Simple Scaling Policy
- Locks the entire ASG
- Warm-up
- Supported by Step Scaling Policy and Target Tracking Policy
- Period of time after adding new instance when it is not counted towards aggregated metrics
- Prevents adding or terminating too many instances
Termination Policy
- How AWS decides which instance to terminate on scale-in
- Firstly - always try to balance AZ (choose random AZ if all have the same instance count)
- Secondly
- Default (OCR)
- OldestLaunchConfiguration
- ClosestToNextInstanceHour
- Random
- Custom
- OldestInstance
- NewestInstance
- OldestLaunchConfiguration
- ClosestToNextInstanceHour
- Multpile policies can be associated with ASG
- e.g. "OldestLaunchConfiguration","NewestInstance", "Default")
Auto Scaling Processes
- Independent processes (workers) that perform state transitions
- Can be individually suspended/resumed (e.g. for debugging)
- Administrative suspension
- All processes in the group are suspended
- When fail to launch instance for 24h
- Can be resumed
- Types
- Launch - add new instances to the group
- Terminate - removes instances from the group
- Healthcheck - checks the health status
- ReplaceUnhealthy
- Uses: Healthcheck, Terminate, Launch
- AZRebalance
- Balance instance count between AZ
- When AZ is removed from a group
- AZ is failing or has recovered
- Instance is explicitly terminated
- Uses: Launch (before termination)
- Unlike Healthcheck Replacement that kills the instance first
- AlarmNotification
- Accepts and reacts on CW Alarms associated with a group
- Required for executing policies based on ALARM triggers
- ScheduledActions
- Performs scheduled actions
- AddToLoadBalancer
- adds launched instances to ELB
CloudWatch metrics
- AutoScaling maintains aggregated instance metrics for all instances in the group (e.g. CpuUtilization)
- Identical to EC2 but dimension is ASG (not instanceId)
- Auto Scaling Metrics
- GroupMinSize
- GroupMaxSize
- GroupDesiredSize
- GroupInService
- ...
Spot Instances
- Can be used with ASG
- Require separate Launch Configuration
- Specify bid price
- Cannot be modified as Launch Configuration is immutable
- Cannot mix on-demand and spot
- When spot instance is interrupted AS tries to launch replacement