Overview
- Monitoring system
- Supports Dashboards
PutMetric
- Timestamp
- (-2 weeks, +2 hours)
- Max size: GET = 8kB, POST = 40kB
- Required
- Namespace, List<MetricDatum>
- Delay
- 2 minutes before get-metric-statistics returns it
- Data timestamped < 24h may take up to 48h
- 15 minutes before list-metrics returns it
- 2 minutes before get-metric-statistics returns it
- Number of data points
- Single
- Multiple
- Max 20 per request
- Statistics Set (Min, Max, Sample Count,Sum)
- Average - automatically calculated (Sum/Sample Count)
- Percentiles need raw data
- Exception: Sample Count == 0 OR Min=Max (but this scenario is useless)
- None
- Recommended "0"to avoid INSUFFICIENT_DATA but this may be expensive
- ALARM has property how to treat Missing value
- Recommended "0"to avoid INSUFFICIENT_DATA but this may be expensive
Monitoring Freqency
- Basic (5 minutes)
- EC2 (default)
- SNS
- Detailed (1 minute)
- EC2 (extra cost)
- Auto Scaling
- ELB
- RDS
- Route 53
Dimensions
- Name/Value pairs associated with metric
- Category that can be used to filter metrics
- Adding dimension creates a unique metric
- No relationship with metrics that have different dimensions
- Examples
- EC2 instanceId
- ELB name
Alarm
- Can be set from CLI (simulation)
- Can be disabled from CLI
- Alarm still changes state but no action is taken
Alarm actions
- Types
- Reboot
- Recover (maintains: instanceId, private IP address)
- Restrictions: only EBS backed instancec, newer instance types, shared tenancy
- Max 3 attempts
- Terminate
- Stop
- Use cases
- Stop impaired instance (StatusCheckFailed_System)
- Stop instance with memory leak (custom metrics)
- Stop idle instance (CPUUtilization)
- Stop web servers with unusually high out traffic (NetworkOut)
- Terminate when batch job is completed (Network Out)
No comments:
Post a Comment