Notes on AWS, Big Data, Machine Learning and Leadership: AWS CloudWatch

Monday, 12 March 2018

AWS CloudWatch

Overview

Monitoring system
Supports Dashboards

PutMetric

Timestamp
- (-2 weeks, +2 hours)
Max size: GET = 8kB, POST = 40kB
Required
- Namespace, List<MetricDatum>
- Delay
  - 2 minutes before get-metric-statistics returns it
    - Data timestamped < 24h may take up to 48h
  - 15 minutes before list-metrics returns it
Number of data points
- Single
- Multiple
  - Max 20 per request
- Statistics Set (Min, Max, Sample Count,Sum)
  - Average - automatically calculated (Sum/Sample Count)
  - Percentiles need raw data
    - Exception: Sample Count == 0 OR Min=Max (but this scenario is useless)
- None
  - Recommended "0"to avoid INSUFFICIENT_DATA but this may be expensive
    - ALARM has property how to treat Missing value

Monitoring Freqency

Basic (5 minutes)
- EC2 (default)
- SNS
Detailed (1 minute)
- EC2 (extra cost)
- Auto Scaling
- ELB
- RDS
- Route 53

Dimensions

Name/Value pairs associated with metric
Category that can be used to filter metrics
Adding dimension creates a unique metric
- No relationship with metrics that have different dimensions
Examples
- EC2 instanceId
- ELB name

Alarm

Can be set from CLI (simulation)
Can be disabled from CLI
- Alarm still changes state but no action is taken

Alarm actions

Types
- Reboot
- Recover (maintains: instanceId, private IP address)
  - Restrictions: only EBS backed instancec, newer instance types, shared tenancy
  - Max 3 attempts
- Terminate
- Stop
Use cases
- Stop impaired instance (StatusCheckFailed_System)
- Stop instance with memory leak (custom metrics)
- Stop idle instance (CPUUtilization)
- Stop web servers with unusually high out traffic (NetworkOut)
- Terminate when batch job is completed (Network Out)

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)