Monday, 12 March 2018

AWS CloudWatch

Overview
  • Monitoring system
  • Supports Dashboards


PutMetric
  • Timestamp
    • (-2 weeks, +2 hours)
  • Max size: GET = 8kB, POST = 40kB
  • Required
    • Namespace, List<MetricDatum>
    • Delay
      • 2 minutes before get-metric-statistics returns it
        • Data timestamped < 24h may take up to 48h
      • 15 minutes before list-metrics returns it
  • Number of data points
    • Single 
    • Multiple
      • Max 20 per request
    • Statistics Set (Min, Max, Sample Count,Sum)
      • Average - automatically calculated (Sum/Sample Count)
      • Percentiles need raw data
        • Exception: Sample Count == 0 OR Min=Max (but this scenario is useless)
    • None
      • Recommended "0"to avoid INSUFFICIENT_DATA but this may be expensive
        • ALARM has property how to treat Missing value

Monitoring Freqency
  • Basic  (5 minutes)
    • EC2 (default)
    • SNS
  • Detailed (1 minute)
    • EC2 (extra cost)
    • Auto Scaling
    • ELB
    • RDS
    • Route 53

Dimensions
  • Name/Value pairs associated with metric
  • Category that can be used to filter metrics
  • Adding dimension creates a unique metric
    • No relationship with metrics that have different dimensions
  • Examples
    • EC2 instanceId
    • ELB name

Alarm
  • Can be set from CLI (simulation)
  • Can be disabled from CLI
    • Alarm still changes state but no action is taken


Alarm actions
  •  Types
    • Reboot
    • Recover (maintains: instanceId, private IP address)
      • Restrictions: only EBS backed instancec, newer instance types, shared tenancy
      • Max 3 attempts
    • Terminate
    • Stop
  • Use cases
    • Stop impaired instance (StatusCheckFailed_System)
    • Stop instance with memory leak (custom metrics)
    • Stop idle instance (CPUUtilization)
    • Stop web servers with unusually high out traffic (NetworkOut)
    • Terminate when batch job is completed (Network Out)



No comments:

Post a Comment