OpenMCF logoOpenMCF

Loading...

AWS CloudWatch Alarm

Deploys an AWS CloudWatch metric alarm that monitors a single metric or metric math expression and triggers actions when the value breaches a threshold. Supports M-of-N evaluation to reduce false positives, and can target SNS topics, Auto Scaling policies, or EC2 automation actions.

What Gets Created

When you deploy an AwsCloudwatchAlarm resource, OpenMCF provisions:

  • CloudWatch Metric Alarm — an aws_cloudwatch_metric_alarm resource configured with the specified metric source (single metric or metric math queries), threshold, evaluation window, and actions

No additional sub-resources are created. The alarm is a standalone monitoring resource.

Prerequisites

  • AWS credentials configured via environment variables or OpenMCF provider config
  • A metric to monitor — the metric must exist in CloudWatch (published by an AWS service or custom application)
  • An SNS topic if configuring alarm actions (the most common action target)
  • IAM permissions — cloudwatch:PutMetricAlarm, cloudwatch:DeleteAlarms, cloudwatch:DescribeAlarms, cloudwatch:TagResource

Quick Start

Create a file alarm.yaml:

apiVersion: aws.openmcf.org/v1
kind: AwsCloudwatchAlarm
metadata:
  name: cpu-high
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: my-org
    pulumi.openmcf.org/project: my-project
    pulumi.openmcf.org/stack.name: dev.AwsCloudwatchAlarm.cpu-high
spec:
  region: us-west-2
  comparisonOperator: GreaterThanThreshold
  evaluationPeriods: 3
  threshold: 80.0
  metricName: CPUUtilization
  namespace: AWS/EC2
  period: 300
  statistic: Average

Deploy:

openmcf apply -f alarm.yaml

This creates an alarm that triggers when EC2 CPU utilization exceeds 80% for 3 consecutive 5-minute periods.

Configuration Reference

Required Fields

FieldTypeDescriptionValidation
regionstringAWS region where the alarm will be created (e.g., us-west-2).Required
comparisonOperatorstringArithmetic operation comparing the statistic to the threshold.One of: GreaterThanOrEqualToThreshold, GreaterThanThreshold, LessThanThreshold, LessThanOrEqualToThreshold, LessThanLowerOrGreaterThanUpperThreshold, LessThanLowerThreshold, GreaterThanUpperThreshold
evaluationPeriodsintNumber of consecutive periods over which data is compared to the threshold.>= 1

One of the following metric source modes is required:

Simple Metric Mode — set metricName, namespace, period, and one of statistic or extendedStatistic:

FieldTypeDescriptionValidation
metricNamestringCloudWatch metric name (e.g., CPUUtilization, RequestCount).Max 255 chars. Mutually exclusive with metricQueries.
namespacestringMetric namespace (e.g., AWS/EC2, AWS/SQS).Required when metricName is set. Max 255 chars.
periodintPeriod in seconds for statistic evaluation.Required when metricName is set. Valid: 10, 20, 30, or multiple of 60.
statisticstringStandard statistic.One of: SampleCount, Average, Sum, Minimum, Maximum. Mutually exclusive with extendedStatistic.

Metric Query Mode — set metricQueries for metric math or anomaly detection:

FieldTypeDescriptionValidation
metricQueriesobject[]Metric math expressions or multi-metric queries.Max 20 items. Mutually exclusive with metricName.

Optional Fields

FieldTypeDefaultDescription
thresholddouble—Static threshold value. Mutually exclusive with thresholdMetricId.
thresholdMetricIdstring—ID of the ANOMALY_DETECTION_BAND function for anomaly detection alarms. Mutually exclusive with threshold.
datapointsToAlarmintSame as evaluationPeriodsNumber of breaching data points to trigger alarm (M-of-N evaluation). Must be <= evaluationPeriods.
treatMissingDatastringmissingHow missing data is treated. One of: missing, ignore, breaching, notBreaching.
actionsEnabledbooltrueWhether actions execute on state transitions. Set to false during tuning or maintenance.
extendedStatisticstring—Percentile statistic (e.g., p95, p99.9, IQM). Mutually exclusive with statistic.
dimensionsmap<string,string>—Dimensions to narrow the metric to a specific resource.
unitstring—Filters data points to matching unit.
alarmActionsStringValueOrRef[][]Actions for ALARM transitions. Can reference AwsSnsTopic via valueFrom. Max 5.
okActionsStringValueOrRef[][]Actions for OK transitions. Can reference AwsSnsTopic via valueFrom. Max 5.
insufficientDataActionsStringValueOrRef[][]Actions for INSUFFICIENT_DATA transitions. Can reference AwsSnsTopic via valueFrom. Max 5.
alarmDescriptionstring—Human-readable description. Max 1024 chars.
evaluateLowSampleCountPercentilesstring—Percentile alarm behavior with low sample counts. One of: evaluate, ignore.

Metric Query Fields

Each item in metricQueries:

FieldTypeDescription
idstringRequired. Variable name for expressions (e.g., m1, errors). Must start with lowercase letter.
expressionstringMetric math expression (e.g., m1/m2*100). Mutually exclusive with metric.
metricobjectRaw metric definition. Mutually exclusive with expression.
metric.metricNamestringRequired. Metric name.
metric.namespacestringRequired. Metric namespace.
metric.periodintRequired. Period in seconds.
metric.statstringRequired. Statistic (standard or extended).
metric.dimensionsmap<string,string>Dimensions for the metric.
labelstringDisplay label for the query.
returnDataboolSet true on exactly one query to use its result as the alarm signal.
accountIdstringAWS account ID for cross-account monitoring.

Examples

Simple Metric with SNS Notification

An EC2 CPU alarm that sends to an SNS topic when CPU exceeds 80%:

apiVersion: aws.openmcf.org/v1
kind: AwsCloudwatchAlarm
metadata:
  name: ec2-cpu-alarm
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: my-org
    pulumi.openmcf.org/project: my-project
    pulumi.openmcf.org/stack.name: prod.AwsCloudwatchAlarm.ec2-cpu-alarm
spec:
  region: us-west-2
  comparisonOperator: GreaterThanThreshold
  evaluationPeriods: 3
  datapointsToAlarm: 2
  threshold: 80.0
  metricName: CPUUtilization
  namespace: AWS/EC2
  period: 300
  statistic: Average
  dimensions:
    InstanceId: i-0abcdef1234567890
  treatMissingData: breaching
  alarmDescription: "EC2 CPU exceeds 80% for 2 of 3 periods"
  alarmActions:
    - value: arn:aws:sns:us-east-1:123456789012:ops-alerts

Error Rate with Metric Math

Computes ALB 5xx error rate as a percentage and alerts when it exceeds 5%:

apiVersion: aws.openmcf.org/v1
kind: AwsCloudwatchAlarm
metadata:
  name: error-rate-alarm
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: my-org
    pulumi.openmcf.org/project: my-project
    pulumi.openmcf.org/stack.name: prod.AwsCloudwatchAlarm.error-rate-alarm
spec:
  region: us-west-2
  comparisonOperator: GreaterThanThreshold
  evaluationPeriods: 3
  datapointsToAlarm: 2
  threshold: 5.0
  treatMissingData: notBreaching
  alarmDescription: "ALB 5xx error rate exceeds 5%"
  metricQueries:
    - id: errors
      metric:
        metricName: HTTPCode_Target_5XX_Count
        namespace: AWS/ApplicationELB
        period: 300
        stat: Sum
        dimensions:
          LoadBalancer: app/my-alb/1234567890abcdef
    - id: requests
      metric:
        metricName: RequestCount
        namespace: AWS/ApplicationELB
        period: 300
        stat: Sum
        dimensions:
          LoadBalancer: app/my-alb/1234567890abcdef
    - id: error_rate
      expression: "errors/requests*100"
      label: "Error Rate %"
      returnData: true
  alarmActions:
    - value: arn:aws:sns:us-east-1:123456789012:ops-alerts

Production Multi-Action with Foreign Key References

A production SQS depth alarm using valueFrom to reference an OpenMCF-managed SNS topic, with actions on all three state transitions:

apiVersion: aws.openmcf.org/v1
kind: AwsCloudwatchAlarm
metadata:
  name: sqs-depth-alarm
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: my-org
    pulumi.openmcf.org/project: my-project
    pulumi.openmcf.org/stack.name: prod.AwsCloudwatchAlarm.sqs-depth-alarm
spec:
  region: us-west-2
  comparisonOperator: GreaterThanOrEqualToThreshold
  evaluationPeriods: 5
  datapointsToAlarm: 3
  threshold: 1000.0
  metricName: ApproximateNumberOfMessagesVisible
  namespace: AWS/SQS
  period: 60
  statistic: Maximum
  treatMissingData: notBreaching
  alarmDescription: "SQS queue depth exceeds 1000 — consumers may be backed up"
  alarmActions:
    - valueFrom:
        kind: AwsSnsTopic
        name: ops-critical
        fieldPath: status.outputs.topic_arn
  okActions:
    - valueFrom:
        kind: AwsSnsTopic
        name: ops-resolved
        fieldPath: status.outputs.topic_arn
  insufficientDataActions:
    - valueFrom:
        kind: AwsSnsTopic
        name: ops-warnings
        fieldPath: status.outputs.topic_arn

Stack Outputs

After deployment, the following outputs are available in status.outputs:

OutputTypeDescription
alarm_arnstringARN of the CloudWatch metric alarm. Used by composite alarms and operational tooling.
alarm_namestringName of the alarm, unique within the AWS account and region.

Related Components

  • AwsSnsTopic — the most common alarm action target for notifications
  • AwsCloudwatchLogGroup — log storage that generates metrics (via metric filters) for alarm evaluation
  • AwsSqsQueue — queues commonly monitored by depth and age alarms
  • AwsLambda — functions monitored by error and duration alarms

Next article

AWS CloudWatch Log Group

AWS CloudWatch Log Group Deploys an AWS CloudWatch Logs log group with configurable retention policy, optional KMS encryption, and log group class selection. The log group serves as a centralized destination for application logs, service logs, and operational data, and is referenced by many other AWS components including Step Functions, API Gateway, and OpenSearch. What Gets Created CloudWatch Log Group — a container for log streams with the specified retention, encryption, and class settings...
Read next article
Presets
3 ready-to-deploy configurationsView presets →