OpenMCF logoOpenMCF

Loading...

AWS Athena Workgroup

Deploys an Amazon Athena workgroup with configurable query result storage, server-side encryption, per-query cost controls, and optional Apache Spark execution support. The workgroup enforces governance settings so individual queries cannot override result locations or encryption policies.

What Gets Created

When you deploy an AwsAthenaWorkgroup resource, OpenMCF provisions:

  • Athena Workgroup — an aws_athena_workgroup resource with the specified name, configuration enforcement, engine version, and cost controls
  • Result Configuration — created only when resultConfiguration is set, directs query output to the specified S3 location with optional encryption and ACL settings
  • Engine Version — created only when selectedEngineVersion is set, pins the workgroup to a specific Athena or PySpark engine version

Prerequisites

  • AWS credentials configured via environment variables or OpenMCF provider config
  • An S3 bucket for storing query results (if setting resultConfiguration.outputLocation)
  • An AWS Glue Data Catalog with databases and tables defined over your S3 data sources
  • A KMS key ARN if using SSE_KMS or CSE_KMS encryption for query results
  • An IAM execution role only if creating a Spark workgroup (standard SQL workgroups do not need one)

Quick Start

Create a file athena-workgroup.yaml:

apiVersion: aws.openmcf.org/v1
kind: AwsAthenaWorkgroup
metadata:
  name: analytics
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: my-org
    pulumi.openmcf.org/project: my-project
    pulumi.openmcf.org/stack.name: dev.AwsAthenaWorkgroup.analytics
spec:
  region: us-east-1
  resultConfiguration:
    outputLocation: "s3://my-athena-results/analytics/"

Deploy:

openmcf apply -f athena-workgroup.yaml

This creates an Athena workgroup named analytics with query results stored in S3, configuration enforcement enabled (default), and CloudWatch metrics published (default).

Configuration Reference

Required Fields

FieldTypeDescriptionValidation
regionstringAWS region where the workgroup will be created (e.g., us-east-1, eu-west-1).Required; non-empty

However, most practical deployments set at least resultConfiguration.outputLocation.

Optional Fields

FieldTypeDefaultDescription
resultConfigurationobject—Query result storage and encryption settings
resultConfiguration.outputLocationstring—S3 URI where query results are stored (e.g., s3://bucket/prefix/)
resultConfiguration.encryptionOptionstring—SSE_S3, SSE_KMS, or CSE_KMS
resultConfiguration.kmsKeyArnstring—KMS key ARN for SSE_KMS/CSE_KMS. Can reference AwsKmsKey resource via valueFrom
resultConfiguration.expectedBucketOwnerstring—AWS account ID for cross-account S3 buckets
resultConfiguration.s3AclOptionstring—BUCKET_OWNER_FULL_CONTROL for cross-account result ownership
bytesScannedCutoffPerQueryint640 (no limit)Max bytes a query can scan. Must be 0 or >= 10485760 (10 MB)
enforceWorkgroupConfigurationbooltrueLock settings so queries cannot override them
publishCloudwatchMetricsEnabledbooltruePublish query metrics to CloudWatch
requesterPaysEnabledboolfalseRequester pays for S3 data access
enableMinimumEncryptionConfigurationboolfalseRequire at least SSE_S3 for all query results
selectedEngineVersionstringAUTOAthena engine version (Athena engine version 3, PySpark engine version 3, or AUTO)
forceDestroyboolfalseDelete named queries and prepared statements on workgroup destroy
executionRolestring—IAM role ARN for Spark workgroups. Can reference AwsIamRole resource via valueFrom

Examples

Basic SQL Workgroup

A minimal workgroup directing query results to S3 with all governance defaults.

apiVersion: aws.openmcf.org/v1
kind: AwsAthenaWorkgroup
metadata:
  name: analytics-team
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: my-org
    pulumi.openmcf.org/project: analytics
    pulumi.openmcf.org/stack.name: dev.AwsAthenaWorkgroup.analytics-team
spec:
  region: us-east-1
  resultConfiguration:
    outputLocation: "s3://my-athena-results/analytics-team/"

Cost-Controlled with SSE_S3

Workgroup with a 10 GB per-query scan limit and enforced minimum encryption.

apiVersion: aws.openmcf.org/v1
kind: AwsAthenaWorkgroup
metadata:
  name: data-science
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: my-org
    pulumi.openmcf.org/project: data
    pulumi.openmcf.org/stack.name: prod.AwsAthenaWorkgroup.data-science
spec:
  region: us-east-1
  resultConfiguration:
    outputLocation: "s3://data-science-results/queries/"
    encryptionOption: SSE_S3
  bytesScannedCutoffPerQuery: 10737418240
  enableMinimumEncryptionConfiguration: true

Production KMS-Encrypted with valueFrom

Production workgroup with SSE_KMS encryption referencing a KMS key from another OpenMCF resource.

apiVersion: aws.openmcf.org/v1
kind: AwsAthenaWorkgroup
metadata:
  name: prod-analytics
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: acme
    pulumi.openmcf.org/project: analytics
    pulumi.openmcf.org/stack.name: prod.AwsAthenaWorkgroup.prod-analytics
spec:
  region: us-east-1
  resultConfiguration:
    outputLocation: "s3://prod-athena-results/queries/"
    encryptionOption: SSE_KMS
    kmsKeyArn:
      valueFrom:
        kind: AwsKmsKey
        name: analytics-encryption-key
        fieldPath: status.outputs.key_arn
  bytesScannedCutoffPerQuery: 53687091200
  enforceWorkgroupConfiguration: true
  publishCloudwatchMetricsEnabled: true
  enableMinimumEncryptionConfiguration: true
  selectedEngineVersion: "Athena engine version 3"

Stack Outputs

OutputTypeDescription
workgroup_arnstringARN of the Athena workgroup, used for IAM policies and cross-service references
workgroup_namestringName of the workgroup, used in Athena API calls (StartQueryExecution, etc.)
effective_engine_versionstringActual engine version in use (resolved from selectedEngineVersion or AUTO)

Related Components

  • AWS S3 Bucket — S3 bucket for query result storage
  • AWS KMS Key — Customer-managed encryption for query results
  • AWS IAM Role — Execution role for Spark workgroups
  • AWS CloudWatch Log Group — Query execution logging

Next article

AWS Batch Compute Environment

AWS Batch Compute Environment Deploys a MANAGED AWS Batch compute environment with bundled job queues and an optional fair-share scheduling policy. Supports EC2, SPOT, FARGATE, and FARGATESPOT resource types with automatic vCPU scaling, VPC networking, and multi-queue priority routing. The component provisions the compute infrastructure, one or more job queues, and an optional scheduling policy in a single resource definition. What Gets Created When you deploy an AwsBatchComputeEnvironment...
Read next article
Presets
3 ready-to-deploy configurationsView presets →