OpenMCF logoOpenMCF

Loading...

AWS Batch Compute Environment

Deploys a MANAGED AWS Batch compute environment with bundled job queues and an optional fair-share scheduling policy. Supports EC2, SPOT, FARGATE, and FARGATE_SPOT resource types with automatic vCPU scaling, VPC networking, and multi-queue priority routing. The component provisions the compute infrastructure, one or more job queues, and an optional scheduling policy in a single resource definition.

What Gets Created

When you deploy an AwsBatchComputeEnvironment resource, OpenMCF provisions:

  • Compute Environment — a batch.ComputeEnvironment of type MANAGED with the specified resource type (EC2/SPOT/FARGATE/FARGATE_SPOT), vCPU limits, VPC subnets, security groups, and optional update policy
  • Job Queues — one batch.JobQueue per entry in jobQueues, each referencing the compute environment with configurable priority, state, and optional job-state time-limit actions for automatic cancellation of stuck jobs
  • Scheduling Policy (optional) — a batch.SchedulingPolicy with fair-share configuration when schedulingPolicy is provided, attached to all bundled job queues for capacity distribution across share identifiers

Prerequisites

  • AWS credentials configured via environment variables or OpenMCF provider config
  • At least one VPC subnet (private subnets recommended) — use AwsVpc to provision
  • A security group allowing outbound access for containers — use AwsSecurityGroup to provision
  • For EC2/SPOT types: an ECS instance profile IAM role with AmazonEC2ContainerServiceforEC2Role policy
  • For SPOT type: a Spot Fleet IAM role with AmazonEC2SpotFleetTaggingRole policy

Quick Start

Create a file batch.yaml:

apiVersion: aws.openmcf.org/v1
kind: AwsBatchComputeEnvironment
metadata:
  name: my-batch
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: my-org
    pulumi.openmcf.org/project: my-project
    pulumi.openmcf.org/stack.name: dev.AwsBatchComputeEnvironment.my-batch
spec:
  region: us-west-2
  computeResources:
    type: FARGATE
    maxVcpus: 256
    subnetIds:
      - value: subnet-0a1b2c3d4e5f00001
      - value: subnet-0a1b2c3d4e5f00002
    securityGroupIds:
      - value: sg-0a1b2c3d4e5f00001
  jobQueues:
    - name: default
      priority: 1

Deploy:

openmcf apply -f batch.yaml

This creates a serverless Fargate compute environment with up to 256 vCPUs of capacity and a single default job queue. AWS manages all compute infrastructure — no EC2 instances, patching, or AMI management required.

Configuration Reference

Required Fields

FieldTypeDescriptionValidation
regionstringAWS region where the compute environment will be created (e.g., us-west-2, eu-west-1).Required; non-empty
computeResourcesobjectInfrastructure configuration for the compute environmentRequired
computeResources.typestringCompute resource typeMust be EC2, SPOT, FARGATE, or FARGATE_SPOT
computeResources.maxVcpusint32Maximum vCPU capacity>= 1
computeResources.subnetIdslist(StringValueOrRef)VPC subnets for compute resourcesAt least 1 required
jobQueueslist(object)Job queues routing to this compute environmentAt least 1 required
jobQueues[].namestringQueue name1-128 chars, alphanumeric/hyphen/underscore, starts with alphanumeric
jobQueues[].priorityint32Dispatch priority (higher = higher priority)Required

Optional Fields — Top Level

FieldTypeDefaultDescription
statestringENABLEDCompute environment state: ENABLED or DISABLED
serviceRoleStringValueOrRefService-linked roleIAM role for AWS Batch to make API calls
updatePolicy.terminateJobsOnUpdateboolfalseWhether to terminate running jobs during infrastructure updates
updatePolicy.jobExecutionTimeoutMinutesint32—Max wait time for jobs during updates (1-360 minutes)
schedulingPolicyobject—Fair-share scheduling policy (see below)

Optional Fields — Compute Resources

FieldTypeApplies ToDescription
computeResources.minVcpusint32EC2/SPOTMinimum vCPUs to maintain (default: 0)
computeResources.desiredVcpusint32EC2/SPOTInitial desired vCPUs
computeResources.securityGroupIdslist(StringValueOrRef)AllVPC security groups
computeResources.instanceTypeslist(string)EC2/SPOTInstance types (e.g., ["optimal"], ["m5.xlarge", "c5.xlarge"])
computeResources.allocationStrategystringEC2/SPOTBEST_FIT_PROGRESSIVE, SPOT_CAPACITY_OPTIMIZED, or SPOT_PRICE_CAPACITY_OPTIMIZED
computeResources.instanceRoleStringValueOrRefEC2/SPOTECS instance profile ARN (required for EC2/SPOT)
computeResources.ec2KeyPairstringEC2/SPOTSSH key pair name
computeResources.bidPercentageint32SPOTMax % of On-Demand price (0-100)
computeResources.spotIamFleetRoleStringValueOrRefSPOTSpot Fleet IAM role (required for SPOT)
computeResources.launchTemplateobjectEC2/SPOTCustom launch template (id or name + optional version)
computeResources.ec2Configurationslist(object)EC2/SPOTAMI customization (max 2 entries)
computeResources.resourceTagsmap(string)EC2/SPOTTags for launched compute resources

Optional Fields — Job Queue

FieldTypeDefaultDescription
jobQueues[].statestringENABLEDQueue state: ENABLED or DISABLED
jobQueues[].jobStateTimeLimitActionslist(object)—Auto-cancel jobs stuck in a state
jobQueues[].jobStateTimeLimitActions[].actionstring—Action to take: CANCEL
jobQueues[].jobStateTimeLimitActions[].maxTimeSecondsint32—Time threshold (600-86400 seconds)
jobQueues[].jobStateTimeLimitActions[].reasonstring—Human-readable reason
jobQueues[].jobStateTimeLimitActions[].statestring—Job state to monitor (e.g., RUNNABLE)

Optional Fields — Scheduling Policy

FieldTypeDescription
schedulingPolicy.computeReservationint32% of vCPUs reserved for new share identifiers (0-99)
schedulingPolicy.shareDecaySecondsint32Usage history decay period (0-604800 seconds)
schedulingPolicy.shareDistributionslist(object)Weight per share identifier
schedulingPolicy.shareDistributions[].shareIdentifierstringUnique share identifier (supports * wildcard suffix)
schedulingPolicy.shareDistributions[].weightFactordoubleRelative share weight (0.0001-999.9999)

Outputs

OutputTypeDescription
compute_environment_arnstringARN of the compute environment
compute_environment_namestringName of the compute environment
ecs_cluster_arnstringARN of the underlying ECS cluster
statusstringCompute environment status
job_queue_arns.<name>stringPer-queue ARN (one entry per queue name)
scheduling_policy_arnstringScheduling policy ARN (if created)

Presets

NameDescription
01-fargate-batchServerless Fargate, single queue, zero-management
02-ec2-managed-batchEC2 with optimal instances, two priority queues, update policy
03-spot-cost-optimized-batchSpot instances, fair-share scheduling, multi-team capacity

Design Decisions

Bundling scope. The compute environment and job queues are bundled because a compute environment without a queue is incomplete infrastructure — you cannot submit jobs without a queue. Job definitions are excluded because they represent application-level workloads with independent lifecycles (versioned, frequently updated, reusable across queues).

MANAGED only. UNMANAGED compute environments (where the user manages compute) and EKS-based compute environments are deferred to v2 as they are niche use cases that add significant complexity.

Scheduling policy as top-level. The scheduling policy is defined at the spec level and attached to all bundled queues. Per-queue scheduling policies with external references are deferred to v2.

State defaults. Both compute environment and job queue states default to ENABLED via the OpenMCF middleware default mechanism. State validation is delegated to the AWS API to keep the proto schema simple and forward-compatible.

Next article

AWS Certificate

AWS Certificate Deploys a public SSL/TLS certificate through AWS Certificate Manager (ACM) with automatic DNS validation via Route53. OpenMCF creates the certificate, provisions the required CNAME validation records in the specified hosted zone, and waits for ACM to confirm domain ownership before marking the deployment complete. What Gets Created When you deploy an AwsCertManagerCert resource, OpenMCF provisions: ACM Certificate — an acm.Certificate resource requesting a public certificate for...
Read next article
Presets
3 ready-to-deploy configurationsView presets →