AWS Glue Catalog Database

Deploys an AWS Glue Data Catalog database — a metadata namespace that organizes table definitions for data stored in S3, Redshift, RDS, and other data stores. The database is the namespace that Amazon Athena, Glue Crawlers, Glue ETL jobs, and Redshift Spectrum use to discover and query data via database.table naming.

What Gets Created

When you deploy an AwsGlueCatalogDatabase resource, OpenMCF provisions:

Glue Catalog Database — an aws_glue_catalog_database resource registered in the AWS Glue Data Catalog with the specified name, description, and optional default storage location

Prerequisites

AWS credentials configured via environment variables or OpenMCF provider config
An S3 bucket if setting locationUri for default table storage (the bucket must exist before deploying)

Quick Start

Create a file glue-catalog-database.yaml:

apiVersion: aws.openmcf.org/v1
kind: AwsGlueCatalogDatabase
metadata:
  name: analytics
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: my-org
    pulumi.openmcf.org/project: my-project
    pulumi.openmcf.org/stack.name: dev.AwsGlueCatalogDatabase.analytics
spec:
  region: us-east-1
  description: "Analytics data catalog for ad-hoc queries and BI dashboards"

Deploy:

openmcf apply -f glue-catalog-database.yaml

This creates a Glue Data Catalog database named analytics that Athena workgroups, Glue crawlers, and ETL jobs can use as a namespace for table definitions.

Configuration Reference

Required Fields

Field	Type	Description
`region`	`string`	AWS region where the Glue Catalog Database will be created (e.g., `us-east-1`)

Optional Fields

Field	Type	Default	Description
`description`	`string`	`""`	Human-readable description of the database (max 2048 characters, enforced by AWS API)
`locationUri`	`string`	`""`	Default S3 URI for tables created in this database (e.g., `s3://bucket/prefix/`). Tables without an explicit location inherit this path

ForceNew Fields

Database name (from metadata.name) — Cannot be changed after creation. Must be 1-255 characters: lowercase letters, numbers, and underscores only. No uppercase characters.

Examples

Minimal Data Catalog

An empty database for quick experimentation. Tables are added later via Glue Crawlers or DDL statements.

apiVersion: aws.openmcf.org/v1
kind: AwsGlueCatalogDatabase
metadata:
  name: experiments
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: my-org
    pulumi.openmcf.org/project: data
    pulumi.openmcf.org/stack.name: dev.AwsGlueCatalogDatabase.experiments
spec:
  region: us-east-1

Descriptive Analytics Database

A database with a description documenting its purpose and contents.

apiVersion: aws.openmcf.org/v1
kind: AwsGlueCatalogDatabase
metadata:
  name: sales_analytics
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: acme
    pulumi.openmcf.org/project: analytics
    pulumi.openmcf.org/stack.name: prod.AwsGlueCatalogDatabase.sales_analytics
spec:
  region: us-east-1
  description: >-
    Sales pipeline data lake — raw ingestion tables, curated transformations,
    and aggregated datasets for BI dashboards and ML feature stores.

Production S3 Data Lake

A production database with a default S3 storage location so all tables inherit a consistent base path. Recommended for organized data lakes.

apiVersion: aws.openmcf.org/v1
kind: AwsGlueCatalogDatabase
metadata:
  name: prod_warehouse
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: acme
    pulumi.openmcf.org/project: data-platform
    pulumi.openmcf.org/stack.name: prod.AwsGlueCatalogDatabase.prod_warehouse
spec:
  region: us-east-1
  description: >-
    Production data warehouse — curated datasets from ETL pipelines.
    Tables populated by Glue crawlers on a daily schedule. Accessed by
    Athena workgroups for ad-hoc analytics and Redshift Spectrum for BI.
  locationUri: "s3://acme-prod-data-lake/warehouse/"

Stack Outputs

Output	Type	Description
`database_name`	`string`	Name of the Glue Data Catalog database, used in Athena queries (`FROM database.table`), Glue crawler configs, and ETL job scripts
`database_arn`	`string`	ARN of the database, used for IAM policies and Lake Formation permissions
`catalog_id`	`string`	ID of the Glue Data Catalog (AWS Account ID), used by downstream resources needing catalog context

AWS Athena Workgroup — Queries data described by tables in this database
AWS S3 Bucket — Storage layer for data referenced by Glue tables
AWS KMS Key — Encryption for data at rest in S3
AWS Redshift Cluster — Redshift Spectrum queries the Glue catalog for external tables

AWS Glue Catalog Database

What Gets Created

When you deploy an AwsGlueCatalogDatabase resource, OpenMCF provisions:

Glue Catalog Database — an aws_glue_catalog_database resource registered in the AWS Glue Data Catalog with the specified name, description, and optional default storage location

Prerequisites

AWS credentials configured via environment variables or OpenMCF provider config
An S3 bucket if setting locationUri for default table storage (the bucket must exist before deploying)

Quick Start

Create a file glue-catalog-database.yaml:

apiVersion: aws.openmcf.org/v1
kind: AwsGlueCatalogDatabase
metadata:
  name: analytics
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: my-org
    pulumi.openmcf.org/project: my-project
    pulumi.openmcf.org/stack.name: dev.AwsGlueCatalogDatabase.analytics
spec:
  region: us-east-1
  description: "Analytics data catalog for ad-hoc queries and BI dashboards"

Deploy:

openmcf apply -f glue-catalog-database.yaml

This creates a Glue Data Catalog database named analytics that Athena workgroups, Glue crawlers, and ETL jobs can use as a namespace for table definitions.

Configuration Reference

Required Fields

Field	Type	Description
`region`	`string`	AWS region where the Glue Catalog Database will be created (e.g., `us-east-1`)

Optional Fields

Field	Type	Default	Description
`description`	`string`	`""`	Human-readable description of the database (max 2048 characters, enforced by AWS API)
`locationUri`	`string`	`""`	Default S3 URI for tables created in this database (e.g., `s3://bucket/prefix/`). Tables without an explicit location inherit this path

ForceNew Fields

Database name (from metadata.name) — Cannot be changed after creation. Must be 1-255 characters: lowercase letters, numbers, and underscores only. No uppercase characters.

Examples

Minimal Data Catalog

An empty database for quick experimentation. Tables are added later via Glue Crawlers or DDL statements.

apiVersion: aws.openmcf.org/v1
kind: AwsGlueCatalogDatabase
metadata:
  name: experiments
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: my-org
    pulumi.openmcf.org/project: data
    pulumi.openmcf.org/stack.name: dev.AwsGlueCatalogDatabase.experiments
spec:
  region: us-east-1

Descriptive Analytics Database

A database with a description documenting its purpose and contents.

apiVersion: aws.openmcf.org/v1
kind: AwsGlueCatalogDatabase
metadata:
  name: sales_analytics
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: acme
    pulumi.openmcf.org/project: analytics
    pulumi.openmcf.org/stack.name: prod.AwsGlueCatalogDatabase.sales_analytics
spec:
  region: us-east-1
  description: >-
    Sales pipeline data lake — raw ingestion tables, curated transformations,
    and aggregated datasets for BI dashboards and ML feature stores.

Production S3 Data Lake

A production database with a default S3 storage location so all tables inherit a consistent base path. Recommended for organized data lakes.

apiVersion: aws.openmcf.org/v1
kind: AwsGlueCatalogDatabase
metadata:
  name: prod_warehouse
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: acme
    pulumi.openmcf.org/project: data-platform
    pulumi.openmcf.org/stack.name: prod.AwsGlueCatalogDatabase.prod_warehouse
spec:
  region: us-east-1
  description: >-
    Production data warehouse — curated datasets from ETL pipelines.
    Tables populated by Glue crawlers on a daily schedule. Accessed by
    Athena workgroups for ad-hoc analytics and Redshift Spectrum for BI.
  locationUri: "s3://acme-prod-data-lake/warehouse/"

Stack Outputs

Output	Type	Description
`database_name`	`string`	Name of the Glue Data Catalog database, used in Athena queries (`FROM database.table`), Glue crawler configs, and ETL job scripts
`database_arn`	`string`	ARN of the database, used for IAM policies and Lake Formation permissions
`catalog_id`	`string`	ID of the Glue Data Catalog (AWS Account ID), used by downstream resources needing catalog context

AWS Athena Workgroup — Queries data described by tables in this database
AWS S3 Bucket — Storage layer for data referenced by Glue tables
AWS KMS Key — Encryption for data at rest in S3
AWS Redshift Cluster — Redshift Spectrum queries the Glue catalog for external tables

AWS Glue Catalog Database

What Gets Created

Prerequisites

Quick Start

Configuration Reference

Required Fields

Optional Fields

ForceNew Fields

Examples

Minimal Data Catalog

Descriptive Analytics Database

Production S3 Data Lake

Stack Outputs

AWS HTTP API Gateway

Presets

AWS Glue Catalog Database

What Gets Created

Prerequisites

Quick Start

Configuration Reference

Required Fields

Optional Fields

ForceNew Fields

Examples

Minimal Data Catalog

Descriptive Analytics Database

Production S3 Data Lake

Stack Outputs

AWS HTTP API Gateway

Presets

AWS Glue Catalog Database

What Gets Created

Prerequisites

Quick Start

Configuration Reference

Required Fields

Optional Fields

ForceNew Fields

Examples

Minimal Data Catalog

Descriptive Analytics Database

Production S3 Data Lake

Stack Outputs

Related Components

AWS HTTP API Gateway

Presets

AWS Glue Catalog Database

What Gets Created

Prerequisites

Quick Start

Configuration Reference

Required Fields

Optional Fields

ForceNew Fields

Examples

Minimal Data Catalog

Descriptive Analytics Database

Production S3 Data Lake

Stack Outputs

Related Components

AWS HTTP API Gateway

Presets