OpenMCF logoOpenMCF

Loading...

AWS Glue Catalog Database

Deploys an AWS Glue Data Catalog database — a metadata namespace that organizes table definitions for data stored in S3, Redshift, RDS, and other data stores. The database is the namespace that Amazon Athena, Glue Crawlers, Glue ETL jobs, and Redshift Spectrum use to discover and query data via database.table naming.

What Gets Created

When you deploy an AwsGlueCatalogDatabase resource, OpenMCF provisions:

  • Glue Catalog Database — an aws_glue_catalog_database resource registered in the AWS Glue Data Catalog with the specified name, description, and optional default storage location

Prerequisites

  • AWS credentials configured via environment variables or OpenMCF provider config
  • An S3 bucket if setting locationUri for default table storage (the bucket must exist before deploying)

Quick Start

Create a file glue-catalog-database.yaml:

apiVersion: aws.openmcf.org/v1
kind: AwsGlueCatalogDatabase
metadata:
  name: analytics
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: my-org
    pulumi.openmcf.org/project: my-project
    pulumi.openmcf.org/stack.name: dev.AwsGlueCatalogDatabase.analytics
spec:
  region: us-east-1
  description: "Analytics data catalog for ad-hoc queries and BI dashboards"

Deploy:

openmcf apply -f glue-catalog-database.yaml

This creates a Glue Data Catalog database named analytics that Athena workgroups, Glue crawlers, and ETL jobs can use as a namespace for table definitions.

Configuration Reference

Required Fields

FieldTypeDescription
regionstringAWS region where the Glue Catalog Database will be created (e.g., us-east-1)

Optional Fields

FieldTypeDefaultDescription
descriptionstring""Human-readable description of the database (max 2048 characters, enforced by AWS API)
locationUristring""Default S3 URI for tables created in this database (e.g., s3://bucket/prefix/). Tables without an explicit location inherit this path

ForceNew Fields

  • Database name (from metadata.name) — Cannot be changed after creation. Must be 1-255 characters: lowercase letters, numbers, and underscores only. No uppercase characters.

Examples

Minimal Data Catalog

An empty database for quick experimentation. Tables are added later via Glue Crawlers or DDL statements.

apiVersion: aws.openmcf.org/v1
kind: AwsGlueCatalogDatabase
metadata:
  name: experiments
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: my-org
    pulumi.openmcf.org/project: data
    pulumi.openmcf.org/stack.name: dev.AwsGlueCatalogDatabase.experiments
spec:
  region: us-east-1

Descriptive Analytics Database

A database with a description documenting its purpose and contents.

apiVersion: aws.openmcf.org/v1
kind: AwsGlueCatalogDatabase
metadata:
  name: sales_analytics
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: acme
    pulumi.openmcf.org/project: analytics
    pulumi.openmcf.org/stack.name: prod.AwsGlueCatalogDatabase.sales_analytics
spec:
  region: us-east-1
  description: >-
    Sales pipeline data lake — raw ingestion tables, curated transformations,
    and aggregated datasets for BI dashboards and ML feature stores.

Production S3 Data Lake

A production database with a default S3 storage location so all tables inherit a consistent base path. Recommended for organized data lakes.

apiVersion: aws.openmcf.org/v1
kind: AwsGlueCatalogDatabase
metadata:
  name: prod_warehouse
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: acme
    pulumi.openmcf.org/project: data-platform
    pulumi.openmcf.org/stack.name: prod.AwsGlueCatalogDatabase.prod_warehouse
spec:
  region: us-east-1
  description: >-
    Production data warehouse — curated datasets from ETL pipelines.
    Tables populated by Glue crawlers on a daily schedule. Accessed by
    Athena workgroups for ad-hoc analytics and Redshift Spectrum for BI.
  locationUri: "s3://acme-prod-data-lake/warehouse/"

Stack Outputs

OutputTypeDescription
database_namestringName of the Glue Data Catalog database, used in Athena queries (FROM database.table), Glue crawler configs, and ETL job scripts
database_arnstringARN of the database, used for IAM policies and Lake Formation permissions
catalog_idstringID of the Glue Data Catalog (AWS Account ID), used by downstream resources needing catalog context

Related Components

  • AWS Athena Workgroup — Queries data described by tables in this database
  • AWS S3 Bucket — Storage layer for data referenced by Glue tables
  • AWS KMS Key — Encryption for data at rest in S3
  • AWS Redshift Cluster — Redshift Spectrum queries the Glue catalog for external tables

Next article

AWS HTTP API Gateway

AWS HTTP API Gateway Deploys an AWS API Gateway HTTP API (v2) with a bundled stage, routes with inline integrations, and optional JWT or Lambda authorizers. HTTP APIs offer lower latency and cost compared to REST APIs, with native support for Lambda proxy integration, HTTP proxy integration, CORS, and automatic deployments. What Gets Created When you deploy an AwsHttpApiGateway resource, OpenMCF provisions: HTTP API — an API Gateway v2 HTTP API with configured CORS, description, and endpoint...
Read next article
Presets
2 ready-to-deploy configurationsView presets →