OpenMCF logoOpenMCF

Loading...

GCP BigQuery Dataset

Deploys a GCP BigQuery dataset with configurable data location, table lifecycle defaults, access control, and optional CMEK encryption. The dataset serves as the top-level container for tables, views, and routines in BigQuery.

What Gets Created

When you deploy a GcpBigQueryDataset resource, OpenMCF provisions:

  • BigQuery Dataset — a google_bigquery_dataset resource in the specified project and location, tagged with organization, environment, and resource labels
  • Access Control Entries — if the access field is provided, an authoritative set of IAM bindings granting roles to users, groups, domains, special groups, IAM members, or authorized views
  • CMEK Encryption Configuration — if kmsKeyName is provided, all new tables in the dataset default to encryption with the specified Cloud KMS key

Prerequisites

  • GCP credentials configured via environment variables or OpenMCF provider config
  • A GCP project where the dataset will be created
  • A Cloud KMS key if enabling customer-managed encryption (optional)
  • BigQuery API enabled in the target project

Quick Start

Create a file bigquery-dataset.yaml:

apiVersion: gcp.openmcf.org/v1
kind: GcpBigQueryDataset
metadata:
  name: my-dataset
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: my-org
    pulumi.openmcf.org/project: my-project
    pulumi.openmcf.org/stack.name: dev.GcpBigQueryDataset.my-dataset
spec:
  projectId:
    value: my-gcp-project
  datasetId: analytics_events
  location: US

Deploy:

openmcf apply -f bigquery-dataset.yaml

This creates a BigQuery dataset named analytics_events in the US multi-region with default access (project owners = OWNER, project editors = WRITER, project viewers = READER).

Configuration Reference

Required Fields

FieldTypeDescriptionValidation
projectIdStringValueOrRefGCP project where the dataset will be created. Can reference a GcpProject resource via valueFrom.Required
datasetIdstringUnique identifier for the dataset within the project. Only letters, numbers, and underscores. Immutable after creation.Required; pattern ^[0-9A-Za-z_]+$; max 1024 chars
locationstringGeographic location where the dataset resides (e.g., US, EU, us-central1). Immutable after creation.Required

Optional Fields

FieldTypeDefaultDescription
friendlyNamestring—User-friendly display name for the dataset.
descriptionstring—Description of the dataset's contents or purpose.
defaultTableExpirationMsint640 (no expiration)Default lifetime for tables in the dataset, in milliseconds. Minimum 3600000 (1 hour).
defaultPartitionExpirationMsint640 (no expiration)Default expiration for partitions in partitioned tables, in milliseconds.
maxTimeTravelHoursint32168 (7 days)Hours of time travel for point-in-time snapshots. Range: 48–168. Lower values reduce storage costs.
isCaseInsensitiveboolfalseWhen true, dataset and table names are case-insensitive. Immutable after creation.
defaultCollationstring—Default collation for string columns in new tables. Use und:ci for case-insensitive collation.
storageBillingModelstringLOGICALBilling model: LOGICAL (uncompressed bytes) or PHYSICAL (compressed bytes, can reduce costs 60–80%).
deleteContentsOnDestroyboolfalseWhen true, all tables are deleted when the dataset is destroyed. When false, destroy fails if the dataset contains tables.
kmsKeyNameStringValueOrRef—Cloud KMS key for default table encryption (CMEK). Format: projects/{project}/locations/{location}/keyRings/{keyRing}/cryptoKeys/{key}. Can reference a GcpKmsKey resource via valueFrom.
accessGcpBigQueryDatasetAccessEntry[]Default project accessAuthoritative access control entries. Entries not listed here are removed. See access entry fields below.
access[].rolestring—IAM role to grant (e.g., OWNER, WRITER, READER, roles/bigquery.dataViewer). Required unless view is set.
access[].userByEmailstring—Email address of a Google Account.
access[].groupByEmailstring—Email address of a Google Group.
access[].domainstring—Domain to grant access to (e.g., example.com).
access[].specialGroupstring—Special group: projectOwners, projectReaders, projectWriters, or allAuthenticatedUsers.
access[].iamMemberstring—IAM member expression (e.g., serviceAccount:sa@project.iam.gserviceaccount.com).
access[].view.projectIdstring—GCP project containing the authorized view.
access[].view.datasetIdstring—Dataset containing the authorized view.
access[].view.tableIdstring—Table ID of the authorized view.

Examples

Dataset with Table Expiration

Automatically delete tables after 90 days, useful for staging or transient data:

apiVersion: gcp.openmcf.org/v1
kind: GcpBigQueryDataset
metadata:
  name: staging-events
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: my-org
    pulumi.openmcf.org/project: my-project
    pulumi.openmcf.org/stack.name: staging.GcpBigQueryDataset.staging-events
spec:
  projectId:
    value: my-gcp-project
  datasetId: staging_events
  location: us-central1
  friendlyName: Staging Events
  description: Transient event data with 90-day auto-expiration
  defaultTableExpirationMs: 7776000000
  maxTimeTravelHours: 48
  deleteContentsOnDestroy: true

Dataset with CMEK Encryption and Physical Billing

Production dataset using customer-managed encryption and physical storage billing for cost optimization:

apiVersion: gcp.openmcf.org/v1
kind: GcpBigQueryDataset
metadata:
  name: prod-analytics
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: my-org
    pulumi.openmcf.org/project: my-project
    pulumi.openmcf.org/stack.name: prod.GcpBigQueryDataset.prod-analytics
spec:
  projectId:
    value: my-gcp-project
  datasetId: prod_analytics
  location: US
  friendlyName: Production Analytics
  description: Core analytics dataset with CMEK and physical billing
  storageBillingModel: PHYSICAL
  kmsKeyName:
    value: projects/my-gcp-project/locations/us/keyRings/analytics-ring/cryptoKeys/analytics-key

Dataset with Explicit Access Control

Grant access to specific users, groups, and an authorized view:

apiVersion: gcp.openmcf.org/v1
kind: GcpBigQueryDataset
metadata:
  name: finance-data
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: my-org
    pulumi.openmcf.org/project: my-project
    pulumi.openmcf.org/stack.name: prod.GcpBigQueryDataset.finance-data
spec:
  projectId:
    value: my-gcp-project
  datasetId: finance_data
  location: EU
  friendlyName: Finance Data
  description: Restricted financial data with explicit access grants
  isCaseInsensitive: true
  defaultCollation: "und:ci"
  access:
    - role: OWNER
      userByEmail: data-owner@example.com
    - role: WRITER
      groupByEmail: data-engineers@example.com
    - role: READER
      groupByEmail: analysts@example.com
    - role: READER
      iamMember: "serviceAccount:etl-pipeline@my-gcp-project.iam.gserviceaccount.com"
    - view:
        projectId: my-gcp-project
        datasetId: reporting_views
        tableId: finance_summary

Using Foreign Key References

Reference other OpenMCF-managed resources instead of hardcoding values:

apiVersion: gcp.openmcf.org/v1
kind: GcpBigQueryDataset
metadata:
  name: ref-dataset
  labels:
    openmcf.org/provisioner: pulumi
    pulumi.openmcf.org/organization: my-org
    pulumi.openmcf.org/project: my-project
    pulumi.openmcf.org/stack.name: prod.GcpBigQueryDataset.ref-dataset
spec:
  projectId:
    valueFrom:
      kind: GcpProject
      name: my-project
      field: status.outputs.project_id
  datasetId: warehouse
  location: us-central1
  kmsKeyName:
    valueFrom:
      kind: GcpKmsKey
      name: warehouse-key
      field: status.outputs.key_id

Stack Outputs

After deployment, the following outputs are available in status.outputs:

OutputTypeDescription
dataset_idstringThe short dataset ID (same as the spec's datasetId input), used in BigQuery SQL queries and API calls
self_linkstringFully qualified URI of the dataset (e.g., https://bigquery.googleapis.com/bigquery/v2/projects/{project}/datasets/{dataset})
projectstringThe GCP project that contains this dataset
creation_timeint64Creation time of the dataset in milliseconds since epoch

Related Components

  • GcpProject — provides the GCP project where the dataset is created
  • GcpKmsKeyRing — provides the key ring containing KMS keys for CMEK encryption
  • GcpKmsKey — provides the Cloud KMS encryption key referenced by kmsKeyName
  • GcpServiceAccount — creates service accounts that can be granted dataset access

Next article

GCP Bigtable Instance

GCP Bigtable Instance Deploys a Cloud Bigtable instance with one or more clusters, supporting SSD and HDD storage types, per-cluster autoscaling, CMEK encryption, and multi-cluster replication. Tables and app profiles are application-level concerns managed separately. What Gets Created When you deploy a GcpBigtableInstance resource, OpenMCF provisions: Bigtable Instance — a googlebigtableinstance resource that serves as the logical container for data, with GCP labels applied automatically One...
Read next article
Presets
3 ready-to-deploy configurationsView presets →