Twing Data Logo

Twing Data

BigQuery Integration

This guide will help you integrate Google BigQuery with Twing Data to optimize your data warehouse performance, reduce costs, and gain valuable insights into query patterns and resource utilization.

What Data We Collect

Twing Data analyzes your BigQuery metadata to provide actionable insights. We collect information from these key system tables:

  • INFORMATION_SCHEMA.JOBS_BY_PROJECT - Contains detailed execution metrics for each query, including processing time, bytes processed, and slot utilization. This helps identify performance bottlenecks and cost optimization opportunities.
  • INFORMATION_SCHEMA.JOBS_BY_USER - Provides user-level information to help understand access patterns and resource usage across different teams and applications.
  • INFORMATION_SCHEMA.JOBS_TIMELINE - Tracks query execution stages and timeline, enabling deep analysis of query performance and optimization opportunities.

How We Use Your Data

Twing Data processes your BigQuery metadata to provide:

  • Cost optimization recommendations based on bytes processed and slot utilization
  • Query performance insights to identify slow-running queries and partition pruning opportunities
  • Resource utilization patterns to help optimize slot reservations and on-demand usage
  • Data access pattern insights to improve table partitioning and clustering
  • Historical trend analysis to predict and optimize costs

Live Integration

The live integration creates a dedicated service account with minimal permissions to access only the necessary system views. This ensures secure, continuous monitoring while maintaining strict access controls.

-- Create service account and grant necessary permissions
-- Run these commands in Cloud Shell or using gcloud CLI

# Create service account
gcloud iam service-accounts create twing-data-monitor \
    --display-name="Twing Data Monitor"

# Grant BigQuery Job User role
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
    --member="serviceAccount:twing-data-monitor@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
    --role="roles/bigquery.jobUser"

# Grant BigQuery Resource Viewer role
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
    --member="serviceAccount:twing-data-monitor@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
    --role="roles/bigquery.resourceViewer"

# Create and download service account key
gcloud iam service-accounts keys create twing-data-key.json \
    --iam-account=twing-data-monitor@YOUR_PROJECT_ID.iam.gserviceaccount.com

Note: The service account has read-only access to system views and job information. It cannot access your actual data tables or modify any resources.

One-time Integration

For teams preferring a one-time analysis, this option extracts historical query data for a comprehensive performance audit. The analysis includes query patterns, resource utilization, and optimization opportunities.

-- Query job history and performance metrics
SELECT
  creation_time,
  user_email,
  job_type,
  statement_type,
  total_bytes_processed,
  total_slot_ms,
  cache_hit,
  query,
  job_stages
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
  AND job_type = 'QUERY'
  AND state = 'DONE'
  AND query NOT LIKE '%INFORMATION_SCHEMA%'
ORDER BY creation_time DESC
LIMIT 5000;

-- Get detailed timeline information
SELECT
  job_id,
  period_start,
  period_end,
  active_units,
  timeline
FROM `region-us`.INFORMATION_SCHEMA.JOBS_TIMELINE
WHERE period_start >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
ORDER BY period_start DESC;

The extracted data provides insights into query performance, resource utilization, and optimization opportunities while excluding system queries and maintaining data privacy.