Twing Data Logo

Twing Data

Redshift Integration

This guide will help you integrate Amazon Redshift with Twing Data to optimize your data warehouse performance, reduce costs, and gain valuable insights into query patterns and resource utilization.

What Data We Collect

Twing Data analyzes your Redshift metadata to provide actionable insights. We collect information from three key system tables:

  • SYS_QUERY_HISTORY - Contains detailed execution metrics for each query, including runtime, resource consumption, and cache utilization. This helps identify performance bottlenecks and optimization opportunities.
  • SVL_USER_INFO - Provides user-level information to help understand access patterns and resource usage across different teams and applications.
  • SYS_QUERY_TEXT - Stores the full SQL text of queries, enabling deep analysis of query patterns and opportunities for query optimization.

How We Use Your Data

Twing Data processes your Redshift metadata to provide:

  • Performance optimization recommendations based on query patterns and execution metrics
  • Cost-saving opportunities through analysis of resource utilization and cache hit rates
  • Query pattern analysis to identify opportunities for materialized views and other optimizations
  • User access pattern insights to improve security and resource allocation
  • Historical trend analysis to predict and prevent performance issues

Live Integration

The live integration creates a dedicated user with minimal permissions to access only the necessary system tables. This ensures secure, continuous monitoring while maintaining strict access controls.

-- Create user with minimal permissions
CREATE USER twing_user WITH PASSWORD 'RANDOM_PASSWORD_HERE';

-- Grant read-only access to the necessary system tables
ALTER USER twing_user WITH SYSLOG ACCESS UNRESTRICTED;

Note: The created user has read-only access to system tables and cannot access your actual data tables.

One-time Integration

For teams preferring a one-time analysis, this option extracts historical query data for a comprehensive performance audit. The analysis includes query patterns, resource utilization, and optimization opportunities.

-- The query history
select s.user_id,
    u.usename as usename,
    s.query_id,
    s.query_type,
    s.database_name,
    trim(s.status) as status,
    s.result_cache_hit,
    s.start_time,
    s.end_time
from SYS_QUERY_HISTORY s
left join SVL_USER_INFO u on u.usesysid = s.user_id
where s.query_type != 'UTILITY'
and trim(s.query_text) not in (
    'SELECT pg_backend_pid()',
    'select version()',
    'select current_schema()',
    'select current_database()',
    'select current_user()'
    )
order by s.start_time desc
limit 5000; -- Specify limit to match desired rows and file size

-- The query details since Redshift doesn't have the full query text in SYS_QUERY_HISTORY
SELECT query_id, sequence, text
FROM SYS_QUERY_TEXT
where query_id in (
    select s.query_id
    from SYS_QUERY_HISTORY s
    where s.query_type != 'UTILITY'
    and trim(s.query_text) not in (
        'SELECT pg_backend_pid()',
        'select version()',
        'select current_schema()',
        'select current_database()',
        'select current_user()'
    )
    limit 5000
)
order by query_id, sequence;

The extracted data provides insights into query performance, resource utilization, and optimization opportunities while excluding system queries and maintaining data privacy.