Redshift External Tables
Overview
Redshift Spectrum external tables let you query data stored in Amazon S3 without loading it into Redshift’s internal storage. The data remains in S3 as the source of truth, while table schema and metadata are managed by AWS.
With Monte Carlo’s Redshift integration, you can monitor these external tables for schema, volume, and freshness changes.
File Type Support
Redshift supports creating external tables from the following file types:
- CSV
- JSON
- Parquet
- Avro
- ORC
- Delta
- Iceberg
File type support is controlled by Amazon. Please reach out to Amazon if you need additional file formats supported.
Steps
Already have external tables in Redshift?If you already have external tables created in Redshift, skip to Add Redshift integration in Monte Carlo.
- Create IAM Role for Spectrum.
- Create an External schema.
- Create External tables.
- Add Redshift integration in Monte Carlo.
- Create Table Monitors for the external tables.
Create Redshift External Tables
This guide provides an example workflow for creating external tables in Amazon Redshift. You can also complete many of these steps using alternative methods, such as the API, CLI, or infrastructure as code.
For detailed instructions, configuration options, limitations, and other information please refer to the AWS documentation: Getting started with Amazon Redshift Spectrum.
Prerequisites
- You will need a Redshift cluster or serverless workgroup
- You will need superuser access to the Redshift deployment
1. Create IAM Role for Spectrum
Redshift needs permission to read S3 and access Glue.
- Go to IAM → Roles → Create Role.
- Choose AWS Service → Redshift → Redshift Spectrum.
- Attach the following policies:
AmazonS3ReadOnlyAccess(or a more restricted policy if desired)AWSGlueConsoleFullAccess(or more restricted policy if desired such as AWSGlueDataCatalogFullAccess for only catalog access)
- Note the ARN of the IAM role (e.g.,
arn:aws:iam::123456789012:role/RedshiftSpectrumRole). - Attach the role to your Redshift cluster:
-- Connect as a superuser
ALTER CLUSTER <cluster_name> ADD IAM ROLE 'arn:aws:iam::123456789012:role/RedshiftSpectrumRole';2. Create an External Schema
Redshift needs an external schema that points to your Glue Data Catalog.
CREATE EXTERNAL SCHEMA '<schema_for_external_tables>'
FROM DATA CATALOG
DATABASE 'my_glue_database'
IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftSpectrumRole'
CREATE EXTERNAL DATABASE IF NOT EXISTS;Notes:
DATABASEis the Glue catalog database where external tables metadata is stored.<schema_for_external_tables>is the name of the schema where the external tables will be created.CREATE EXTERNAL DATABASE IF NOT EXISTSallows Redshift to create the catalog database if it doesn’t exist.
3. Create External Tables
Now define external tables that point to S3 data:
CREATE EXTERNAL TABLE <schema_for_external_tables>.<table_name> (
user_id BIGINT,
ts TIMESTAMP
)
STORED AS <file_format>
LOCATION 's3://my-bucket/events/';Notes:
<schema_for_external_tables>is the schema you created in step 2.<table_name>is the name of the external table you are creating.STORED AS <file_format>is the file format of your data in S3 (CSV, ORC, Parquet, JSON).LOCATION: S3 prefix containing your data files.- Table columns must match the schema of the files. The columns
user_idandtshave just been provided as an example.
For detailed instructions on how to set up external tables in Redshift from your S3 data, see the AWS's Getting started with Amazon Redshift Spectrum guide.
Add Redshift Integration in Monte Carlo
Now that you have external tables created in Redshift, we can add a Redshift integration in Monte Carlo to monitor the external tables.
First, refer to the Connection to Redshiftguide for details on creating a service user, managing permissions, and related setup (such as network configuration).
If you’ve already integrated Redshift with Monte Carlo, you can skip this step. If you’d like to confirm access to a specific asset, you can do so from the Integration Settings page.
Onboard Redshift Integration
Add a Redshift integration using the onboarding wizard. Integrations can also be added via the CLI or API if you prefer.
Deployment: [Optional] If you have multiple deployments already in your Monte Carlo account, choose the deployment you want to use for this integration. If you do not have multiple deployments, this field will not be shown.Integration name: The user-friendly name for this integration.Username: The username of the service user you created.Password: The password of the service user.Host: The hostname of your Redshift cluster or serverless instance.Port: The port of your Redshift cluster or serverless instance, typically 5439.Database: The database name your external schema and tables live in. E.g., created in Step 2.
Create Table Monitors on External tables
With the Redshift integration added, you can now create table monitors for schema, freshness and volume on the external tables. See our guide on creating table monitors.
You can also create additional monitors types if you’d like (e.g., custom).
Updated about 3 hours ago
