Skip to content

Connect Snowflake Open Catalog to Dremio Cloud: Multi-Engine Iceberg Analytics

Published: at 01:00 AM

Snowflake Open Catalog is Snowflake’s managed implementation of the Apache Iceberg REST catalog specification, based on the open-source Apache Polaris project. It serves as a centralized metadata catalog for Apache Iceberg tables, enabling multiple compute engines — including Dremio, Spark, Trino, and Flink — to read from and write to the same Iceberg tables without metadata conflicts.

Dremio Cloud connects to Snowflake Open Catalog as a first-class Iceberg data source. You get full read and write access to Iceberg tables, automatic table maintenance (compaction, manifest optimization, vacuuming), and the ability to federate catalog data with databases, object storage, cloud warehouses, and other catalogs — all through standard SQL.

For organizations already invested in Snowflake, the Open Catalog is a strategic choice for multi-engine interoperability. Unlike Snowflake’s proprietary internal catalog (which is only accessible through Snowflake compute), the Open Catalog exposes Iceberg metadata via a standard REST API. This means you’re not locked into Snowflake compute for every analytical query — Dremio can read the same tables at a fraction of the credit cost for repetitive workloads. Dremio also provides its federated engine, Reflections, governance, and AI capabilities — all without duplicating data or metadata.

Why Snowflake Open Catalog Users Need Dremio

Multi-Engine Strategy Without Vendor Lock-In

Snowflake Open Catalog is designed for multi-engine compatibility, which makes it an ideal complement to Dremio. By connecting Dremio to your Snowflake Open Catalog, you add a query engine that specializes in areas Snowflake doesn’t:

Cost Optimization

Instead of running all workloads through Snowflake credits, offload analytical queries to Dremio. Dremio’s Reflections cache results so repeated queries don’t consume Snowflake credits. For organizations spending significant amounts on Snowflake compute, routing read-heavy analytical workloads through Dremio can reduce overall costs.

Federate with Non-Snowflake Sources

Snowflake’s data sharing works within Snowflake. But what if you need to join your Snowflake Open Catalog data with PostgreSQL application data, MongoDB user profiles, or S3 raw event logs? Dremio’s federation engine does exactly that — no ETL pipelines, no data duplication.

Credential Vending

Snowflake Open Catalog supports credential vending, meaning Dremio doesn’t need separate storage credentials to access the underlying S3, Azure, or GCS data. The catalog provides temporary, scoped credentials for accessing data files. This simplifies security configuration and reduces the credentials you need to manage.

Write Support for External Catalogs

Dremio can write to external Snowflake Open Catalogs, enabling you to create tables, run transformations, and build data pipelines using Dremio’s SQL engine while keeping metadata managed in Snowflake’s catalog.

Prerequisites

Before connecting to Snowflake Open Catalog, confirm you have:

Step-by-Step: Connect Snowflake Open Catalog to Dremio Cloud

1. Add the Source

In the Dremio console, click the ”+” button in the left sidebar and select Snowflake Open Catalog from the catalog source types.

2. Configure Connection Details

3. Select Catalogs

Choose which catalogs to enable:

4. Configure Advanced Settings

Set Reflection Refresh and Metadata schedules. For catalogs with frequently changing tables, more frequent metadata refreshes ensure Dremio sees new tables and schema changes quickly.

5. Set Privileges and Save

Optionally restrict which Dremio users can access this catalog. Click Save.

Query Snowflake Open Catalog Data

-- Query an Iceberg table managed by Snowflake Open Catalog
SELECT customer_id, customer_name, total_spend, signup_date
FROM "sf-open-catalog".analytics.customer_summary
WHERE total_spend > 10000 AND signup_date >= '2024-01-01'
ORDER BY total_spend DESC;

-- Write to an external catalog
INSERT INTO "sf-open-catalog".analytics.monthly_metrics
SELECT
  DATE_TRUNC('month', order_date) AS month,
  COUNT(*) AS order_count,
  SUM(total_amount) AS revenue
FROM "sf-open-catalog".ecommerce.orders
GROUP BY 1;

Federate with Other Sources

Join catalog data with non-Snowflake sources in a single query:

SELECT
  soc.customer_name,
  soc.total_spend AS catalog_spend,
  pg.region,
  pg.account_manager,
  s3.support_ticket_count,
  CASE
    WHEN soc.total_spend > 100000 AND s3.support_ticket_count < 3 THEN 'Platinum'
    WHEN soc.total_spend > 50000 THEN 'Gold'
    WHEN soc.total_spend > 10000 THEN 'Silver'
    ELSE 'Standard'
  END AS customer_tier
FROM "sf-open-catalog".analytics.customer_summary soc
JOIN "postgres-crm".public.customers pg ON soc.customer_id = pg.customer_id
LEFT JOIN "s3-support".tickets.customer_tickets s3 ON soc.customer_id = s3.customer_id
ORDER BY catalog_spend DESC;

Build a Semantic Layer

Create views that combine catalog data with business logic:

CREATE VIEW analytics.gold.customer_health AS
SELECT
  soc.customer_id,
  soc.customer_name,
  soc.total_spend,
  soc.signup_date,
  CASE
    WHEN soc.total_spend > 100000 THEN 'Enterprise'
    WHEN soc.total_spend > 25000 THEN 'Mid-Market'
    ELSE 'SMB'
  END AS customer_segment,
  ROUND(soc.total_spend / GREATEST(DATEDIFF('MONTH', soc.signup_date, CURRENT_DATE), 1), 2) AS monthly_spend_rate
FROM "sf-open-catalog".analytics.customer_summary soc;

Navigate to the Catalog, click Edit (pencil icon) on this view, and Generate Wiki and Generate Tags. This creates the business context that powers Dremio’s AI features.

AI-Powered Analytics

Dremio AI Agent

The AI Agent lets users ask questions in plain English. For example: “Who are our highest-spending enterprise customers?” The Agent reads your wiki descriptions and view definitions to generate the correct SQL. Better wikis produce better results — describe what “enterprise customer” and “monthly spend rate” mean in business terms.

Dremio MCP Server

The Dremio MCP Server extends AI capabilities to Claude, ChatGPT, and other AI chat clients. Connect through the hosted MCP Server:

  1. Create a Native OAuth application in Dremio Cloud
  2. Configure redirect URLs for your AI client (e.g., https://claude.ai/api/mcp/auth_callback)
  3. Connect using mcp.dremio.cloud/mcp/{project_id}

Your team can then ask Claude “Show me customer health trends from our Snowflake catalog data” and get governed, accurate results without writing SQL.

AI SQL Functions

Enrich catalog data with AI inline in your queries:

SELECT
  customer_name,
  total_spend,
  AI_CLASSIFY(
    'Based on spending patterns, classify customer risk of churn',
    'Customer: ' || customer_name || ', Total Spend: $' || CAST(total_spend AS VARCHAR) || ', Months Active: ' || CAST(months_active AS VARCHAR),
    ARRAY['Low Risk', 'Moderate Risk', 'High Risk', 'Critical']
  ) AS churn_risk
FROM "sf-open-catalog".analytics.customer_summary
WHERE total_spend > 5000;

AI_CLASSIFY runs LLM inference in your SQL query. AI_GENERATE produces narrative summaries, and AI_SIMILARITY finds semantic matches between text fields.

Reflections for Performance

Create Reflections on frequently queried views to cache results:

  1. In the Catalog, select the view and click the Reflections tab
  2. Choose Raw Reflection (full cache) or Aggregation Reflection (pre-computed metrics)
  3. Select columns and aggregations to include
  4. Set the Refresh Interval — balance freshness against compute cost
  5. Click Save

BI tools connected via Arrow Flight or ODBC get sub-second responses from Reflections instead of re-reading Iceberg files from storage. This reduces Snowflake credit consumption for workloads routed through Dremio.

Governance Across Snowflake Open Catalog and Other Sources

Dremio’s Fine-Grained Access Control (FGAC) adds governance that spans Snowflake Open Catalog and all other sources:

These policies apply across SQL Runner, BI tools (Arrow Flight/ODBC), AI Agent, and MCP Server.

Connect BI Tools via Arrow Flight

Arrow Flight provides 10-100x faster data transfer than JDBC/ODBC:

All queries benefit from Reflections, governance, and the semantic layer.

VS Code Copilot Integration

Dremio’s VS Code extension with Copilot integration lets developers query Snowflake Open Catalog data from their IDE. Ask Copilot “Show me customer churn risk from the catalog” and get SQL generated using your semantic layer — without switching tools.

When to Use Snowflake Open Catalog vs. Other Catalogs

Use Snowflake Open Catalog when: You’re already in the Snowflake ecosystem and want multi-engine Iceberg access, your team uses Snowflake for data management but needs Dremio for federation and AI.

Use AWS Glue when: You’re AWS-native and want tight integration with EMR, Athena, and S3.

Use Dremio’s Open Catalog when: You want zero-configuration automatic maintenance, Autonomous Reflections, and no external catalog dependencies.

You can connect multiple catalogs simultaneously. Many organizations use Snowflake Open Catalog for shared enterprise data and Dremio’s Open Catalog for Dremio-specific analytical workloads.

Credential Vending in Detail

Credential vending is a key feature of Snowflake Open Catalog that simplifies Dremio’s access to underlying storage. Here’s how it works:

  1. You configure storage in Snowflake Open Catalog — specify the S3, Azure, or GCS bucket where Iceberg data files live.
  2. When Dremio queries a table, it requests access from the catalog API.
  3. Snowflake Open Catalog returns temporary, scoped credentials — short-lived tokens with permissions limited to the specific data files needed.
  4. Dremio uses these credentials to read (or write, for external catalogs) directly from storage.
  5. Credentials expire automatically — no long-lived keys to rotate or manage.

This means your Dremio Cloud connection needs only the catalog API credentials (OAuth), not separate storage credentials for every S3 bucket or Azure container. One connection, automatic credential management, reduced security surface area.

Multi-Engine Architecture with Snowflake Open Catalog

Snowflake Open Catalog enables a powerful multi-engine architecture:

All engines read from the same Iceberg tables managed by Snowflake Open Catalog — no data duplication, no metadata sync issues, no format conversion. Each engine reads the latest table metadata from the catalog and accesses data files via credential vending.

Dremio’s unique contribution to this architecture is federation (joining catalog tables with non-Iceberg sources), AI capabilities (Agent, MCP, SQL Functions), and Reflections (sub-second BI serving without re-reading storage).

Snowflake Open Catalog vs. Apache Polaris

Snowflake Open Catalog is based on the open-source Apache Polaris (incubating) project. Key differences:

FeatureSnowflake Open CatalogApache Polaris (self-managed)
HostingManaged by SnowflakeSelf-hosted
Credential VendingBuilt-inRequires configuration
AuthenticationSnowflake OAuthCustom
SupportSnowflake supportCommunity
CostSnowflake pricingInfrastructure costs

If you use Snowflake’s managed offering, you get turnkey catalog management. If you prefer self-managed, Apache Polaris works with Dremio’s Iceberg REST Catalog connector.

Get Started

Snowflake Open Catalog users can build a truly multi-engine lakehouse — manage Iceberg metadata in Snowflake’s infrastructure while querying with Dremio’s federated engine, AI capabilities, and Reflection-based acceleration.

Connect your Snowflake Open Catalog to Dremio Cloud, build views over your Iceberg tables, and start leveraging AI Agent, MCP Server, and Reflections for cost-optimized analytical serving. The setup takes minutes and works immediately.

Try Dremio Cloud free for 30 days and connect your Snowflake Open Catalog.