# Recce Documentation — Full Corpus --- ## https://docs.reccehq.com/ What is Recce (Data Review Agent) No more merging PRs where the pipeline succeeded but the data is quietly wrong. Recce is a Data Review Agent that automates data validation for pull requests. When you open a PR, it compares your dev environment against production and surfaces schema changes, data diffs, row counts, and downstream impacts. You see what changed, what it affects, and what passed, all before you merge. Recce is the product. The agent automates validation on your PRs. You can run Recce through Cloud (hosted, automated) or open source (local, manual). Get Started with Cloud Set Up Open Source How Recce Works When you open a PR with data changes, Recce automatically: Runs data diffing: The best practice to validate data changes Analyzes impact: Identifies what changed down to the column level using Column-Level Lineage (CLL) Reviews first: The agent provides a data review summary explaining the change and its impact Surfaces what matters: Shows only impacted items, not every downstream table Opens exploration: Spins up a Recce instance where you can run additional diffs, explore lineage, and investigate deeper You review the agent's findings, add notes, and approve with confidence, not blind trust. PR Created Recce Triggered Agent Analyzes Production vs. Development Data Agent Generates Review Summary Human Explore in Recce Instance Human Reviews Approves PR Merges Example of Recce agent summary in a GitHub PR comment: Automate Agent Data Review with CI/CD Recce delivers value through CI/CD integration. Without it, you waste time triaging false alerts from source data updates and manually comparing environments hoping you caught everything. With CI/CD: Every PR gets automatic validation Base and current environments are set up automatically Agent reviews before you do Checks accumulate as organizational knowledge (preset checks) When to Use Recce Business-critical data: Data that's customer-facing or revenue-impacting Team collaboration: When reviewers need to understand impact, not just see code changes Standardized validation: When you need consistent pull request review across senior and junior team members Unknown unknowns: When you can't predict what might break from a change When Not to Use Teams that accept errors on production and fix later Exploratory analysis that won't go to production FAQ Does Recce work without CI/CD? Yes, you can run Recce locally for dev sessions. But CI/CD unlocks the full value: automatic validation on every PR without manual setup. What data platforms does Recce support? Recce works with data warehouses like Snowflake, BigQuery, Redshift, and Databricks. See Connect to Warehouse for setup. Next Steps Interactive Demo: Try the Data Review Agent Tutorial: Get Started with Recce Cloud Blog: The Problem with Data PR Reviews: Where Do You Even Start? --- ## https://docs.reccehq.com/AGENTS/ Recce documentation, for agents This is the agent skill file for docs.reccehq.com . It tells AI coding agents how to install Recce, configure it, and use it against a dbt project. If you are a human, the rendered docs site is the friendlier read; this file exists for agents that need a single index. Recce is a Data Review Agent for dbt pull requests. It compares a base release against the changes in a pull request, surfaces breaking changes, runs row-count and value diffs, and produces a validation checklist that travels with the PR. Install Recce ships in two flavors. Pick one before continuing. Recce Cloud (managed, recommended). No local install. Sign in at https://cloud.reccehq.com/ , connect a repository, connect a warehouse, and validation runs on every pull request. Recce OSS (local CLI). Install with uv (recommended) or pip inside the dbt project's virtualenv: uv tool install recce # or pip install recce Verify with recce version . The full OSS setup walkthrough lives at https://docs.reccehq.com/getting-started/oss-setup/ ; the hands-on tutorial against the Jaffle Shop sample project is at https://docs.reccehq.com/getting-started/jaffle-shop-tutorial/ . Configuration Recce reads two files at the root of the dbt project. recce.yml holds preset checks and run parameters. Full reference at https://docs.reccehq.com/technical-concepts/configuration/ . Minimal example: checks : - name : row count of orders type : row_count_diff params : node : orders profiles.yml (dbt's normal profile) must define both a base target (production data) and a current target (the PR's data). Patterns for shared production base versus per-PR schemas are documented at https://docs.reccehq.com/setup-guides/environment-best-practices/ . For CI/CD, the validation workflow is configured in GitHub Actions or GitLab CI; see https://docs.reccehq.com/setup-guides/setup-ci/ and https://docs.reccehq.com/setup-guides/setup-cd/ . For agents working inside Claude Code, Cursor, or Windsurf, point the editor at the Recce MCP server: https://docs.reccehq.com/setup-guides/mcp-server/ . The Claude Code plugin (one-step setup) is at https://docs.reccehq.com/setup-guides/claude-plugin/ . Usage Pick the workflow that matches the project. Validate a pull request locally (OSS). From the dbt project root, with both base and current artifacts generated: recce run recce server recce run executes every preset check; recce server opens the Recce interface for ad hoc lineage, code, and data diffing. Workflow guide: https://docs.reccehq.com/using-recce/oss-workflow/ . Review a PR in Recce Cloud. Open the PR's Recce link from the GitHub or GitLab check, walk the validation checklist, and approve or comment per check. Reviewer workflow: https://docs.reccehq.com/using-recce/data-reviewer/ . Validate from an AI assistant (MCP). With the MCP server connected, ask the agent things like "show the lineage diff for the orders model" or "run a value diff on customers between base and current". The agent invokes Recce tools and returns structured results. MCP reference: https://docs.reccehq.com/setup-guides/mcp-server/ . Common explorations. Each of these maps to a page in the docs: Lineage diff (added, modified, removed models): https://docs.reccehq.com/what-you-can-explore/lineage-diff/ Code change (SQL and config diff per model): https://docs.reccehq.com/what-you-can-explore/code-change/ Column-level lineage: https://docs.reccehq.com/what-you-can-explore/column-level-lineage/ Impact radius (downstream models affected): https://docs.reccehq.com/what-you-can-explore/impact-radius/ Breaking change analysis: https://docs.reccehq.com/what-you-can-explore/breaking-change-analysis/ Data diffing (row count, profile, value, top-K, histogram, query): https://docs.reccehq.com/what-you-can-explore/data-diffing/ For the CLI reference, see https://docs.reccehq.com/using-recce/cli-commands/ . For terminology, see /whats-recce/glossary/ . More Full machine-readable index: https://docs.reccehq.com/llms.txt Full corpus (single text file): https://docs.reccehq.com/llms-full.txt Markdown mirror of any page: append .md to the page URL (for example /getting-started/oss-setup.md ) Sitemap (markdown): https://docs.reccehq.com/sitemap.md Sitemap (XML): https://docs.reccehq.com/sitemap.xml a14y configuration Target URL: https://docs.reccehq.com Scorecard: 0.2.0 Mode: site Last runs: 2026-05-11 — 66 (scorecard 0.2.0) --- ## https://docs.reccehq.com/collaboration/activity/ Activity Each check in your checklist has its own Activity panel. It records everything that happens to that specific check—approvals, comments, and updates—giving reviewers context on how the validation evolved. What Gets Recorded Activity captures all events for a check: Created : When the check is added to the checklist Approvals : When the check is approved or unapproved Comments : Questions, discussions, and clarifications about the check Description updates : Changes to the check's description When to Use Requesting context : Ask the developer about unexpected results or ask reviewers about the acceptable thresholds Documenting decisions : Record the process of making a decision Tracking history : See who approved, what questions were asked, and how descriptions changed Handoff scenarios : Give the next reviewer context on past decisions Sync Comments to GitHub/GitLab When your Git provider is connected to Recce, comments you post in Activity automatically sync to the PR or MR. Each comment appears as a new comment on GitHub or GitLab, with a link back to the specific check in Recce. The comment appears on the PR/MR: You can @mention teammates using their GitHub or GitLab username (e.g., @john-doe ). They'll receive a notification through GitHub or GitLab. Use the exact username—Recce doesn't currently map display names to usernames. This works the same way on GitLab: Related Checklist - Save and track validation checks Share - Share your session with reviewers --- ## https://docs.reccehq.com/collaboration/checklist/ Checklist Save validation checks to track your findings and share them with reviewers. The checklist becomes your proof-of-correctness for modeling changes. How It Works When you run a diff or query in Recce, you can add the result to your checklist. Each check captures: The validation type (schema diff, row count, query, etc.) The result at the time of capture Your notes explaining what the result means Checklist with saved validation checks Adding Checks For diffs performed via the Explore Change dropdown menu, click Add to Checklist in the results panel: Add to Checklist button in diff results panel Example: Adding a Top-K Diff to the Checklist Writing Descriptions Add descriptions to help reviewers understand each check: What changed - The specific model or column being validated Why it matters - Business context or downstream impact What to verify - Expected behavior or acceptable thresholds Good descriptions reduce back-and-forth and speed up PR approval. Approving Checks Reviewers approve individual checks as they verify each validation. When configured as a required PR check, all checks must be approved before the PR can be merged. This ensures: Every validation is reviewed, not just glanced at Multiple reviewers can collaborate on approval Clear audit trail of who verified what Re-running Checks After making additional changes to your models, re-run checks from the checklist to verify your updates. This lets you iterate until all validations pass. For checks you want to run on every PR automatically, see Preset Checks . When to Use During development - Save checks as you validate each change, building evidence as you go Before creating a PR - Compile all validations that prove your changes are correct For recurring validations - Use Preset Checks to automate checks that should run on every PR Stakeholder review - Share your checklist to give reviewers full context Related Preset Checks - Automate recurring validation checks Share - Share your checklist with reviewers --- ## https://docs.reccehq.com/collaboration/preset-checks/ Preset Checks Define validation checks that run automatically for every PR. Preset checks ensure consistent validation across your team. Goal: Configure recurring checks that execute automatically when Recce runs. Prerequisites Recce Cloud account or Recce installed in your dbt project At least one validation check you want to automate Recce Cloud Create preset checks directly in the Recce Cloud interface. When a PR is created, preset checks run automatically. From the checklist Mark any existing check as a preset check: Run a diff or query in your Recce session Add the result to your checklist Open the check menu and select Mark as Preset Check From project settings Create preset checks directly in your project configuration: Navigate to your project's Preset Checks page Click Add Preset Check Configure the check type and parameters When preset checks are configured, they run automatically each time a PR is created. Recce OSS For local Recce, configure preset checks in recce.yml and run them manually or in CI. Configure in recce.yml Start by adding a check to your checklist manually: Run a diff or query in Recce Add the result to your checklist Open the check menu and select Get Preset Check Template Copy the YAML config from the dialog Paste the config into recce.yml at your project root: # recce.yml checks : - name : Query diff of customers description : | This is the demo preset check. Please run the query and paste the screenshot to the PR comment. type : query_diff params : sql_template : select * from {{ ref("customers") }} view_options : primary_keys : - customer_id Run preset checks In Recce server When you launch Recce, preset checks appear in your checklist automatically (but not yet executed): Click Run Query to execute each check. With recce run Execute all preset checks from the command line: recce run Output: ───────────────────────────────── DBT Artifacts ───────────────────────────────── Base: Manifest: 2024-04-10 08:54:41.546402+00:00 Catalog: 2024-04-10 08:54:42.251611+00:00 Current: Manifest: 2024-04-22 03:24:11.262489+00:00 Catalog: 2024-04-10 06:15:13.813125+00:00 ───────────────────────────────── Preset checks ───────────────────────────────── Recce Preset Checks ────────────────────────────────────────────────────────────────────────────── Status Name Type Execution Time Failed Reason ────────────────────────────────────────────────────────────────────────────── [Success] Query of customers Query Diff 0.10 seconds N/A ────────────────────────────────────────────────────────────────────────────── The state file is stored at [recce_state.json] View results by launching the server with the state file: recce server recce_state.json Verification Confirm preset checks work: Add a check config to recce.yml Run recce run Verify the check appears in output with [Success] status Launch recce server recce_state.json and confirm the check appears in your checklist Troubleshooting Issue Solution Check not appearing Verify recce.yml is in project root and YAML syntax is valid Check fails to run Check that the SQL template references valid models Wrong results Ensure base and current artifacts are up to date Related Checklist - Manually add checks during development Configuration - Full recce.yml reference --- ## https://docs.reccehq.com/collaboration/share/ Share Share your validation results with team members and stakeholders. Goal: Give reviewers access to your Recce session so they can explore validation results. Recce Cloud Share your session by copying the URL directly from your browser. Team members with organization access can view any session immediately. To invite team members to your organization, see Admin Setup . Command Line Sharing For automated workflows or CI pipelines, use recce share to upload a state file directly: recce share # With API token recce share --api-token Recce OSS For local Recce sessions, use these sharing methods: Method Best For Requires Copy to Clipboard Quick screenshots in PR comments Nothing Upload to Recce Cloud Full interactive session access Recce Cloud account Copy to Clipboard For quick sharing of specific results, use Copy to Clipboard in any diff result. Paste the screenshot directly into PR comments, Slack, or other channels. Copy diff result and paste to GitHub Browser Compatibility Firefox does not support copying images to the clipboard. Recce displays a modal where you can download or right-click to copy the image. Upload to Recce Cloud When reviewers need full context, upload your session to Recce Cloud. This creates a shareable link with complete access to your validation results. Benefits: No setup required for viewers Full lineage exploration, query results, and checklists Read-only access (secure viewing) Simple link sharing via any channel Access Control Anyone with the link can view your session after signing into Recce Cloud. For restricted access, contact our team . Setting Up Cloud Connection The first time you share via Cloud, you'll need to connect your local Recce to your cloud account. This one-time setup enables sharing. Step 1: Enable Cloud Connection Launch the Recce server and click the Use Recce Cloud button if your local installation isn't already connected to Cloud. Step 2: Sign In and Grant Access After successful login, authorize your local Recce to connect with Cloud. This authorization enables the sharing functionality and secure state file hosting. Step 3: Complete the Setup Refresh the Recce page to activate the cloud connection. Once connected, the Share button will be available, allowing you to generate shareable links. Alternative Setup Method You can also connect to Cloud using the command line: recce connect-to-cloud This command handles the sign-in and authorization process directly from your terminal. Manual Configuration (Advanced) For containerized environments or manual setup, configure the connection using your API token. Step 1: Retrieve Your API Token Sign in to Cloud and copy your API token from the personal settings page . Step 2: Configure Local Connection Choose one of the following methods: Option A: Command Line Flag Launch Recce server with your API token. The token will be saved to your profile for future use: recce server --api-token Option B: Profile Configuration Edit your ~/.recce/profile.yml file to include the API token: api_token : Configuration File Location Mac/Linux: cd ~/.recce Windows: cd ~\. recce Navigate to C:\Users\\.recce or use the PowerShell command above. Verification Confirm sharing works: Add a check to your checklist Share via your preferred method (URL for Cloud, Share button for OSS) Open the link in an incognito window Verify you can view the session Related Admin Setup - Invite team members to your organization Checklist - Save validation checks to share Preset Checks - Automate recurring checks --- ## https://docs.reccehq.com/getting-started/jaffle-shop-tutorial/ Jaffle Shop Tutorial (OSS) This tutorial is for Recce Open Source This tutorial uses Recce OSS for local validation. For automated PR validation with Recce Cloud, see Get Started with Cloud . When you change a dbt model, how do you know what data actually changed? Running your model isn't enough. You need to compare outputs against the previous version. Goal: Make a model change and validate the data impact using Recce OSS with the dbt Labs example project. This tutorial uses DuckDB for a local, file-based setup. No cloud warehouse or Recce Cloud account needed. You'll work with jaffle_shop_duckdb , a sample project from dbt Labs. You'll modify a model, see how the change affects downstream data, and add a validation to your checklist. Prerequisites Python 3.9+ installed Git installed Steps 1. Clone Jaffle Shop git clone git@github.com:dbt-labs/jaffle_shop_duckdb.git cd jaffle_shop_duckdb Expected result: You're in the jaffle_shop_duckdb directory. 2. Set up virtual environment python -m venv venv source venv/bin/activate Expected result: Your terminal prompt shows (venv) . 3. Install dependencies pip install -r requirements.txt pip install recce Expected result: Both dbt and Recce install without errors. 4. Configure DuckDB profile for comparison Recce compares two environments. Edit ./profiles.yml to add a prod target for the base environment. Add the following under outputs: : prod : type : duckdb path : 'jaffle_shop.duckdb' schema : prod threads : 24 Your complete profiles.yml should look like: jaffle_shop : target : dev outputs : dev : type : duckdb path : 'jaffle_shop.duckdb' threads : 24 prod : type : duckdb path : 'jaffle_shop.duckdb' schema : prod threads : 24 Expected result: profiles.yml has both dev and prod targets. 5. Build base environment Generate the production data and artifacts that Recce uses as baseline. dbt seed --target prod dbt run --target prod dbt docs generate --target prod --target-path ./target-base Expected result: target-base/ folder contains manifest.json and catalog.json . 6. Make a model change Edit ./models/staging/stg_payments.sql to introduce a data change: renamed as ( payment_method, - -- `amount` is currently stored in cents, so we convert it to dollars - amount / 100 as amount + amount from source ) This removes the cents-to-dollars conversion, so downstream models will now show values 100x larger. Expected result: stg_payments.sql outputs amount in cents instead of dollars. 7. Build development environment dbt seed dbt run dbt docs generate Expected result: target/ folder contains updated manifest.json and catalog.json . 8. Start Recce server recce server Expected result: Server starts at http://0.0.0.0:8000 Open http://localhost:8000 in your browser. The Lineage tab shows stg_payments and downstream models highlighted. 9. Run a Query Diff Switch to the Query tab and run: select * from {{ ref ( "orders" ) }} order by 1 Click Run Diff (or press Cmd+Shift+Enter ). Expected result: Query Diff shows the amount column with values 100x larger in the current environment. 10. Add to checklist Click Add to Checklist (blue button, bottom right) to save this validation. Expected result: Checklist tab shows your saved Query Diff. Verify Success Confirm you completed the tutorial: Lineage Diff shows stg_payments and downstream models highlighted Query Diff on orders shows the amount change (100x difference) Checklist contains your saved validation Troubleshooting Issue Solution "No artifacts found" error Run dbt docs generate for both prod ( --target-path ./target-base ) and dev Empty Lineage Diff Ensure you made the model change in step 6 and ran dbt run + dbt docs generate DuckDB lock error Close any other processes using jaffle_shop.duckdb Next Steps OSS Setup : Set up Recce with your own dbt project Cloud vs Open Source : Compare OSS and Cloud features --- ## https://docs.reccehq.com/getting-started/oss-setup/ Set Up Open Source Recce When you change data models, you need to compare the data before and after to catch unintended impacts. The open source version lets you run this validation locally. Goal: Install and run Recce locally for manual data validation. The open source version gives you the core validation engine to run locally. For the full experience with Recce Agent assistance on PRs and during development, see Cloud vs Open Source . Prerequisites Python 3.9+ installed A dbt project with at least one model Git installed (for version comparison) Steps 1. Install Recce Install Recce in your dbt project's virtual environment. pip install recce Expected result: Installation completes without errors. 2. Generate base environment artifacts Recce compares two states of your dbt project. First, generate artifacts for your base (production) state. git checkout main dbt docs generate --target-path ./target-base Expected result: target-base/ folder contains manifest.json and catalog.json . Different approaches by environment File-based (DuckDB): Run dbt build first to create data. See Jaffle Shop Tutorial . Cloud warehouses with dbt Cloud: Download artifacts from dbt Cloud API. See For dbt Cloud Users below. 3. Generate current environment artifacts Switch to your development branch and generate artifacts for comparison. git checkout your-feature-branch dbt run dbt docs generate Expected result: target/ folder contains updated manifest.json and catalog.json . 4. Start Recce server Launch the Recce web interface. recce server Expected result: Server starts and displays: Recce server is running at http://0.0.0.0:8000 5. Explore changes in the UI Open http://localhost:8000 in your browser. Lineage tab: See which models changed and their downstream impact Query tab: Run SQL queries to compare data between base and current states Expected result: Lineage Diff shows your modified models highlighted. 6. Add validation checks to checklist After running a query or diff: Review the results Click Add to Checklist to save the validation Repeat for each check you want to track Expected result: Checklist shows your saved validations. Verify Success Run recce server and confirm you can: See Lineage Diff between base and current Run a Query Diff on a modified model Add the result to your checklist Try It: Jaffle Shop Tutorial Want a hands-on walkthrough with DuckDB? The Jaffle Shop Tutorial guides you through making a model change, comparing data, and validating the impact. For dbt Cloud Users If you use dbt Cloud for CI/CD, download production artifacts instead of generating them locally. Get artifacts from dbt Cloud API: # Set your dbt Cloud credentials export DBT_CLOUD_API_TOKEN = "your-token" export DBT_CLOUD_ACCOUNT_ID = "your-account-id" export DBT_CLOUD_JOB_ID = "your-production-job-id" # Download artifacts from your production job curl -H "Authorization: Token $DBT_CLOUD_API_TOKEN " \ "https://cloud.getdbt.com/api/v2/accounts/ $DBT_CLOUD_ACCOUNT_ID /jobs/ $DBT_CLOUD_JOB_ID /artifacts/manifest.json" \ -o target-base/manifest.json curl -H "Authorization: Token $DBT_CLOUD_API_TOKEN " \ "https://cloud.getdbt.com/api/v2/accounts/ $DBT_CLOUD_ACCOUNT_ID /jobs/ $DBT_CLOUD_JOB_ID /artifacts/catalog.json" \ -o target-base/catalog.json Then generate current artifacts locally ( dbt docs generate ) and run recce server as usual. Recce Cloud automates this With Cloud, the agent retrieves artifacts automatically, no manual downloads needed. See Start Free with Cloud . Troubleshooting Issue Solution "No artifacts found" error Run dbt docs generate for both base and current states Empty Lineage Diff Ensure you have uncommitted model changes vs the base branch Port 8000 already in use Use recce server --port 8001 to specify a different port Next Steps Cloud vs Open Source : Compare OSS and Cloud features Start Free with Cloud : Get the agent on your PRs and CLI --- ## https://docs.reccehq.com/getting-started/start-free-with-cloud/ Get Started with Recce Cloud Set up Cloud to automate data review on every pull request. This guide walks you through each onboarding step. Get Started Goal Recce compares Base vs Current environments to validate data changes in every PR: Base : your main branch (production) Current : your PR branch (development) Per-PR schema : an isolated database schema created for each pull request, so multiple PRs can validate simultaneously without conflicts For accurate comparisons, both environments should use consistent data ranges. See Best Practices for Preparing Environments for environment strategies. Prerequisites Cloud account : free trial at cloud.reccehq.com dbt project in a git repository that runs successfully: your environment can execute dbt build and dbt docs generate Repository admin access for setup : required to add workflows and secrets Data warehouse : read access to your warehouse for data diffing Onboarding Process Overview After signing up, you'll enter the onboarding flow: Connect data warehouse Connect Git provider Add Recce to CI/CD Merge the CI/CD change Recce Web Agent Setup You can use the Recce Web Agent to help automate your setup. Currently it handles step 3 (Add Recce to CI/CD): The agent analyzes your repository and CI/CD setup You answer clarifying questions the agent asks about your environment strategy The agent creates a PR with customized workflow files The agent covers common setups and continues to expand coverage. If your setup isn't supported yet, the agent directs you to the Setup Guide below for manual configuration. Need help? Contact us at support@reccehq.com. Setup Guide This guide explains each onboarding step in detail. First, go to cloud.reccehq.com and create your free account. 1. Connect Data Warehouse Provide read-only credentials so Recce can run data diffs against your warehouse. Connect Data Warehouse 2. Connect Git Provider Authorize the Recce app and select the repositories you want to connect. Connect Your Repository 3. Add Recce to CI/CD This step adds CI/CD workflow files to your repository. The web agent detects your setup and guides you through. For manual setup, follow the linked guides below. Choose your setup Question If this is you... Then... How do you run dbt? You own your dbt run (GitHub Actions, GitLab CI, CircleCI) Continue reading below You run dbt on a platform (dbt Cloud, Paradime, etc.) See dbt Cloud Setup How complex is your environment? Simple (prod and dev targets) Continue reading below. We use per-PR schemas for fast setup. See Environment Setup for why. Advanced (multiple schemas, staging environments) See Environment Setup What's your CI/CD platform? GitHub Actions Continue reading below Other (GitLab CI, CircleCI, etc.) See Setup CD and Setup CI Configure in this order: profile, then CD, then CI. CD establishes the production baseline that CI compares against. a. Configure your dbt profile Add ci and prod targets to your profiles.yml so Recce can compare base and current environments. Environment Setup b. Set up baseline updates (CD) Add a workflow that uploads production artifacts to Cloud after every merge to main. Setup CD c. Set up PR validation (CI) Add a workflow that uploads PR branch artifacts so Recce can validate changes before merge. Setup CI Your workflows use GITHUB_TOKEN (automatically provided by GitHub Actions) and your existing warehouse credential secrets. recce vs recce-cloud pip install recce is the open source CLI for local validation. pip install recce-cloud is the CI/CD uploader for Cloud. 4. Merge the CI/CD change Merge the PR containing the workflow files. After merging: The Base workflow automatically uploads your Base to Cloud The Current workflow is ready to validate future PRs In Cloud, verify you see: GitHub Integration: Connected Warehouse Connection: Connected Production Metadata: Updated automatically PR Sessions: all open PRs appear in the list. Only PRs with uploaded metadata can be launched for review. 5. Final Steps You can now: See data review summaries in PR comments Launch Recce instance to visualize changes Review downstream impacts before merging Verification Checklist Base workflow : Trigger manually, check Base metadata appears in Cloud Current workflow : Create a test PR, verify PR session appears Data diff : Open PR session, run Row Count Diff Troubleshooting Issue Solution Authentication errors Confirm repository is connected in Cloud settings Push to main blocked Check branch protection rules Secret names don't match Update template to use your existing secret names Workflow fails Check secrets are configured correctly Artifacts missing Ensure dbt docs generate completes before upload Warehouse connection fails Check IP whitelisting; add GitHub Actions IP ranges Next Steps Environment Setup - Configure dbt profiles and CI/CD variables Setup CD - Detailed CD workflow guide Setup CI - Detailed CI workflow guide Environment Best Practices - Strategies for source data and schema management --- ## https://docs.reccehq.com/setup-guides/claude-plugin/ Recce Claude Plugin for Claude Code Recce is a dbt data validation tool that compares your development branch against your base branch and surfaces schema changes, row count differences, and data diffs before you merge. The Recce Claude Plugin brings this capability into Claude Code, making it accessible through natural language and interactive slash commands. If you're reviewing dbt pull requests with Claude Code, the plugin connects Claude directly to your data warehouse so you can ask questions like "What changed in the orders model?" and get validated answers without writing a single query by hand. Why use the plugin? Without Recce, reviewing data changes in a dbt PR means manually querying your warehouse, comparing results across branches, and hoping you've checked the right models. With the Recce Claude Plugin, Claude does this for you. It analyzes your model lineage, traces Column-Level Lineage, runs Schema Diff, Row Count Diff, Value Diff, and Profile Diff across your modified models and reports back in plain language. The plugin also handles all of the setup that the MCP server requires manually: Checks prerequisites (Python, dbt, Git) Installs Recce if needed Generates base and current dbt artifacts (the manifest.json and catalog.json metadata files that Recce needs) Starts the MCP server automatically Provides slash commands for common validation workflows Note If you use Cursor, Windsurf, or another AI agent, see the MCP Server page for direct configuration instructions. Requirements Claude Code 1.0.33 or higher Python 3.8+ dbt (any adapter: Snowflake, BigQuery, Redshift, Databricks, DuckDB, and others) Git Installation Step 1: Add the Recce marketplace In Claude Code, run: /plugin marketplace add DataRecce/recce-claude-plugin Step 2: Install the plugin /plugin install recce-quickstart@recce-claude-plugin Or use the interactive installer: /plugin Navigate to the Discover tab, find recce-quickstart , and press Enter to install. Step 3: Verify installation /plugin Navigate to the Installed tab to confirm recce-quickstart appears. Installation scopes By default, the plugin installs at user scope (available across all projects). You can also install at project scope ( --scope project ) to share with your team, or local scope ( --scope local ) for just the current repository. Getting started Make sure you're on your feature branch. Recce compares your current branch against main. Then navigate to your dbt project and run the setup command: /recce-setup This walks you through: Verifying your dbt project and warehouse connection Generating development dbt artifacts ( target/manifest.json ) Generating base dbt artifacts ( target-base/manifest.json ) Starting the Recce MCP server When setup completes, you'll see confirmation that the MCP server is running and connected. Once connected, Claude has access to the full set of Recce validation tools, including Lineage Diff, Column-Level Lineage, row-level Value Diff, Top-K Diff, Histogram Diff, and more. Try asking questions about your data changes: You: What schema changes happened in my current branch? Claude calls Recce's schema_diff tool behind the scenes and responds with a summary of added, removed, or modified columns across your changed models. You: Which downstream columns are affected by my changes to the orders model? Claude uses get_cll (Column-Level Lineage) to trace the impact through your model graph. You: Run a Value Diff on the customers model using customer_id as the primary key Claude runs value_diff to compare row-level values between branches and reports per-column match rates. See the MCP Server page for the full list of available tools and how agents use them in a structured validation workflow. Available commands Command Description /recce-setup Guided setup: installs dependencies, generates artifacts, starts the MCP server /recce-pr [url] Analyze data impact of a pull request (auto-detects PR from current branch) /recce-check [type] [model] Run validation checks (row-count, schema, profile, query-diff) /recce-ci Generate GitHub Actions workflows for Recce Cloud CI/CD Example workflows Analyze a pull request /recce-pr https://github.com/your-org/your-repo/pull/123 Or if you're already on the PR branch: /recce-pr Run specific validation checks /recce-check row-count orders /recce-check schema customers /recce-check profile payments Managing the plugin Disable the plugin: /plugin disable recce-quickstart@recce-claude-plugin Re-enable: /plugin enable recce-quickstart@recce-claude-plugin Uninstall: /plugin uninstall recce-quickstart@recce-claude-plugin Update to latest version: /plugin marketplace update recce-claude-plugin Troubleshooting Plugin not loading Verify Claude Code version is 1.0.33 or higher: claude --version Check the plugin is installed: /plugin → Installed tab Check for errors: /plugin → Errors tab MCP server not starting Make sure you're in a dbt project directory (has dbt_project.yml ) Verify Recce is installed: pip install 'recce[mcp]' Check if port 8081 is available, or set a custom port: RECCE_MCP_PORT=8085 Re-run the setup: /recce-setup Commands not recognized Confirm the plugin is enabled: /plugin → Installed tab → check status Restart Claude Code to reload plugins FAQ Does the Recce Claude Plugin work with Cursor or Windsurf? The plugin is specific to Claude Code. For Cursor and Windsurf, configure Recce using the MCP server directly. What dbt adapters does Recce support? Recce works with any dbt adapter, including Snowflake, BigQuery, Redshift, Databricks, DuckDB, and others. What is the Model Context Protocol (MCP)? MCP is an open standard that lets AI agents like Claude Code call external tools. Recce implements an MCP server so Claude can run data diffs against your warehouse on demand. Can I use the plugin without Cloud? Yes. The plugin works with the open source version for local validation. Cloud adds automated PR review, team collaboration, and persistent validation history. Next Steps Recce MCP Server : full tool reference, agent workflow guide, and configuration for Cursor, Windsurf, and other AI agents Column-Level Lineage : understand how column changes propagate through your models Value Diff : row-level data validation Profile Diff : statistical profiling comparisons CI/CD Setup : automate validation in your workflow --- ## https://docs.reccehq.com/setup-guides/connect-git/ Connect Your Repository Goal: Connect your GitHub or GitLab repository to Recce Cloud for automated PR data review. Cloud supports GitHub and GitLab. Using a different provider? Contact us at support@reccehq.com. Prerequisites Cloud account (free trial at cloud.reccehq.com) Repository admin access (required to authorize app installation) dbt project in the repository How It Works When you connect a Git provider, Cloud maps your setup: Git Provider Cloud Organization Organization Repository Project Every Cloud account starts with one organization and one project. When you connect your Git provider, you select which organization and repository to link. Monorepo support: If you have multiple dbt projects in one repository, you can create multiple Cloud projects that connect to the same repo. Connect GitHub 1. Authorize the Recce GitHub App Navigate to Settings → Git Provider in Cloud. Click Connect GitHub . Expected result: GitHub authorization page opens. 2. Select Organization and Repository Choose which GitHub organization to connect. This becomes your Cloud organization. Then select the repository containing your dbt project. This becomes your Cloud project. Expected result: Repository connected. Your Cloud project is ready to use. Connect GitLab GitLab uses Personal Access Tokens (PAT) instead of OAuth. Unlike GitHub, where the Recce GitHub App posts comments as itself, GitLab API comments appear as the token owner. We recommend creating a dedicated service account so that PR comments appear as a bot rather than your personal account. Use a shared team email When creating the service account, use a shared team email (e.g., data@yourcompany.com ) so it isn't tied to any individual. 1. Create a Dedicated Service Account (Recommended) If you use your personal token, your teammates see PR comments from you rather than from Recce Cloud. To avoid this, create a GitLab service account for Recce Cloud: In GitLab, navigate to your group → Settings → Service Accounts Click Add service account Set the name to Recce Cloud (username auto-generates) Click Create service account (Optional) Edit the account to customize the username and upload a Recce avatar Expected result: Service account appears in the group's member list with a "service account" badge. Add the service account as a Developer member to the projects you want Recce Cloud to access. Availability GitLab service accounts are available on GitLab.com Free (up to 100 per group), Premium, and Ultimate. For Self-Managed Free instances where service accounts are unavailable, create a dedicated GitLab user (e.g., recce-cloud-bot ) instead. If you don't have group admin access, you can skip this step and use a personal access token directly. Note that PR comments will appear as your user account. 2. Create a Personal Access Token Generate a PAT for the service account (or your personal account if you skipped Step 1): For a service account : navigate to your group → Settings → Service Accounts , select the account, and click Create token For a personal token : navigate to User Settings → Access Tokens → Add new token Set a descriptive name (e.g., "Recce Cloud integration") Select the api scope (required for posting PR comments). The read_api scope is not sufficient. Set an expiration date and click Create Copy the token immediately (it cannot be viewed again) 3. Add Token to Cloud Navigate to Settings → Git Provider in Recce Cloud. Select GitLab and paste the token. Verify Success In Cloud, navigate to your repository. You should see: Connection status: "Connected" Organization Project is linked to a git repository Troubleshooting Issue Solution Repository not found Ensure proper permissions are granted (GitLab: token access, GitHub: app authorized) Invalid token (GitLab) Generate new token with api scope Cannot post PR comments (GitLab) Regenerate token with api scope instead of read_api PR comments show as personal user (GitLab) Create a service account and use its token instead of your personal token Next Steps Connect Data Warehouse Add Recce to CI/CD --- ## https://docs.reccehq.com/setup-guides/connect-to-warehouse/ Connect Data Warehouse Goal: Connect your data warehouse to Recce Cloud to enable data diffing on PRs. Cloud supports Snowflake , Databricks , BigQuery , and Redshift . Using a different warehouse? Contact us at support@reccehq.com. Prerequisites Warehouse credentials with read access Network access configured (IP whitelisting if required) Security Cloud queries your warehouse directly to compare Base and Current environments. Recce encrypts and stores credentials securely. Read-only access is sufficient for all data diffing features. Connect Snowflake Option 1: Username/Password Field Description Example Account Snowflake account identifier xxxxxx.us-central1.gcp Database Default database MY_DB Schema Default schema PUBLIC Username Database username MY_USER Password Database password my_password Role Role with read access ANALYST_ROLE Warehouse Compute warehouse name WH_LOAD Option 2: Key Pair Authentication Field Description Example Account Snowflake account identifier xxxxxx.us-central1.gcp Database Default database MY_DB Schema Default schema PUBLIC Username Service account username MY_USER Private Key PEM-formatted private key -----BEGIN RSA PRIVATE KEY-----... Passphrase Key passphrase (if encrypted) my_passphrase Role Role with read access ANALYST_ROLE Warehouse Compute warehouse name WH_LOAD Connect Databricks Option 1: Personal Access Token Field Description Example Host Workspace URL adb-1234567890123456.7.azuredatabricks.net HTTP Path SQL warehouse path /sql/1.0/warehouses/abc123def456 Token Personal access token dapiXXXXXXXXXXXXXXXXXXXXXXX Catalog Unity Catalog name (optional) my_catalog Schema Default schema MY_SCHEMA Option 2: OAuth (M2M) Field Description Example Host Workspace URL adb-1234567890123456.7.azuredatabricks.net HTTP Path SQL warehouse path /sql/1.0/warehouses/abc123def456 Client ID Service principal client ID 12345678-1234-1234-1234-123456789012 Client Secret Service principal secret dose1234567890abcdef Catalog Unity Catalog name (optional) my_catalog Schema Default schema MY_SCHEMA Note : OAuth M2M is auto-enabled in Databricks accounts. For setup details, see dbt Databricks setup . Connect BigQuery Field Description Example Project GCP project ID my-gcp-project-123456 Dataset Default BigQuery dataset my_dataset Service Account JSON Full JSON key file contents {"type": "service_account", ...} Note : For authentication, we currently provide support for service account JSON only. More details here . Connect Redshift Field Description Example Host Cluster endpoint my-cluster.abc123xyz.us-west-2.redshift.amazonaws.com Port Database port 5439 (Default) Database Database name analytics_db Schema Default schema public Username Database user admin_user Password Database password my_password Note : We currently support Database (Password-based authentication) only. More details here . Save Connection After entering your connection details, click Save . Cloud runs a connection test automatically and displays "Connected" on success. Verify Success Navigate to Organization Settings in Cloud. Your data warehouse should appear. Troubleshooting Issue Solution Connection refused Whitelist Cloud IP ranges in your network configuration Authentication failed Verify credentials and regenerate if expired Permission denied on table Grant SELECT permissions on target schemas Next Steps Add Recce to CI/CD Run Your First Data Diff --- ## https://docs.reccehq.com/setup-guides/dbt-cloud-setup/ dbt Cloud Setup When your dbt project runs on dbt Cloud, validating PR data changes requires retrieving artifacts from the dbt Cloud API rather than generating them locally. Goal After completing this tutorial, every PR triggers automated data validation. Recce compares your PR changes against production, with results visible in Recce Cloud. Prerequisites Cloud account : free trial at cloud.reccehq.com dbt Cloud account : with CI (continuous integration) and CD (continuous deployment) jobs configured dbt Cloud API token : with read access to job artifacts GitHub repository : with admin access to add workflows and secrets How Recce retrieves dbt Cloud artifacts Recce needs both base (production) and current (PR) dbt artifacts to compare changes. When using dbt Cloud, these artifacts live in dbt Cloud's API rather than your local filesystem. Your GitHub Actions workflows retrieve them via API calls and upload to Cloud. Two workflows handle this: Base workflow (on merge to main): Downloads production artifacts from your CD job → uploads with recce-cloud upload --type prod PR workflow (on pull request): Downloads PR artifacts from your CI job → uploads with recce-cloud upload Setup steps 1. Enable "Generate docs on run" in dbt Cloud Recce requires catalog.json for schema comparisons. Enable documentation generation for both your CI and CD jobs in dbt Cloud. For CD jobs (production): Go to your CD job settings in dbt Cloud Under Execution settings , enable Generate docs on run For CI jobs (pull requests): Go to your CI job settings in dbt Cloud Under Advanced settings , enable Generate docs on run Note Without this setting, dbt Cloud won't generate catalog.json , and Recce won't be able to compare schemas between environments. 2. Get your dbt Cloud credentials Collect the following from your dbt Cloud account: Credential Where to find it Account ID URL when viewing any job: cloud.getdbt.com/deploy/{ACCOUNT_ID}/projects/... CD Job ID URL of your production/CD job: ...jobs/{JOB_ID} CI Job ID URL of your PR/CI job: ...jobs/{JOB_ID} API Token Account Settings > API Tokens > Create Service Token Tip Create a service token with "Job Admin" or "Member" permissions. This allows read access to job artifacts. 3. Configure GitHub secrets Add the following secrets to your GitHub repository (Settings > Secrets and variables > Actions): dbt Cloud secrets: DBT_CLOUD_API_TOKEN - Your dbt Cloud API token DBT_CLOUD_ACCOUNT_ID - Your dbt Cloud account ID DBT_CLOUD_CD_JOB_ID - Your production/CD job ID DBT_CLOUD_CI_JOB_ID - Your PR/CI job ID Note GITHUB_TOKEN is automatically provided by GitHub Actions, no configuration needed. 4. Create the base workflow (CD) Create .github/workflows/recce-base.yml to update your production baseline when merging to main. name : Update Base Metadata (dbt Cloud) on : push : branches : [ main ] workflow_dispatch : env : DBT_CLOUD_API_BASE : "https://cloud.getdbt.com/api/v2/accounts/${{ secrets.DBT_CLOUD_ACCOUNT_ID }}" DBT_CLOUD_API_TOKEN : ${{ secrets.DBT_CLOUD_API_TOKEN }} jobs : update-base : runs-on : ubuntu-latest steps : - uses : actions/checkout@v4 - uses : actions/setup-python@v5 with : python-version : "3.10" - name : Install recce-cloud run : pip install recce-cloud - name : Retrieve artifacts from CD job env : DBT_CLOUD_CD_JOB_ID : ${{ secrets.DBT_CLOUD_CD_JOB_ID }} run : | set -eo pipefail CD_RUNS_URL="${DBT_CLOUD_API_BASE}/runs/?job_definition_id=${DBT_CLOUD_CD_JOB_ID}&order_by=-id&limit=1" CD_RUNS_RESPONSE=$(curl -sSf -H "Authorization: Bearer ${DBT_CLOUD_API_TOKEN}" "${CD_RUNS_URL}") DBT_CLOUD_CD_RUN_ID=$(echo "${CD_RUNS_RESPONSE}" | jq -r ".data[0].id") mkdir -p target for artifact in manifest.json catalog.json; do ARTIFACT_URL="${DBT_CLOUD_API_BASE}/runs/${DBT_CLOUD_CD_RUN_ID}/artifacts/${artifact}" curl -sSf -H "Authorization: Bearer ${DBT_CLOUD_API_TOKEN}" "${ARTIFACT_URL}" -o "target/${artifact}" done - name : Upload to Recce Cloud env : GITHUB_TOKEN : ${{ secrets.GITHUB_TOKEN }} run : recce-cloud upload --type prod 5. Create the PR workflow (CI) Create .github/workflows/recce-pr.yml to validate PR changes. name : Validate PR (dbt Cloud) on : pull_request : branches : [ main ] env : DBT_CLOUD_API_BASE : "https://cloud.getdbt.com/api/v2/accounts/${{ secrets.DBT_CLOUD_ACCOUNT_ID }}" DBT_CLOUD_API_TOKEN : ${{ secrets.DBT_CLOUD_API_TOKEN }} jobs : validate-pr : runs-on : ubuntu-latest steps : - uses : actions/checkout@v4 - uses : actions/setup-python@v5 with : python-version : "3.10" - name : Install recce-cloud run : pip install recce-cloud - name : Wait for dbt Cloud CI job env : DBT_CLOUD_CI_JOB_ID : ${{ secrets.DBT_CLOUD_CI_JOB_ID }} CURRENT_GITHUB_SHA : ${{ github.event.pull_request.head.sha }} run : | set -eo pipefail CI_RUNS_URL="${DBT_CLOUD_API_BASE}/runs/?job_definition_id=${DBT_CLOUD_CI_JOB_ID}&order_by=-id" fetch_ci_run_id() { CI_RUNS_RESPONSE=$(curl -sSf -H "Authorization: Bearer ${DBT_CLOUD_API_TOKEN}" "${CI_RUNS_URL}") echo "${CI_RUNS_RESPONSE}" | jq -r ".data[] | select(.git_sha == \"${CURRENT_GITHUB_SHA}\") | .id" | head -n1 } DBT_CLOUD_CI_RUN_ID=$(fetch_ci_run_id) while [ -z "$DBT_CLOUD_CI_RUN_ID" ]; do echo "Waiting for dbt Cloud CI job to start..." sleep 10 DBT_CLOUD_CI_RUN_ID=$(fetch_ci_run_id) done echo "DBT_CLOUD_CI_RUN_ID=${DBT_CLOUD_CI_RUN_ID}" >> $GITHUB_ENV CI_RUN_URL="${DBT_CLOUD_API_BASE}/runs/${DBT_CLOUD_CI_RUN_ID}/" while true; do CI_RUN_RESPONSE=$(curl -sSf -H "Authorization: Bearer ${DBT_CLOUD_API_TOKEN}" "${CI_RUN_URL}") CI_RUN_SUCCESS=$(echo "${CI_RUN_RESPONSE}" | jq '.data.is_complete and .data.is_success') CI_RUN_FAILED=$(echo "${CI_RUN_RESPONSE}" | jq '.data.is_complete and (.data.is_error or .data.is_cancelled)') if $CI_RUN_SUCCESS; then echo "dbt Cloud CI job completed successfully." break elif $CI_RUN_FAILED; then status=$(echo ${CI_RUN_RESPONSE} | jq -r '.data.status_humanized') echo "dbt Cloud CI job failed or was cancelled. Status: $status" exit 1 fi echo "Waiting for dbt Cloud CI job to complete..." sleep 10 done - name : Retrieve artifacts from CI job run : | set -eo pipefail mkdir -p target for artifact in manifest.json catalog.json; do ARTIFACT_URL="${DBT_CLOUD_API_BASE}/runs/${DBT_CLOUD_CI_RUN_ID}/artifacts/${artifact}" curl -sSf -H "Authorization: Bearer ${DBT_CLOUD_API_TOKEN}" "${ARTIFACT_URL}" -o "target/${artifact}" done - name : Upload to Recce Cloud env : GITHUB_TOKEN : ${{ secrets.GITHUB_TOKEN }} run : recce-cloud upload Verification After setting up: Trigger the base workflow - Push to main or run manually to upload production baseline Create a test PR with a small model change Wait for dbt Cloud CI job to complete Check GitHub Actions - the Recce PR workflow should run after dbt Cloud CI completes Open Cloud - the PR session appears with validation results Tip Run the base workflow first to establish your production baseline. The PR workflow compares against this baseline. Troubleshooting Issue Solution "CD run not found" Ensure your CD job has run on the base branch commit. Try rebasing your PR to trigger a new CD run. "CI job timeout" The workflow waits for dbt Cloud CI to complete. Check if your CI job is stuck or taking longer than expected. "Artifact not found" Verify "Generate docs on run" is enabled for both CI and CD jobs. "API authentication failed" Check your DBT_CLOUD_API_TOKEN has correct permissions and is stored in GitHub secrets. CD job timing considerations The base workflow retrieves artifacts from the latest CD job run. For accurate comparisons, ensure your dbt Cloud CD job runs on every merge to main. If your CD job runs on a schedule: The baseline may be outdated compared to the actual main branch Consider triggering the CD job manually before validating PRs Next steps Get Started with Cloud - Standard setup for self-hosted dbt Configure CD to establish your production baseline Configure CI for automated PR validation Learn environment strategies for reliable comparisons --- ## https://docs.reccehq.com/setup-guides/environment-advanced/ Advanced Environment Setup When comparing production (built with target.name = 'prod' ) against a PR (built with target.name = 'ci' ), some models produce different SQL, leading to data differences that have nothing to do with your code changes. This guide explains why it happens and how to fix it. Prerequisite Complete the Environment Best Practices guide first. This page is for teams that found environment-dependent SQL patterns using the diagnostic grep in that guide. Why False Alarms Happen A false alarm is when Recce reports data differences between base and current that are caused by the environment setup, not by your actual code changes. For example, Recce shows 10,000 fewer rows in a model you never touched, or flags a column value change on every single row. These differences are real in the data, but they don't reflect the impact of your PR. This happens when dbt models contain environment-dependent SQL : patterns whose output depends on when or where the model is built, not just what code and data it consumes. Pattern Example Why It Causes False Alarms target.name / target.schema {% if target.name == 'prod' %} Production takes the if branch, PR takes the else branch: different SQL current_date() / now() WHERE date >= current_date() - 7 Production was built yesterday, PR is built today: different date window current_timestamp() SELECT current_timestamp() as updated_at Timestamp is always different between any two builds Limited source data ranges {% if target.name != 'prod' %} WHERE order_date >= ... {% endif %} Production has all data, CI has a subset: row count differences unrelated to code When Recce compares production against your PR, these models produce literally different SQL, different output, false alarm. Incremental models are fine Incremental models are not inherently problematic. In a fresh CI build, is_incremental() evaluates to false in both environments, so both run the same else branch. The real issue is environment-dependent patterns in any materialization type. The Fix: Session Base Build both environments the same way. Instead of comparing production against the PR, build a dedicated session base for the PR using a CI target so both base and current produce the same SQL. Configure profiles.yml Add pr_base and pr_current targets that mirror your CI target: # profiles.yml jaffle_shop : outputs : pr_base : type : snowflake account : "{{ env_var('SNOWFLAKE_ACCOUNT') }}" user : "{{ env_var('SNOWFLAKE_USER') }}" password : "{{ env_var('SNOWFLAKE_PASSWORD') }}" database : analytics warehouse : COMPUTE_WH schema : "{{ env_var('SNOWFLAKE_SCHEMA_BASE') }}" threads : 4 pr_current : type : snowflake account : "{{ env_var('SNOWFLAKE_ACCOUNT') }}" user : "{{ env_var('SNOWFLAKE_USER') }}" password : "{{ env_var('SNOWFLAKE_PASSWORD') }}" database : analytics warehouse : COMPUTE_WH schema : "{{ env_var('SNOWFLAKE_SCHEMA_CURRENT') }}" threads : 4 In your CI workflow, set both schemas per PR: env : SNOWFLAKE_SCHEMA_BASE : "pr_${{ github.event.pull_request.number }}_base" SNOWFLAKE_SCHEMA_CURRENT : "pr_${{ github.event.pull_request.number }}" Both targets share the same target.name behavior, so both produce the same SQL. Before: Shared Production Base graph LR P1["Production
(target: prod)
→ takes 'if' branch"] --> R1["Recce"] C1["PR Current
(target: ci)
→ takes 'else' branch"] --> R1 R1 --> FA["False alarm:
different SQL branches"] After: Session Base for the PR graph LR P2["Session Base
(target: pr_base)
→ takes 'else' branch"] --> R2["Recce"] C2["PR Current
(target: pr_current)
→ takes 'else' branch"] --> R2 R2 --> OK["Same SQL branches:
differences = real code changes"] What Changes in CI Instead of one build, you run two builds per PR: Session base : checkout the merge base commit (where PR branched from main), build with --target pr_base Current : checkout the PR branch, build with --target pr_current Both use CI targets, so target.name , current_date() , etc. resolve the same way. Differences reflect actual code changes only. Why merge base, not tip of main? If main has moved forward since the PR was created (other PRs merged), building from tip of main would include unrelated changes. The merge base isolates this PR's changes only , matching how GitHub computes its PR diff. Optimization: Reduce Data with --sample Running dbt twice per PR means more warehouse compute and longer CI times. dbt's native --sample flag (dbt >= 1.10) injects time-based filters on source tables, so both base and current process the same data window. Faster builds, same comparison accuracy. # Adjust dates to match your data range — both builds must use the same window dbt build --target pr_base --sample = "{'start': '2025-02-01', 'end': '2025-02-28'}" dbt build --target pr_current --sample = "{'start': '2025-02-01', 'end': '2025-02-28'}" Prerequisites: dbt >= 1.10 event_time configured on source tables you want to sample Always use absolute date ranges Relative dates like --sample="14 days" use CURRENT_TIMESTAMP , which shifts between builds, reintroducing the exact environment-dependency you're trying to avoid. Optimization: Clone + Selective Rebuild Instead of building everything twice, clone production and only rebuild the environment-dependent models: dbt clone --state prod_artifacts/ : copies all models from prod into the pr_base schema (zero-copy on Snowflake/BigQuery/Databricks) dbt build --target pr_base --select tag:environment_dependent : rebuilds only the flagged models The deterministic models are already correct as clones. Only the environment-dependent ones need rebuilding with a CI target. These levers stack --sample reduces how much data per model. Clone + Selective Rebuild reduces which models to build. Combine them to minimize both dimensions of cost. CI Configuration Examples Scenario A: Shared Production Base (Simple) For teams with no environment-dependent SQL patterns. See Environment Best Practices . Scenario B: Session Base, Full Rebuild For small projects (< 50 models) with environment-dependent SQL. env : PR_BRANCH : ${{ github.head_ref }} steps : - name : Build Session Base (from merge base) run : | MERGE_BASE=$(git merge-base origin/main ${{ env.PR_BRANCH }}) git checkout $MERGE_BASE dbt build --target pr_base dbt docs generate --target pr_base - name : Upload Session Base Artifacts to Recce Cloud run : recce-cloud upload --session-base env : GITHUB_TOKEN : ${{ secrets.GITHUB_TOKEN }} - name : Build PR Current run : | git checkout ${{ env.PR_BRANCH }} dbt build --target pr_current dbt docs generate --target pr_current - name : Upload Current Artifacts to Recce Cloud run : recce-cloud upload env : GITHUB_TOKEN : ${{ secrets.GITHUB_TOKEN }} Scenario C: Session Base, Optimized For large projects: combines clone, selective rebuild, and data sampling. env : PR_BRANCH : ${{ github.head_ref }} steps : - name : Fetch production artifacts run : | mkdir -p prod_artifacts/ aws s3 cp s3://your-bucket/dbt-artifacts/manifest.json prod_artifacts/ - name : Build Session Base (clone + selective rebuild) run : | MERGE_BASE=$(git merge-base origin/main ${{ env.PR_BRANCH }}) git checkout $MERGE_BASE dbt clone --target pr_base --state prod_artifacts/ # Adjust dates to match your data range dbt build --target pr_base --select tag:environment_dependent \ --sample="{'start': '2025-02-01', 'end': '2025-02-28'}" dbt docs generate --target pr_base - name : Upload Session Base Artifacts to Recce Cloud run : recce-cloud upload --session-base env : GITHUB_TOKEN : ${{ secrets.GITHUB_TOKEN }} - name : Build PR Current run : | git checkout ${{ env.PR_BRANCH }} # Use the same date range as session base dbt build --target pr_current \ --sample="{'start': '2025-02-01', 'end': '2025-02-28'}" dbt docs generate --target pr_current - name : Upload Current Artifacts to Recce Cloud run : recce-cloud upload env : GITHUB_TOKEN : ${{ secrets.GITHUB_TOKEN }} Next Steps Data Developer Workflow : Start validating changes as a data developer Data Reviewer Workflow : Review data changes from your team Preset Checks : Configure automated validation checks --- ## https://docs.reccehq.com/setup-guides/environment-best-practices/ Environment Best Practices Recce compares a base environment (your production or staging reference) against a current environment (your PR changes). Reliable environments produce reliable validation results. When source data drifts, branches fall behind, or environments collide, Recce comparisons produce misleading results. This guide walks you through preparing environments for accurate, efficient data validation, starting with the simplest setup and building up as needed. Start Simple: Shared Production Base Most teams already have two environments running: Production : your main branch, built on a schedule (CD job, dbt Cloud, Airflow, etc.) PR build : a CI job triggered on each pull request that runs dbt build The simplest Recce setup: use production as your base, and the PR build as your current. graph LR A["Production
(scheduled CD job)
= base"] --> C["Recce
Compare"] B["PR CI Build
(triggered per PR)
= current"] --> C Info This is where most teams should start. Production already exists, just point Recce at it. No extra builds, no extra configuration. Use Per-PR Schemas Each PR should have its own isolated schema. This prevents interference between concurrent PRs and makes cleanup straightforward. # profiles.yml ci : schema : "{{ env_var('CI_SCHEMA') }}" # CI workflow env : CI_SCHEMA : "pr_${{ github.event.pull_request.number }}" Benefits: Complete isolation between PRs Parallel validation without conflicts Easy cleanup by dropping the schema See Environment Setup for detailed configuration. Slim CI Works Out of the Box Many teams optimize PR builds using dbt's slim CI pattern, only building models that changed and their downstream dependencies: dbt build --select state:modified+ --defer --state prod-artifacts/ Unchanged models are resolved from production via --defer , so you only pay for what changed. Recce works with slim CI. It compares whatever was built in the PR environment against production. For many projects, this is the complete setup. Keep Your Base Environment Current The base environment can become outdated in two scenarios: New source data : if you update data weekly, update the base environment at least weekly PRs merged to main : the base no longer reflects the latest code Configure your CD workflow to run: On merge to main (immediate update) On schedule (e.g., daily at 2 AM UTC) See Setup CD for workflow configuration. Obtain Artifacts for Environments Recce uses base and current environment artifacts ( manifest.json , catalog.json ) to find corresponding tables in the data warehouse for comparison. Recce Cloud : Automatic artifact management via recce-cloud upload . See Setup CD and Setup CI . dbt Cloud : Download artifacts from dbt Cloud jobs. See dbt Cloud Setup . For custom setups, upload artifacts to cloud storage (S3, GCS, Azure Blob) or use GitHub Actions artifacts. Keep PR Branch in Sync If a PR runs after other PRs merge to main, the comparison mixes changes from the current PR with changes from other merged PRs. This produces results that don't accurately reflect the current PR's impact. GitHub: Enable branch protection to show when PRs are outdated. CI check: Add a workflow step to verify the PR is up-to-date: - name : Check if PR is up-to-date if : github.event_name == 'pull_request' run : | git fetch origin main UPSTREAM=${GITHUB_BASE_REF:-'main'} HEAD=${GITHUB_HEAD_REF:-${GITHUB_REF#refs/heads/}} if [ "$(git rev-list --left-only --count ${HEAD}...origin/${UPSTREAM})" -eq 0 ]; then echo "Branch is up-to-date" else echo "Branch is not up-to-date" exit 1 fi Clean Up PR Environments As PRs accumulate, so do generated schemas. Implement cleanup to manage warehouse storage. On PR close: Create a workflow that drops the PR schema when the PR closes. { % macro clear_schema ( schema_name ) % } { % set drop_schema_command = "DROP SCHEMA IF EXISTS " ~ schema_name ~ " CASCADE;" % } { % do run_query ( drop_schema_command ) % } { % endmacro % } Run the cleanup: dbt run-operation clear_schema --args "{'schema_name': 'pr_123'}" Scheduled cleanup: Remove schemas not used for a week. Example Configuration Environment Schema When to Run Count Production public Daily 1 PR pr_ On push # of open PRs Seeing Unexpected Diffs? If Recce shows large row count differences or data mismatches on models you didn't change, your project may contain environment-dependent SQL : patterns that produce different output depending on when or where models are built. Common examples: target.name / target.schema : conditional logic that produces different SQL in prod vs CI (e.g., {% if target.name == 'prod' %} ) current_date() / current_timestamp() / now() : time-dependent filters that shift between builds Limited source data ranges : many teams filter sources to recent data in non-prod environments (e.g., {% if target.name != 'prod' %} WHERE order_date >= ... {% endif %} ). This is a common and sensible cost optimization, but it means production has all data while CI has a subset, producing row count differences unrelated to code changes. Quick check: scan your project for these patterns: grep -rn "target\.name\|target\.schema\|current_date\|current_timestamp\|now()" models/ No matches? The shared production base setup above is all you need. You're done. Matches found? See Advanced Environment Setup for strategies to eliminate false alarms. Next Steps Environment Setup : Technical configuration for profiles.yml and CI/CD Setup CD : Configure automatic baseline updates Setup CI : Configure PR validation Advanced Environment Setup : Eliminate false alarms from environment-dependent SQL --- ## https://docs.reccehq.com/setup-guides/environment-setup/ Following the onboarding guide? Return to Get Started with Recce Cloud after completing this page. Environment Setup Configure your dbt profiles and CI/CD environment variables for Recce data validation. Goal Set up isolated schemas for base vs current comparison. After completing this guide, your CI/CD workflows automatically create per-PR schemas and compare them against production. Prerequisites dbt project : A working dbt project with profiles.yml configured CI/CD platform : GitHub Actions, GitLab CI, or similar Warehouse access : Credentials with permissions to create schemas dynamically Why separate schemas matter Recce compares two sets of data to validate changes: Base : The production state (main branch) Current : The PR branch with your changes For accurate validation, these must point to different schemas in your warehouse. Without separation, you would compare identical data and miss meaningful differences. How CI/CD works with Recce Recce uses both continuous delivery (CD) and continuous integration (CI) to automate data validation: CD (Continuous Delivery) : Runs after merge to main. Updates baseline artifacts with latest production state. CI (Continuous Integration) : Runs on PR. Validates proposed changes against baseline. Set up CD first , then CI. CD establishes your baseline (production artifacts), which CI uses for comparison. Configure profiles.yml Your profiles.yml file defines how dbt connects to your warehouse. Add a ci target with a dynamic schema for PR isolation. jaffle_shop : target : dev outputs : dev : type : snowflake account : "{{ env_var('SNOWFLAKE_ACCOUNT') }}" user : "{{ env_var('SNOWFLAKE_USER') }}" password : "{{ env_var('SNOWFLAKE_PASSWORD') }}" database : analytics warehouse : COMPUTE_WH schema : dev threads : 4 # CI environment with dynamic schema per PR ci : type : snowflake account : "{{ env_var('SNOWFLAKE_ACCOUNT') }}" user : "{{ env_var('SNOWFLAKE_USER') }}" password : "{{ env_var('SNOWFLAKE_PASSWORD') }}" database : analytics warehouse : COMPUTE_WH schema : "{{ env_var('SNOWFLAKE_SCHEMA') }}" threads : 4 prod : type : snowflake account : "{{ env_var('SNOWFLAKE_ACCOUNT') }}" user : "{{ env_var('SNOWFLAKE_USER') }}" password : "{{ env_var('SNOWFLAKE_PASSWORD') }}" database : analytics warehouse : COMPUTE_WH schema : public threads : 4 After saving, your profile supports three targets: dev for local development, ci for PR validation, and prod for production. Key points: The ci target uses env_var('SNOWFLAKE_SCHEMA') for dynamic schema assignment (other warehouses use their own variable name) The prod target uses a fixed schema ( public ) for consistency Adapt this pattern for other warehouses (BigQuery uses dataset instead of schema ) Set CI/CD environment variables Your CI/CD workflow sets the schema dynamically for each PR. The key configuration: GitHub Actions: env : SNOWFLAKE_SCHEMA : "PR_${{ github.event.pull_request.number }}" GitLab CI: variables : SNOWFLAKE_SCHEMA : "MR_${CI_MERGE_REQUEST_IID}" This creates schemas like PR_123 , PR_456 for each PR automatically. When a PR opens, the workflow sets SNOWFLAKE_SCHEMA and dbt writes to that isolated schema. For complete workflow examples, see Setup CD and Setup CI . Recommended pattern: Schema-per-PR Create an isolated schema for each PR. This is the recommended approach for teams. Base Schema Current Schema Example public (prod) pr_123 PR #123 gets its own schema Why this pattern: Complete isolation between PRs Multiple PRs can run validation in parallel without conflicts Easy cleanup by dropping the schema when PR closes Clear audit trail of what data each PR produced Alternative patterns Using staging as base Instead of comparing against production, compare against a staging environment with limited data. Base Schema Current Schema Use Case staging pr_123 Teams wanting faster comparisons Pros: Faster diffs with limited data ranges Consistent source data between base and current Reduced warehouse costs Cons: Staging may drift from production Issues caught in staging might not reflect production behavior Requires maintaining an additional environment See Environment Best Practices for strategies on limiting data ranges. Shared development schema (not recommended) Using a single dev schema for all development work. Base Schema Current Schema Use Case public (prod) dev Solo developers only Why this is not recommended: Multiple PRs overwrite each other's data Cannot run parallel validations Comparison results may include changes from other work Difficult to isolate issues to specific PRs Only use this pattern for individual local development, not for CI/CD automation. Verification After configuring your setup, verify that both base and current schemas are accessible. Check configuration locally dbt debug --target ci Verify in Recce interface Launch Recce and check Environment Info in the top-right corner. You should see: Base : Your production schema (e.g., public ) Current : Your PR-specific schema (e.g., pr_123 ) Troubleshooting Issue Solution Schema creation fails Verify your CI credentials have CREATE SCHEMA permissions Environment variable not found Check that secrets are configured in your CI/CD platform settings Base and current show same schema Ensure --target ci is used in CI, not --target dev Profile not found Verify profiles.yml is accessible in CI (check path or use DBT_PROFILES_DIR ) Connection timeout Check warehouse IP allowlists include CI runner IP ranges Next steps Get Started with Recce Cloud - Complete onboarding guide Environment Best Practices - Strategies for source data and schema management Setup CD - CD workflow for GitHub Actions and GitLab CI Setup CI - CI workflow for GitHub Actions and GitLab CI --- ## https://docs.reccehq.com/setup-guides/mcp-server/ Recce MCP Server When data models change, downstream dashboards and reports can break without warning. The Recce MCP server lets your AI agent validate those changes before they reach production — directly from your editor, through natural language. MCP (Model Context Protocol) is an open standard that lets AI assistants call external tools. Recce implements an MCP server so your AI agent can run data diffs against your warehouse on your behalf. Unlike general-purpose database tools, Recce's MCP server is purpose-built for branch comparison. It reads dbt artifacts ( manifest.json , catalog.json ) to understand your model graph, so your AI agent can reason about lineage, column-level changes, and statistical differences — not just raw SQL. Claude Code users: skip to the easy path The Recce Claude Plugin handles all setup automatically — prerequisites, artifact generation, and server startup — in two commands. If you use Claude Code, start there. What you can do Once connected, ask your AI agent questions like: "What schema changes happened in this branch?" "Show me the Row Count Diff for all modified models" "Are there any breaking column changes in this PR?" "Profile the orders table and compare it against production" "Which downstream columns are affected by this change?" "Run a Value Diff on the orders model and show me which columns changed" "Run a custom SQL query against both dev and prod and show the differences" Your agent translates these into the appropriate Recce tool calls and returns the results directly in your conversation. How it works Recce compares your current branch against a baseline from your main branch. It needs two sets of dbt artifacts — one representing your current work and one representing your base branch. The MCP server reads both artifact sets and runs diffs against your warehouse when your AI agent requests them. Prerequisites Before starting the MCP server, you need dbt artifacts for your current branch. Base artifacts are recommended for full diffing but not required. Generate development artifacts Run dbt in your current working branch: dbt docs generate This creates target/manifest.json and target/catalog.json . Generate base artifacts Generate artifacts from your base branch to a separate directory: dbt docs generate --target-path target-base This creates target-base/manifest.json and target-base/catalog.json . The MCP server compares these two artifact sets to produce diffs. Note If target-base/ is missing, the MCP server starts in single-environment mode — all tools remain available, but diff results show no changes because both sides reference the same data. Generate base artifacts to enable real comparisons. Installation Install Recce with the MCP extra dependency: pip install 'recce[mcp]' Recce works with all major dbt adapters, including Snowflake, BigQuery, Redshift, Databricks, DuckDB, and others. Configuration Choose the tab for your AI agent. stdio is simpler (no separate process to manage) and works for most setups. Use SSE only if you need to share a single Recce server across multiple tools simultaneously. Claude Code Cursor Windsurf Generic (stdio) Generic (SSE) Option A: Recce plugin (recommended) The Recce Claude Plugin provides guided setup, handles prerequisite checks, generates artifacts, and starts the MCP server — all through interactive commands. /plugin marketplace add DataRecce/recce-claude-plugin /plugin install recce-quickstart@recce-claude-plugin /recce-setup See the Claude Plugin guide for full details. Option B: Stdio Configure Recce as an MCP server with stdio transport. Claude Code automatically launches the server when you start a session. cd my-dbt-project/ claude mcp add --scope project recce -- recce mcp-server Then start Claude Code and verify the connection: claude > /mcp ╭────────────────────────────────────────────────────────────╮ │ Manage MCP servers │ │ │ │ ❯ 1. recce ✔ connected · Enter to view details │ Option C: SSE Launch a standalone MCP server that Claude Code connects to via HTTP. Use this if you want to keep the server running independently or share it across tools. Start the server in a separate terminal: cd my-dbt-project/ recce mcp-server --sse In another terminal, configure Claude Code: cd my-dbt-project/ claude mcp add --transport sse --scope project recce http://localhost:8000/sse Add to .cursor/mcp.json in your dbt project: { "mcpServers" : { "recce" : { "command" : "recce" , "args" : [ "mcp-server" ] } } } Or use SSE mode: { "mcpServers" : { "recce" : { "url" : "http://localhost:8000/sse" } } } Add to ~/.codeium/windsurf/mcp_config.json : { "mcpServers" : { "recce" : { "command" : "recce" , "args" : [ "mcp-server" ] } } } Any MCP-compatible client can use stdio transport: { "command" : "recce" , "args" : [ "mcp-server" ], "transport" : "stdio" } Start the server: recce mcp-server --sse --host localhost --port 8000 Connect your client to: http://localhost:8000/sse Available tools The MCP server exposes these tools to your AI agent. Tools are grouped by availability — some work in all modes, while diff tools require a running server with warehouse access. Metadata and lineage tools These tools are always available because they only require dbt artifacts and do not query your data warehouse: Tool Description lineage_diff Compare data lineage between base and current branches. Returns nodes with change status and impact analysis schema_diff Detect schema changes (added, removed, or modified columns and type changes) get_model Get column details (names, types, constraints) for a model from both base and current branches get_cll Get Column-Level Lineage (CLL): trace which downstream columns are affected by changes select_nodes Resolve dbt selector expressions to node IDs. Useful for planning before running diffs get_server_info Get server context including adapter type, git branch, and supported tools Diff tools These tools query your data warehouse and require an active warehouse connection: Tool Description row_count_diff Compare row counts between branches for specified models profile_diff Statistical profiling comparison (min, max, avg, distinct count, nulls, and more) value_diff Compare row-level values using primary key join. Returns per-column match rates value_diff_detail Get detailed row-level diff showing actual changed, added, and removed values top_k_diff Compare top-K categorical value distributions between branches histogram_diff Compare numeric or datetime column distributions as histograms query Run arbitrary SQL against your data warehouse (supports Jinja and dbt macros) query_diff Run the same SQL against both branches and compare results Check management tools These tools manage validation checks stored in the running Recce server instance (checks persist for the life of the server process): Tool Description list_checks List all validation checks with their status and approval state run_check Run a specific validation check by ID create_check Create a persistent checklist item from analysis findings. Idempotent — updates existing checks with matching type and parameters Checks can also be configured as preset checks in recce.yml . See Preset checks for details. Note If base artifacts ( target-base/ ) are not present, the server starts in single-environment mode — all tools remain available, but diff results show no changes. Generate base artifacts to enable real comparisons. How agents use these tools The metadata and diff tools work together in a structured validation workflow. A well-configured AI agent follows this pattern: 1. Understand the change The agent starts with metadata tools to build context before querying the data warehouse: get_server_info : confirms the connection is ready and which tools are available lineage_diff : identifies which models changed and which downstream models are impacted select_nodes : resolves dbt selectors (like state:modified+ ) to specific node IDs for targeted analysis get_model : inspects column details of individual models before diffing get_cll : traces Column-Level Lineage to understand which downstream columns are affected This planning phase helps the agent skip irrelevant models and focus warehouse queries on what matters. 2. Validate the data With a clear picture of what changed, the agent runs diff tools to validate the data: schema_diff : detects structural changes (added, removed, or type-changed columns) row_count_diff : checks for unexpected volume changes profile_diff : compares statistical profiles (min, max, avg, distinct count, nulls) value_diff / value_diff_detail : compares actual row-level values using primary keys top_k_diff / histogram_diff : detects distribution shifts in categorical or numeric columns query / query_diff : runs custom SQL for cases not covered by built-in diffs 3. Persist findings as checks After analysis, the agent calls create_check to save important findings as persistent checklist items. Each check runs automatically to produce verifiable evidence. These checks appear in Recce's validation checklist and PR comments, so reviewers can verify the results independently. The agent can also use list_checks and run_check to work with existing preset checks configured in recce.yml . Why metadata tools matter Without select_nodes and get_cll , an agent would guess which models to validate or diff every model in the project. Metadata tools let the agent focus on what actually changed and what is impacted — reducing warehouse costs and response time. Troubleshooting MCP server fails to start The most common cause is missing dbt artifacts. Check that your dbt artifacts exist: ls target/manifest.json If missing, run dbt docs generate in your current branch. See Prerequisites . Diff results show no changes If the server starts but all diffs return empty results, you are likely in single-environment mode (missing base artifacts). Follow the Generate base artifacts steps to enable real comparisons. Port already in use (SSE mode) # Check what's using port 8000 lsof -i :8000 # Use a different port recce mcp-server --sse --port 8001 Warehouse connection errors The MCP server uses your profiles.yml to connect to your warehouse. Verify your connection: dbt debug Prefer guided setup over manual configuration If you're using Claude Code and running into issues, the Recce Claude Plugin handles prerequisite checks and provides actionable error messages: /plugin install recce-quickstart@recce-claude-plugin /recce-setup See the Claude Plugin guide for full setup instructions. FAQ "How do I validate data changes in my PR using an AI agent?" Connect Recce's MCP server to your AI agent (Claude Code, Cursor, or Windsurf), then ask questions in natural language. Your agent calls the appropriate validation tools and returns the results. "Which dbt adapters work with Recce MCP?" Recce works with all major dbt adapters: Snowflake, BigQuery, Redshift, Databricks, DuckDB, and others. "Do I need Recce Cloud to use the MCP server?" No. The MCP server is part of Recce OSS and free to use. Recce Cloud adds automated PR review, team collaboration, and persistent validation history. "What is MCP and how does Recce use it?" MCP (Model Context Protocol) is an open standard that allows AI agents to call external tools. Recce implements an MCP server so AI agents can run data diffs against your warehouse on demand. Next Steps Recce Claude Plugin : guided setup for Claude Code users with interactive commands Column-Level Lineage : trace how column changes propagate through your model graph Row Count Diff : understand row count validation Profile Diff : statistical profiling comparisons Value Diff : row-by-row data validation CI/CD Setup : automate validation in your workflow --- ## https://docs.reccehq.com/setup-guides/setup-cd/ Following the onboarding guide? Return to Get Started with Recce Cloud after completing this page. Setup CD - Auto-Update Baseline Manually updating your Recce Cloud baseline after every merge is tedious and error-prone. This guide shows you how to automate baseline updates so your data comparison stays current without manual intervention. After completing this guide, your continuous deployment (CD) workflow automatically uploads dbt artifacts to Cloud whenever code merges to main. What This Does Automated Base Session Management eliminates manual baseline maintenance: Triggers : Merge to main + scheduled updates + manual runs Action : Auto-update base Recce session with latest production artifacts Benefit : Current comparison baseline for all future PRs Prerequisites Before setting up CD, ensure you have: Cloud account - Start free trial Repository connected to Cloud - Connect Git Provider dbt artifacts - Know how to generate manifest.json and catalog.json from your dbt project Environment configured - Environment Setup with prod target for base artifacts Environment strategy This workflow uses the main branch with the prod target as the base environment. The base artifacts represent your production state, which PRs compare against. See Environment Setup for profiles.yml configuration. Setup GitHub Actions Create .github/workflows/base-workflow.yml : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 name : Update Base Metadata on : push : branches : [ "main" ] schedule : - cron : "0 2 * * *" workflow_dispatch : concurrency : group : ${{ github.workflow }} cancel-in-progress : true jobs : update-base-session : runs-on : ubuntu-latest timeout-minutes : 30 permissions : contents : read steps : - name : Checkout code uses : actions/checkout@v4 - name : Setup Python uses : actions/setup-python@v5 with : python-version : "3.11" cache : "pip" - name : Install dependencies run : pip install -r requirements.txt - name : Prepare dbt artifacts run : | dbt deps dbt build --target prod dbt docs generate --target prod env : SNOWFLAKE_ACCOUNT : ${{ secrets.SNOWFLAKE_ACCOUNT }} SNOWFLAKE_USER : ${{ secrets.SNOWFLAKE_USER }} SNOWFLAKE_PASSWORD : ${{ secrets.SNOWFLAKE_PASSWORD }} SNOWFLAKE_DATABASE : ${{ secrets.SNOWFLAKE_DATABASE }} SNOWFLAKE_WAREHOUSE : ${{ secrets.SNOWFLAKE_WAREHOUSE }} - name : Upload to Recce Cloud run : | pip install recce-cloud recce-cloud upload --type prod env : GITHUB_TOKEN : ${{ secrets.GITHUB_TOKEN }} Key points: dbt build and dbt docs generate create the required artifacts ( manifest.json and catalog.json ) recce-cloud upload --type prod uploads the Base metadata to Cloud GITHUB_TOKEN authenticates with Cloud GitLab CI/CD Add to your .gitlab-ci.yml : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 stages : - build - upload variables : DBT_TARGET_PROD : "prod" # Production build - runs on schedule or main branch push prod-build : stage : build image : python:3.11-slim script : - pip install -r requirements.txt - dbt deps # Optional: dbt build --target $DBT_TARGET_PROD - dbt docs generate --target $DBT_TARGET_PROD artifacts : paths : - target/ expire_in : 7 days rules : - if : $CI_PIPELINE_SOURCE == "schedule" - if : $CI_PIPELINE_SOURCE == "push" && $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH # Upload to Recce Cloud recce-upload-prod : stage : upload image : python:3.11-slim script : - pip install recce-cloud - recce-cloud upload --type prod dependencies : - prod-build rules : - if : $CI_PIPELINE_SOURCE == "schedule" - if : $CI_PIPELINE_SOURCE == "push" && $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH Key points: Authentication is automatic via CI_JOB_TOKEN Configure schedule in CI/CD → Schedules (e.g., 0 2 * * * for daily at 2 AM UTC) recce-cloud upload --type prod tells Recce this is a baseline session Platform Comparison Aspect GitHub Actions GitLab CI/CD Config file .github/workflows/base-workflow.yml .gitlab-ci.yml Trigger on merge on: push: branches: ["main"] if: $CI_PIPELINE_SOURCE == "push" && $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH Schedule setup In workflow YAML ( schedule: ) In UI: CI/CD → Schedules Authentication Explicit ( GITHUB_TOKEN ) Automatic ( CI_JOB_TOKEN ) Manual trigger workflow_dispatch: Pipeline run from UI Verification Test the Workflow GitHub: Go to Actions tab → Select "Update Base Metadata" Click Run workflow → Monitor for completion GitLab: Go to CI/CD → Pipelines → Click Run pipeline Select main branch → Monitor for completion Verify Success Look for these indicators: Workflow/Pipeline completes without errors Base session updated in Cloud GitHub: GitLab: Expected Output When the upload succeeds, you'll see output like this in your workflow logs: GitHub: ─────────────────────────── CI Environment Detection ─────────────────────────── Platform: github-actions Session Type: prod Commit SHA: def456ab... Source Branch: main Repository: your-org/your-repo Info: Using GITHUB_TOKEN for platform-specific authentication ────────────────────────── Creating/touching session ─────────────────────────── Session ID: abc123-def456-ghi789 Uploading manifest from path "target/manifest.json" Uploading catalog from path "target/catalog.json" Notifying upload completion... ──────────────────────────── Uploaded Successfully ───────────────────────────── Uploaded dbt artifacts to Recce Cloud for session ID "abc123-def456-ghi789" Artifacts from: "/home/runner/work/your-repo/your-repo/target" GitLab: ─────────────────────────── CI Environment Detection ─────────────────────────── Platform: gitlab-ci Session Type: prod Commit SHA: a1b2c3d4... Source Branch: main Repository: your-org/your-project Info: Using CI_JOB_TOKEN for platform-specific authentication ────────────────────────── Creating/touching session ─────────────────────────── Session ID: abc123-def456-ghi789 Uploading manifest from path "target/manifest.json" Uploading catalog from path "target/catalog.json" Notifying upload completion... ──────────────────────────── Uploaded Successfully ───────────────────────────── Uploaded dbt artifacts to Recce Cloud for session ID "abc123-def456-ghi789" Artifacts from: "/builds/your-org/your-project/target" Advanced Options Custom Artifact Path If your dbt artifacts are in a non-standard location: recce-cloud upload --type prod --target-path custom-target External Artifact Sources You can download artifacts from external sources before uploading: # GitHub example - name : Download from dbt Cloud run : | # Your download logic here # Artifacts should end up in target/ directory - name : Upload to Recce Cloud run : | pip install recce-cloud recce-cloud upload --type prod Dry Run Testing Test your configuration without actually uploading: recce-cloud upload --type prod --dry-run Troubleshooting Missing dbt artifacts Error : Missing manifest.json or Missing catalog.json Solution : Ensure dbt docs generate runs successfully before upload: GitHub: - name : Prepare dbt artifacts run : | dbt deps dbt docs generate --target prod # Required GitLab: prod-build : script : - dbt deps - dbt docs generate --target $DBT_TARGET_PROD # Required artifacts : paths : - target/ Authentication issues Error : Failed to create session: 401 Unauthorized Solutions : Verify your repository is connected in Cloud settings For GitHub : Ensure GITHUB_TOKEN is passed explicitly to the upload step and the job has contents: read permission For GitLab : Verify project has GitLab integration configured Check that you've created a Personal Access Token Ensure the token has appropriate scope ( api or read_api ) Verify the project is connected in Cloud settings Upload failures Error : Failed to upload manifest/catalog Solutions : Check network connectivity to Cloud Verify artifact files exist in target/ directory Review workflow/pipeline logs for detailed error messages For GitLab : Ensure artifacts are passed between jobs: prod-build : artifacts : paths : - target/ # Must include dbt artifacts recce-upload-prod : dependencies : - prod-build # Required to access artifacts Session not appearing Issue : Upload succeeds but session doesn't appear in Cloud Solutions : Check you're viewing the correct repository in Cloud Verify you're looking at the production/base sessions (not PR/MR sessions) Check session filters in Cloud (may be hidden by filters) Refresh the Cloud page Schedule not triggering (GitLab only) Issue : Scheduled pipeline doesn't run Solutions : Verify schedule is Active in CI/CD → Schedules Check schedule timezone settings (UTC by default) Ensure target branch ( main ) exists Review project's CI/CD minutes quota Verify schedule owner has appropriate permissions Next Steps Setup CI to automatically validate PR/MR changes against your updated base session. This completes your CI/CD pipeline by adding automated data validation for every pull request or merge request. --- ## https://docs.reccehq.com/setup-guides/setup-ci/ Following the onboarding guide? Return to Get Started with Recce Cloud after completing this page. Setup CI - Auto-Validate PRs Manual data validation before merging is error-prone and slows down PR reviews. This guide shows you how to set up continuous integration (CI) that automatically validates data changes in every pull request (PR). After completing this guide, your CI workflow validates every PR against your production baseline, with results appearing in Recce Cloud. What This Does Automated PR Validation prevents data regressions before merge: Triggers : PR opened or updated against main Action : Auto-update Recce session for validation Benefit : Automated data validation and comparison visible in your PR Prerequisites Before setting up CI, ensure you have: Cloud account - Start free trial Repository connected to Cloud - Connect Git Provider dbt artifacts - Know how to generate manifest.json and catalog.json from your dbt project CD configured - Setup CD to establish baseline for comparisons Environment configured - Environment Setup with ci target for per-PR schemas Environment strategy This workflow uses per-PR schemas with the ci target as the current environment. Each PR gets an isolated schema (e.g., pr_123 ) that compares against the base artifacts from CD. See Environment Setup for profiles.yml configuration and why per-PR schemas are recommended. Setup GitHub Actions Create .github/workflows/pr-workflow.yml : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 name : Validate PR Changes on : pull_request : branches : [ "main" ] concurrency : group : ${{ github.workflow }}-${{ github.ref }} cancel-in-progress : true jobs : validate-changes : runs-on : ubuntu-latest timeout-minutes : 45 permissions : contents : read pull-requests : write steps : - name : Checkout PR branch uses : actions/checkout@v4 with : fetch-depth : 2 - name : Setup Python uses : actions/setup-python@v5 with : python-version : "3.11" cache : "pip" - name : Install dependencies run : pip install -r requirements.txt - name : Build current branch artifacts run : | dbt deps dbt build --target ci dbt docs generate --target ci env : SNOWFLAKE_ACCOUNT : ${{ secrets.SNOWFLAKE_ACCOUNT }} SNOWFLAKE_USER : ${{ secrets.SNOWFLAKE_USER }} SNOWFLAKE_PASSWORD : ${{ secrets.SNOWFLAKE_PASSWORD }} SNOWFLAKE_DATABASE : ${{ secrets.SNOWFLAKE_DATABASE }} SNOWFLAKE_WAREHOUSE : ${{ secrets.SNOWFLAKE_WAREHOUSE }} SNOWFLAKE_SCHEMA : "PR_${{ github.event.pull_request.number }}" - name : Upload to Recce Cloud run : | pip install recce-cloud recce-cloud upload env : GITHUB_TOKEN : ${{ secrets.GITHUB_TOKEN }} Key points: Creates a per-PR schema ( PR_123 , PR_456 , etc.) using the dynamic SNOWFLAKE_SCHEMA environment variable to isolate each PR's data dbt build and dbt docs generate create the required artifacts ( manifest.json and catalog.json ) recce-cloud upload (without --type ) auto-detects this is a PR session GITHUB_TOKEN authenticates with Cloud GitLab CI/CD Add to your .gitlab-ci.yml : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 stages : - build - upload variables : DBT_TARGET : "ci" # MR build - runs on merge requests dbt-build : stage : build image : python:3.11-slim script : - pip install -r requirements.txt - dbt deps # Optional: dbt build --target $DBT_TARGET - dbt docs generate --target $DBT_TARGET artifacts : paths : - target/ expire_in : 1 week rules : - if : $CI_PIPELINE_SOURCE == "merge_request_event" # Upload to Recce Cloud recce-upload : stage : upload image : python:3.11-slim script : - pip install recce-cloud - recce-cloud upload dependencies : - dbt-build rules : - if : $CI_PIPELINE_SOURCE == "merge_request_event" Key points: Authentication is automatic via CI_JOB_TOKEN recce-cloud upload (without --type ) auto-detects this is an MR session dbt docs generate creates the required manifest.json and catalog.json Platform Comparison Aspect GitHub Actions GitLab CI/CD Config file .github/workflows/pr-workflow.yml .gitlab-ci.yml Trigger on: pull_request: if: $CI_PIPELINE_SOURCE == "merge_request_event" Authentication Explicit ( GITHUB_TOKEN ) Automatic ( CI_JOB_TOKEN ) Session type Auto-detected from PR context Auto-detected from MR context Artifact passing Not needed (single job) Use artifacts: + dependencies: Verification Test with a PR GitHub: Create a test PR with small data changes Check Actions tab for CI workflow execution Verify validation runs successfully GitLab: Create a test MR with small data changes Check CI/CD → Pipelines for workflow execution Verify validation runs successfully Verify Success Look for these indicators: Workflow/Pipeline completes without errors PR session created in Cloud Session URL appears in workflow/pipeline output GitHub: GitLab: Expected Output When the upload succeeds, you'll see output like this in your workflow logs: GitHub: ─────────────────────────── CI Environment Detection ─────────────────────────── Platform: github-actions PR Number: 42 PR URL: https://github.com/your-org/your-repo/pull/42 Session Type: cr Commit SHA: abc123de... Base Branch: main Source Branch: feature/your-feature Repository: your-org/your-repo Info: Using GITHUB_TOKEN for platform-specific authentication ────────────────────────── Creating/touching session ─────────────────────────── Session ID: f8b0f7ca-ea59-411d-abd8-88b80b9f87ad Uploading manifest from path "target/manifest.json" Uploading catalog from path "target/catalog.json" Notifying upload completion... ──────────────────────────── Uploaded Successfully ───────────────────────────── Uploaded dbt artifacts to Recce Cloud for session ID "f8b0f7ca-ea59-411d-abd8-88b80b9f87ad" Artifacts from: "/home/runner/work/your-repo/your-repo/target" Change request: https://github.com/your-org/your-repo/pull/42 GitLab: ─────────────────────────── CI Environment Detection ─────────────────────────── Platform: gitlab-ci MR Number: 4 MR URL: https://gitlab.com/your-org/your-project/-/merge_requests/4 Session Type: cr Commit SHA: c928e3d5... Base Branch: main Source Branch: feature/your-feature Repository: your-org/your-project Info: Using CI_JOB_TOKEN for platform-specific authentication ────────────────────────── Creating/touching session ─────────────────────────── Session ID: f8b0f7ca-ea59-411d-abd8-88b80b9f87ad Uploading manifest from path "target/manifest.json" Uploading catalog from path "target/catalog.json" Notifying upload completion... ──────────────────────────── Uploaded Successfully ───────────────────────────── Uploaded dbt artifacts to Recce Cloud for session ID "f8b0f7ca-ea59-411d-abd8-88b80b9f87ad" Artifacts from: "/builds/your-org/your-project/target" Change request: https://gitlab.com/your-org/your-project/-/merge_requests/4 Review PR Session To analyze the changes in detail: Go to your Cloud Find the PR session that was created Launch Recce instance to explore data differences Advanced Options Custom Artifact Path If your dbt artifacts are in a non-standard location: recce-cloud upload --target-path custom-target Dry Run Testing Test your configuration without actually uploading: recce-cloud upload --dry-run Troubleshooting If CI is not working, the issue is likely in your CD setup. Most problems are shared between CI and CD: Common issues: Missing dbt artifacts Authentication failures Upload errors Sessions not appearing → See the Setup CD Troubleshooting section for detailed solutions. CI-specific tip: If CD works but CI doesn't, verify: PR trigger conditions in your workflow configuration The PR is targeting the correct base branch (usually main ) You're looking at PR sessions in Cloud (not production sessions) Next Steps After setting up CI, explore these guides: Environment Best Practices - Strategies for source data and schema management Get Started with Cloud - Complete onboarding guide --- ## https://docs.reccehq.com/technical-concepts/configuration/ Configuration This reference documents the recce.yml configuration file, which defines preset checks and their parameters for automated data validation. Overview The config file for Recce is located in recce.yml in your dbt project root. Use this file to define preset checks that run automatically with recce server or recce run . File Location Path Description recce.yml Main configuration file in dbt project root Preset Checks Preset checks define automated validations that execute when you run recce server or recce run . Each check specifies a type of comparison and its parameters. Check Structure # recce.yml checks : - name : Query diff of customers description : | This is the demo preset check. Please run the query and paste the screenshot to the PR comment. type : query_diff params : sql_template : select * from {{ ref("customers") }} view_options : primary_keys : - customer_id Check Fields Field Description Type Required name The title of the check string Yes description The description of the check string type The type of the check (see types below) string Yes params The parameters for running the check object Yes view_options The options for presenting the run result object Check Types Row Count Diff Compares row counts between base and current environments. Type: row_count_diff Parameters: Field Description Type Required node_names List of node names string[] *1 node_ids List of node IDs string[] *1 select Node selection syntax. See dbt docs string exclude Node exclusion syntax. See dbt docs string packages Package filter string[] view_mode Quick filter for changed models all , changed_models Notes: *1: If node_ids or node_names is specified, it will be used; otherwise, nodes will be selected using the criteria defined by select , exclude , packages , and view_mode . Examples: Using node selector: checks : - name : Row count for modified tables description : Check row counts for all modified table models type : row_count_diff params : select : state:modified,config.materialized:table exclude : tag:dev Using node names: checks : - name : Row count for key models description : Check row counts for customers and orders type : row_count_diff params : node_names : [ 'customers' , 'orders' ] Schema Diff Compares schema structure between base and current environments. Type: schema_diff Parameters: Field Description Type Required node_id The node ID or list of node IDs to check string[] *1 select Node selection syntax. See dbt docs string exclude Node exclusion syntax. See dbt docs string packages Package filter string[] view_mode Quick filter for changed models all , changed_models Notes: *1: If node_id is specified, it will be used; otherwise, nodes will be selected using the criteria defined by select , exclude , packages , and view_mode . Examples: Using node selector: checks : - name : Schema diff for modified models description : Check schema changes for modified models and downstream type : schema_diff params : select : state:modified+ exclude : tag:dev Using node ID: checks : - name : Schema diff for customers description : Check schema for customers model type : schema_diff params : node_id : model.jaffle_shop.customers Lineage Diff Compares lineage structure between base and current environments. Type: lineage_diff Parameters: Field Description Type Required select Node selection syntax. See dbt docs string exclude Node exclusion syntax. See dbt docs string packages Package filter string[] view_mode Quick filter for changed models all , changed_models Examples: checks : - name : Lineage diff for modified models description : Check lineage changes for modified models and downstream type : lineage_diff params : select : state:modified+ exclude : tag:dev Query Executes a custom SQL query in the current environment. Type: query Parameters: Field Description Type Required sql_template SQL statement using Jinja templating string Yes Examples: checks : - name : Customer count description : Get total customer count type : query params : sql_template : select count(*) from {{ ref("customers") }} Query Diff Compares query results between base and current environments. Type: query_diff Parameters: Field Description Type Required sql_template SQL statement using Jinja templating string Yes base_sql_template SQL statement for base environment (if different) string primary_keys Primary keys for record identification string[] *1 Notes: *1: If primary_keys is specified, the query diff is performed in the warehouse. Otherwise, the query result (up to the first 2000 records) is returned, and the diff is executed on the client side. Examples: checks : - name : Customer data diff description : Compare customer data between environments type : query_diff params : sql_template : select * from {{ ref("customers") }} primary_keys : - customer_id Value Diff Compares values for a specific model between environments. Type: value_diff or value_diff_detail Parameters: Field Description Type Required model The name of the model string Yes primary_key Primary key(s) for record identification string or string[] Yes columns List of columns to include in diff string[] Examples: Value diff summary: checks : - name : Customer value diff description : Compare customer values type : value_diff params : model : customers primary_key : customer_id Value diff with detailed rows: checks : - name : Customer value diff (detailed) description : Compare customer values with row details type : value_diff_detail params : model : customers primary_key : customer_id Profile Diff Compares statistical profiles of a model between environments. Type: profile_diff Parameters: Field Description Type Required model The name of the model string Yes Examples: checks : - name : Customer profile diff description : Compare statistical profile of customers type : profile_diff params : model : customers Histogram Diff Compares histogram distributions for a column between environments. Type: histogram_diff Parameters: Field Description Type Required model The name of the model string Yes column_name The name of the column string Yes column_type The type of the column string Yes Examples: checks : - name : CLV histogram diff description : Compare customer lifetime value distribution type : histogram_diff params : model : customers column_name : customer_lifetime_value column_type : BIGINT Top-K Diff Compares top-K values for a column between environments. Type: top_k_diff Parameters: Field Description Type Required model The name of the model string Yes column_name The name of the column string Yes k Number of top items to include number Default: 50 Examples: checks : - name : Top 50 customer values description : Compare top 50 customer lifetime values type : top_k_diff params : model : customers column_name : customer_lifetime_value k : 50 Default Behavior Preset checks are loaded from recce.yml when Recce starts Checks execute automatically with recce run Results are stored in the state file View options control how results are displayed in the UI Related Preset Checks Guide - How to use preset checks in workflows State File - Understanding the state file format CLI Commands - Command-line options for running checks --- ## https://docs.reccehq.com/technical-concepts/state-file/ State File This reference documents the Recce state file format, which stores validation results, checks, and environment information. Overview The state file represents the serialized state of a Recce instance. It is a JSON-formatted file containing checks, runs, environment artifacts, and runtime information. File Format Aspect Details Format JSON Default name recce_state.json Location dbt project root Contents The state file contains the following information: Checks : Data from the checks added to the checklist on the Checklist page Runs : Each diff execution in Recce corresponds to a run, similar to a query in a data warehouse. Typically, a single run submits a series of queries to the warehouse and retrieves the final results Environment Artifacts : Includes manifest.json and catalog.json files for both the base and current environments Runtime Information : Metadata such as Git branch details and pull request (PR) information from the CI runner Saving the State File There are multiple ways to save the state file. Save from Web UI Click the Save button at the top of the app. Recce will continuously write updates to the state file, effectively working like an auto-save feature, and persist the state until the Recce instance is closed. The file is saved with the specified filename in the directory where the recce server command is run. Export from Web UI Click the Export button located in the top-right corner to download the current Recce state to any location on your machine. Start with State File Provide a state file as an argument when launching Recce. If the file does not exist, Recce will create a state file and start with an empty state. If the file exists, Recce will load the state and continue working from it. recce server my_recce_state.json Using the State File The state file can be used in several ways: Continue State Launch Recce with the specified state file to continue from where you left off. recce server my_recce_state.json Review Mode Running Recce with the --review option enables review mode. In this mode, Recce uses the dbt artifacts in the state file instead of those in the target/ and target-base/ directories. This option is useful for distinguishing between development and review purposes. recce server --review my_recce_state.json Import Checklist To preserve favorite checks across different branches, import a checklist by clicking the Import button at the top of the checklist. Continue from recce run Execute the checks in the specified state file. recce run --state-file my_recce_state.json Workflow Examples Development Workflow In the development workflow, the state file acts as a session for developing a feature. It allows you to store checks to verify the diff results against the base environment. Run the recce server without a state file recce server Add checks to the checklist Save the state by clicking the Save or Export button Resume your session by launching Recce with the specific state file recce server recce_issue_1.json PR Review Workflow During the PR review process, the state file serves as a communication medium between the submitter and the reviewer. Start the Recce server without a state file recce server Add checks to the checklist Save the state by clicking the Save or Export button Share the state file with the reviewer or attach it as a comment in the pull request The reviewer reviews the results using the state file recce server --review recce_issue_1.json CLI Options Option Description recce server Start server with state file recce server --review Start in review mode using state file artifacts recce run --state-file Run checks from state file Default Behavior If no state file is specified, Recce starts with an empty state State files are saved to the current working directory by default Review mode ( --review ) uses artifacts embedded in the state file Related CLI Commands - Command-line options Configuration - Preset check configuration PR Review Workflow - Using state files in reviews --- ## https://docs.reccehq.com/using-recce/admin-setup/ Set Up Your Organization After connecting your Git repo to Recce Cloud , you need to configure your organization so your team can collaborate on PR validation. Goal: Configure your Cloud organization for team collaboration. When you sign up for Cloud, you get one organization and one project. After connecting to Git, your organization and project names automatically map to your Git provider's names. You can rename them and invite team members. Prerequisites Cloud account with owner/admin access Git repository connected to Cloud Team members' email addresses Steps 1. Access organization settings Navigate to your organization configuration. Log in to Cloud Click Settings → Organization in the side panel Expected result: Organization settings page displays your current organization. 2. Rename your organization (optional) Update the organization name to match your company or team. In Organization Settings, find the Organization Name field Enter your preferred name Click Save Expected result: Organization name updates across all Cloud pages. 3. Set up additional projects (monorepo) For monorepo users If your repository contains multiple dbt projects, set up additional projects before inviting team members. Skip this step if you have a single dbt project. In Organization Settings, navigate to Projects Click Add Project Enter the project name and select the subdirectory path Click Create Expected result: New project appears in the project list and sidebar. 4. Rename your project (optional) Update the project name if needed. In Organization Settings, navigate to Projects Click on the project you want to rename Enter the new project name Click Save Expected result: Project name updates in the sidebar and project list. 5. Invite team members Add collaborators to your organization. In Organization Settings, find the Members section Click Invite Members Enter email addresses (use SSO email if members use SSO login) Select a role for each invitee Click Send Invitation Tell invitees: when they log in, a modal appears asking them to accept the invitation. See For Invited Users Role Permissions Owner The one who created this organization. Full organization management: update info, manage roles, remove members Admin Same permissions as Owner Member Upload metadata, launch Recce instances, view organization info SSO login requires Team plan or above SSO login is available on the Team plan and above. See Pricing for plan details. Expected result: Invitees receive email invitations and see notifications when logged in. Verify Success Confirm your setup by checking: Organization name displays correctly in the sidebar Invited members appear in the Members list (pending or active) All projects are listed under Settings → Projects Troubleshooting Issue Solution Invitation not received Check spam folder; verify email address matches SSO provider Member sees their own org, not company org They may have signed up with a different email than the one you invited; ask them to log in with the invited email Cannot change organization name Confirm you have Admin role Project not appearing Refresh the page; verify the subdirectory path is correct For Invited Users When you receive an invitation: Immediate response: A notification modal appears on login. Accept or decline directly Later: Navigate to Settings → Organization to view pending invitations Next Steps Data Developer Workflow : Learn how developers validate changes Data Reviewer Workflow : Learn how reviewers approve PRs --- ## https://docs.reccehq.com/using-recce/cli-commands/ CLI Commands Reference for the Recce command-line tools. Overview Recce provides two pip packages: recce-cloud ( pip install recce-cloud ) - Lightweight library for Cloud operations. Cloud users only need this to connect to sessions. Prerequisite: Recce Cloud setup completed recce ( pip install recce ) - Full OSS library with local server for data validation and diffing. Prerequisite: OSS Setup completed recce-cloud Commands Connect to Cloud sessions locally or upload artifacts in CI/CD pipelines. Installation pip install recce-cloud Connect to Cloud sessions Step 1: Authenticate # Check current auth status recce-cloud login --status # If not logged in: recce-cloud login Step 2: Initialize the project Link your local project to your Cloud organization: recce-cloud init Step 3: Find your session # List all sessions recce-cloud list # Filter by type recce-cloud list --type pr Step 4: Launch the server recce server --session-id The server runs locally on http://localhost:8000 but fetches state from Cloud. Your changes automatically sync back when you close the session. CI/CD integration Upload dbt artifacts to Cloud in your pipelines: # CD workflow: Upload baseline after merge to main recce-cloud upload --type prod # CI workflow: Upload PR artifacts (auto-detected) recce-cloud upload Environment variables: Platform Variable Description GitHub GITHUB_TOKEN Authentication token (automatic in Actions) GitLab CI_JOB_TOKEN Authentication token (automatic in CI/CD) See Setup CI and Setup CD for complete guides. Command reference recce-cloud login recce-cloud login [ OPTIONS ] Option Description --status Check current authentication status recce-cloud init recce-cloud init Links your local project to a Cloud organization and project. recce-cloud list recce-cloud list [ OPTIONS ] Option Description --type Filter by session type: pr , dev , prod recce-cloud upload recce-cloud upload [ OPTIONS ] Option Description --type Session type: prod for baseline, omit for auto-detection --target-path Path to dbt artifacts. Default: target/ --dry-run Test configuration without uploading recce Commands Run data validation locally with the full OSS library. Installation pip install recce Local development workflow # Start interactive session recce server # Continue from saved state recce server my_state.json # Share with reviewer (they load in review mode) recce server --review my_state.json See OSS Workflow for the complete guide. Command reference recce server recce server [ OPTIONS ] [ STATE_FILE ] Arguments: Argument Description STATE_FILE Optional state file path. Loads if exists, creates if not. Options: Option Description --session-id Connect to a Cloud session by ID --review Review mode. Uses artifacts from state file instead of target/ --port Port to run the server on. Default: 8000 --api-token API token for Cloud connection Notes: Runs on http://localhost:8000 by default For OSS usage, requires artifacts in target/ and target-base/ unless using --review mode recce run Executes preset checks and saves results to a state file. recce run [ OPTIONS ] Option Description --state-file Path to state file. Default: recce_state.json --github-pull-request-url GitHub PR URL for CI context recce summary Generates a summary report from a state file. recce summary recce debug Verifies configuration and environment setup. recce debug Checks for required artifacts and warehouse connection. recce --help Display all available commands and options. recce --help Troubleshooting Port already in use If port 8000 is already in use, specify a different port: recce server --port 8001 Summary fails with "state file is required for review mode" recce summary recce-state.json [Error] Failed to generate summary The state file is required for review mode The recce summary command requires artifacts embedded in the state file. State files from local sessions may not include artifacts. Export a state file from a review session or use Cloud for summary generation. Related Configuration - Preset check configuration State File - State file format --- ## https://docs.reccehq.com/using-recce/data-developer/ Data Developer Workflow Validate data changes throughout your development lifecycle. This guide covers validating changes before creating a PR (dev sessions) and iterating on feedback after your PR is open. Goal: Validate data changes at every stage of development, from local work through PR merge. Prerequisites Recce Cloud account dbt project with CI/CD configured for Recce Access to your data warehouse Development Stages Before PR: Dev Sessions Validate changes locally before pushing to remote. Dev sessions let you run Recce validation without creating a PR. Since your CD workflow automatically maintains the base environment, you just upload your local target/ artifacts as the current environment to compare against production. When to Use Dev Sessions Testing changes before committing Validating complex refactoring locally Exploring impact without creating a PR Sharing work-in-progress with teammates Upload via Web UI Go to Cloud Navigate to your project Click New Dev Session Upload your dbt artifacts: target/manifest.json target/catalog.json Expected result: Dev session created. Recce validates your changes against the production base. Upload via CLI Run from your dbt project directory: recce-cloud upload --type dev This uploads your current target/ artifacts and creates a dev session. Required files: File Location Generated by manifest.json target/ dbt run , dbt build , or dbt compile catalog.json target/ dbt docs generate Review Your Changes After uploading, you can review your changes in Cloud: Trigger agent review - Click Data Review to generate a summary of your changes Read the summary - The agent analyzes impact, runs validation checks, and explains what changed Launch Recce instance - Click Launch Recce to explore lineage, run data diffs, and investigate deeper After PR: CI/CD Validation Once you push changes and open a PR, the Recce Agent validates automatically. What Happens Your CI pipeline runs recce-cloud upload The agent compares your PR branch against the base branch The agent runs validation checks based on detected changes A data review summary posts to your PR Understanding the Agent Summary The summary shows key changes, impact analysis, checklist results, and suggested actions. See Reading the Summary for details. Fixing Issues When the agent identifies issues: Review the validation results in the PR comment Click Launch Recce to explore details in the web UI Identify the root cause using lineage and data diffs Make fixes in your branch Push changes - the agent re-validates automatically Iterating Until Checks Pass Each push triggers a new validation cycle: Agent re-analyzes your changes New validation results post to the PR Previous results are updated (not duplicated) Continue until all checks pass Validation Techniques Check Lineage First Start with lineage diff to understand your change scope: Modified models highlighted in the DAG Downstream impact visible at a glance Schema changes shown per model Validate Metadata Low-cost checks using model metadata. See Data Diffing for details: Schema diff - Column additions, removals, type changes Row count diff - Record count comparison (uses warehouse metadata) Validate Data Higher-cost checks that query your warehouse: Value diff - Column-level match percentage Profile diff - Statistical comparison (count, distinct, min, max, avg) Histogram diff - Distribution changes for numeric columns Top-K diff - Distribution changes for categorical columns Custom Queries For flexible validation, use query diff: SELECT date_trunc ( 'month' , order_date ) AS month , SUM ( amount ) AS revenue FROM {{ ref ( 'orders' ) }} GROUP BY month ORDER BY month DESC Add queries to your checklist for repeated use. Add to Checklist After running validation checks, add them to your checklist for reviewers: Run a validation (row count, profile, value diff, etc.) Click Add to Checklist to save the result Add a description explaining what the check validates and what reviewers should look for Write clear descriptions that help reviewers understand: What changed - The specific model or column being validated Why it matters - Business context or downstream impact What to verify - Expected behavior or acceptable thresholds Good descriptions reduce back-and-forth and speed up PR approval. See Checklist for more details. Verification Confirm your workflow works: Before PR: Make a small model change locally Generate artifacts: dbt build && dbt docs generate Upload dev session: recce-cloud upload --type dev Verify session appears in Cloud Launch Recce to explore changes, or click Data Review to trigger agent validation Iterate on your changes until validation passes After PR: Create PR and confirm agent posts summary Launch Recce and add validation checks to checklist Push a fix and confirm agent re-validates Confirm reviewers can approve checks Troubleshooting Issue Solution Dev session upload fails Check artifacts exist in target/ ; run dbt docs generate Agent doesn't run on PR Verify CI workflow includes recce-cloud upload Validation results missing Check warehouse credentials in CI secrets Summary not appearing Confirm GITHUB_TOKEN has PR write permissions Next Steps Data Reviewer Workflow - How reviewers use Recce Admin Setup - Set up your organization Data Review Summary - Understanding agent summaries --- ## https://docs.reccehq.com/using-recce/data-reviewer/ Data Reviewer Workflow Review data changes in pull requests using Recce. Your admin set up Recce for your team - here's how to use it as a reviewer. Goal: Review and approve data changes in PRs with confidence. Prerequisites Recce Cloud account (via team invitation) Access to the project in Cloud PR with Recce validation results Reviewing a PR 1. Find the Data Review Summary When a PR modifies dbt models, the Recce Agent posts a summary comment. See Data Review Summary for details on what the agent analyzes. Open the PR in GitHub/GitLab Scroll to the Recce bot comment Review the summary sections Expected result: Summary shows change overview, impact analysis, and validation results. 2. Understand the Summary The summary shows key changes, impact analysis, checklist results, and suggested actions. See Reading the Summary for details. 3. Review the Checklist Verify the checklist covers the impacts identified in the summary. Check each validation result for Pass, Warning, or Fail status. If impacted models lack validation checks, consider running additional diffs in Cloud. Approve individual checks as you review them to track your progress and signal to other reviewers which validations have been verified. Activity Alongside the checklist, use the Activity panel to track all session events—approvals, comments, and updates. Leave comments, request clarifications, or discuss specific validation results directly in the session. 4. Explore in Cloud For deeper investigation: Click Launch Recce in the PR comment (or go to Cloud) Select the PR session from the list Explore the changes interactively What you can do: View lineage diff to see affected models Drill into schema changes per model Run additional data diffs (row count, profile, value, etc.) Execute custom queries to investigate specific concerns 5. Approve or Request Changes Based on your review: Approve the PR: Validation results meet expectations Impact scope is understood and acceptable No unexpected data changes Request changes: Validation failures need investigation Impact scope is broader than expected Questions about specific changes PR Blocking Checks Recce checks appear as status checks on your PR. The blocking is from the Recce Cloud GitHub App and runs automatically. When configured as required checks, reviewers must approve all checks in the checklist before the PR can be merged. Common Review Scenarios Schema Changes When columns are added, removed, or modified: Check if downstream models are affected Verify the change is intentional Confirm breaking changes are coordinated Row Count Differences When record counts change: Determine if the change is expected Check if filters or joins were modified Verify the magnitude is reasonable Performance Impact When models are refactored: Compare query complexity Check for unintended full table scans Review impact on downstream refresh times Verification Confirm you can review PRs: Open a PR with Recce validation results Find the Recce bot comment Click Launch Recce to open the session Navigate the lineage and view a diff result Next Steps Data Developer Workflow - How developers validate changes Data Review Summary - Understanding the agent summary Checklist - Track validation checks across PRs Share Validation Results - Share findings with your team --- ## https://docs.reccehq.com/using-recce/oss-workflow/ Data Workflow in OSS Validate data changes locally using Recce open source (OSS). This guide covers saving your work, managing checklists, and sharing results with reviewers. Goal: Validate data changes locally and share evidence with PR reviewers. Prerequisites Recce OSS installed dbt project with base and current environments configured Access to your data warehouse Development Workflow 1. Run Validation Checks Start the Recce server and validate your changes: recce server Use lineage diff , schema diff, and data diffs to validate your changes. Add important checks to your checklist with descriptions explaining what reviewers should verify. 2. Iterate as You Develop When you update your dbt models locally, Recce automatically detects changes to your target/ artifacts. The lineage diff updates to reflect your latest changes, so you can keep validating as you develop. Make changes to your dbt models Run dbt build or dbt run to update artifacts Recce refreshes automatically—check the updated lineage and re-run validations 3. Save Your State Switching branches is often unavoidable during development. Save your current state for future use: Click Export in the Recce UI Save the state file (e.g., recce_issue_123.json ) To resume later, start Recce with the state file: recce server recce_issue_123.json 4. Import Checklist Reuse checks from previous sessions: Go to the Checklist page Click the Import icon Select a state file to import checks from This preserves your favorite checks across branches. 5. Share with Reviewers When ready for PR review, share your validation results. As the submitter: Export your state file Attach the state file to your PR comment Use Copy to Clipboard to paste screenshots in PR comments As the reviewer: Download the state file from the PR Run Recce in review mode: recce server --review The --review option uses artifacts from the state file to connect to both base and PR environments. Required files You still need profiles.yml and dbt_project.yml so Recce knows which credentials to use for the data warehouse connection. Validation Techniques See Data Diffing for available validation methods: Schema diff - Column additions, removals, type changes Row count diff - Record count comparison Value diff - Column-level match percentage Profile diff - Statistical comparison Query diff - Custom SQL validation Verification Confirm your workflow works: Make a model change and run dbt build && dbt docs generate Start Recce: recce server Add a validation check to your checklist Export the state file Start a new Recce session and import the checklist Verify checks imported correctly Next Steps Share - More sharing options including Cloud upload Checklist - Managing validation checks State File - State file reference --- ## https://docs.reccehq.com/what-you-can-explore/breaking-change-analysis/ Breaking Change Analysis Breaking Change Analysis examines modified models and categorizes changes into three types: Breaking changes Partial breaking changes Non-breaking changes It's generally assumed that any modification to a model’s SQL will affect all downstream models. However, not all changes have the same level of impact. For example, formatting adjustments or the addition of a new column should not break downstream dependencies. Breaking change analysis helps you assess whether a change affects downstream models and, if so, to what extent. Usage Use the impact radius view to analyze changed and see the impacted downstream. Categories of change Non-breaking change No downstream models are affected. Common cases are adding new columns, comments, or formatting changes that don't alter logic. Example: Add new columns Adding a new column like status doesn't affect models that don't reference it. select user_id, user_name, ++ status, from {{ ref("orders") }} Partial breaking change Only downstream models that reference specific columns are affected. Common cases are removing, renaming, or redefining a column. Example: Removing a column select user_id, -- status, order_date, from {{ ref("orders") }} Example: Renaming a column select user_id, -- status ++ order_status from {{ ref("orders") }} Example: Redefining a column select user_id, -- discount ++ coalesce(discount, 0) as discount from {{ ref("orders") }} Breaking change All downstream models are affected. Common case are changes adding a filter condition or adding group by columns. Example: Adding a filter condition This may reduce the number of rows, affecting all downstream logic that depends on the original row set. select user_id, order_date from {{ ref("orders") }} ++ where status = 'completed' Example: Adding a GROUP BY column Changes the granularity of the result set, which can break all dependent models. select user_id, ++ order_data, count(*) as total_orders from {{ ref("orders") }} -- group by user_id ++ group by user_id, order_date Limitations Our breaking change analysis is intentionally conservative to prioritize safety. As a result, a modified model may be classified as a breaking change when it is actually non-breaking or partial breaking changes. Common cases include: Logical equivalence in operations, such as changing a + b to b + a . Adding a LEFT JOIN to a table and selecting columns from it. This is often used to enrich the current model with additional dimension table data without affecting existing downstream tables. All modified python models or seeds are treated as breaking change. When to Use Determine which downstream models need validation after a change Prioritize review effort based on impact severity Understand if a refactor will break dependent models Assess risk before merging model changes Technology Breaking Change Analysis is powered by the SQL analysis and AST diff capabilities of SQLGlot to compare two SQL semantic trees. Related Impact Radius - Visualize affected downstream models Column-Level Lineage - Trace column dependencies Code Change - Review the actual SQL modifications --- ## https://docs.reccehq.com/what-you-can-explore/code-change/ Code Change The Code Change feature allows you to compare the SQL code changes between your current branch and the base branch, helping you understand exactly what has been modified in your dbt models. Viewing Code Change When you identify a modified model in the Lineage Diff , you can examine the specific code changes to understand the nature of the modifications. Opening Code Change To view the code changes for a model: Click on any modified (orange) model node in the lineage view In the node details panel that opens, navigate to the Code tab The code diff will display showing the changes between branches Viewing code changes for a modified model Understanding the Code Diff The code diff uses standard diff formatting to highlight changes: Red lines (with - prefix) show code that was removed Green lines (with + prefix) show code that was added Unchanged lines appear in normal formatting for context This visual comparison makes it easy to identify: - New columns or transformations - Modified business logic - Changes to joins or filters - Updated column names or data types Full Screen View For complex changes or detailed review, you can expand the code diff to full screen: Click the expand button in the top-right corner of the code diff panel Review the changes in the larger view for better readability Use this view when conducting thorough code reviews or sharing changes with team members Full-screen view for detailed code review Why Code Diff Matters Understanding code changes is essential for: Impact Assessment : Determining if changes affect downstream models or reports Code Review : Validating that modifications align with business requirements Collaboration : Clearly communicating what changed to stakeholders Quality Assurance : Ensuring changes don't introduce errors or break existing logic Next Steps After reviewing code changes, you can: Examine the impact radius to see which downstream models are affected Run data diffs to validate that the changes produce expected results Add your findings to the collaboration checklist for team review Best Practice Always review code changes alongside data validation checks to ensure your modifications produce the expected results and don't break downstream dependencies. When to Use Review what SQL logic changed before running data diffs Understand the scope of a PR during code review Identify which columns or joins were modified Document changes for team communication Related Breaking Change Analysis - Classify impact severity Impact Radius - See affected downstream models Data Diffing - Validate data changes --- ## https://docs.reccehq.com/what-you-can-explore/column-level-lineage/ Column-Level Lineage Column-Level Lineage provides visibility into the upstream and downstream relationships of a column. Common use-cases for column-level lineage are: Source Exploration : During development, column-level lineage helps you understand how a column is derived. Impact Analysis : When modifying the logic of a column, column-level lineage enables you to assess the potential impact across the entire DAG. Root Cause Analysis : Column-level lineage helps identify the possible source of errors by tracing data lineage at the column level. Usage Select a node in the lineage DAG, then click the column you want to view. The column-level lineage for the selected column will be displayed. To exit column-level lineage view, click the close button in the upper-left corner. Transformation Types The transformation type is also displayed for each column, which will help you understand how the column was generated or modified. Type Description Pass-through The column is directly selected from the upstream table. Renamed The column is selected from the upstream table but with a different name. Derived The column is created through transformations applied to upstream columns, such as calculations, conditions, functions, or aggregations. Source The column is not derived from any upstream data. It may originate from a seed/source node, literal value or data generation function. Unknown We have no information about the transformation type. This could be due to a parse error or other unknown reason. When to Use Trace where a column's data originates Understand which downstream columns depend on a specific column Assess the impact of modifying a column's logic Debug data quality issues by following the transformation chain Related Impact Radius - See column-level impact on downstream models Breaking Change Analysis - Classify change severity Data Diffing - Validate column-level data changes --- ## https://docs.reccehq.com/what-you-can-explore/data-diffing/ Data Diffing Data diffing validates that your model changes produce the expected results. Each diff type serves a different validation purpose, from quick row counts to detailed value comparisons. Overview Diff Type Purpose Query Cost Best For Row Count Compare record counts Low Quick sanity check Profile Column-level statistics Medium Distribution analysis Value Row-by-row comparison High Exact match verification Top-K Categorical distribution Medium Categorical columns Histogram Numeric distribution Medium Numeric columns Query Custom SQL comparison Varies Flexible validation Choosing the Right Diff A common approach is to start with lightweight checks and progressively drill down as needed. This decision tree provides a suggested workflow: Start with Row Count │ ├─ Counts match? → Profile Diff for deeper stats │ └─ Counts differ? │ ├─ Expected? → Document in checklist │ └─ Unexpected? → Value Diff to find specific changes │ └─ For specific columns: • Categorical → Top-K Diff • Numeric → Histogram Diff • Custom logic → Query Diff Working with results After running any diff, you can rerun, export, or download the results. Rerun - Refresh after model changes Export - Copy as image, text, or CSV for PR comments and Slack Download - Save as CSV, TSV, or Excel for offline analysis Result options: rerun, export, and download Row Count Diff Compare the number of rows between base and current environments. When to use: Quick validation that filters or joins didn't unexpectedly add or remove records. Running Row Count Diff Click a model in the Lineage DAG Click Explore Change > Row Count Diff Row Count Diff for a single model Interpreting Results Result Meaning Count unchanged No records added or removed Count increased New records added (check if expected) Count decreased Records removed (verify filters/joins) Profile Diff Compare column-level statistics between environments. When to use: Validate that transformations didn't unexpectedly change data distributions. Statistics Compared Statistic Description Row count Total records Not null % Proportion of non-null values Distinct % Proportion of unique values Distinct count Number of unique values Is unique Whether all values are unique Min / Max Range of values Average / Median Central tendency Running Profile Diff Select a model from the Lineage DAG Click Explore Change > Profile Diff Profile Diff showing column statistics Interpreting Results Look for unexpected changes in: Null rates - Did a column become more/less nullable? Distinct counts - Did cardinality change unexpectedly? Min/Max - Did value ranges shift? Value Diff Compare actual values row-by-row using primary keys. When to use: Verify exact data matches when precision matters. How It Works Value Diff uses primary keys to match records between environments, then compares each column value. Primary keys are auto-detected from columns with the unique test. Value Diff showing match percentages Result Columns Column Meaning Added New PKs in current (not in base) Removed PKs in base (not in current) Matched Count of matching values for common PKs Matched % Percentage match for common PKs Viewing Mismatches Click show mismatched values on a column to see row-level differences: Top-K Diff Compare the distribution of categorical columns by showing the most frequent values. When to use: Validate categorical data hasn't shifted unexpectedly (status codes, categories, regions). Running Top-K Diff Via Explore Change: Select model > Explore Change > Top-K Diff Select a column Click Execute Via Column Menu: Hover over a column in Node Details Click ... > Top-K Diff Generate a Top-K Diff from the column menu Options Option Description Top 10 Default view Top 50 Expanded view for more categories Top-K Diff comparing category distributions Histogram Diff Compare the distribution of numeric columns using binned histograms. When to use: Validate numeric distributions haven't shifted (amounts, scores, durations). Running Histogram Diff Via Explore Change: Select model > Explore Change > Histogram Diff Select a numeric column Click Execute Via Column Menu: Hover over a numeric column Click ... > Histogram Diff Generate a Histogram Diff from column options Histogram Diff showing overlaid distributions Query Diff Write custom SQL to compare any query results between environments. When to use: Flexible validation for complex scenarios not covered by standard diffs. Running Query Diff Open the Query page Write SQL using dbt syntax: select * from {{ ref ( "mymodel" ) }} Click Run Diff Query Diff interface Comparison Modes Mode When to Use How It Works Client-side No primary key Fetches first 2,000 rows, compares locally Warehouse Primary key specified Compares in warehouse, shows only differences Keyboard Shortcuts (Mac) ⌘ Enter - Run query ⌘ ⇧ Enter - Run query diff Result Options Option Description Primary Key Click key icon to set comparison key Pinned Column Show specific columns first Changed Only Hide unchanged rows and columns Query Diff with filtering options Related Validation Techniques - How to use data diffing in your workflow Lineage Diff - Visualize change impact Checklist - Save validation results --- ## https://docs.reccehq.com/what-you-can-explore/impact-radius/ Impact Radius Impact Radius helps you analyze changes and identify downstream impacts at the column level. While dbt provides a similar capability using the state selector with state:modified+ to identify modified nodes and their downstream dependencies, Recce goes further. By analyzing SQL code directly, Recce enables fine-grained impact radius analysis . It reveals how changes to specific columns can ripple through your data pipeline, helping you prioritize which models—and even which columns—deserve closer attention. Impact Radius state:modified+ Usage Show impact radius Click the Impact Radius button in the upper-left corner. The impact radius will be displayed. To exit impact radius view, click the close button in the upper-left corner. Show impact radius for a single changed model Hover over a changed model, then click the target icon or right-click the model and click the Show Impact Radius The impact radius for this model will be displayed. To exit impact radius view, click the close button in the upper-left corner. Impact Radius of a Column The right side of the Column-Level Lineage (CLL) graph represents the impact radius of a selected column. This view helps you quickly understand what will be affected if that column changes. What does the impact radius include? Downstream columns that directly reference the selected column Downstream models that directly depend on the selected column All indirect downstream columns and models that transitively depend on it This helps you evaluate both the direct and downstream effects of a column change, making it easier to understand its overall impact. Example: Simplified Model Chain Given the following models, here's how changes to stg_orders.status would impact downstream models: -- stg_orders.sql select order_id , customer_id , status , ... from {{ ref ( "raw_orders" ) }} -- orders.sql select order_id , customer_id , status , ... from {{ ref ( "stg_orders" ) }} -- customers.sql select c . customer_id , ... from {{ ref ( "stg_customers" ) }} as c join {{ ref ( "stg_orders" ) }} as o on c . customer_id = o . customer_id where o . status = 'completed' group by c . customer_id -- customer_segments.sql select customer_id , ... from {{ ref ( "customers" ) }} The following impact is detected: orders : This model is partially impacted, as it selects the status column directly from stg_orders but does not apply any transformation or filtering logic. The change is limited to the status column only. customers : This model is fully impacted, because it uses status in a WHERE clause ( where o.status = 'completed' ). Any change to the logic in stg_orders.status can affect the entire output of the model. customer_segments : This model is indirectly impacted, as it depends on the customers model, which itself is fully impacted. Even though customer_segments does not directly reference status , changes can still propagate downstream via its upstream dependency. How it works Two core features power the impact radius analysis: Breaking Change Analysis classifies modified models into three categories: Breaking changes : Impact all downstream models Non-breaking changes : Do not impact any downstream models Partial breaking changes : Impact only downstream models or columns that depend on the modified columns Column-level lineage analyzes your model's SQL to identify column-level dependencies: Which upstream columns are used as filters or grouping keys. If those upstream columns change, the current model is impacted. Which upstream columns a specific column references. If those upstream columns change, the specific column is impacted. Putting It Together With the insights from the two features above, Recce determines the impact radius: If a model has a breaking change , include all downstream models in the impact radius. If a model has a non-breaking change , include only the downstream columns and models of newly added columns. If a model has a partial breaking change , include the downstream columns and models of added, removed, or modified columns. When to Use Identify which downstream models need validation after a change Prioritize data diff effort based on actual impact Assess risk before merging model modifications Understand the blast radius of column-level changes Related Breaking Change Analysis - Understand how changes are classified Column-Level Lineage - Trace column dependencies Data Diffing - Validate data changes in impacted models --- ## https://docs.reccehq.com/what-you-can-explore/lineage-diff/ Lineage Diff The Lineage view shows how your data model changes impact your data pipeline. It visualizes the potential area of impact from your modifications, helping you determine which models need further investigation. How It Works Recce compares your base and current branch artifacts to identify: Dependencies - Which models depend on others Change Impact - How modifications ripple through your pipeline Data Flow - The path data takes from sources to final outputs Interactive lineage graph showing modified models Visual Status Indicators Models are color-coded to indicate their status: Color Status Green Added (new to your project) Red Removed (deleted from your project) Orange Modified (changed code or configuration) Gray Unchanged (shown for context) Model node with status indicators Change Detection Icons Each model displays icons in the bottom-right corner: Row Count Icon - Shows when row count differences are detected Schema Icon - Shows when column or data type changes are detected Grayed-out icons indicate no changes were detected. Model with schema change detected Filtering and Selection Filter Options In the top control bar: Filter Description Mode Changed Models (modified + downstream) or All Package Filter by dbt package names Select Select nodes by node selection Exclude Exclude nodes by node selection Selecting Models Click a node to select it, or use Select nodes to select multiple models for batch operations. Row Count Diff by Selector Run row count diff on selected nodes: Use select and exclude to filter nodes Click the ... button in the top-right corner Click Row Count Diff by Selector Investigating Changes Node Details Panel Click any model to open the node details panel: Open the node details panel From this panel you can: View model metadata (type, materialization) Examine schema changes Run validation checks Add findings to your checklist Available Validations Click Explore Change to access: Row Count Diff - Compare record counts Profile Diff - Analyze column statistics Value Diff - Identify specific value changes Top-K Diff - Compare common values Histogram Diff - Visualize distributions Node details with exploration options Schema Diff Schema diff identifies structural changes to your models: Added columns - New fields (shown in green) Removed columns - Deleted fields (shown in red) Renamed columns - Changed names (shown with arrows) Data type changes - Modified column types Interactive schema diff showing column changes Requirements Schema diff requires catalog.json in both environments. Run dbt docs generate in both before starting your Recce session. When to Use Starting your review - Get an overview of all changes and their downstream impact Identifying affected models - Find models that need validation Understanding dependencies - See how changes propagate through your pipeline Scoping your validation - Determine which models to diff Related Validation Techniques - How to use lineage in your workflow Code Change - View SQL changes for a model Column-Level Lineage - Trace column dependencies Multi-Model Selection - Batch operations on models Data Diffing - Validate data changes --- ## https://docs.reccehq.com/what-you-can-explore/multi-models/ Multi-Models Multi-Models Selection Multiple models can be selected in the Lineage DAG. This enables actions to be performed on multiple models at the same time such as Row Count Diff, or Value Diff. Select Models Individually To select multiple models individually, click the checkbox on the models you wish to select. Select multiple models individually Select Parent or Child models To select a node and all of its parents or children: Click the checkbox on the node Right-click the node Click to select either parent or child models Select a node and its parents or children Perform actions on multiple models After selecting the desired models, use the Actions menu at the top right of the screen to perform diffs or add checks. Perform actions on multiple models Example - Row Count Diff An example of selecting multiple models to perform a multi-node row count diff: Perform a Row Count Diff on multiple models Example - Value Diff An example of selecting multiple models to perform a multi-node Value Diff: Perform a Value Diff on multiple models Schema and Lineage Diff From the Lineage DAG, click the Actions dropdown menu and click Lineage Diff or Schema Diff from the Add to Checklist section. This will add: Lineage Diff: The current Lineage view, dependent on your model selection options. Schema Diff: A diff of all models if none are selected, or specific selected models . Add a Lineage Diff Check or Schema Check via the Actions dropdown menu Recce supports dbt node selection in the lineage diff . This enables you to target specific resources with data checks by selecting or excluding models. Supported syntax and methods Since Recce uses dbt's built-in node selector, it supports most of the selecting methods. Here are some examples: Select a node: my_model select by tag: tag:nightly Select by wildcard: customer* Select by graph operators: my_model+ , +my_model , +my_model , 1+my_model+ Select by union: model1 model2 Select by intersection: stg_invoices+,stg_accounts+ Select by state: state:modified , state:modified+ Use state method In dbt, you need to specify the --state option in the CLI. In Recce we use the base environment as the state, allowing you to use the selector on the fly. Removed models Another difference is that in dbt, you cannot select removed models. However, in Recce, you can select removed models and also find them using the graph operator. This is a notable distinction from dbt's node selection capabilities. Limitation "result" method not supported "source_status" method not supported. YAML selectors not supported. When to Use Run Row Count Diff across multiple related models at once Perform bulk Value Diff on a set of staging models Validate all models in a specific path or tag Compare schema changes across an entire model group Related Lineage Diff - Visualize model dependencies and changes Data Diffing - Run diffs on selected models Impact Radius - See downstream effects of changes --- ## https://docs.reccehq.com/what-you-can-explore/summary/ Data Review Summary The Data Review Summary provides an AI-powered analysis of your data changes and their impact. The Recce Agent examines your metadata, runs data diffing checks, and delivers insights to guide merge decisions. What It Does Identifies what to validate - Automatically determines which changes need data diffing Runs checks - Executes relevant validation checks and assesses impact Explores changes - Uses data diffing to surface meaningful differences Generates insights - Provides recommendations for informed decision-making Example of Recce agent summary in a GitHub PR comment: How to Generate Automatic (CI/CD) The summary also generates when session metadata is updated: Run recce-cloud upload in your CI workflow Update session metadata through the Cloud UI Automatic (PR) The summary generates automatically when you push a new commit to your PR. Your CI workflow runs recce-cloud upload , which triggers the agent to analyze your changes. Manual (PR Comment) Comment /recce on your GitHub PR to generate a new data review summary. The Recce bot responds with a status update while generating. View Agent Progress When you push a commit to your PR, the Recce bot posts progress updates: 👀 Changes detected 🚀 Summary generating 👍 Summary complete Manual (Cloud UI) Click Data Review in a session to generate a summary on demand. Reading the Summary The validation summary includes these sections: Section What It Shows Summary Overview of PR changes and key impacts Key Changes Models modified, file changes, and specific value differences Impact Analysis Lineage diagram showing downstream dependencies affected Checklist Validation results with run status and impact analysis Suggested Actions Recommended next steps based on findings Status Symbols The agent uses these symbols for Impact Analysis: Symbol Meaning 🔴 Critical - Issue requires attention before merge ⚠️ Warning - Review recommended but not blocking ✅ OK - Validation passed or change is acceptable 📝 Informational - Expected change, documented for context These differ from checklist status icons, which show Pass/Warning/Fail for individual checks. Static Summary Recce OSS generates a static summary from your state file. Unlike the Cloud agent summary, this outputs a Markdown report of your checklist results. recce summary recce_state.json Save to a file for PR comments: recce summary recce_state.json > recce_summary.md When to Use Reviewing PRs - Get a quick understanding of data changes before diving deeper Validating impact - See which downstream models are affected Communicating changes - Share findings with teammates and reviewers Deciding to merge - Use insights to make informed merge decisions Related Data Developer Workflow - How developers use summaries during development Data Reviewer Workflow - How reviewers use summaries for PR review Data Diffing - Validation techniques used by the agent --- ## https://docs.reccehq.com/whats-recce/cloud-vs-oss/ Cloud vs Open Source Validating data changes manually takes time and slows PR review. Recce is a data validation agent. Open Source gives you the core validation engine to run yourself, Cloud gives you the full Agent experience with automated validation on every PR. flowchart LR subgraph Cloud direction LR C1[You open PR] --> C2[Agent validates automatically] C2 --> C3[Summary posted to PR] end subgraph OSS["Open Source"] direction LR O1[You open PR] --> O2[You run checks manually] O2 --> O3[You copy results to PR] end The Core Difference Cloud Open Source Experience The agent works alongside you You run validation manually PR validation Agent validates automatically, posts summary You run checks, copy results to PR During development CLI + Agent assistance CLI tools only Learning curve Agent guides you through validation Learn the tools, run them yourself Cloud Recce Cloud connects to your Git repository and data warehouse so the Recce Agent can validate your data changes automatically. When you open a PR, the agent analyzes your changes, runs validation checks, and posts findings directly to your PR. No manual work required. On pull requests: The Agent runs automatically when you open a PR. It: Analyzes your data model changes Runs relevant validation checks Posts a summary to your PR with findings Updates as you push new commits During development: The agent works with your CLI using Recce MCP . MCP (Model Context Protocol) is an open standard that lets AI tools call Recce directly. With it, the agent can: Answers questions about your changes Suggests validation approaches Helps interpret diff results For your team: Define what "correct" means for your repo with preset checks that apply across all PRs Share validation standards as institutional knowledge so everyone validates the same way Developers and reviewers collaborate on validation, going back and forth until the change is verified Pricing: Cloud is free to start. See Pricing for plan details. Choose Cloud when: You want automated validation on every PR You want Agent assistance during development Your team reviews data changes in PRs Open Source The open source version is the core validation engine you run locally. You control when and how validation happens. Run checks, explore results, and decide what to share. Everything stays on your machine unless you export it. You get: Lineage Diff between branches Data comparison (row count, schema, profile, value, top-k, histogram diff) Query diff for custom validations Checklist to track your checks Recce MCP for AI-assisted validation with Claude, Cursor, and other AI clients Choose OSS when: Exploring Recce before adopting Cloud Working in environments without external connectivity Contributing to Recce development Feature Comparison Feature Cloud OSS Lineage Diff Yes Yes Data diff (row count, schema, profile, value, top-k, histogram diff) Yes Yes Query diff Yes Yes Checklist Yes Yes Agent on PRs Yes No Agent CLI assistance (MCP) Yes Yes Preset checks across PRs Yes Manual Shared validation standards Yes Manual Developer-reviewer collaboration Yes Manual PR comments & summaries Yes No LLM-powered insights Yes No FAQ Can I start with OSS and upgrade to Cloud later? Yes. OSS and Cloud use the same validation engine. Your existing checklists and workflows carry over when you connect to Cloud. Does Cloud require a different setup than OSS? Cloud connects to your Git repository and data warehouse directly. You don't need to generate artifacts locally. The agent handles that automatically. What data does Cloud access? Cloud accesses your dbt artifacts (manifest.json, catalog.json) and runs queries against your data warehouse to perform validation. Your data stays in your warehouse. Getting Started Cloud: Start Free with Cloud OSS: OSS Setup Next Steps Data Developer Workflow : Using Recce throughout development Recce MCP Server : AI-assisted validation with Claude, Cursor, Windsurf Recce Claude Plugin : Guided setup for Claude Code users --- ## https://docs.reccehq.com/whats-recce/community-support/ Community & Support Here's where you can get in touch with the Recce team and find support: Our discord dbt Slack in the #tools-recce channel Email us help@reccehq.com If you believe you have found a bug on our open source, or there is some missing functionality in Recce, please open a GitHub Issue . Recce on the web You can follow along with news about Recce and blogs from our team in the following places: LinkedIn Recce Blog @datarecce on Twitter/X @DataRecce@mastodon.social on Mastodon @datarecce.bsky.social on BlueSky Subscribe to our newsletter Stay updated with Recce news, data engineering insights, and product updates. Sign up for Recce Updates --- ## https://docs.reccehq.com/whats-recce/glossary/ Glossary Quick definitions for the terms used across the Recce docs. Where a concept has its own page, the entry links there. Recce concepts Recce Cloud. The managed version of Recce. Connects to a Git repository and warehouse, runs validation automatically on every pull request, and posts a summary to the PR. See Cloud vs Open Source . Recce OSS. The open source CLI you run locally against a dbt project. Same validation engine as Cloud, manual orchestration. See OSS Setup . Recce interface. The web UI Recce serves (locally via recce server , in the cloud via the Cloud app) for browsing lineage, code, and data diffs. Sometimes called "Recce" alone in conversational context. Data Review Agent. What Recce is. An agent that performs validation on dbt data changes the way a code review agent reviews code: examines the diff, runs checks, summarizes findings. Validation. Running checks that confirm a data change is correct and intended. Recce's primary verb. Use in place of "testing" when describing what Recce does. Validation check. A single comparison between base and current data (a row count diff, a value diff, a query) that returns a result the reviewer can approve. Preset check. A validation check defined in recce.yml so it runs automatically on every PR for the project. See Preset Checks . Checklist. The collection of validation checks saved against a PR. Travels with the PR as proof-of-correctness. See Checklist . State file. The on-disk file that holds the checklist, run history, and validation results for a Recce session. Format reference: State File . Recce instance. A running Recce server (local recce server or a Cloud session) that holds a single state file. Validation comparisons Base. The reference side of the comparison. Usually the production warehouse, or whatever the team treats as the source of truth for the PR. Current. The PR's side of the comparison. Usually a per-PR schema in the warehouse with the PR's models built into it. Lineage diff. Visualization of which dbt models were added, modified, or removed between base and current. See Lineage Diff . Code change. The SQL and config diff for a single modified model. See Code Change . Column-level lineage. A lineage view that traces relationships at the column level, not just the model level. See Column-Level Lineage . Impact radius. The set of downstream models and columns affected by a change. See Impact Radius . Breaking change analysis. Automated categorization of a modified model as breaking, non-breaking, or partial breaking. See Breaking Change Analysis . Row count diff. Compares the number of rows in a model between base and current. Fastest sanity check. Profile diff. Column-level statistical comparison (min, max, distinct, null counts) between base and current. Value diff. Row-by-row comparison of values for a primary-key join between base and current. Top-K diff. Comparison of the most frequent values in a categorical column. Histogram diff. Comparison of the distribution of a numeric column. Query diff. A custom SQL query run against both base and current with results compared. The full data-diffing toolkit lives on the Data Diffing page. dbt and warehouse terms dbt. The data transformation tool Recce wraps validation around. Always lowercased. dbt model. A SQL transformation defined in a dbt project. The unit of change Recce validates. dbt target. The named warehouse connection defined in profiles.yml (for example dev or prod ). Recce uses two targets per PR: one for base, one for current. dbt artifacts. The manifest.json and catalog.json files dbt produces. Recce reads them to understand the model graph and column types. Data warehouse. Snowflake, BigQuery, Databricks, Redshift, or another analytical warehouse. Where Recce queries actually run. Warehouse connection. Credentials and configuration that point Recce at a specific warehouse. Configured via dbt's profiles.yml . Development stage. The phase of work a change is in (development, review, release). Use this in place of "environment" when "environment" might be misread as "warehouse". Release. Making data changes live in the production warehouse. Use in place of "deploy" when talking about data changes (deploy is reserved for infrastructure). Workflow roles Data developer. The author of a dbt PR. Runs Recce locally or watches Cloud-generated validation results. Reviewer workflow: Data Developer . Data reviewer. The teammate who reviews the PR. Walks the Recce checklist, approves checks, leaves comments. Reviewer workflow: Data Reviewer . Admin. The person who connects Recce Cloud to the repo and warehouse and manages organization access. Setup guide: Admin Setup . Automation CI (continuous integration). Recce's per-PR automation: runs preset checks when a PR opens or updates and posts the result. Setup: Setup CI . CD (continuous delivery). Recce's post-merge automation: refreshes the base state after merges so the next PR has an up-to-date reference. Setup: Setup CD . MCP server. Recce's Model Context Protocol server, used by AI coding agents (Claude Code, Cursor, Windsurf) to call Recce tools through natural language. Setup: MCP Server .