Understanding Lineage Diff
The Lineage view is Recce's main interface for visualizing and analyzing how your dbt model changes impact your data pipeline. It shows you the potential area of impact from your modifications, helping you determine which models need further investigation and validation.
What is Data Lineage?
Data lineage tracks the flow and transformation of data through your dbt project. In Recce, the lineage graph shows:
- Dependencies: Which models depend on others
- Change Impact: How modifications ripple through your pipeline
- Data Flow: The path data takes from sources to final outputs
Viewing the Lineage Graph
From the Lineage view, you can determine which models to investigate further and perform various data validation checks that serve as proof-of-correctness of your work.
Getting Started
When you first open Recce, the lineage graph automatically loads showing only the models affected by your changes. This focused view helps you quickly understand the impact of your work.
Filter Nodes
In the top control bar, you can change the rule to filter the nodes:
- Mode:
- Changed Models: Modified nodes and their downstream + 1st degree of their parents.
- All: Show all nodes.
- Package: Filter by dbt package names.
- Select: Select nodes by node selection.
- Exclude: Exclude nodes by node selection.
Select Nodes
Click a node to select it, or click the Select nodes button in the top-right corner to select multiple nodes for further operations. For detail, see the Multi Nodes Selections section
Row Count Diff
A row count diff can be performed on nodes selected using the select
and exclude
options:
After selecting nodes, run the row count diff by:
- Clicking the 3 dots (...) button in the top-right corner.
- Clicking Row Count Diff by Selector.
Understanding Model Nodes
Visual Status Indicators
Models in the lineage graph are color-coded to indicate their status:
- Green: Added models (new to your project)
- Red: Removed models (deleted from your project)
- Orange: Modified models (changed code or configuration)
- Gray: Unchanged models (shown for context)
Change Detection Icons
Each model node displays two icons in the bottom-right corner that indicate detected changes:
- Row Count Icon : Shows when row count differences are detected
- Schema Icon : Shows when column or data type changes are detected
Grayed-out icons indicate no changes were detected in that category.
Row Count Detection
The row count icon only appears after you've run a row count diff on that specific model. This helps you track which models you've already validated.
Investigating Model Changes
Opening the Node Details Panel
Click on any model in the lineage graph to open the node details panel. This is your starting point for deeper analysis.
Schema Diff
Schema diff helps you understand structural changes to your models.
Requirements
Schema diff requires catalog.json
files in both your base and current environments. Make sure to run dbt docs generate
in both environments before starting your Recce session.
Viewing Schema Changes
Click on a model to view its schema diff in the node details panel.
Types of Schema Changes
Schema diff identifies:
- Added columns: New fields in your model (shown in green)
- Removed columns: Fields that no longer exist (shown in red)
- Renamed columns: Fields that have changed names (shown with arrows)
- Data type changes: Modifications to column types
Code Diff
Understanding the code changes helps you analyze the root cause of data differences.
From any model's node details panel, you can view the exact code changes that were made. This helps you understand:
- What SQL logic was modified
- How transformations changed
- Why data differences might be occurring
Learn more about viewing and analyzing code changes in the Code Diff guide.
Node Details
Node Details Overview
The node details panel provides comprehensive information about the selected model:
From this panel, you can:
- View model information: Node type, materialization, and basic metadata
- Examine changes: See what specifically changed in the model
- Run validations: Execute pre-built data diffs and custom queries
- Add to checklist: Document important findings for review
Available Data Validation Checks
Click the "Explore Change" button to access pre-built validation checks that save time on writing SQL:
- Row Count Diff: Compare the number of rows between environments
- Profile Diff: Analyze column-level statistics and distributions
- Value Diff: Identify specific value changes between datasets
- Top-K Diff: Compare the most common values in your data
- Histogram Diff: Visualize data distribution changes
Custom Query Analysis
Click "Query" to open the query interface where you can:
- Write custom SQL to investigate changes
- Run ad-hoc comparisons between environments
- Validate specific business logic or data quality rules
Building Your Validation Checklist
As you investigate changes, you can add important findings to your checklist for documentation and collaboration purposes.
Collaboration Best Practice
Use the checklist feature to document your validation process. This creates a clear record of what you've tested and verified, making it easier for teammates to review your changes.
Next Steps
After reviewing the lineage changes:
- Validate: Run data diffs on critical models to verify changes are correct
- Document: Add key findings to your checklist with clear descriptions
- Collaborate: Share your analysis with team members for review
- Integrate: Use Recce's workflow integration to automate validation in your CI/CD process
Ready to dive deeper into specific validation techniques? Explore the Data Diffing section to learn about different ways to validate your changes.