Top-K Diff
Top-K Diff compares the distribution of a categorical column. The top 10 elements are shown by default, which can be expanded to the top 50 elements.
A Top-K Diff can be generated in two ways.
Via the Explore Change button menu:
- Select the model from the Lineage DAG.
- Click the
Explore Change
button. - Click
Top-K Diff
. - Select a column to diff.
- Click
Execute
.
Via the column options menu:
- Select the model from the Lineage DAG.
- Hover over the column in the Node Details panel.
- Click the vertical 3 dots
...
- Click
Top-K Diff
.
SQL Execution
Top-K Diff generates SQL queries using FULL OUTER JOIN to compare the most frequent values in categorical columns between environments. The queries group by column values and count occurrences to identify the top K categories.
You can review the exact SQL templates in the TopKDiffTask class.