Skip to content

Top-K Diff

Top-K Diff compares the distribution of a categorical column. The top 10 elements are shown by default, which can be expanded to the top 50 elements.

Recce Top-K Diff

Recce Top-K Diff

A Top-K Diff can be generated in two ways.

Via the Explore Change button menu:

  1. Select the model from the Lineage DAG.
  2. Click the Explore Change button.
  3. Click Top-K Diff.
  4. Select a column to diff.
  5. Click Execute.

Via the column options menu:

  1. Select the model from the Lineage DAG.
  2. Hover over the column in the Node Details panel.
  3. Click the vertical 3 dots ...
  4. Click Top-K Diff.

Generate a Recce Top-K Diff

Generate a Recce Top-K Diff

SQL Execution

Top-K Diff generates SQL queries using FULL OUTER JOIN to compare the most frequent values in categorical columns between environments. The queries group by column values and count occurrences to identify the top K categories.

You can review the exact SQL templates in the TopKDiffTask class.