⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

Conversation

@vcruces
Copy link
Contributor

@vcruces vcruces commented Jan 28, 2026

Ticket ENG-2293

Description Of Changes

Adds an is_leaf column to the StagedResource model to efficiently identify leaf fields (fields with no children and non-object data types) in detection/discovery results. This column enables faster queries by avoiding expensive runtime calculations of resource_type = 'Field' AND children = [].
The migration includes conditional logic to create the index directly for smaller tables (<1M rows) or defer it to post-upgrade index creation for larger tables to avoid blocking migrations.
A new post-upgrade backfill system is introduced to populate the is_leaf column for existing data. The backfill is designed to be:

  • Idempotent: Safe to run multiple times
  • Resumable: Can be stopped and restarted, will pick up where it left off
  • Non-blocking: Uses small batches with delays and SKIP LOCKED to minimize database impact
  • Error-resilient: Retries transient errors, tracks failures, fails gracefully
    A new admin API endpoint (POST /admin/backfill) is also added for manual triggering of backfills with configurable batch size and delay parameters.

Code Changes

  • Added is_leaf column to StagedResource model with composite index on (monitor_config_id, is_leaf)
  • Added Alembic migration d05acec55c64 to add the column with conditional index creation
  • Added post_upgrade_backfill.py module with batched backfill infrastructure including Redis locking, retry logic, and progress tracking
  • Added POST /admin/backfill endpoint with BACKFILL_EXEC scope for manual backfill triggering
  • Added entry to post_upgrade_index_creation.py for deferred index creation on large tables
  • Added unit tests for backfill logic and admin endpoint

Steps to Confirm

  • Run migrations and verify is_leaf column is added to stagedresource table
  • Verify existing staged resources with is_leaf = NULL get backfilled on application startup
  • Test manual backfill via POST /api/v1/admin/backfill endpoint (requires backfill:exec scope)
  • Verify backfill correctly sets is_leaf = true for fields with no children and non-object data types
  • Verify backfill correctly sets is_leaf = false for databases, schemas, tables, fields with children, or object-type fields

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • All UX related changes have been reviewed by a designer
    • No UX review needed
  • Followup issues:
    • Followup issues created
    • No followup issues
  • Database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!
    • No migrations
  • Documentation:
    • Documentation complete, PR opened in fidesdocs
    • Documentation issue created in fidesdocs
    • If there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
    • No documentation updates required

@vercel
Copy link
Contributor

vercel bot commented Jan 28, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Actions Updated (UTC)
fides-plus-nightly Ignored Ignored Preview Jan 29, 2026 11:18pm
fides-privacy-center Ignored Ignored Jan 29, 2026 11:18pm

Request Review

@vcruces vcruces force-pushed the ENG-2293 branch 2 times, most recently from 964a8b0 to 46c5634 Compare January 28, 2026 20:36
@vcruces vcruces marked this pull request as ready for review January 28, 2026 22:10
@vcruces vcruces requested a review from a team as a code owner January 28, 2026 22:10
@vcruces vcruces requested review from adamsachs and thabofletcher and removed request for a team and thabofletcher January 28, 2026 22:10
@vcruces vcruces added the do not merge Please don't merge yet, bad things will happen if you do label Jan 28, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 28, 2026

Greptile Overview

Greptile Summary

This PR adds an is_leaf column to the StagedResource model to efficiently identify leaf fields in detection/discovery results. The implementation includes a comprehensive backfill infrastructure for handling large tables without blocking migrations.

Key Changes

  • Added nullable is_leaf Boolean column to stagedresource table to identify fields with no children and non-object data types
  • Composite index ix_stagedresource_monitor_config_is_leaf on (monitor_config_id, is_leaf) with partial index where is_leaf IS NOT NULL
  • Migration includes conditional logic: creates index directly for tables <1M rows, defers to post-startup for larger tables
  • New post_upgrade_backfill.py module with batched backfill infrastructure featuring Redis locking, exponential backoff retry logic, and progress tracking
  • New admin API endpoints: POST /admin/backfill to trigger manual backfills and GET /admin/backfill to check status
  • New BACKFILL_EXEC scope for admin backfill operations
  • Comprehensive test coverage for backfill logic and API endpoints

Implementation Quality

The backfill implementation is production-ready with:

  • Idempotent design (safe to run multiple times)
  • Resumable operation (can be stopped and restarted)
  • Non-blocking batched updates with configurable delays
  • FOR UPDATE SKIP LOCKED to prevent lock contention
  • Redis-based distributed locking to prevent concurrent backfills
  • Transient error detection and retry logic with exponential backoff
  • Progress tracking and detailed logging

Minor Observations

  • The is_leaf column uses nullable=True to track "not yet backfilled" status during migration rollout, which is appropriate for this temporary state but deviates from the style guide preference for non-nullable Boolean columns with defaults
  • Once backfills complete globally, consider a follow-up PR to change the column to nullable=False with server_default="f"

Confidence Score: 4/5

  • This PR is safe to merge with careful monitoring during rollout
  • The implementation is well-designed with comprehensive error handling, testing, and production safeguards. The batched backfill approach with Redis locking and retry logic demonstrates careful consideration of database performance and reliability. The minor style guide deviation regarding nullable Boolean columns is justified for the temporary migration state and does not affect functionality.
  • Pay attention to post_upgrade_backfill.py during rollout to monitor backfill progress and ensure Redis locks are properly released

Important Files Changed

Filename Overview
src/fides/api/alembic/migrations/versions/xx_2026_01_26_1451_d05acec55c64_add_is_leaf_to_stagedresource.py Adds is_leaf column with conditional index creation based on table size (<1M rows)
src/fides/api/migrations/post_upgrade_backfill.py Implements batched backfill infrastructure with Redis locking, retry logic, and progress tracking for populating is_leaf column
src/fides/api/api/v1/endpoints/admin.py Adds POST and GET /admin/backfill endpoints with BACKFILL_EXEC scope for manual backfill triggering and status checking
src/fides/api/models/detection_discovery/core.py Adds is_leaf Boolean column (nullable) to StagedResource with composite index on (monitor_config_id, is_leaf)

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do not merge Please don't merge yet, bad things will happen if you do

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants