-
Notifications
You must be signed in to change notification settings - Fork 268
Rewrite O(N) recursive templates with O(1) pack expansion #3596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
tenpercent
wants to merge
6
commits into
develop
Choose a base branch
from
mpodkory/recursive-to-pack-expansion
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+390
−71
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was referenced Jan 16, 2026
59f0c32 to
5190578
Compare
6d792da to
f5ada17
Compare
5190578 to
887bdf2
Compare
2 tasks
887bdf2 to
02e42dc
Compare
f5ada17 to
9942fd6
Compare
9d67d0d to
c4d95f7
Compare
82b6016 to
602c127
Compare
c4d95f7 to
631df4f
Compare
602c127 to
1713ea7
Compare
Replace O(N) recursive template sequence_map_inverse_impl with constexpr function and pack expansion for O(1) template depth. Results: - sequence_map_inverse: 45 instances, 187ms → 7 instances, 10ms (95% reduction)
Use pack expansion with fold expression to compute element space size instead of recursive template or recursive lambda. Results: - calculate_element_space_size: 24 instances, 35ms → 10 instances, 9ms - Max template depth: 24 → 23
Use operator| with fold expression (Seqs{} | ...) to merge sequences
in O(1) template depth instead of O(log N) binary tree recursion.
- Reduces sequence_merge instantiations from 449 to 167 (63% reduction)
- Total template instantiations: 47,186 → 46,974 (-212)
- ADL finds operator| since Sequence is in ck namespace
631df4f to
ce7bc73
Compare
Added tests for sequence_merge: - MergeEmptySequence: tests merging no sequences - MergeFourSequences: tests merging 4 sequences with varying sizes - MergeWithEmptySequences: tests merging with empty sequences included Added tests for sequence_map_inverse: - InverseReverseMap: tests inverting a reverse permutation - InverseSmallMap: tests inverting a 2-element swap - InverseLargerMap: tests inverting a 5-element permutation Co-Authored-By: Claude <noreply@anthropic.com>
…roaches Detailed comments explain: - Why fold expressions achieve O(1) template depth for sequence_merge - Comprehensive comparison of three approaches: recursive O(N), binary tree O(log N), fold expression O(1) - Trade-offs: C++17 requirement, ADL complexity, debugging difficulty - Clear recommendations for when to use each approach - How operator| enables fold expression pattern via ADL This documentation helps maintainers understand the performance vs complexity trade-offs between different sequence merging strategies.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
sequence_map_inverseusing O(1) depth pack expansioncalculate_element_space_sizewith fold expressionsequence_mergeO(log N) recursion with O(1) fold expressionResults
sequence_map_inversecalculate_element_space_sizesequence_mergeWhy It Works
Template recursion requires N template instantiations for N iterations, each with its own overhead. Constexpr loops and fold expressions execute within a single template instantiation, avoiding per-instantiation overhead.
Test Plan
make_naive_tensor_descriptorhelpersPR Stack
This PR is part of the build time optimization effort (issue #3575). All PRs now target develop independently:
__make_integer_seqTracking issue: #3575