-
Notifications
You must be signed in to change notification settings - Fork 268
Optimize sequence_gen and uniform_sequence_gen to reduce template instantiation depth #3585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
a477221 to
57c8cb1
Compare
|
Do you want to add unit tests for this, or just rely on the tests of all the code that depends on this? If it's easy to add unit tests, that's generally better, but I'm also fine with moving fast to cut down compilation times. |
| // generate sequence | ||
| template <index_t NSize, typename F> | ||
| struct sequence_gen | ||
| // Four sequences: direct concatenation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like these specializations. It will be interesting to get a survey of the code to see how often the specializations are used and if these four smallest cases are the most impactful ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using the build traces to drive the optimizations. Maybe removing the unused code is one other aspect which could help with parsing times
let's move fast, in case something important breaks CI will catch it, tests also need maintenance and let's see how much of the metaprogramming is left after the initial sprint |
57c8cb1 to
3d46680
Compare
02dd8d6 to
3dcda67
Compare
…tiation depth This change significantly improves compile-time performance by reducing template instantiation depth for sequence generation and merging operations: Optimizations: - sequence_gen: Reduce instantiation depth from O(log N) to O(1) by using __make_integer_seq to generate indices in a single step, then applying the functor via pack expansion - uniform_sequence_gen: Similarly optimized to O(1) depth using __make_integer_seq with a helper that applies a constant value via pack expansion - sequence_merge: Reduce depth from O(N) to O(log N) using binary tree reduction strategy. Added direct concatenation specializations for 1-4 sequences to avoid recursion in common cases, falling back to binary tree merging for 5+ sequences Documentation: - Added extensive inline comments explaining why sequence_merge cannot achieve O(1) depth like sequence_gen (requires computing cumulative sequence lengths from heterogeneous inputs, inherently requiring recursion) - Documented the binary tree reduction approach and why it's superior to fold expressions for this use case Testing: - Added comprehensive unit tests for uniform_sequence_gen with different values, sizes, and edge cases - Added tests for sequence_gen with custom functors (double, square, identity, constant) to verify the new implementation works with arbitrary functors - Added tests for sequence_merge with 4, 5, and many sequences to verify both the direct concatenation path and binary tree reduction path - Added tests for empty sequence edge cases
3dcda67 to
8826238
Compare
Summary
__make_integer_seqforsequence_genanduniform_sequence_gensequence_mergeusing direct concatenation for small casesMotivation
The recursive
sequence_gen_implcreates deep template chains for large sequences, increasing compile time and memory usage.Why It Works
The compiler builtin
__make_integer_seqgenerates the entire integer sequence in a single step internally, without recursive template instantiation. This reduces instantiation depth from O(N) to O(1), significantly cutting compile time for large sequences.Additionally,
sequence_mergenow uses direct concatenation for 2-3 sequences instead of recursive merge, reducing overhead.Test Plan
sequence_genwith custom functorsPR Stack
This PR is part of the build time optimization effort (issue #3575). All PRs now target develop independently:
__make_integer_seqTracking issue: #3575