⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

Conversation

@tenpercent
Copy link
Contributor

@tenpercent tenpercent commented Jan 16, 2026

Summary

  • Use compiler builtin __make_integer_seq for sequence_gen and uniform_sequence_gen
  • Optimize sequence_merge using direct concatenation for small cases
  • Reduces template instantiation depth from O(N) to O(1)

Motivation

The recursive sequence_gen_impl creates deep template chains for large sequences, increasing compile time and memory usage.

Why It Works

The compiler builtin __make_integer_seq generates the entire integer sequence in a single step internally, without recursive template instantiation. This reduces instantiation depth from O(N) to O(1), significantly cutting compile time for large sequences.

Additionally, sequence_merge now uses direct concatenation for 2-3 sequences instead of recursive merge, reducing overhead.

Test Plan

  • Added 4 unit tests for sequence_gen with custom functors
  • Waiting for full CI

PR Stack

This PR is part of the build time optimization effort (issue #3575). All PRs now target develop independently:

# PR Description Status
1 #3585 sequence_gen with __make_integer_seq This PR
2 #3628 generate_identity_sequences + named functors New (replaces #3588, #3589)
3 #3590 container_concat optimization Updated
4 #3596 O(1) pack expansion rewrites Updated
5 #3600 TensorDescriptor/TensorAdaptor lambda elimination Updated

Tracking issue: #3575

@shumway
Copy link
Collaborator

shumway commented Jan 16, 2026

Do you want to add unit tests for this, or just rely on the tests of all the code that depends on this? If it's easy to add unit tests, that's generally better, but I'm also fine with moving fast to cut down compilation times.

// generate sequence
template <index_t NSize, typename F>
struct sequence_gen
// Four sequences: direct concatenation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like these specializations. It will be interesting to get a survey of the code to see how often the specializations are used and if these four smallest cases are the most impactful ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using the build traces to drive the optimizations. Maybe removing the unused code is one other aspect which could help with parsing times

@tenpercent
Copy link
Contributor Author

Do you want to add unit tests for this, or just rely on the tests of all the code that depends on this? If it's easy to add unit tests, that's generally better, but I'm also fine with moving fast to cut down compilation times.

let's move fast, in case something important breaks CI will catch it, tests also need maintenance and let's see how much of the metaprogramming is left after the initial sprint

…tiation depth

This change significantly improves compile-time performance by reducing template
instantiation depth for sequence generation and merging operations:

Optimizations:
- sequence_gen: Reduce instantiation depth from O(log N) to O(1) by using
  __make_integer_seq to generate indices in a single step, then applying the
  functor via pack expansion
- uniform_sequence_gen: Similarly optimized to O(1) depth using __make_integer_seq
  with a helper that applies a constant value via pack expansion
- sequence_merge: Reduce depth from O(N) to O(log N) using binary tree reduction
  strategy. Added direct concatenation specializations for 1-4 sequences to
  avoid recursion in common cases, falling back to binary tree merging for 5+
  sequences

Documentation:
- Added extensive inline comments explaining why sequence_merge cannot achieve
  O(1) depth like sequence_gen (requires computing cumulative sequence lengths
  from heterogeneous inputs, inherently requiring recursion)
- Documented the binary tree reduction approach and why it's superior to fold
  expressions for this use case

Testing:
- Added comprehensive unit tests for uniform_sequence_gen with different values,
  sizes, and edge cases
- Added tests for sequence_gen with custom functors (double, square, identity,
  constant) to verify the new implementation works with arbitrary functors
- Added tests for sequence_merge with 4, 5, and many sequences to verify both
  the direct concatenation path and binary tree reduction path
- Added tests for empty sequence edge cases
@tenpercent tenpercent force-pushed the tenpercent/old-ck-pack-rewrites branch from 3dcda67 to 8826238 Compare January 22, 2026 17:26
@tenpercent tenpercent marked this pull request as draft January 22, 2026 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants