⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

Conversation

@tenpercent
Copy link
Contributor

@tenpercent tenpercent commented Jan 16, 2026

Summary

  • Rewrite sequence_map_inverse using O(1) depth pack expansion
  • Replace O(N) recursive calculate_element_space_size with fold expression
  • Replace sequence_merge O(log N) recursion with O(1) fold expression

Results

Pattern Before After Reduction
sequence_map_inverse 45 inst, 187ms 10 inst, 10ms 95%
calculate_element_space_size 24 inst, 35ms 10 inst, 9ms 73%
sequence_merge O(log N) depth O(1) depth -

Why It Works

Template recursion requires N template instantiations for N iterations, each with its own overhead. Constexpr loops and fold expressions execute within a single template instantiation, avoiding per-instantiation overhead.

Test Plan

  • Added 18 unit tests for make_naive_tensor_descriptor helpers
  • Waiting for full CI

PR Stack

This PR is part of the build time optimization effort (issue #3575). All PRs now target develop independently:

# PR Description Status
1 #3585 sequence_gen with __make_integer_seq Independent
2 #3628 generate_identity_sequences + named functors New (replaces #3588, #3589)
3 #3590 container_concat optimization Independent
4 #3596 O(1) pack expansion rewrites This PR
5 #3600 TensorDescriptor/TensorAdaptor lambda elimination Independent

Tracking issue: #3575

@tenpercent tenpercent force-pushed the mpodkory/generate-tuple-optimizations branch from 59f0c32 to 5190578 Compare January 16, 2026 17:34
@tenpercent tenpercent force-pushed the mpodkory/recursive-to-pack-expansion branch from 6d792da to f5ada17 Compare January 16, 2026 20:16
@tenpercent tenpercent force-pushed the mpodkory/generate-tuple-optimizations branch from 5190578 to 887bdf2 Compare January 16, 2026 20:16
@tenpercent tenpercent marked this pull request as ready for review January 17, 2026 03:41
@tenpercent tenpercent force-pushed the mpodkory/generate-tuple-optimizations branch from 887bdf2 to 02e42dc Compare January 17, 2026 03:51
@tenpercent tenpercent force-pushed the mpodkory/recursive-to-pack-expansion branch from f5ada17 to 9942fd6 Compare January 17, 2026 03:51
@tenpercent tenpercent force-pushed the mpodkory/recursive-to-pack-expansion branch from 9d67d0d to c4d95f7 Compare January 21, 2026 23:43
@tenpercent tenpercent force-pushed the mpodkory/generate-tuple-optimizations branch from 82b6016 to 602c127 Compare January 21, 2026 23:56
@tenpercent tenpercent force-pushed the mpodkory/recursive-to-pack-expansion branch from c4d95f7 to 631df4f Compare January 21, 2026 23:57
@tenpercent tenpercent force-pushed the mpodkory/generate-tuple-optimizations branch from 602c127 to 1713ea7 Compare January 22, 2026 01:00
@tenpercent tenpercent changed the base branch from mpodkory/generate-tuple-optimizations to develop January 22, 2026 01:04
Replace O(N) recursive template sequence_map_inverse_impl with
constexpr function and pack expansion for O(1) template depth.

Results:
- sequence_map_inverse: 45 instances, 187ms → 7 instances, 10ms (95% reduction)
Use pack expansion with fold expression to compute element space size
instead of recursive template or recursive lambda.

Results:
- calculate_element_space_size: 24 instances, 35ms → 10 instances, 9ms
- Max template depth: 24 → 23
Use operator| with fold expression (Seqs{} | ...) to merge sequences
in O(1) template depth instead of O(log N) binary tree recursion.

- Reduces sequence_merge instantiations from 449 to 167 (63% reduction)
- Total template instantiations: 47,186 → 46,974 (-212)
- ADL finds operator| since Sequence is in ck namespace
@tenpercent tenpercent force-pushed the mpodkory/recursive-to-pack-expansion branch from 631df4f to ce7bc73 Compare January 22, 2026 01:10
tenpercent and others added 2 commits January 22, 2026 01:26
Added tests for sequence_merge:
- MergeEmptySequence: tests merging no sequences
- MergeFourSequences: tests merging 4 sequences with varying sizes
- MergeWithEmptySequences: tests merging with empty sequences included

Added tests for sequence_map_inverse:
- InverseReverseMap: tests inverting a reverse permutation
- InverseSmallMap: tests inverting a 2-element swap
- InverseLargerMap: tests inverting a 5-element permutation

Co-Authored-By: Claude <noreply@anthropic.com>
…roaches

Detailed comments explain:
- Why fold expressions achieve O(1) template depth for sequence_merge
- Comprehensive comparison of three approaches: recursive O(N), binary tree O(log N), fold expression O(1)
- Trade-offs: C++17 requirement, ADL complexity, debugging difficulty
- Clear recommendations for when to use each approach
- How operator| enables fold expression pattern via ADL

This documentation helps maintainers understand the performance vs complexity
trade-offs between different sequence merging strategies.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants