RoPE kernel performance drops when using  zero-length sequences

**Describe the bug**

Customers are seeing performance regressions when using zero-length sequences to pad `cu_seqlens` to a fixed length tensor to keep CUDA Graphs happy when working with sequence packing (or THD format). Apparently, attention module itself can support zero-length sequences but in the profiles, rope kernel is taking huge chunks of time (much larger than attention itself) and drives down perf 10x. 

**Steps/Code to reproduce bug**
Customers reported this bug when using NeMo-RL and Megatron-LM libs. 
They create fixed-length `cu_seqlens` when using THD format (where `num_seqs` can vary from batch to batch. 
If there are too many zero-length sequences, perf drops down a lot. 

- [] Create a proper repro for this first

**Expected behavior**

A clear and concise description of what you expected to happen.

**Environment overview (please complete the following information)**

 - Environment location: [Bare-metal, Docker, Cloud(specify cloud provider - AWS, Azure, GCP, Collab)]
 - Method of Transformer Engine install: [pip install or from source]. Please specify exact commands you used to install.
 - If method of install is [Docker], provide `docker pull` & `docker run` commands used

**Environment details**

If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
- OS version
- PyTorch version
- Python version
- Transformer Engine version
- CUDA version
- CUDNN version

**Device details**
- GPU model

**Additional context**

Add any other context about the problem here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RoPE kernel performance drops when using zero-length sequences #2599

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RoPE kernel performance drops when using zero-length sequences #2599

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions