⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

Conversation

@aitap
Copy link
Member

@aitap aitap commented Jan 15, 2026

When y has no rows, the data pointers of the columns of unique(y[,...]) are poisoned. Instead of trying to dereference them, pretend that all comparisons fail and fill the return value with NAs.

When there are no matches on the non-range columns, the index in from[i] may become 0. So uncomment the (from[i]>0) ? from[i] : 1 checks instead of trusting from[i] and accessing VECTOR_ELT(lookup, -1). I am especially interested in someone double-checking this part because there are other places where from[i] is not checked for being > 0. Are they reachable with from[i] == 0?

Fixes: #7597

aitap added 4 commits January 15, 2026 14:56
If 'ux' contains 0 rows, pretend that all comparisons against its
non-existent elements fail.
This used to happen when from[i] was 0. (No match on non-range columns?)
@aitap aitap requested a review from MichaelChirico as a code owner January 15, 2026 12:28
@codecov
Copy link

codecov bot commented Jan 15, 2026

Codecov Report

❌ Patch coverage is 91.66667% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 99.01%. Comparing base (0216983) to head (eab8609).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
src/ijoin.c 91.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7598      +/-   ##
==========================================
+ Coverage   99.00%   99.01%   +0.01%     
==========================================
  Files          87       87              
  Lines       16893    16896       +3     
==========================================
+ Hits        16725    16730       +5     
+ Misses        168      166       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Jan 15, 2026

  • HEAD=fix-7597 slower P<0.001 for melt improved in #5054
    Comparison Plot

Generated via commit eab8609

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 3 minutes and 3 seconds
Installing different package versions 22 seconds
Running and plotting the test cases 4 minutes and 33 seconds

@aitap aitap added this to the 1.18.2 milestone Jan 15, 2026
@TysonStanley
Copy link
Member

Is this ready to merge and be included in 1.18.2 or move to next patch?

@MichaelChirico
Copy link
Member

I haven't found a chance to review -- @ben-schwen or @jangorecki

@aitap
Copy link
Member Author

aitap commented Jan 18, 2026

FWIW, insurancerating passes R CMD check with data.table from this branch, so merging it would help #7514 as well.

aitap added 2 commits January 18, 2026 22:37
Technically this one was harmless (and thus not caught by sanitizers)
because the preceding VECSEXP header always contained a 0, preventing
the branch where VECTOR_ELT() would be called with a negative index.
@aitap
Copy link
Member Author

aitap commented Jan 18, 2026

there are other places where from[i] is not checked for being > 0.

Most of these had separate checks for either from[i] > 0 or k > 0. One remaining case, type = "any", mult = "first", really did underflow type_count[j-1], but then did not underflow VECTOR_ELT(type_lookup, j-1) in my testing because the former ended up being 0. This isn't caught by sanitizers because type_count is preceded in memory by the SEXP header and thus technically isn't a buffer underflow.

Comment on lines 264 to 284
for (int i=0; i<rows; ++i) {
const int len=totlen;
int wlen=0, j=0, m=0;
const int k = (from[i]>0) ? from[i] : 1;
if (k == to[i]) {
wlen = count[k-1];
} else if (k < to[i]) {
tmp1 = VECTOR_ELT(lookup, k-1);
tmp2 = VECTOR_ELT(type_lookup, to[i]-1);
while (j<count[k-1] && m<type_count[to[i]-1]) {
if ( INTEGER(tmp1)[j] == INTEGER(tmp2)[m] ) {
++wlen; ++j; ++m;
} else if ( INTEGER(tmp1)[j] > INTEGER(tmp2)[m] ) {
++m;
} else ++j;
}
}
totlen += wlen;
if (len == totlen)
++totlen;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (int i=0; i<rows; ++i) {
const int len = totlen;
const int k = from[i];
if (k > 0) {
if (k == to[i]) {
totlen += count[k-1];
} else if (k < to[i]) {
int *s = INTEGER(VECTOR_ELT(lookup, k-1));
int *e = INTEGER(VECTOR_ELT(type_lookup, to[i]-1));
int scount = count[k-1], ecount = type_count[to[i]-1];
for (int j=0, m=0; j < scount && m < ecount; ) {
if (s[j] == e[m]) { ++totlen; ++j; ++m; }
else if (s[j] > e[m]) ++m;
}
}
if (len == totlen) ++totlen;
}
}

We are always so picky about moving INTEGER out of loops, but apparently here we did not care yet. Definitely not for this PR but should be followed-up!

Copy link
Member

@ben-schwen ben-schwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only line I'm worried about is the creation of the return array, when we skip now (see suggested test).

Besides that LGTM

@aitap feel free to merge once that is cleared

@ben-schwen
Copy link
Member

Is this ready to merge and be included in 1.18.2 or move to next patch?

If possible, I would favor this going into 1.18.2. Its a long standing bug of >10 years which only surfaced now when R > 4.5.0 introduced stricter out of bounds checks

aitap and others added 5 commits January 19, 2026 08:58
Co-authored-by: Benjamin Schwendinger <[email protected]>
Co-authored-by: Benjamin Schwendinger <[email protected]>
Co-authored-by: Benjamin Schwendinger <[email protected]>
The underflow is covered by already existing tests.
@aitap aitap merged commit 35dbf06 into master Jan 19, 2026
12 of 13 checks passed
@aitap aitap deleted the fix-7597 branch January 19, 2026 09:50
ben-schwen added a commit that referenced this pull request Jan 19, 2026
* Add tests

* overlaps: avoid accessing length-0 vectors in ux

If 'ux' contains 0 rows, pretend that all comparisons against its
non-existent elements fail.

* overlaps: avoid 'lookup' list overflow

This used to happen when from[i] was 0. (No match on non-range columns?)

* NEWS entry

* overlaps: uncomment one more underflow test

Technically this one was harmless (and thus not caught by sanitizers)
because the preceding VECSEXP header always contained a 0, preventing
the branch where VECTOR_ELT() would be called with a negative index.

* test formatting

* Update src/ijoin.c

Co-authored-by: Benjamin Schwendinger <[email protected]>

* Update src/ijoin.c

Co-authored-by: Benjamin Schwendinger <[email protected]>

* Update src/ijoin.c

Co-authored-by: Benjamin Schwendinger <[email protected]>

* Update inst/tests/tests.Rraw

* overlaps: uncomment the remaining underflow test

The underflow is covered by already existing tests.

---------

Co-authored-by: Benjamin Schwendinger <[email protected]>
@ben-schwen
Copy link
Member

@TysonStanley I cherry picked this into 1.18.2

I guess we are good to go to submit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

foverlaps segfault when y has 0 rows

4 participants