Fix coordinate misalignment in expression merge#550
Fix coordinate misalignment in expression merge#550FBumann wants to merge 12 commits intoPyPSA:masterfrom
Conversation
…ing logic - Mark defensive code paths with pragma: no cover (unreachable in practice)
|
Im not entirely sure if this affects performance, but its at least worth raising i thought |
…ion-based approach. For each non-helper dimension, it computes the union of coordinate values across all datasets. If any mismatch is found, all datasets are reindexed to the union coordinates with FILL_VALUE defaults. This handles both the reordered-coords case and the overlapping-subset case. test/test_linear_expression.py: Added test_merge_with_overlapping_coords that creates variables with ["alice", "bob"] and ["bob", "charlie"], merges them, and verifies correct alignment — bob gets both terms, alice and charlie get only their respective term with fill values for the missing one.
…es) and then patching up mismatches with reindexing, _check_coords_match now checks that actual coordinate values and order are identical. When they're not, override is False and xarray's join='outer' handles alignment correctly. The entire reindex block is gone.
|
this is a very sensible API change! let's pull that in. do you see anything missing here? |
|
I would like to reconsider if there is a vlid use case for this, or if its truly only a bug. |
|
@FabianHofmann
So the behaviour change is for such cases: import pandas as pd
import numpy as np
import linopy
m = linopy.Model()
hours = pd.date_range("2025-01-01", periods=8760, freq="h", name="time")
gen = m.add_variables(coords=[hours], name="gen")
# Suppose demand comes from an external source, sorted differently
demand = pd.Series(
data=np.linspace(0, 100, 8760),
index=hours.sort_values(ascending=False), # reversed!
name="demand",
)
# Before the PR fix, this silently misaligns:
expr = gen - demand
print(expr)If the indexes were not both containing equal keys, nothing changes (shift, subset, etc were always aligned correctly) |
|
yes, I agree. did you test if this now also applies to |
Im not sure. Should i verify the call graph? |
|
That would be awesome, but if not also fine. I guess I have to sit down next week and figure out a consistent convention for all linopy operations anyway |
|
@FabianHofmann I'll see if i find the time. Unfortunately i will be quite busy next week, except friday. So if you need someone to brainstorm with and you can wait until friday tell me ;) |
Summary
Fixes silent data corruption when adding expressions whose coordinates have the same values but in different order (e.g.
['costs', 'Penalty']vs['Penalty', 'costs']). Previously,merge()used positional concatenation viajoin='override', ignoring labels entirely.What changed
When
merge()detects that two datasets share the same coordinate values in a different order, it now reindexes to the first dataset's order before concatenation. This only triggers when the coordinate value sets are identical — different subsets or different sizes are unaffected.What is NOT affected
v.loc[:9] + v.loc[10:]— different value sets, override preservedBug reproduction script
Benchmark script (
dev-scripts/benchmark_merge_alignment.py)Checklist
doc/release_notes.rstis included