⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

Fix coordinate misalignment in expression merge#550

Draft
FBumann wants to merge 12 commits intoPyPSA:masterfrom
FBumann:fix/expression-allignment
Draft

Fix coordinate misalignment in expression merge#550
FBumann wants to merge 12 commits intoPyPSA:masterfrom
FBumann:fix/expression-allignment

Conversation

@FBumann
Copy link
Contributor

@FBumann FBumann commented Jan 20, 2026

Summary

Fixes silent data corruption when adding expressions whose coordinates have the same values but in different order (e.g. ['costs', 'Penalty'] vs ['Penalty', 'costs']). Previously, merge() used positional concatenation via join='override', ignoring labels entirely.

What changed

When merge() detects that two datasets share the same coordinate values in a different order, it now reindexes to the first dataset's order before concatenation. This only triggers when the coordinate value sets are identical — different subsets or different sizes are unaffected.

What is NOT affected

  • Expressions with identical coordinates (same values, same order) — fast override path, unchanged
  • Expressions with different dimension sizes — outer join path, unchanged
  • Positional patterns like v.loc[:9] + v.loc[10:] — different value sets, override preserved
Bug reproduction script
import pandas as pd
import linopy

m = linopy.Model()
var = m.add_variables(coords=[pd.Index(['costs', 'Penalty'], name='effect')], name='var')
factors = pd.Series([2.0, 1.0], index=pd.Index(['Penalty', 'costs'], name='effect'))

result = var - 10 * factors

costs = float(result.const.sel(effect='costs'))
penalty = float(result.const.sel(effect='Penalty'))
print(f"costs: {costs} (expected -10), penalty: {penalty} (expected -20)")
assert costs == -10.0 and penalty == -20.0, "BUG: values misaligned!"
Benchmark script (dev-scripts/benchmark_merge_alignment.py)
"""Benchmark merge performance: compare override (same coords) vs outer join paths.

Run from repo root:
    python dev-scripts/benchmark_merge_alignment.py
"""

import statistics
import timeit

import pandas as pd

import linopy

ROUNDS = 10
CALLS_PER_ROUND = 10

m = linopy.Model()

time = pd.RangeIndex(8760, name="time")  # hourly year
space = pd.Index([f"node_{i}" for i in range(100)], name="space")
tech = pd.Index([f"tech_{i}" for i in range(20)], name="tech")

gen = m.add_variables(coords=[time, space, tech], name="gen")
cap = m.add_variables(coords=[space, tech], name="cap")

# Warm up
_ = 1 * gen


def bench(label: str, func, rounds: int = ROUNDS, number: int = CALLS_PER_ROUND):
    times = [
        timeit.timeit(func, number=number) / number * 1000 for _ in range(rounds)
    ]
    mean = statistics.mean(times)
    stdev = statistics.stdev(times)
    print(f"{label}: {mean:.1f} ± {stdev:.1f} ms  (n={rounds}x{number})")


# 1) Same-shape expressions (override path) — common in energy models
bench(
    "4-term sum (same shape, 8760x100x20)",
    lambda: 3.5 * gen + 1.2 * gen + 0.8 * gen + 2.0 * gen,
)

# 2) Broadcasting dimensions (gen has time, cap doesn't)
bench(
    "2-term sum (broadcasting, gen+cap)",
    lambda: 1 * gen + 1 * cap,
)

# 3) Many small additions
vars_list = [m.add_variables(coords=[time], name=f"v{i}") for i in range(50)]
bench(
    "50-term sum (1D, 8760)",
    lambda: sum(1 * v for v in vars_list),
    number=5,
)

Checklist

  • Code changes are sufficiently documented
  • Unit tests for new features were added
  • A note for the release notes doc/release_notes.rst is included
  • I consent to the release of this PR's code under the MIT license

@FBumann FBumann changed the title First attempt Fix coordinate misalignment in expression merge with join='override' Jan 20, 2026
…ing logic

   - Mark defensive code paths with pragma: no cover (unreachable in practice)
@FBumann
Copy link
Contributor Author

FBumann commented Jan 23, 2026

Im not entirely sure if this affects performance, but its at least worth raising i thought

…ion-based approach. For each non-helper dimension, it computes the union of coordinate values across all

  datasets. If any mismatch is found, all datasets are reindexed to the union coordinates with FILL_VALUE defaults. This handles both the reordered-coords case and the
  overlapping-subset case.

  test/test_linear_expression.py: Added test_merge_with_overlapping_coords that creates variables with ["alice", "bob"] and ["bob", "charlie"], merges them, and verifies
  correct alignment — bob gets both terms, alice and charlie get only their respective term with fill values for the missing one.
…es) and then patching up mismatches with reindexing, _check_coords_match now

  checks that actual coordinate values and order are identical. When they're not, override is False and xarray's join='outer' handles alignment correctly. The entire reindex
  block is gone.
@FBumann FBumann changed the title Fix coordinate misalignment in expression merge with join='override' Fix coordinate misalignment in expression merge Jan 30, 2026
@FBumann FBumann marked this pull request as ready for review January 30, 2026 11:10
@FBumann FBumann marked this pull request as draft January 30, 2026 12:10
@FabianHofmann
Copy link
Collaborator

this is a very sensible API change! let's pull that in. do you see anything missing here?

@FBumann
Copy link
Contributor Author

FBumann commented Feb 6, 2026

I would like to reconsider if there is a vlid use case for this, or if its truly only a bug.

@FBumann
Copy link
Contributor Author

FBumann commented Feb 6, 2026

@FabianHofmann
The behaviour after this PR is as follows:
If we add two expressions with the same dims, we have 2 cases:

  • Same labels, different order → Reindexed to match again (previously aligned by position)
  • Different label subsets (e.g. ['costs'] + ['Penalty']) → this is a genuinely different case. The PR doesn't touch this path; it still goes through the existing override/outer logic.

So the behaviour change is for such cases:

import pandas as pd
import numpy as np
import linopy

m = linopy.Model()
hours = pd.date_range("2025-01-01", periods=8760, freq="h", name="time")

gen = m.add_variables(coords=[hours], name="gen")

# Suppose demand comes from an external source, sorted differently
demand = pd.Series(
    data=np.linspace(0, 100, 8760),
    index=hours.sort_values(ascending=False),  # reversed!
    name="demand",
)

# Before the PR fix, this silently misaligns:
expr = gen - demand
print(expr)

If the indexes were not both containing equal keys, nothing changes (shift, subset, etc were always aligned correctly)
So for me this seems like a proper bugfix.

@FabianHofmann
Copy link
Collaborator

yes, I agree. did you test if this now also applies to lhs == rhs operations and alike?

@FBumann
Copy link
Contributor Author

FBumann commented Feb 6, 2026

yes, I agree. did you test if this now also applies to lhs == rhs operations and alike?

Im not sure. Should i verify the call graph?

@FabianHofmann
Copy link
Collaborator

That would be awesome, but if not also fine. I guess I have to sit down next week and figure out a consistent convention for all linopy operations anyway

@FBumann
Copy link
Contributor Author

FBumann commented Feb 6, 2026

@FabianHofmann I'll see if i find the time. Unfortunately i will be quite busy next week, except friday. So if you need someone to brainstorm with and you can wait until friday tell me ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants