⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

Conversation

@Chibionos
Copy link
Contributor

Summary

Replaces hardcoded LLM model version with wildcard (*) in trace test expectations to prevent future test failures when models are updated.

Problem

The simple-local-mcp test expectations were recently updated to use gpt-4.1-mini-2025-04-14, but this approach is brittle - it will break again the next time:

  • A new model version is released
  • LLM Gateway defaults change
  • Test environments use different model configurations

Solution

Use wildcard matching for llm.model_name instead of exact version:

- "llm.model_name": "gpt-4.1-mini-2025-04-14"
+ "llm.model_name": "*"

Also removed exact content matching for the final response, as wording can vary slightly between models.

Benefits

Future-proof: Won't break on model updates
Environment-agnostic: Works regardless of which model is configured
Lower maintenance: No need to update test expectations when models change
Still validates: Provider (azure), system (openai), and span structure are still checked

Implementation

The trace assertion logic (trace_assert.py) already supports wildcards:

def matches_value(expected_value: Any, actual_value: Any) -> bool:
    if expected_value == "*":
        return True  # Accept any value

Testing

This change only affects test expectations, not runtime behavior. The wildcard will accept any model name while still validating:

  • Correct span structure and hierarchy
  • Required span attributes
  • Message roles and content
  • Tool invocations

🤖 Generated with Claude Code

Chibionos pushed a commit that referenced this pull request Jan 20, 2026
Extended the wildcard fix to all remaining test files with hardcoded
LLM model versions:
- company-research-agent: gpt-4.1-mini-2025-04-14 → "*"
- init-flow: gpt-4o-mini-2024-07-18 → "*"
- ticket-classification: gpt-4.1-mini-2025-04-14 → "*"

This ensures all trace validation tests are resilient to future LLM
Gateway model changes, preventing CI/CD failures when defaults update.

Related to: #440
@cristipufu cristipufu requested a review from ionmincu January 20, 2026 15:16
Chibi Vikram and others added 2 commits January 21, 2026 06:59
Replace hardcoded model version with wildcard to prevent test failures
when LLM Gateway defaults change. This improves long-term test stability.

Previous fix updated model to gpt-4.1-mini-2025-04-14, but this will break
again on next model update. Wildcard approach is resilient to future changes.

Changes:
- llm.model_name: "gpt-4.1-mini-2025-04-14" → "*"
- Removed exact content match (varies by model wording)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Extended the wildcard fix to all remaining test files with hardcoded
LLM model versions:
- company-research-agent: gpt-4.1-mini-2025-04-14 → "*"
- init-flow: gpt-4o-mini-2024-07-18 → "*"
- ticket-classification: gpt-4.1-mini-2025-04-14 → "*"

This ensures all trace validation tests are resilient to future LLM
Gateway model changes, preventing CI/CD failures when defaults update.

Related to: #440
@cristipufu cristipufu force-pushed the fix/mcp-test-model-wildcard branch from b76b0f4 to 548521e Compare January 21, 2026 04:59
@cristipufu cristipufu merged commit bc5978f into main Jan 21, 2026
39 checks passed
@cristipufu cristipufu deleted the fix/mcp-test-model-wildcard branch January 21, 2026 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants