⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

response_match_score (ROUGE-1) is not effectively in Japanese (with manual tokenization) #4122

@ftnext

Description

@ftnext

Describe the bug
response_match_score uses rouge-score's English-oriented tokenizer.
Japanese text without whitespace is treated as a single token, so scores are incorrect.
Because we have no built-in Japanese tokenization, we had to manually pre-tokenize the Japanese input and pass it as space-separated text.
This is a workaround but there is a bug..

To Reproduce
Minimal example (using pre-tokenized Japanese as a workaround):

from google.adk.evaluation.final_response_match_v1 import _calculate_rouge_1_scores

candidate = "これ は テスト 候補 の 応答"
reference = "これ は テスト の 正解"
score = _calculate_rouge_1_scores(candidate, reference)
print(score)
Score(precision=0.0, recall=0.0, fmeasure=0.0)

Even in this case, the score depends on manual tokenization and does not reflect native Japanese text behavior.

Expected behavior
Japanese should be supported without requiring users to manually insert whitespace.
Ideally response_match_score should accept a tokenizer option or provide language-aware tokenization for ROUGE.

Score(precision=0.6666666666666666, recall=0.8, fmeasure=0.7272727272727272)

Desktop (please complete the following information):

  • OS: macOS
  • Python version(python -V): Python 3.13.8
  • ADK version(pip show google-adk): v1.22.0

Model Information:

  • Are you using LiteLLM: No
  • Which model is being used(e.g. gemini-2.5-pro): N/A (offline metric)

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    eval[Component] This issue is related to evaluation

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions