-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Description
Describe the bug
response_match_score uses rouge-score's English-oriented tokenizer.
Japanese text without whitespace is treated as a single token, so scores are incorrect.
Because we have no built-in Japanese tokenization, we had to manually pre-tokenize the Japanese input and pass it as space-separated text.
This is a workaround but there is a bug..
To Reproduce
Minimal example (using pre-tokenized Japanese as a workaround):
from google.adk.evaluation.final_response_match_v1 import _calculate_rouge_1_scores
candidate = "これ は テスト 候補 の 応答"
reference = "これ は テスト の 正解"
score = _calculate_rouge_1_scores(candidate, reference)
print(score)Score(precision=0.0, recall=0.0, fmeasure=0.0)
Even in this case, the score depends on manual tokenization and does not reflect native Japanese text behavior.
Expected behavior
Japanese should be supported without requiring users to manually insert whitespace.
Ideally response_match_score should accept a tokenizer option or provide language-aware tokenization for ROUGE.
Score(precision=0.6666666666666666, recall=0.8, fmeasure=0.7272727272727272)
Desktop (please complete the following information):
- OS: macOS
- Python version(python -V): Python 3.13.8
- ADK version(pip show google-adk): v1.22.0
Model Information:
- Are you using LiteLLM: No
- Which model is being used(e.g. gemini-2.5-pro): N/A (offline metric)
Additional context
- The limitation comes from rouge-score default tokenization.
- We had to pass pre-tokenized Japanese to get any meaningful ROUGE-1 result, whichis not ideal.