`response_match_score` (ROUGE-1) is not effectively in Japanese (with manual tokenization)

**Describe the bug**
`response_match_score` uses `rouge-score`'s English-oriented tokenizer.
Japanese text without whitespace is treated as a single token, so scores are incorrect.
Because we have no built-in Japanese tokenization, we had to manually pre-tokenize the Japanese input and pass it as space-separated text.
This is a workaround but there is a bug..

**To Reproduce**
Minimal example (using pre-tokenized Japanese as a workaround):

```python
from google.adk.evaluation.final_response_match_v1 import _calculate_rouge_1_scores

candidate = "これ は テスト 候補 の 応答"
reference = "これ は テスト の 正解"
score = _calculate_rouge_1_scores(candidate, reference)
print(score)
```

```
Score(precision=0.0, recall=0.0, fmeasure=0.0)
```

Even in this case, the score depends on manual tokenization and does not reflect native Japanese text behavior.

**Expected behavior**
Japanese should be supported without requiring users to manually insert whitespace.
Ideally response_match_score should accept a tokenizer option or provide language-aware tokenization for ROUGE.

```
Score(precision=0.6666666666666666, recall=0.8, fmeasure=0.7272727272727272)
```

**Desktop (please complete the following information):**

- OS: macOS
- Python version(python -V): Python 3.13.8
- ADK version(pip show google-adk): v1.22.0

**Model Information:**

- Are you using LiteLLM: No
- Which model is being used(e.g. gemini-2.5-pro): N/A (offline metric)

**Additional context**

- The limitation comes from [rouge-score](https://pypi.org/project/rouge-score/) default tokenization.
    - https://github.com/google-research/google-research/blob/f6a8c2c254838f75b8429c5de456f2c1f2d8dc7c/rouge/tokenize.py#L51-L52
- We had to pass pre-tokenized Japanese to get any meaningful ROUGE-1 result, whichis not ideal.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`response_match_score` (ROUGE-1) is not effectively in Japanese (with manual tokenization) #4122

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

response_match_score (ROUGE-1) is not effectively in Japanese (with manual tokenization) #4122

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`response_match_score` (ROUGE-1) is not effectively in Japanese (with manual tokenization) #4122