⚠ This page is served via a proxy. Original site: https://github.com
This service does not collect credentials or authentication data.
Skip to content

Conversation

@anshul98ks123
Copy link
Contributor

@anshul98ks123 anshul98ks123 commented Jan 20, 2026

Issue

FieldSpec.getStringValue() fails to serialize Scala collections to valid JSON

Description

When jackson-module-scala is on the classpath (via Kafka connectors), Jackson deserializes empty JSON objects {} as Scala collections instead of Java collections. The existing getStringValue() method only handles Java Map/List via instanceof checks, but Scala collections don't implement java.util.Map or java.util.List, causing them to fall through to toString() which produces invalid JSON.

For example, a schema with ComplexFieldSpec:

"complexFieldSpecs": [
    {
        "name": "dimensions",
        "dataType": "MAP",
        "defaultNullValue": {}
    }
]

When Jackson deserializes this with jackson-module-scala on the classpath:

  1. "defaultNullValue": {} → scala.collection.immutable.Map$EmptyMap$
  2. setDefaultNullValue(scalaMap) calls getStringValue(scalaMap)
  3. instanceof Map returns false (Scala Map ≠ java.util.Map)
  4. Falls through to scalaMap.toString() → "Map()"
  5. Later, setDataType(DataType.MAP) triggers getDefaultNullValue() which calls DataType.MAP.convert("Map()")
  6. JsonUtils.stringToObject("Map()", Map.class) fails because "Map()" is not valid JSON
Caused by: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot convert value: 'Map()' to type: MAP
 at [Source: REDACTED; line: 333, column: 33] (through reference chain: 
   TablePreviewApi["tableConfigs"]->TableConfigs["schema"]->Schema["complexFieldSpecs"]
   ->ArrayList[0]->ComplexFieldSpec["dataType"])

Bug

The bug is in FieldSpec.getStringValue():

public static String getStringValue(Object value) {
    if (value instanceof BigDecimal) {
      return ((BigDecimal) value).toPlainString();
    }
    if (value instanceof byte[]) {
      return BytesUtils.toHexString((byte[]) value);
    }
    return value.toString();  // ← BUG: Scala Map.toString() = "Map()"
}

Fix

This PR fixes that by detecting Scala collections via class name and serializing them to JSON

public static String getStringValue(Object value) {
    // ... BigDecimal, byte[] handling ...
    
    // Handle Java collections (Map, List)
    if (value instanceof Map || value instanceof List) {
      try {
        return JsonUtils.objectToString(value);
      } catch (JsonProcessingException e) {
        throw new RuntimeException(
            String.format("Failed to serialize %s to JSON: %s", value.getClass().getName(), e.getMessage()), e);
      }
    }
    // Handle Scala collections - they don't implement java.util.Map/List
    final String className = value.getClass().getName();
    if (className.startsWith("scala.collection")) {
      return serializeScalaCollection(value, className);
    }
    return value.toString();
}

Testing

  • UTs
  • Lint Check
  • Ran STP locally and validated Preview API with following block
{
    "schema": {
        ...
        ...
        "complexFieldSpecs": [
            {
                "fieldType": "COMPLEX",
                "childFieldSpecs": {
                    ...
                    ...
                },
                "singleValueField": true,
                "notNull": false,
                "allowTrailingZeros": false,
                "defaultNullValueString": "{}",
                "name": "address",
                "defaultNullValue": {},
                "dataType": "MAP"
            }
        ]
    }
}

@anshul98ks123 anshul98ks123 force-pushed the preview-serialization-fix branch 2 times, most recently from a2d6be6 to be791e0 Compare January 20, 2026 06:55
@codecov-commenter
Copy link

codecov-commenter commented Jan 20, 2026

Codecov Report

❌ Patch coverage is 27.27273% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.20%. Comparing base (57f94f3) to head (b341ac6).
⚠️ Report is 183 commits behind head on master.

Files with missing lines Patch % Lines
...main/java/org/apache/pinot/spi/data/FieldSpec.java 27.27% 15 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17534      +/-   ##
============================================
- Coverage     63.27%   63.20%   -0.07%     
- Complexity     1433     1476      +43     
============================================
  Files          3133     3170      +37     
  Lines        186078   189530    +3452     
  Branches      28408    29002     +594     
============================================
+ Hits         117733   119797    +2064     
- Misses        59279    60423    +1144     
- Partials       9066     9310     +244     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.15% <27.27%> (-0.05%) ⬇️
java-21 63.18% <27.27%> (-0.04%) ⬇️
temurin 63.20% <27.27%> (-0.07%) ⬇️
unittests 63.20% <27.27%> (-0.07%) ⬇️
unittests1 55.53% <27.27%> (-0.14%) ⬇️
unittests2 34.03% <9.09%> (+0.15%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@anshul98ks123 anshul98ks123 force-pushed the preview-serialization-fix branch 2 times, most recently from cc63f58 to fe7913b Compare January 20, 2026 09:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a serialization bug in FieldSpec.getStringValue() where Scala collections (when present via jackson-module-scala) are improperly serialized using toString() instead of JSON, causing downstream parsing failures. The fix adds explicit JSON serialization for Java Map/List types and special handling for Scala collections that don't implement java.util.Map/List.

Changes:

  • Modified FieldSpec.getStringValue() to serialize Java collections to JSON using JsonUtils.objectToString()
  • Added Scala collection detection via class name pattern matching (scala.collection.*)
  • Implemented serializeScalaCollection() to handle empty Scala collections by returning appropriate JSON ({} for Maps, [] for Lists/Seq/etc.)
  • Added comprehensive test coverage for default null value conversions, JSON serialization, and Scala collection handling

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java Core fix: Added JSON serialization for Map/List types and Scala collection handling with helper methods
pinot-spi/src/test/java/org/apache/pinot/spi/data/FieldSpecTest.java Comprehensive test suite covering default null values, JSON serialization round-trips, error cases, and Scala collection behavior

Comment on lines +376 to +378
// Check if empty using reflection (all Scala collections have isEmpty method)
try {
final boolean isEmpty = (boolean) scalaCollection.getClass().getMethod("isEmpty").invoke(scalaCollection);
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reflection is used on every Scala collection serialization. Consider caching the isEmpty Method object in a static map keyed by class name to avoid repeated reflection lookups for the same collection types.

Copilot uses AI. Check for mistakes.
Comment on lines +393 to +398
if (className.contains("Map")) {
return "{}";
}
// Nil$ is Scala's empty list singleton
if (className.contains("List") || className.contains("Seq") || className.contains("Set")
|| className.contains("Vector") || className.contains("Array") || className.contains("Buffer")) {
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using contains() for class name matching can produce false positives. For example, a class named MyArrayList or CustomListProcessor would incorrectly match. Consider using more precise pattern matching such as checking for className.matches('.*\\.(List|Seq|Set|Vector|Array|Buffer)(\\$|$)') to match only when these terms appear as actual class name components.

Suggested change
if (className.contains("Map")) {
return "{}";
}
// Nil$ is Scala's empty list singleton
if (className.contains("List") || className.contains("Seq") || className.contains("Set")
|| className.contains("Vector") || className.contains("Array") || className.contains("Buffer")) {
if (className.matches(".*\\.(Map)(\\$|$)")) {
return "{}";
}
// Nil$ is Scala's empty list singleton
if (className.matches(".*\\.(List|Seq|Set|Vector|Array|Buffer)(\\$|$)")) {

Copilot uses AI. Check for mistakes.
* Returns the appropriate empty JSON representation for a Scala collection based on its class name.
*/
private static String getEmptyJsonForScalaCollection(final String className) {
if (className.contains("Map")) {
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using contains('Map') for class name matching can produce false positives. A class like MyMapper or EmailMapperService would incorrectly be identified as a Map type. Use more precise pattern matching such as className.matches('.*\\.Map(\\$|$)') to ensure 'Map' appears as an actual class name component.

Suggested change
if (className.contains("Map")) {
if (className.matches(".*\\.Map(\\$|$)")) {

Copilot uses AI. Check for mistakes.
Comment on lines +783 to +788
final Object scalaStyleMap = new HashMap<String, Object>() {
@Override
public String toString() {
return "Map()"; // This is what Scala Map.toString() returns
}
};
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test verifies behavior for Java HashMap with overridden toString(), but doesn't actually test the Scala collection detection logic since the class name is still java.util.HashMap$1 (anonymous class), not scala.collection.*. The Scala detection code path at line 365 (className.startsWith('scala.collection')) is never exercised. Consider adding a test with a mock class whose name actually starts with 'scala.collection' to cover this branch.

Copilot uses AI. Check for mistakes.
Comment on lines +831 to +832
Map<String, Object> circularMap = new HashMap<>();
circularMap.put("self", circularMap); // Circular reference
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test covers the JsonProcessingException catch block for Java collections, but the similar error path in serializeScalaCollection() (lines 377-384 where reflection fails) is not tested. Add a test that triggers the reflection exception branch to ensure complete coverage of error handling.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss more on the issue. I feel we should store serialized format in "defaultNullValue", instead of the complex structure

// Handle Java collections (Map, List)
if (value instanceof Map || value instanceof List) {
try {
return JsonUtils.objectToString(value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work if we invoke JsonUtils.objectToString() on scala object?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants