Working with structured data in Java

Parse, compare and manipulate JSON-based data

Java is great to implement business logic, but it has some weak points when it comes to working with datasets. These tasks are usually associated with having to deal with lots of boilerplate code related to POJOs, and updating each property programmatically. In this post we'll look how Jackson, Apache Commons Lang, JSON-P and Guava comes to the rescue.

All code examples can be found in this GitHub repo.

Working without a schema

Suppose that there's a data source providing records in JSON format:

{
    "name": "John",
    "age": 25,
    "address": {
        "city": "London",
        "country" : "United Kingdom"
    }
}

Jackson is an actively developed JSON parser and serializer library. It has a data binding package that supports JSON among other formats.

<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-databind</artifactId>
    <version>2.9.9</version>
</dependency>

If the schema of the data is not important, or there’s a high variation of available fields then mapping each of them into a Map<String, Object> might be the simplest way to get started.

Map<String, Object> record =
    new ObjectMapper().readValue(new File("record.json"), Map.class);

Update

Jackson can update the values of an object, based on the values in another object or a Map.

For example, we might load another record from a different source or create one programmatically and use its values to modify an existing record:

Map<String, Object> original =
    new ObjectMapper().readValue(new File("record.json"), Map.class);

// {
//    "name": "John",
//    "age": 25,
//    "address": {
//      "city": "London",
//      "country": "United Kingdom"
//    }
// }

Map<String, Object> overrides =
    Map.of("name", "Robert");
Map<String, Object> updated =
    new ObjectMapper().updateValue(original, updates);

// {
//    "name": "Robert",
//    "age": 25,
//    "address": {
//      "city": "London",
//      "country": "United Kingdom"
//    }
// }

The code snippet above updates the original Map with the values of the updates Map. If a value is key is not present in the Map, it will be not modified. Updating entries in a Map is not a big deal, but the updateValue method can also use Java classes to supply the override values.

Note: the signature of this method a bit misleading, as it returns the updated object. However, this is simply a reference to the original object, which is also modified.

Calculate diff of Maps

There are many solutions to calculate the difference between two Maps. In this post, I use the Guava library to do that:

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>28.1-jre</version>
</dependency>

The following snippet calculates the difference between two Maps:

MapDifference<String, Object> diff =
    Maps.difference(original, etalon);
Map<String, ValueDifference<Object>> entriesDiffering =
    diff.entriesDiffering();

// diff = {name=(Robert, John)}

Mapping structured data to POJOs

A problem with the previous approach is that it’s hard to work with the mapped data objects. As we use Maps, we are completely bypassing type and schema checking. Also, if we are to do anything with the values, most likely the code has to be polluted with casts and instanceof checks.

We can mitigate this problem by reading the value into a Java class:

Person record =
    new ObjectMapper().readValue(new File("record.json"), Person.class);

Out of the box it works with Plain Old Java Objects or POJOs, meaning the class we are using has to have setters and getters for each field we want to work with, and also has the default constructor. This behavior can be fine-tuned with configuration to support immutable classes that only have public final fields.

An additional benefit of using Java classes is that Jackson will throw an error when it encounters an unknown property.

This is a great safeguard against typos and unsanitized data. If needed, this behavior can be turned off:

objectMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);

Update and partial update

It’s a common use-case to override some properties of an object without affecting others. With Jackson, we can do just that:

Person original =
    new ObjectMapper().readValue(new File("record.json"), Person.class);
Map<String, Object> overrides =
    Map.of("name", "John");
Person updated =
    new ObjectMapper().updateValue(original, updates);

As a result of this snipped, the name field of the original object will be updated to “John”, without changing its other values. Similar to the Map based example, a call to the updateValue will update the original object.

This technique can be also used to copy all non-null fields of an object into another object:

Person original =
    new ObjectMapper().readValue(new File("record.json"), Person.class);
Person etalon = /* … */

Map<String, Object> overrides =
    new ObjectMapper()
        .convertValue(etalon, Map.class)
        .entrySet().stream()
            .filter(entry -> entry.getValue() != null)
            .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));

Person updated = new ObjectMapper().updateValue(original, overrides);

Calculate diff of objects

A typical pain point with data classes is that you manually have to provide equals and hashCode methods in order to work with conventional equality checks. Moreover, this infrastructure is not enough to get a sense of what is different in case of inequality.

DiffBuilder from Apache Commons Lang 3 is designed to solve exactly this problem.

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-lang3</artifactId>
    <version>3.9</version>
</dependency>

The following snippet illustrates how to get the difference of two objects:

List<Diff<?>> diff =
    new ReflectionDiffBuilder(original, etalon, new RecursiveToStringStyle())
        .build()
        .getDiffs();

The library also has the ToStringBuilder, which enables pretty-printing objects without an explicit toString method:

String dtoAsString =
    ReflectionToStringBuilder.toString(original, new RecursiveToStringStyle());

As an alternative, Jackson might be used to transform the object to its JSON representation.

Flat versus hierarchical data

All techniques presented in this post support flat data structures, but working with hierarchical data can be more challenging.

For example, merging nested Maps with Jackson works as expected:

Person original =
    new ObjectMapper().readValue(new File("record.json"), Person.class);
Map<String, Object> overrides =
    Map.of("address", Map.of("city", "Birmingham"));
Person updated =
    new ObjectMapper().updateValue(original, updates);

// updated = {
//    "name": "John Doe",
//    "age": 25,
//    "address": {
//        "city": "Birmingham",
//        "country" : "United Kingdom"
//    }
//}

But for POJOs, merging only handles shallow data structures by default. To deep merge POJOs the object mapper has to be configured with one of the following:

  • annotate a specific field with @JsonMerge
  • enable deep merging of a specific type with objectMapper.configOverride(MyNestedClass.class).setMergeable(true);
  • enable deep merging by default with objectMapper.setDefaultMergeable(true);

Without this, the whole top-level property will be overwritten, and the unspecified properties will be replaced by null:

ObjectMapper objectMapper = new ObjectMapper();
objectMapper.setDefaultMergeable(true);
Person original =
    new ObjectMapper().readValue(new File("record.json"), Person.class);
Map<String, Object> overrides =
    Map.of("address", Map.of("city", "Birmingham"));
Person updated =
    new ObjectMapper().updateValue(original, updates);

// updated = {
//    "name": "John Doe",
//    "age": 25,
//    "address": {
//        "city": "Birmingham",
//        "country" : null      // unrelated nested field got nulled
//    }
//}

On top of that, diffing with Maps.difference and DiffBuilder does not detect changes in the nested structures. If one property is different for a nested Map or POJO, the whole field will be marked as different.

For example:

// diff = {
	//   name=(Robert, John),
	//   age=(25, 23),
	//   address=({city=London, country=United Kingdom}, {city=Birmingham, country=United Kingdom})}

Notice, that in the previous example the country field is the same in both addresses, adding some noise to the report.

These diff results can be improved with some tricks:

  • if the structured data only has a few, well defined nested properties, one might diff them separately and merge the diff results
  • when diffing Maps, flatten the hierarchical structure

Alternatively, convert the data to JSON and use JSON-P, which does provide deep comparison.

To do that, add the following dependencies:

<dependency>
    <groupId>javax.json</groupId>
    <artifactId>javax.json-api</artifactId>
    <version>1.1.2</version>
</dependency>
<dependency>
    <groupId>org.glassfish</groupId>
    <artifactId>javax.json</artifactId>
    <version>1.1.2</version>
</dependency>

Then, use it as follows:

JsonStructure original = Json.createReader(...).read();
JsonStructure etalon = Json.createReader(...).read();
JsonPatch diff = Json.createDiff(original, etalon);

// diff = [
//   {"op":"replace","path":"/name","value":"John"},
//   {"op":"replace","path":"/age","value":23},
//   {"op":"replace","path":"/address/city","value":"Birmingham"}]

Conclusion

Jackson, Guava and Apache Commons Lang libraries provide handy features to process and update ad-hoc hierarchical data.

October 8, 2019
In this article