Working with structured data in Java
Parse, compare and manipulate JSON-based data
Java is great to implement business logic, but it has some weak points when it comes to working with datasets. These tasks are usually associated with having to deal with lots of boilerplate code related to POJOs, and updating each property programmatically. In this post we'll look how Jackson, Apache Commons Lang, JSON-P and Guava comes to the rescue.
All code examples can be found in this GitHub repo.
Working without a schema
Suppose that there's a data source providing records in JSON format:
{
"name": "John",
"age": 25,
"address": {
"city": "London",
"country" : "United Kingdom"
}
}
Jackson is an actively developed JSON parser and serializer library. It has a data binding package that supports JSON among other formats.
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.9.9</version>
</dependency>
If the schema of the data is not important, or there’s a high variation of available fields then mapping
each of them into a Map<String, Object>
might be the simplest way to get started.
Map<String, Object> record =
new ObjectMapper().readValue(new File("record.json"), Map.class);
Update
Jackson can update the values of an object, based on the values in another object or a Map.
For example, we might load another record from a different source or create one programmatically and use its values to modify an existing record:
Map<String, Object> original =
new ObjectMapper().readValue(new File("record.json"), Map.class);
// {
// "name": "John",
// "age": 25,
// "address": {
// "city": "London",
// "country": "United Kingdom"
// }
// }
Map<String, Object> overrides =
Map.of("name", "Robert");
Map<String, Object> updated =
new ObjectMapper().updateValue(original, updates);
// {
// "name": "Robert",
// "age": 25,
// "address": {
// "city": "London",
// "country": "United Kingdom"
// }
// }
The code snippet above updates the original
Map with the values of the updates
Map. If a value is key is not
present in the Map, it will be not modified. Updating entries in a Map is not a big deal, but the updateValue
method can also use Java classes to supply the override values.
Note: the signature of this method a bit misleading, as it returns the updated object. However, this is simply a reference to the original object, which is also modified.
Calculate diff of Maps
There are many solutions to calculate the difference between two Maps. In this post, I use the Guava library to do that:
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>28.1-jre</version>
</dependency>
The following snippet calculates the difference between two Maps:
MapDifference<String, Object> diff =
Maps.difference(original, etalon);
Map<String, ValueDifference<Object>> entriesDiffering =
diff.entriesDiffering();
// diff = {name=(Robert, John)}
Mapping structured data to POJOs
A problem with the previous approach is that it’s hard to work with the mapped data objects. As we use Maps, we are completely bypassing type and schema checking. Also, if we are to do anything with the values, most likely the code has to be polluted with casts and instanceof checks.
We can mitigate this problem by reading the value into a Java class:
Person record =
new ObjectMapper().readValue(new File("record.json"), Person.class);
Out of the box it works with Plain Old Java Objects or POJOs, meaning the class we are using has to have setters and getters for each field we want to work with, and also has the default constructor. This behavior can be fine-tuned with configuration to support immutable classes that only have public final fields.
An additional benefit of using Java classes is that Jackson will throw an error when it encounters an unknown property.
This is a great safeguard against typos and unsanitized data. If needed, this behavior can be turned off:
objectMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
Update and partial update
It’s a common use-case to override some properties of an object without affecting others. With Jackson, we can do just that:
Person original =
new ObjectMapper().readValue(new File("record.json"), Person.class);
Map<String, Object> overrides =
Map.of("name", "John");
Person updated =
new ObjectMapper().updateValue(original, updates);
As a result of this snipped, the name field of the original object will be updated to “John”, without changing its other values. Similar to the Map based example, a call to the updateValue will update the original object.
This technique can be also used to copy all non-null fields of an object into another object:
Person original =
new ObjectMapper().readValue(new File("record.json"), Person.class);
Person etalon = /* … */
Map<String, Object> overrides =
new ObjectMapper()
.convertValue(etalon, Map.class)
.entrySet().stream()
.filter(entry -> entry.getValue() != null)
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
Person updated = new ObjectMapper().updateValue(original, overrides);
Calculate diff of objects
A typical pain point with data classes is that you manually have to provide equals and hashCode methods in order to work with conventional equality checks. Moreover, this infrastructure is not enough to get a sense of what is different in case of inequality.
DiffBuilder from Apache Commons Lang 3 is designed to solve exactly this problem.
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.9</version>
</dependency>
The following snippet illustrates how to get the difference of two objects:
List<Diff<?>> diff =
new ReflectionDiffBuilder(original, etalon, new RecursiveToStringStyle())
.build()
.getDiffs();
The library also has the ToStringBuilder, which enables pretty-printing objects without an explicit
toString
method:
String dtoAsString =
ReflectionToStringBuilder.toString(original, new RecursiveToStringStyle());
As an alternative, Jackson might be used to transform the object to its JSON representation.
Flat versus hierarchical data
All techniques presented in this post support flat data structures, but working with hierarchical data can be more challenging.
For example, merging nested Maps with Jackson works as expected:
Person original =
new ObjectMapper().readValue(new File("record.json"), Person.class);
Map<String, Object> overrides =
Map.of("address", Map.of("city", "Birmingham"));
Person updated =
new ObjectMapper().updateValue(original, updates);
// updated = {
// "name": "John Doe",
// "age": 25,
// "address": {
// "city": "Birmingham",
// "country" : "United Kingdom"
// }
//}
But for POJOs, merging only handles shallow data structures by default. To deep merge POJOs the object mapper has to be configured with one of the following:
- annotate a specific field with @JsonMerge
- enable deep merging of a specific type with
objectMapper.configOverride(MyNestedClass.class).setMergeable(true);
- enable deep merging by default with
objectMapper.setDefaultMergeable(true);
Without this, the whole top-level property will be overwritten, and the unspecified properties will be replaced by null:
ObjectMapper objectMapper = new ObjectMapper();
objectMapper.setDefaultMergeable(true);
Person original =
new ObjectMapper().readValue(new File("record.json"), Person.class);
Map<String, Object> overrides =
Map.of("address", Map.of("city", "Birmingham"));
Person updated =
new ObjectMapper().updateValue(original, updates);
// updated = {
// "name": "John Doe",
// "age": 25,
// "address": {
// "city": "Birmingham",
// "country" : null // unrelated nested field got nulled
// }
//}
On top of that, diffing with Maps.difference
and DiffBuilder
does not detect changes in the nested
structures. If one property is different for a nested Map or POJO, the whole field will be marked as different.
For example:
// diff = {
// name=(Robert, John),
// age=(25, 23),
// address=({city=London, country=United Kingdom}, {city=Birmingham, country=United Kingdom})}
Notice, that in the previous example the country field is the same in both addresses, adding some noise to the report.
These diff results can be improved with some tricks:
- if the structured data only has a few, well defined nested properties, one might diff them separately and merge the diff results
- when diffing Maps, flatten the hierarchical structure
Alternatively, convert the data to JSON and use JSON-P, which does provide deep comparison.
To do that, add the following dependencies:
<dependency>
<groupId>javax.json</groupId>
<artifactId>javax.json-api</artifactId>
<version>1.1.2</version>
</dependency>
<dependency>
<groupId>org.glassfish</groupId>
<artifactId>javax.json</artifactId>
<version>1.1.2</version>
</dependency>
Then, use it as follows:
JsonStructure original = Json.createReader(...).read();
JsonStructure etalon = Json.createReader(...).read();
JsonPatch diff = Json.createDiff(original, etalon);
// diff = [
// {"op":"replace","path":"/name","value":"John"},
// {"op":"replace","path":"/age","value":23},
// {"op":"replace","path":"/address/city","value":"Birmingham"}]
Conclusion
Jackson, Guava and Apache Commons Lang libraries provide handy features to process and update ad-hoc hierarchical data.