Control your Data Structures

Complex logic should be implemented on data structures you have full control on: an internal Domain Model that you can tailor to your problem to simplify your code.

Contents

Key Terms

Here is an (opinionated) list of term definitions used in this article:

Domain = The area in code where you want to keep the most complex part of the logic of your application. A goal of any architecture is to cleanup the environment in which the essential complexity of your system is implemented.
Data Transfer Object (DTO) = an object moving across system boundaries, for example marshalled as JSON, XML, or in a binary format.
API = The entry points to a system (REST endpoints or message queues) along with the DTOs involved. Many system expose a REST API and call APIs of other systems in turn.
Domain Object (eg Value Object, Entity, Aggregate) = internal data structures of your application, often persisted in some database but otherwise invisible from the outside of the system

Preview

The entire article is based on a single slide from my Design Patterns training. I am struggling hard, but with little success, to find time to extract out such ideas from my training and talks into blog entries, because I know many developers prefer to read than watch.

Context

The ideas in this article apply best to backend applications that implement (complex) business rules, state transitioning, and persist their data to some database. Complex mobile apps, as well as batch applications, are also in scope, given they implement a considerable amount of internal logic.

However, the article is of little use for designing single-page applications (eg React, Angular, Vue, vanilla…) since the amount of internal logic to implement is rather limited in their case, and they also lack internal persistent storage. The same happens to ‘aggregators’, systems that just enrich and combine data coming from other larger systems in order to consolidate a convenient view for their clients.

Reasons to Avoid DTOs

Imagine you start implementing complex application rules on top of DTOs, either of your API or belonging to a 3rd party API you call. For example, you implement elaborate validation and further data enhancement on DTOs, changing their state, or otherwise allowing them to enter deep through dozens/hundreds of complex lines of code. Many microservices I consulted and reviewed in my training sessions attempted to do this. But pretty soon you realize that your central logic becomes polluted, and you start hating DTOs, for the reasons listed below. The alternative that I invite you to contemplate is to create new Domain Objects to which you map the DTOs, such that you enter your core logic with objects under your control.

So, why do we hate DTOs?

DTOs are Bloated

Most DTOs exposed by 3rd party APIs are more general and include more fields than the subset of attributes that you need in your application. It’s much easier to understand a method call taking an object containing 3 fields than one taking an argument with 12 fields (plus 2 children lists).

⭐ Favor small cohesive data structures in your Domain.

DTOs are Flat

A key concern in designing an API is backward compatibility, that is, not breaking existing clients when the API changes tomorrow. A service provider might for example send tomorrow more attributes in its JSON response, and no client should complain about that. Contrast the impact of adding this field with changing the structure its response from this:

{
  "name": "John DOE"
}

To this:

{  
  "fullName": {
    "firstName" : "John",
    "lastName" : "DOE"
  }
}

The second structure is deeper – it has a nested structure inside. Because of that, many agree that it’s more expressive. But it’s a breaking change in your API that will impact your clients.

A (more) safe way to evolve the structure is to add additional fields to the same flat structure, like this:

{
  "name": "John DOE", // will be faded out
  "firstName": "John",
  "lastName": "DOE"
}

In short, many times DTOs are flat, which can easily degenerate to structures with dozens of attributes. On the other side, we prefer to use smaller structures in our Domain, which we get only if we model more aggressively the concepts we discover as nested objects. In other words, extracting a FullName {first,last} or Address{} is a great idea in our Domain.

⭐ Aim for deep expressive models in your Domain.

DTOs belong to a Different Bounded Context

Domain-Driven Design stresses hard that whenever words have different meanings for different parties, those parties might belong in different Bounded Contexts. To give a visual example that is hard to forget, imagine going to the restaurant with your loved one and ordering a chicken. And what they bring you (straight from the ChickeFarmAPI) is a ChickedDto which doesn’t quite match your expectations. It’s still “chicken” but from another perspective:

DIfferent meanings of the word “chicken” – in two Bounded Contexts

In code, the service you call might refer to uId as the user login string while you prefer the term username.

⭐ Objects in your Domain should express the reality in your language.

DTOs are Anemic

I see today mainly 3 ways to create DTOs:

Manually write
Import from a ‘client-library’
Generate from an OpenAPI/swagger/WSDL spec

Option (3) makes it very hard to add custom code to the DTOs, while (2) makes it frankly impossible. But if you allow these objects to enter the core logic, you have to push behavior out to other places leading to the dreaded Util or Helper classes, or to accumulate tons of procedural code in your Services. Whenever the data structures are anemic, the logic piles up all the details that could otherwise be encapsulated in small handy methods, eg. Product.canBeActivated():boolean. It’s out of scope for this article to explain more the benefits of OOP in large projects.

⭐ Push bits of reusable domain logic inside your Domain Objects.

But even if you manually define the DTOs (1), you should NOT implement any logic on them, remember? (because of the other reasons in this list)

DTOs are Mutable

Well, that’s not true for all projects. I recently met an entire group that ‘never saw a setter on a DTO for a long time’. Indeed, it turns out that you can tell some frameworks (like Jackson in Java) to map to immutable structures via some annotations you add to your constructor or using a bit of magic.

However many (older) systems still have both getters and setters on their DTOs, especially if DTOs come bundled in a ‘client-library’ or if they are generated.

⭐ Design as many Domain Objects as immutable structures.

There are a number of tricks here to keep in mind when using an ORM, but these are out of scope for this article. If curious, check this other talk of mine explaining how to model ‘leaf’ Value Objects as persistent embedded immutable objects.

DTOs are unconstrained

Data Transfer Objects, they transfer data. Amazing, huh? 😋 Their responsibility is NOT to validate data or be consistent.

However, in the API we expose, many Java teams choose to add validation annotations (eg @NotNull, @Size, @Pattern, …) to our ‘request’ DTOs. After that, we just ask the framework to check everything. Which is cool. 😎

DTOs of 3rd party APIs however, never carry constraints.

So any kind of DTO (our or their) can present corrupted data when used inside Domain.

⭐ In your Domain you only want to work with valid data structures.

But “valid” here means two things:

Simple global constraints, like ‘required field’
Enforcing consistency rules of my Domain, like ‘activated contract has an Activation Date set’

If you explore the second rule in the context of Domain-Driven Design you will discover Aggregates and Domain Events – but going that deep is just an option.

But here’s a more immediate benefit. If you choose to work with your own Domain Objects instead of DTOs, you can fight the null disease right in their constructors and setters. Once guaranteed it never gets null, a field can safely be returned to the Domain logic via its getter. If however, a field is ok to be left not set, you can design getters that return Optional (Java), or with “?” in Kotlin, TypeScript, and PHP.

Why?

To focus your core logic on solving your complex problems, shielded away from nulls, corrupted data, and data inconsistencies. The structures should support you to write cleaner code in the center of your system.

DTOs are out of your control

They might change for external reasons: you are forced to upgrade to V2. If you wrote thousands of lines of logic on those structures, good luck rewriting all your logic on the new structures. I’ve seen some dramatic cases here.

⭐ Build your core logic on stable structures.

Conclusion

If you are having any of the issues above, then you are probably involving DTOs in complex logic. If you do, you should consider introducing a Domain Object that you extract from the DTO so that you can further clean up your core logic – that place that gives you the most bugs and headaches on change requests.

Like any design idea, I beg you to debate the idea with your colleagues before applying it, to avoid overengineering.

Post Views: 10,597

2 Comments

Maciej says:
08.04.2024 at 21:42


I feel the DTOs are like being a good driver. Much like how you cannot trust other cars to always behave as expected and always trying to be as clear in your intentions as possible, never trust data structures published to you and always be clear and communicate with the data structures you share outside.
1. victorrentea says:
  10.04.2024 at 11:29
  
  
  Nice metaphor.
  Also: Contract vs implementation, one of the oldest design principles of all.