The Intersection of architecture and implementation [eng]
Talk presentation
A common saying by software architects is “that’s an implementation detail”. All too often we treat software architecture and implementation as two separate things, where implementation is something that happens once a software architecture is defined. In fact, it’s the other way around: software architecture should be viewed as a first draft, where implementation reveals more details and refinements. In this provocative keynote Mark Richards discusses the intersection of architecture and implementation, and how the two must be in constant alignment to achieve success. Through real-world examples, he shows how implementation can get out of alignment with the architecture, causing the system to fail to achieve its desired goals. He then shows some techniques and tools to help ensure alignment between architecture and implementation.
- Mark Richards is an experienced, hands-on software architect involved in the architecture, design, and implementation of microservices architectures, service-oriented architectures, and distributed systems in a variety of technologies.
- Mark is the founder of DeveloperToArchitect.com, a free website devoted to helping developers in the journey to becoming a software architect.
- He is the author of numerous technical books and videos, including the Fundamentals of Software Architecture, Software Architecture Fundamentals Video Series, and several books and videos on microservices as well as enterprise messaging.
- In addition to hands-on consulting, Mark is also a conference speaker and trainer, having spoken at hundreds of conferences and user groups around the world on a variety of enterprise-related technical topics.
Talk transcription
Wonderful. Thank you so much, and welcome everyone. Especially, thank you for having me at the FW Days Software Architecture Conference. Today, in this session, I want to talk about the intersection of architecture and design. I'm going to discuss the intersection of architecture and implementation and what that means and how important it is for the success of an architecture. Let me show you a scenario you're probably familiar with. We have a business sponsor who says, "We need an architecture that can scale to 500,000 users with an average response time of 600 milliseconds under full load." The architect says, "No problem, I'm on it," and creates an architecture. Now, the architect meets with the development team and says, "Here's the new architecture for the system" and goes about describing the architecture.
Of course, the development team does have some questions. They ask, "What protocol should we use between services?" Now, for the first quiz question: What do you suppose the response is from the software architect? Well, most of you have probably said two words, "It depends," which is usually the response. But there's another response that the software architect gives: "Oh, that's an implementation detail." However, the development team still has some further questions as the system gets more and more implemented. "By the way, what persistent framework should we probably use?" Again, that's also an implementation detail, says the architect. And time goes on. The development team asks, "What third-party libraries and services should we probably be using?" Of course, the architect says, again, "That's an implementation detail. Look, architecture is my job. Implementation is yours. Now, go and implement."
Well, of course, the development team starts making their own choices and does, in fact, implement the system. It gets released into production. And right away, this happens. And then a little bit later, this happens. And pretty soon, the entire system comes crashing down. A complete failure. Why does this common thing occur where architecture gets misaligned with implementation? The main reason has to do with isomorphism. Now, isomorphism is a Latin word that comes from "isos," meaning equal, and then "morph," meaning form or shape. This funny word, isomorphism, really asks, "How close does the shape of one thing match the shape of the other thing?" All too often, this is like putting a square peg in a round hole.
For many, many years, I've been talking a lot about architecture to domain isomorphism because when we go to choose a software architecture, how close does the shape of that architecture match the shape of the problem domain? Every problem domain, something we're trying to solve with software, has a particular shape. Every architecture style also has a particular shape. For example, with microservices, the shape of this architecture is single-purpose functions deployed as separate units of software, with each unit owning its own data. That's the shape of this architecture. Does it match the shape of the problem you're trying to solve?
So, in fact, with the traditional anterior layered architecture that's still so popular, the shape of this architecture is a single deployable unit with functionality grouped by technical categories. Well, in this session, I'd like to talk about architecture to implementation isomorphism and what this means. How close does the shape of the architecture match the shape of the implementation? Because, as you saw in the intro, too many times these two shapes get out of alignment, and nothing works. So in this session, what I want to show you are some case studies, some real-world examples that have happened, of where the architecture and the implementation can easily get out of alignment.
I want to show you what I mean. On the last part of this session, I want to show you why I'm demonstrating how to get out of alignment. I want to show you how to get out of alignment, and I want to show you how to get out of alignment. This frequently happens, and I'll share a few techniques for how to really keep architecture in alignment with its corresponding implementation. So let me show you how this can occur so easily, so that you see some examples of really what I mean when I say, "Does the architecture shape match the implementation shape?"
Here's our first case study: The business says, "We're expecting anywhere from several thousand to upwards of a million concurrent customers in the new system; that's our problem." The architect goes to our star ratings chart, which we published in our book "The Fundamentals of Software Architecture," where we rated certain architecture characteristics (ilities) or non-functional requirements. For scalability and elasticity, we took every architecture style and rated them. Five stars mean it's really good, and one star means it's not well supported. We know we need high levels of scalability and elasticity, going from several hundred customers to upwards of a million and then maybe back down to several hundred or thousand. That describes an elastic system.
Well, it turns out, from doing qualitative analysis, microservices seem like a good choice for this kind of problem. As a matter of fact, the architect chooses microservices, which was a good choice for the architecture because as we start increasing our load, these services can all scale due to their fine granularity. The startup time is slow, providing great elasticity, and we're only scaling those particular functions we need to scale. Thumbs up to the software developer; this is a good choice.
So, the developers start implementing this architecture. The first thing they notice is that as customers place orders, it comes into the order placement service, but that order placement service needs to know what the inventory is. Is the item on back order? How many do we have left in stock? So, the order placement has to synchronously ask inventory for the current inventory amounts, and then it sends that information back to order placement.
Well, the development team correctly notices this is too slow. There's a lot of latency between these two services, and these services are dependent on one another, hence they're tightly coupled. The development team identifies this as a big problem and considers how to solve it. They correctly notice that these services should be decoupled, and they choose to use in-memory replicated caching, where inventory has an in-memory cache of each item and its current inventory, and then the order placement has a read-only replica. They choose to use Hazelcast, another good choice, to keep these caches always in sync. As inventory gets updated, the inventory service updates its internal memory cache, and Hazelcast will, behind the scenes, keep those caches always in sync. This is much faster, providing nanosecond retrieval of inventory data, and order placement is no longer dependent on inventory.
This is fantastic. It's a great solution. So the development team starts implementing it, implementing that conclusion, and implementing this particular solution. All of this gets released into production, and then this happens. You see, all that inventory is 500 megabytes of data. And not considered a lot, but, together with order placement and inventory, it's one gig. What happens? Well, we start to scale, and we continue to scale, and we continue to get more and more and more customers. You can start to see what's about to happen; all of a sudden, we simply run out of memory and, consequently, the entire system comes crashing down with a 503. This is a great example where the implementation got out of alignment with the architecture. The choice of the architecture to support these high levels of scalability and elasticity was a good choice, and the architecture itself will support these kinds of characteristics even upwards to a million.
The problem is the implementation put the focus on performance and decoupling, and unfortunately, they got misaligned. We're no longer isomorphic, and we didn't achieve our goals. This is one example of where architecture can get out of alignment with the corresponding implementation.But I always tend to pick on microservices because that's mostly what I do. Let's look at another case study. Let's say the business says we have a very tight budget and a very tight time frame for this new venture, and we're expecting a significant amount of change to the underlying data structures as we learn more about this new line of business. When we make those database changes, we need those things to happen really fast.
The architect, once again, does qualitative analysis, comparing the quality of one thing to another, and realizes that cost and simplicity are really the main drivers; that's kind of the shape of our problem. But the other shape of that problem is the fact that we're going to experience a lot of change to a particular technical layer. Notice that domain partitioning gets one star, which means that this layer is not going to become the normal layer that's going to be kept throughout the entire project; this layer is just going to be consistent with the old layer. I'll show you what this looks like in an example. We're going to look at the layout of the new hybrid layer. Our client will just close this section, and we're going to need to do the same thing to the new layer.
But the way we're going to shift this layer to our current content is to change the default layout to the original layout of the new database. If we just use the default layout, we're going to end up with a different example of isolation; we can isolate specific technical categories, which gives us great change control. We know that we'll be continually and frequently changing the database. By leveraging the layered architecture with closed layers to get that layers of isolation, I can isolate that change only to the persistence layer and move quickly with those changes. This is great stuff. However, during implementation, the presentation developers for the UI say, "Now, you know what, this is way too slow. I'm unnecessarily going through all these layers. I get much better performance simply by calling the database directly."
So they start implementing the database. And then they start implementing the database. And then they start implementing the database. And then they start implementing the solution that way. The back-end developers and shared services say, "You know, this is really annoying, having all of our SQL logic separate from our business logic. It's so much easier to maintain having these together." So they start joining that code together and start implementing that way. Well, of course, this all goes into production and gets released. And you can guess what happens. A change, of course, is made to the database. The persistence layer, of course, has to change. But because the layers of isolation and closed layers were not respected in the implementation of that architecture, change is propagated throughout the entire system. And, of course, this turns into one big epic failure.
Where stakeholders or product owners, whoever is asking for that database change, are waiting for that forever. Another good example where the architecture did support these concerns of isolation and change control, but the implementation did not. It focused rather on performance and convenience, meaning that the architecture, which was sound and the correct choice, was misaligned with the implementation of that architecture. So how does this happen? And how can we prevent this from happening? Let me show you three reasons why this problem occurs and a few techniques for how to resolve this issue and keep architecture going.
The first reason that this occurs is a lack of communication. Let me show you what I mean and share some tips. You see, the software architect says, "Well, the most effective way to deal with that constant database change is to leverage closed layers in the architecture." And that architect is correct. I've got to let the development team know. This is about working. This is about communication. But how do I tell them about this? Turns out, you just heard this about 20 minutes ago in the Q&A. Architecture decision records. So I'm so happy that you had mentioned these in the prior talk. Architecture decision records, ADRs to the rescue. Let me show you a couple of aspects about this, though, that you may not have heard of. You may not realize.
So an architecture decision record, an ADR, is a short text file. Each decision you make as an architect has a corresponding record, a corresponding file. It has the title of what the decision is, a status, which we'll be looking at, by the way, in the next problem. A context, which is a description of the problem and what alternatives I thought about. The decision, which gives me the justification of what I'm trying to do. The reason why I'm making this. And the consequences, which allows me a placeholder to document my tradeoff analysis and the impacts of the decision I made. Let me tell you a little secret. You see, if developers don't know why you made a decision as an architect, they're less likely to agree with that decision. As a manufacturer. And as a company. Watershsh, mathematically, had soccer, software architecture. Besides the fact that it was very important for the development of hardware architecture and determined that the architecture that was going to be designed was going to be made in the future. Now, as the was structured, theing of the third thing of the department... Government seeded. For better scalability or sacrifice performance for better security. And that was part of my justification, that trade-off analysis. An architecture decision record not only acts as a great communication vehicle to let development teams know why I made a decision, but also that the decision itself was made and can absolutely help in aligning the implementation of the architecture with the architecture itself.
Now, this problem, communication, leads to the second reason why this misalignment occurs. And that is collaboration. Collaboration is much different than communication. Communication is telling somebody something, what we just saw, and being able to justify why I made a decision or why we're doing something. Collaboration is actually working with someone on a solution. Let me show you what I mean. And I'm going to share some tips here as well. So we have the software architecture or the software architect over here and software development. Now, these are two separate roles. Even in this conference, we're talking about techniques for software architecture. A software architect determines, for example, what architecture characteristics are important. What architecture style should we choose that's aligned with the business problem? And also, what are those building blocks and components? The architect does all this work and hands these artifacts over to the software developer. Now, they take these artifacts and figure out, how am I going to implement this through class files or class design? Maybe it's a user interface design. And of course, the implementation itself.
This is exactly why implementation gets misaligned with architecture. And it's this line right here. You see, I drew a gray area between architecture and development. And this is the gray area between architecture and development. And this is the gray area between architecture and design. But it's not really a gray area at all, everybody. In fact, it turns out that it's actually a chasm, a big canyon separating these two camps. And while some architecture decisions do effectively get conveyed to software development teams, unfortunately, most of those end up right inside the chasm. Even worse, as the implementation happens of our architecture, those decisions to change things never get to the architect. They go right into that chasm.
In order to get the proper alignment of implementation and architecture, this requires us, first of all, to fill in this chasm and form a bidirectional relationship between software architects and developers, working together on the same virtual team, so that the decisions about an architecture are easily conveyed to development teams. And implementation changes and alignment are easily conveyed to the architect. And this also helps facilitate things like leadership, mentoring, coaching, these sort of things. Now, let me let you in on yet another secret. You see, if developers are not involved in the decision, they will be less likely to follow it. So we had communication, which I described the why. So developers might agree with it, as you can see with OK here. But later, since they're not involved in it, they find better ways of doing something.
Let me show you the problem and then describe a solution that we can use. So the software architect says this. We're going to use request-reply messaging between our services because, here's the reason, it's faster and more scalable. Actually, says the development team, we decided to use REST because it's just as fast, and it scales better than messaging, as a matter of fact. Go look it up on Google, chat GPT. But, but, says the architect, this is my decision. And messaging is more faster and scalable. In our environment, whatever, says the development team, we're moving forward with REST. What is the architect to do now? This kind of situation and conflict occurs all the time.
One way to solve this problem and help with collaboration, as a matter of fact, I'll show you two ways, is DDD. Now, most of you might think of DDD as domain-driven design, but it stands for demonstration defeats discussion. Let's listen to this conversation. I ran some benchmarks, says the software architect, between REST and messaging in our production environment. Take a look at this. So I ran response times against user load in production, and here's what I found.
So it turns out that this over here is our expected response. So this is our expected load. But unfortunately, this is our maximum response time for our service level agreement. And as you'll notice, messaging in our environment stays pretty far below that, whereas REST is pretty far above that max. So, based on this, says the architect, I'm thinking that we should probably use messaging to communicate between our services. What do you think? Now we see that hook. What do you think? I, as the architect, am showing, demonstrating a solution, but instead of telling the development team, I'm asking them. Well, says the development team, we were thinking of using REST, but you're right. Based on this data, messaging is a better choice.
You know, there's a second way that we can actually collaborate. And that is, again, architecture decision records to the rescue. You see, we have a status here that's formally proposed, accepted, or superseded. Well, one of the things we can leverage is a status called request for comment by a certain date. This allows the development team to review, be involved in that decision, to criticize it, to validate it, to put comments, about what they think is good or bad. And then by that date, I incorporate all of those comments, which is yet another form of collaboration if we're not together. And that then goes into proposed or accepted. Proposed goes to usually an architecture review board, which goes back to accepted.
You know, even a third way of this collaboration is something called architecture risk storming, where we can involve developers in a risk storming exercise and activity, to be able to assess and help identify risk. Now, I realize I've got about three minutes left of this session before our Q and A, but I will defer you to Simon Brown's book, software architecture for developers, in which he describes this whole process of risk storming. It's a great way of introducing that collaboration with developers.
And yet another technique for aligning our implementation with architecture. So to wrap up, one last reason for this misalignment is the lack of architectural governance, hopefully automated. So the software architect says, I hope everybody's complying with all that work that we did involving everybody, communicating effectively. And hopefully that we're scaling fine. And with all these coding changes, I hope that we're still able to meet our 1 million users. The solution to solve this is architecture fitness functions. These are an objective integrity assessment of some characteristic. A lot of tools that we use for observability can help us. We can use them to write our own metrics, which are in fact fitness functions, or we can choose to write these ourselves in a custom mode. By leveraging something like Kafka to stream certain observability metrics, which we capture, do our own kind of custom analysis, whatever that may be, and start tracking things over time. And when something goes awry, that's when we notify the architect.
Two good examples to kind of close here. Response time trend analysis for elasticity and scalability. This is something that just continues to run in production. No one's doing a thing. And we notice that while users increase, we keep within one to two standard deviations from the mean. It looks like we had a little hiccup here, but as we increase scalability, likely what happened was we started new instances. And now as those instances started, we settled down that duration. A good example of elasticity. But there's other governance we can do in terms of the structural aspect. Do you remember the problem we had where we had a closed layered architecture? We could leverage tools like arc unit, net arc test, arc unit nets, sonar graph, to be able to write, tests, fitness functions in an automated fashion so that I can govern and ensure the alignment with implementation and also architecture. And these are the kind of tests that we can run.
I kind of want to close because I am now out of time with with one last statement and quote. And this is from our book, The Fundamentals of Software Architecture. Neil Ford and I say this. Developers should never take components designed by architects as the last word. Rather, the initial design that architecture should be viewed as a first draft where whoops, that is, I set a timer where implementation reveals more details and refinement. And the last tip is to embrace continuous architectural change. The architecture must be in alignment with the implementation. And not always is the architecture right. This requires communication, collaboration, and strong automated governance. So thank you all so much. I'm happy if there's any questions to to entertain any that that are up and that's that's the talk about alignment.