Successful software design is all about trade-offs. In the typical (if there is such a thing) distributed system, recognizing the importance of trade-offs within the design of your architecture is integral to the success of your system. Despite this reality, I see time and time again, developers choosing a particular solution based on an ill-placed belief in their solution as a “silver bullet”, or a solution that conquers all, despite the inevitable occurrence of changing requirements. Regardless of the reasons behind this phenomenon, I’d like to outline a few of the methods I use to ensure that I’m making good scalable decisions without losing sight of the trade-offs that accompany them. I’d also like to compile (pun intended) the issues at hand, by formulating a simple theorem that we can use to describe this oft occurring situation.
First though, let me be clear about what I’m trying to accomplish by coining a theorem to describe the above actuality. My thoughts on this are hardly original. In fact, I would say that these issues are practically standardized as issues that are expected to be successfully overcome by the average system architect. What I am proposing, however, is that we can take a number of descriptive core issues and sum them up into a single, easy to remember, theorem.
Now, let me explain why trade-offs should be taken into careful consideration when designing your distributed system. For every problem that a system is designed to solve, there is a limitation added to the system. By this I mean that if your system solves one issue, the code and/or data put in place to do so, may require us to write additional code and/or add additional data to the system in order to remove its unintended effects. However, my adding the extra code and/or data, we’ve effectively required ourselves to write further code or add further data to then remove the consequences of writing our previous code or adding our previous data; and on and on, until as you can see, we’ve hit an infinite loop of adding to remove.
To better explain this, take for example a system that is designed to solve a real-time messaging based problem. The architecture that has been decided on is an XMPP based architecture. As such, we’ve basically decided to abstract away the database in such a way as to nearly render it unavailable for useful reporting purposes. So, by choosing our XMPP based architecture, we’ve made it more difficult to report on the events occurring within our system, even though we’ve developed a system that very specifically solves the initial problem.
Now let’s assume that this system grows large and the inevitable happens; we need to monitor our system and report on its usage. We must now add code to handle monitoring every event and dump it into a database. This database does what we need, but now the inevitable happens again; we need to add a feature to the system that involves the addition of a separate component that runs as a web service and connects to the system via XMPP. So we write code for this feature, and we write code for the monitoring portion of the system so that it can successfully monitor our newly added web service. As you can see, the monitoring portion of our system is a portion of the system that must also scale, both to the system’s usage and the system’s addition of features. That is our limitation, we now need to code additionally for the monitoring system.
The above is a simplified example, but I hope it helps you understand the issues involved in developing a system – there are trade-offs involved in every aspect of the design and development process. Now this is not to say that every problem we solve in code will have an equal impact on the usage of our system for our purposes. Sometimes we’ll add a feature that produces a limitation that we indeed never come into contact with. That is ideal, and we attempt to write code that produces these kinds of limitations all the time. What I’m trying to do is discuss some of the broader design issues that aren’t of that ilk.
Breaking this concept down into its core issues is crucial to the complete understanding of necessarily recognizing the need for trade-offs within your system. The core issues are as follows:
- Horizontally scaling a system complicates the system’s continued maintenance as more components are necessarily added to the system, therefore increasing the overall system’s size.
- As a result of a system’s increase in size, monitoring becomes more difficult, and eventually becomes a sub-system in and of itself.
- As a result of a system’s increase in size, the addition of features becomes a more time intensive and difficult task. This is due to the necessity of having to take into account the instantaneous volume of additional data, as a consequence of rolling out a feature amongst a system’s existing components. This is especially the case for a system with a large number of concurrent clients or users.
- As a result of a system’s increase in size, refactoring or reorganizing a system’s components and/or its component’s data involves the removal and subsequent re-launch of a system’s affected components and/or its affected component’s data.
We’ll define the above used terms – Data, Horizontal Scalability, and Flexibility – as follows:
- Data – In the context of the above points, data means a system’s code, running application instances, and file data (flat files, database data, etc).
- Horizontal Scalability – To horizontally scale a system is to partition a system’s usage along its topographical dimensions, among individual system components.
- Flexibility – A system’s flexibility is a function of the ease by which a system’s maintenance, monitoring, feature additions, and reorganization is made.
Now that we’ve properly defined the terms above as they are used within the context of the subject at hand, we can define our theorem as follows:
I’ve coined the above theorem the I.H.S.D.F Theorem (pronounced “is-deaf”). It’s hardly creative but I think it’s certainly appropriate and easy to pronounce.
I’d like to hammer home that the decrease in overall system flexibility will likely be the result of multiple issues resulting from horizontally scaling your system, and that this theorem is meant to easily state that reality. This theorem also encompasses what is in fact the basis on which most other distributed system issues are based.
I hope that this brief brain-dump has helped to define an easy to remember theorem that you can use to keep the realities of developing distributed systems in mind. I’ve been using it for a while now and I’ve found that it’s helped to keep my head planted squarely on a few separate occasions.