Blogward

Engineering Chronicles - Chapter 0: Making design decisions with Microservices

We are starting our series on Engineering topics by providing insights into the high-level design choices we face daily while building our supply chain software. In the future episodes, we will dig into the problems and use-cases that we face in Logward.


Architecture

Just as any physical structure would, software systems require architecture. Logward is like many modern software systems where we use a microservice architecture. In essence, this means that we break our system down into smaller independent and interchangeable modules/services. These modules/services are typically made up independent code, stored separately from other services, which communicate through APIs.


This structure is as opposed to monolithic architecture, which is typically a single cohesive code base stored in the same place, with interconnected and interdependent components.

For us, the decision was easy due to the clear benefits of a microservices approach. These include improved flexibility/agility, increase speed of development, less risk, and scalability. Unlike most monolithic systems, Logward’s systems can function even if some services are non-functional, which not only reduce the risk of system failure, but improves the ability to easily add and remove features as they are in development or during updates. It also enables us to focus on core components while relying on 3rd party apps for certain features. For example, we rely on another company for AIS data from satellites, connecting to them through an API. In the future, if we decided to launch our own satellite fleet (doubtful!), we could simply remove the API connection without affecting the rest of our software. The same is true if we wanted to change providers for a certain service, such as accounting software.

Lastly, from a management perspective it means you can easily create smaller, independent working groups for development, as it reduces the complexity of the decision making and its impacts, thereby removing the need for each engineer/team to know the whole system through and through. 

We could keep going, but let’s move on. In the meantime, a good technical read on the topic can be found at microservices.io. (Ref.)


When we started designing Logward’s architecture, we identified all the initial components we would need to build and realised there would be much more to consider than choosing microservice architecture. The following are the main concepts we seek to address when making decisions.

  1. Synchronous v/s Asynchronous

  2. Complex Event Processing

  3. Loose coupling v/s Tight coupling

  4. Stateful v/s Stateless

1. Synchronous (sync) v/s Asynchronous (async)


Choosing between a synchronous versus asynchronous approach is one of the most fundamental design decisions in computer architecture especially for microservices, which requires communication between systems.


Rather than give you a definition, we'll give you an example. Lets say an OMS (Order management system) needs to communicate a transport request to a TMS (Transport management system).

  • Sync / Blocking

In the case of a sync approach, the OMS would call the TMS with shipment details and wait till the shipment gets created successfully before performing further actions.


This helps us in transaction rollbacks, or handling errors. If shipment creation fails, further actions will not be executed, therefore avoiding having to do even more error correction. This adds direct dependency on uptime as well as response times.

Fig: Explaining the synchronous calls

  • Async / Non blocking

When designed using the async approach, the OMS spawns a thread for shipment creation in the TMS and proceeds to further steps, potentially simultaneously. This is also known as non-blocking, and removes the dependency on uptime and response times. However the potential problem with this approach is the missing rollback incase of failure on shipment validation.

Fig: Explaining async calls


This is why it is common practice in the industry to keep the validation synchronous and creation asynchronous.

Fig: Explaining industry usecase - a hybrid approach

2. Complex event processing

One of the key challenges of any software program, including Logward, is handling complex event processing. In other words, ensuring that the cascading effects of events from one microservice are acknowledged by all microservices.

A  real-life (software is real too…) example would be when a sales team (a stand-in for a microservice) closes a deal for which production capacity has to go up. Planning begins, but in the process, someone forgets to inform the procurement team (another microservice) and the whole expansion plan is at risk.

Fig: Explaining complex event processing

Similarly, in any software system it is critical that each microservice keeps the others updated on relevant information. Yet, how to coordinate this?


Coordination design solutions usually center around either an orchestration or choreography approach, although in some cases a mix of both is possible.

a. Choreography (aka Mesh architecture)

Choreography refers to a setup in which all the services can talk to each other, and for any given process an owner service is defined which triggers the event. Each service needs to be aware of the other service’s response type and domain. Adapters need to be written in the respective language in order for the APIs to function for both incoming and outgoing messages.


This means in most cases you end up writing client and domain libraries for each of the microservices, as well as ensure versioning and language compatibility.

Fig: Illustration of a system using Choreography

An example use case for a software system might be placing a booking. The user enters the necessary information into our user interface, which triggers a process.


If we’ve done our job right, the system identifies that the owner for this process is the Booking Service. In turn the booking service shares information with the schedules service, rates service, carrier service, workflow service, and so-on. In the case of a transaction failing, for example no rates found, the rates service will know to return a message to the scheduling service instructing it to send a message to the booking service, which results in the user being informed.

In our real-life example of a big sale, perhaps a capacity increase event is always owned by the Production team. Once the Sales department records the sale, the Production team receives the information and begins the process by communicating with the HR department, which in turn reaches out to Finance. Each department knows by which telephone/email to reach-out, and will provide feedback once information is available. Finance will tell HR what the budget can be, HR will hire accordingly, and tell Productions what the new staffing arrangement will be.

b. Orchestration Architecture

In orchestration architecture, there is a centralized composite service which triggers and connects the necessary services. Unlike in choreography, the individual services do not communicate between each other, but only with the orchestrator. The advantage here is that the microservices can continue performing their basic functions. Much like in the name, imagine a musical orchestra, rather than the violinists looking to the cellists for cues, they all watch the conductor giving instructions.

Fig: Illustration of a system using Orchestration


Returning to our booking example, after the booking process is started by a user in the customer interface the information is received by the Orchestrator, which then simultaneously contacts the various services, including the schedules, rates, carrier, and workflow services, gathering the necessary data. Once completed, the orchestrator then sends the information to the booking service.


The biggest advantage here is that the orchestrator need not understand the domain and each microservice can be composed of different technology, meaning you don’t have to keep exchanging libraries between microservices.


One of the key disadvantages to this approach is the difficulty of handling transaction rollback. Unlike the choreography setup, the rates service will not necessarily know whether or not the carriers service was executed properly. It increases coupling (which we will discuss below) and the orchestrator becomes the single point of failure. It also increases the response time since each of the calls to the microservices are sync/blocking.


For our real-life capacity expansion example, the orchestrator might be the Management team, who takes the lead on communicating with the necessary departments and sticking to goals and deadlines.


Below is an example where we get an enriched booking information from multiple microservices.

Fig: Usecase of orchestration to get booking details

c. A hybrid model

In this model, the service which owns the event triggers the orchestrator, which proceeds to handle communication with the other microservices. In Logward we use a mix of both, eg the booking service triggers an event which is consumed and orchestrated to the Allocation, Scheduler and Payment services.

Fig: Explaining hybrid model of orchestration and choreography

3. Loose-coupling versus Tight-coupling


Another important aspect of microservices architecture is determining how closely you want to couple. Loose coupling refers to mostly independent services that are not affected by and do not need to be informed about other services, whereas tight coupling means one service is affected by changes in others.


The tighter the coupling, the more dependency, and therefore the greater the impact of one issue/error affecting the rest of the application. It also requires more integration points, as a single service/module requires connections with many others.


Coupling is a very broad term and is subject to interpretation. In the example below, we take a product on the Cart of an e-commerce website, to go through entire flow until its underlying TMS.


Fig: Flow of information


When you want to generate an invoice, you contact multiple microservices to gather information. One is an orchestrated fashion, and the other is through choreography. In both the cases, the invoice generation is dependent on all the microservices to be up and running.

Fig: getInvoice usecase in tightly coupled systems example with orchestration

Fig: getInvoice usecase in tightly coupled systems example in mesh architecture


A loosely coupled system has microservices / sub-systems not dependent on each other for their functionality. A common example would be when you place an order on an e-commerce site and you get an email notification. Your order is processed irrespective of whether the email service works or not.

Fig: Loosely coupled services


The advantages of such a setup will be:

  • You can clearly segregate tier-1, tier-2, etc services. Eg: Your email services is not as important as your order management service.

  • You can integrate multiple clients to your services. Eg: You have a transport management system (TMS), order management system (OMS), inventory system, etc.

In the diagram below, you can see the information is passed on to a different microservice at the time of creation. This localizes the way the information is used in the system. If you have to generate invoice, you only call OMS and it fetches all the data from the DB.

This removes all kind of dependencies from other systems.

Fig: getInvoice usecase in loosely coupled system


Perhaps the biggest disadvantage is keeping the data in sync with the downstream systems. However, in most cases like the example here, the information is stateful which we'll explain below.

4. Stateful v/s Stateless systems


Finally, it’s important to note the distinction between stateful and stateless systems. Although it sounds like a term from political science, in software architecture it essentially means whether a service stores persistent data or not. 


In case of stateless systems, the value of the output depends only on the value of the input, at any point of time. In other words, the stateless system typically maintains data necessary to perform its function, but does not store historical data or maintain and update ongoing requests, such as an ecommerce cart you can return to.


A typical example of a stateless service would be a currency converter. Currency data flows in and is converted using the current exchange rates, and is returned/sent to another service. There are no states associated or stored, and oftentimes only the current exchange rates are referenced.

However, in case of stateful systems, the output is dependent on the input as well as an internal state of the system, since it has some transactional data stored within it. Typical examples of stateful systems might include a Session management service, Cart management service, Payment service,  or a Workflow service. Due to its function, the service is required to maintain and update the state as needed.

To make it even more tangible, let’s say you need to build an order/shipment management system, something many supply chain software systems including Logward would have.

When you capture an order from a user, you store this in the order management system. We decouple stateful and stateless behavior of the order into two separate systems.


a. Mostly Stateless: An order store service maintains order information for showing / updating order related data. It need not have the business state of the booking and hence we can call it mostly stateless.

Fig: Getting standard information from booking


b. Stateful: A workflow service manages the status of the order from order placement to order completion and therefore requires both order related data and business status.

Fig: Different states of a booking


Conclusion


While this is by no means a comprehensive discussion of what one must consider when building software such as Logward, it does give insight into some of the key components.


Perhaps just as important as the technical aspects is building a team that can learn and adapt over time. Because in the end software engineering it is an ongoing (never-ending) process, with some design flaws being noticed initially and others taking longer to recognize. By having fluid and open communication within the engineering team, we improve our chances of continuously improving our product.


Our next engineering blog post will tell you what happens behind the scene when you place a booking on Logward.


References:

  1. https://microservices.io/, https://martinfowler.com/articles/microservices.html