Micro-Services Design Patterns
The core concepts and related design patterns
The core concepts and related design patterns
Context : Shared functionality and reusability
Offload shared or specialized service functionality to a gateway proxy.
This pattern can simplify application development by moving shared service functionality, such as the use of SSL certificates, from other parts of the application into the gateway.
Some features are commonly used across multiple services, and these features require configuration, management, and maintenance. A shared or specialized service that is distributed with every application deployment increases the administrative overhead and increases the likelihood of deployment error. Any updates to a shared feature must be deployed across all services that share that feature.
Solution
Offload some features into a gateway, particularly cross-cutting concerns such as certificate management, authentication, SSL termination, monitoring, protocol translation, or throttling.
The following diagram shows a gateway that terminates inbound SSL connections. It requests data on behalf of the original requestor from any HTTP server upstream of the gateway.
Benefits of this pattern include:
Simplify the development of services by removing the need to distribute and maintain supporting resources, such as web server certificates and configuration for secure websites. Simpler configuration results in easier management and scalability and makes service upgrades simpler.
Allow dedicated teams to implement features that require specialized expertise, such as security. This allows your core team to focus on the application functionality, leaving these specialized but cross-cutting concerns to the relevant experts.
Provide some consistency for request and response logging and monitoring. Even if a service is not correctly instrumented, the gateway can be configured to ensure a minimum level of monitoring and logging.
Protect applications and services by using a dedicated host instance that acts as a broker between clients and the application or service, validates and sanitizes requests, and passes requests and data between them. This can provide an additional layer of security, and limit the attack surface of the system.
Applications expose their functionality to clients by accepting and processing requests. In cloud-hosted scenarios, applications expose endpoints clients connect to, and typically include the code to handle the requests from clients. This code performs authentication and validation, some or all request processing, and is likely to accesses storage and other services on behalf of the client.
To minimize the risk of clients gaining access to sensitive information and services, decouple hosts or tasks that expose public endpoints from the code that processes requests and accesses storage. You can achieve this by using a façade or a dedicated task that interacts with clients and then hands off the request—perhaps through a decoupled interface—to the hosts or tasks that'll handle the request. The figure provides a high-level overview of this pattern.
The gatekeeper pattern can be used to simply protect storage, or it can be used as a more comprehensive façade to protect all of the functions of the application. The important factors are:
Controlled validation. The gatekeeper validates all requests, and rejects those that don't meet validation requirements.
Limited risk and exposure. The gatekeeper doesn't have access to the credentials or keys used by the trusted host to access storage and services. If the gatekeeper is compromised, the attacker doesn't get access to these credentials or keys.
Appropriate security. The gatekeeper runs in a limited privilege mode, while the rest of the application runs in the full trust mode required to access storage and services. If the gatekeeper is compromised, it can't directly access the application services or data.
This pattern is helpful for applications that:
handle sensitive information
expose services that require a high degree of protection from malicious attacks
perform mission-critical operations that can't be disrupted.
require request validation be performed separately from the main tasks, or to centralize this validation to simplify maintenance and administration
Implement functional checks within an application that external tools can access through exposed endpoints at regular intervals. This pattern can help to verify that applications and services are performing correctly.
It's a good practice, and often a business requirement, to monitor web applications and back-end services, to ensure they're available and performing correctly. However, it's sometimes more difficult to monitor services running in the cloud than it is to monitor on-premises services. For example, you don't have full control of the hosting environment, and the services typically depend on other services provided by platform vendors and others.
It can be against specific key metrics to check the solution service behavior over time.
Implement health monitoring by sending requests to an endpoint on your application. The application should perform the necessary checks and then return an indication of its status.
A health monitoring check typically combines two factors:
The checks (if any) that the application or service performs in response to the request to the health verification endpoint
The analysis of the results by the tool or framework that performs the health verification check
Typical checks that monitoring tools perform include:
Validating the response code. For example, an HTTP response of 200 (OK) indicates that the application responded without error. The monitoring system might also check for other response codes to give more comprehensive results.
Check integration provider healthy state: like SMS gateways, payment gateways, Governmental gateways, and business related gateways.
Checking the content of the response to detect errors, even when the status code is 200 (OK). By checking the content, you can detect errors that affect only a section of the returned web page or service response. For example, you might check the title of a page or look for a specific phrase that indicates that the app returned the correct page.
Access to database and its performance
Checking resources or services that are located outside the application. An example is a content delivery network that the application uses to deliver content from global caches.
Checking for the expiration of TLS certificates.
Measuring the response time
Validating the URL that a DNS lookup returns.
Sample Healthy checker Matrix
We have a core issue since We moved to database per each service, Who makes the change?!!!
Instead of storing just the current state of the data in a domain, use an append-only store to record the full series of actions taken on that data.
Use an append-only store to record the full series of events that describe actions taken on data in a domain, rather than storing just the current state, so that the store can be used to materialize the domain objects. This pattern can simplify tasks in complex domains by avoiding the requirement to synchronize the data model and the business domain; improve performance, scalability, and responsiveness; provide consistency for transactional data; and maintain full audit trails and history that may enable compensating actions.
The CRUD approach has some limitations:
CRUD systems perform update operations directly against a data store, which can slow down performance and responsiveness, and limit scalability, due to the processing overhead it requires.
In a collaborative domain with many concurrent users, data update conflicts are more likely because the update operations take place on a single item of data.
Unless there's an additional auditing mechanism that records the details of each operation in a separate log, history is lost.
The Event Sourcing pattern defines an approach to handling operations on data that's driven by a sequence of events, each of which is recorded in an append-only store. Application code sends a series of events that imperatively describe each action that has occurred on the data to the event store, where they're persisted. Each event represents a set of changes to the data (such as AddedItemToOrder).
Benefits
So, You can use the events, to materialize the views, without updating the data, and with the final process, you can update your data models.
Also improve the performance, and consistency, and enable replaying the events again.
A good solution to this problem is to use event sourcing. Event sourcing persists the state of a business entity such an Order or a Customer as a sequence of state-changing events. Whenever the state of a business entity changes, a new event is appended to the list of events. Since saving an event is a single operation, it is inherently atomic. The application reconstructs an entity’s current state by replaying the events.
Applications persist events in an event store, a database of events. The store has an API for adding and retrieving an entity’s events. The event store also behaves like a message broker. It provides an API that enables services to subscribe to events. When a service saves an event in the event store, it is delivered to all interested subscribers.
Finally:
Event Sourcing focuses on data persistence and providing a complete, auditable history of changes by storing events
Coordinate a set of actions across a distributed set of services and other remote resources, attempt to transparently handle faults if any of these actions fail, or undo the effects of the work performed if the system cannot recover from a fault.
This pattern can add resiliency to a distributed system by enabling it to recover and retry actions that fail due to transient exceptions, long-lasting faults, and process failures.
Sample of the Scheduler reviews:
Transactions to be compensated: since you make rules for the delay, and you customer can't wait, also if some transaction s failed, and you need to discover it.
Data governance: Some times, your data model violate the data governance rules, like zero and null values, invalid references, wrong data entry.
Periodic checks upon absence of EDA, mainly with third parties integration, so you should incrementally check the expected responses to complete scenarios, and update data models.
How Can We implement it:
Application schedule
DB Jobs
The opposite graph show the four pillar resiliency and stability patterns, that should be applied to improve the solution stability:
Retry Pattern: the existing pattern, how may times to try the same behavior, to prevent failure
Circuit Breaker: how to stop the unlimited trials for the same requests/jobs to prevent stuck and locks.
Scheduler Job: to compensate failed, and incomplete transactions due to the retry failure and circuit breaker
Leader election: at infrastructure level
Collaboration between all mentioned design patterns should take place, specially at the large scale solution.
Prioritize requests sent to services so that requests with a higher priority are received and processed more quickly than those of a lower priority. This pattern is useful in applications that offer different service level guarantees to individual types of client.
Applications can delegate specific tasks to other services, for example, to perform background processing or to integrate with other applications or services. In the cloud, a message queue is typically used to delegate tasks to background processing. In many cases, the order requests are received in by a service isn't important. In some cases, though, it's necessary to prioritize specific requests. These requests should be processed earlier than lower priority requests that were sent previously by the application.
A queue is usually a first-in, first-out (FIFO) structure, and consumers typically receive messages in the same order that they were posted to the queue. However, some message queues support priority messaging. The application posting a message can assign a priority and the messages in the queue are automatically reordered so that those with a higher priority will be received before those with a lower priority. The figure illustrates a queue with priority messaging.
As We see in the opposite image, the solution is to use the reorder pattern, to move according to the priority, so till having the higher priority, We should pay attention till end, Which is applicable is cases like the OTP authentication, which should be expired as example 2 minutes, or in money transfer, like 30 Seconds.
How to implement it?
In systems that don't support priority-based message queues, an alternative solution is to maintain a separate queue for each priority. The application is responsible for posting messages to the appropriate queue. Each queue can have a separate pool of consumers. Higher priority queues can have a larger pool of consumers that run on faster hardware than lower priority queues.
You can do the following;
Create application jobs (Platinum jobs) with lower intervals and large data sets for executing high priority tasks, as example (each 10 seconds, take 1000 records)
Create application jobs (Gold jobs) with lower intervals and large data sets for executing high priority tasks, as example (each 30 seconds, take 30 records)
Create application jobs (Silver jobs) with lower intervals and large data sets for executing high priority tasks, as example (each 50 seconds, take 20 records)
Another approach with Kafka
Create multiple partitions for Kafka topic according to topic importance
Partitions enable parallelism, which enable multiple agents for increasing the performance
Using a priority-queuing mechanism can provide the following advantages:
It allows applications to meet business requirements that require the prioritization of availability or performance, such as offering different levels of service to different groups of customers.
It can help to minimize operational costs. If you use the single-queue approach, you can scale back the number of consumers if you need to. High priority messages are still processed first (although possibly more slowly), and lower priority messages might be delayed for longer. If you implement the multiple message queue approach with separate pools of consumers for each queue, you can reduce the pool of consumers for lower priority queues. You can even suspend processing for some very low priority queues by stopping all the consumers that listen for messages on those queues.
The multiple message queue approach can help maximize application performance and scalability by partitioning messages based on processing requirements. For example, you can prioritize critical tasks so that they're handled by receivers that run immediately, and less important background tasks can be handled by receivers that are scheduled to run at times that are less busy.
This pattern is useful in scenarios where:
The system must handle multiple tasks that have different priorities, like OTP, and VIP customers
Different users or tenants should be served with different priorities.
Dr. Ghoniem Lawaty
Tech Evangelist @TechHuB Egypt