Building Asynchronous SOA for Modern Applications [eng]

Talk presentation

Let's learn about the challenges faced in building asynchronous services. We will dive deep into workflow orchestrators, exploring their role and importance in simplifying our asynchronous systems and solving these challenges. We will address the 'how' behind these orchestrators, elucidating how they effortlessly handle state management, resilience, and monitoring, right out of the box. We will also explore and evaluate different types of workflow orchestrators available today.

Sai Pragna Etikyala
Twilio
  • Sai Pragna Etikyala is a Technical Lead at Twilio, where she currently leads the team responsible for A2P 10DLC compliance for messaging. Leveraging her vast experience with asynchronous systems, she has efficiently streamlined Twilio's complex compliance pipelines using workflow orchestrators, leading to notable improvements in manageability and operational efficiency.
  • Before Twilio, Sai Pragna held key roles at Amazon Web Services, Yahoo, and Cerner. During her tenure at these companies, she developed robust end-to-end solutions and managed complex operations, enriching her expertise not only in asynchronous computing but also in software development, cloud computing, and healthcare IT solutions.
  • Linkedin

Talk transcription

Hello everyone, I'm Sai Pragna Etikaila, and I'm currently working as a technical lead at Twilio, leading A2P compliance for messaging. Now I'm thrilled to be here to talk to you about how to build asynchronous SOA, to construct asynchronous systems and how workflow orchestrators can make it easy. Let's start by defining what service-oriented architecture, or SOA, is. It is a very popular architectural pattern for creating software applications. It is designed with independent services that perform specific tasks, each being very self-contained. However, synchronous SOA takes it one step further. In this design, communication between services doesn't require an immediate response to the request, which means the client can send a request and move on to other tasks without waiting for a reply. This is particularly useful in systems where operations need to occur independently and in parallel, improving performance and scalability.

Now, let's apply this concept to a real-world example. Think of a food delivery application. Once an order is placed, the customer doesn't need to wait for the restaurant to prepare the food before going ahead and doing other tasks. You can browse for more items or order from different restaurants. At some point in the future, you will receive a notification that the food is ready or the delivery has happened at your doorstep. This ability to send a request and receive a response at an unsolicited specified later time mirrors an asynchronous communication in an SOA. It enables the user to do multiple tasks independently without being held up waiting for a response. If you look at this diagram for synchronous SOA and asynchronous SOA, synchronous SOA sends a request and waits for the response to happen immediately. On the other hand, asynchronous SOA sends a request, and you'll probably get back a 202 accepted response from the server, but the server will later send the response, and you can either poll or the server can send an event at a later time.

Now, let's talk about the challenges posed by building asynchronous systems. The first significant hurdle is state management. In an event-driven architecture, like our food delivery example, maintaining and managing the state of each process can be a very complex task. Look at what happens when you place an order: it triggers various stages such as placing, confirming, preparing, and delivering the order. Each of these stages is an independent event, some happening concurrently with others. Ensuring the state of each event is managed accurately across these dispersed processes can be very challenging.

Despite these challenges, it's vital to manage states accurately and provide a seamless experience to our customer. Imagine if your food delivery app shows the order is being prepared while, in reality, it's already out for delivery. That could lead to confusion and potentially a dissatisfied customer. The complexity of states can be very challenging, and the need for state management in asynchronous SOA is something that needs careful handling.

Now, let's look at a potential architecture of the same food delivery service and how we manage state in this event-driven system. Essentially, every time there is an event, it triggers a new operation. Instead of the system being a sequence of steps, it's basically your state machine. All your state machine knows is it got an event x when the database is in event y, so it's going to perform event z. It doesn't really understand the order of the sequence that happens in a food delivery system. Finally, let's talk about resiliency. For instance, in our food delivery example, there might be scenarios where the payment gets declined or the restaurant fails to receive a notification about the order. In a synchronous system, such an error might halt the entire process.

But in an asynchronous system, it's critical that we design the system to be resilient. Now, it should have mechanisms to handle these errors gracefully and recover effectively to keep the process running. For instance, the system might retry the failed operations or notify the user about the issue or even backtrack if necessary. Resiliency in an asynchronous system allows for more robust operations, even in the face of inevitable errors and exceptions. The next aspect is traceability or auditability, in other terms. In a food delivery system, such as Uber Eats or DoorDash, there are many moving parts involved, including the customer app, restaurant systems, delivery partner app, and back-end services. When it comes to tracing a specific process or finding the root cause of an issue, it can feel like searching for a needle in a haystack. Auditability is crucial for quickly identifying and resolving issues. For example, if a customer reports not receiving their food despite the app showing it as delivered, proper traceability is essential to pinpoint where the process went wrong, whether it was an issue with the restaurant preparing the food or a problem during the transit with the delivery partner.

The next consideration is maintainability. Asynchronous systems, by their very nature, are complex. As they evolve and scale, the complexity only increases. If we want to add a new feature, like priority delivery for premium customers to our food delivery app, for example, in a poorly designed system, it might require a significant overhaul impacting multiple components and services. Onboarding new team members or debugging an issue can become daunting in such a complex system. Addressing the challenge of maintainability is crucial for long-term viability and flexibility of an asynchronous system.

We've discussed various challenges, such as state management, resiliency, maintainability, and auditability, and how they pose a significant overhead. For example, in an asynchronous system, which is event-driven, the state machine, as mentioned earlier, doesn't understand the sequence of steps. It only knows it got an event x, the database is in event y, and it will perform event z. This lack of understanding can hinder scalability, making the system complex and harder to comprehend, onboard new team members, or make significant changes or additions to features.

After exploring these challenges, the solution for building asynchronous systems lies in workflow orchestrators. These tools are powerful in addressing the complexities of asynchronous systems, bringing order and efficiency. They act as central coordinators, overseeing and guiding the execution of tasks within a workflow, similar to a conductor in an orchestra ensuring tasks are performed at the right time and in the correct sequence. Workflow orchestrators enhance the overall efficiency, reliability, and scalability of the system. They streamline state management, providing a solution to challenges in managing processes, especially in a dynamic system like a food delivery example.

In a food delivery system, different states exist, such as "order confirmed," "order being prepared," "order being delivered," or "delay in your order." Workflow orchestrators play a crucial role in managing these states, monitoring and ensuring that each step happens in the correct order. What is meant by "workflow orchestrators take care of state management" is explained by contrasting it with the challenges discussed in the previous architecture. In the previous setup, a state machine lacked knowledge of the sequence of events or the order of things. If an event occurred out of the expected order, the state machine might not handle it appropriately, leading to potential issues. Manual specifications to handle such edge cases could result in significant development overhead, requiring knowledge of the entire code base and sequence of events.

Workflow orchestrators address these challenges by utilizing predefined workflows or sequences of tasks that represent various stages of the delivery process. These workflows define the order in which tasks should be executed and specify dependencies or conditions between them. This organized flow of states eliminates chaos and potential inconsistencies in state transitions. Workflow orchestrators ensure that all parties involved, including customers, the delivery team, and internal teams, have accurate and timely information about the order status, creating a seamless experience. Comparing the previous state machine diagram with the new workflow orchestrator diagram, the sequence of steps becomes predefined. For example, steps like "initiate order," "confirm order," "preparation," "initiate delivery," and "notify delivery" are part of a structured sequence. Workflow orchestrators execute these steps in order, waiting for events when required, creating a more organized and understandable system.

Workflow orchestrators also enhance resiliency by handling errors through built-in mechanisms. These orchestrators use strategies like automatic retries, time to close, and backups. In the case of errors, such as a failed payment or communication issues with a restaurant, workflow orchestrators take immediate action to recover. They may automatically retry failed operations or apply a back-off strategy to allow the system time to resolve the issue. This approach ensures that the entire delivery process doesn't come to a halt, enabling the system to operate smoothly even when unexpected errors occur. Configuring resiliency and retries with workflow orchestrators is a straightforward process, allowing for settings like maximum retry duration, maximum number of retries, or time intervals between retries to be easily specified. This configurability contributes to the reliability and efficiency of food delivery systems, ensuring a reliable service for customers.

If you've ordered a pizza and your payment system is down for longer than two hours, it doesn't make sense to continue with your delivery order because your customer would be very annoyed waiting for that pizza. The next consideration is enabling traceability or auditability. Traceability is a crucial feature in managing complex asynchronous systems, and workflow orchestrators play a significant role in simplifying this challenge. Orchestrators provide comprehensive logging and monitoring capabilities, making each stage and transition within the system visible and traceable. With these capabilities, developers and system administrators can easily track the flow of processes, identify bottlenecks, and troubleshoot any issues that may arise.

Another important aspect is facilitating maintainability. When it comes to maintaining complex asynchronous systems, workflow orchestrators offer invaluable assistance. One of the challenges in maintaining such systems is making modifications or adding new features without disrupting the entire system. Workflow orchestrators address this challenge by providing clear and code-based workflows, acting as a blueprint for the system. This makes it easy to understand and modify the system. With the help of workflow orchestrators, developers can make changes or introduce new features more efficiently, allowing the system to evolve and adapt to new requirements without causing unnecessary disruptions or delays.

Now, let's discuss how these workflow orchestrators empower developers' productivity. Empowering development productivity and aiding in maintenance are closely related. Workflow orchestrators have a transformative effect on the development process, enabling developers to focus on what truly matters—developing functionality that delivers value. By abstracting the complexity of asynchronous systems, orchestrators provide a high-level programming model. This allows developers to concentrate on business logic and the core functionality of applications. Developers can spend more time developing functionalities that deliver value rather than being bogged down by infrastructure concerns. With the support of workflow orchestrators, developers can work more efficiently, accelerating the development process and reducing time to market.

To summarize, workflow orchestrators offer features such as handling state management out of the box, providing a platform for auditability or traceability, and relieving the development team of the overhead of resiliency. These capabilities collectively contribute to creating efficient, reliable, and adaptable systems in complex asynchronous environments. Essentially, you can set retries or make your workflows more resilient, and these features come out of the box. Workflow orchestrators also provide out-of-the-box metrics, indicating the status of your workflow and highlighting where things may fail or the number of retries that occurred. These metrics make it easy to monitor and analyze the performance of your workflows. You can easily obtain metrics such as the number of food delivery workflows currently open and processing, providing valuable insights into the system's current workload.

Workflow orchestrators significantly reduce the effort required for features that come out of the box, leading to increased productivity and accelerated development. The next consideration is the variety of workflow orchestrator options available in the tech ecosystem. These orchestrators come in various forms, each with unique features and advantages. Some are open source, some are open source with managed solutions, and some are proprietary-based, each catering to different needs. Let's explore some popular options. The first one is Apache Airflow, a widely used open-source workflow orchestrator. It allows you to define workflows as code using Python, providing flexibility and ease of understanding. Apache Airflow has a vibrant community and extensive plugin support, making seamless integrations with various services possible. It is well-suited for data engineering tasks and scenarios where workflow steps may change frequently.

For those who don't want to manage an Airflow server, astronomer.io offers a fully managed Apache Airflow solution, taking care of hosting. The active open-source community ensures prompt bug fixes and collaborative problem-solving. The next option is Argo, a Kubernetes-native workflow orchestrator designed for cloud-native environments. Argo leverages the power of Kubernetes to manage workflow execution, treating each step as a container. It is well-suited for organizations using Kubernetes, offering scalability, fault tolerance, and flexibility in workflows.

Temporal is another powerful option, focusing on reliability and simplicity. Temporal is an open-source workflow orchestrator that allows developers to define workflows as code. It supports multiple SDKs for different languages, such as Java, .NET, Go, and even a new PHP SDK. This flexibility is particularly useful for complex use cases or business cases with branching logic. In summary, choosing a workflow orchestrator depends on your specific needs and the characteristics of your system. Options like Apache Airflow, Argo, and Temporal provide unique features, making them suitable for different scenarios. Selecting the right workflow orchestrator can contribute significantly to the efficiency, reliability, and maintainability of your asynchronous system.

If you want to handle a lot of edge cases and branching in a complex use case, Temporal is a great choice. Temporal offers long-duration execution with at least once semantics, built-in handling of failures and interruptions, and excels in managing stateful and long-running workflows while ensuring reliability and ease of development. It's particularly useful for scenarios where you need dynamic workflow execution, such as adding new steps to long-running workflows. Temporal has gained significant adoption, especially with the availability of its managed cloud service, eliminating the need to manage infrastructure. Its scalability, simplicity, and support for defining workflows as code in multiple languages make it suitable for both complex and simple use cases.

Another notable option is AWS Step Functions, a fully managed workflow orchestrator provided by Amazon Web Services. It simplifies the orchestration of complex multi-step applications using visual workflows defined with Amazon state machine language. AWS Step Functions seamlessly integrates with other AWS services, making it an excellent choice for organizations already immersed in the AWS ecosystem. However, it may be less flexible compared to Temporal, especially if you need to call services outside the AWS environment. When selecting a workflow orchestrator, factors such as the complexity of tasks, scalability requirements, fault tolerance, integration with existing infrastructure, and the convenience of the programming model should be considered. Each orchestrator, including Argo, Airflow, Temporal, Step Functions, and others, offers unique features and benefits, catering to specific use cases and preferences.

Ultimately, whether you choose a fully managed service like AWS Step Functions or a more flexible open-source solution like Apache Airflow, Argo, or Temporal, leveraging workflow orchestrators simplifies the complexities of asynchronous systems. These tools empower developers to build resilient, scalable, and maintainable applications. Choosing the right orchestrator aligns with your organization's needs and preferences. In conclusion, harnessing the power of workflow orchestrators empowers organizations to deliver better services, optimize development processes, and stay ahead in today's dynamic landscape. Building asynchronous systems, even if they start small and simple, can evolve into complex use cases. Consider using a workflow orchestrator to simplify your development life, especially in event-driven architectures or scenarios involving queues. It's a game-changer in simplifying system development. Thank you for your attention, and I hope this information proves valuable in your future endeavors. See you in the audience section!

Sign in
Or by mail
Sign in
Or by mail
Register with email
Register with email
Forgot password?