Managing Actions per Second (APS) Limits in Temporal Cloud

If you're running Workflows on Temporal Cloud, you've probably noticed that each Namespace comes with an Actions Per Second (APS) limit. But what exactly does that mean, and why does it matter?

In Temporal, an "action" is any operation that modifies Workflow state or interacts with the Temporal service. Your Namespace's APS limit controls how many of these operations can happen per second across all Workflows within that Namespace. When the APS limit is reached, Temporal begins to throttle requests. Depending on the business priority of the Workflow, this may be fine or it may have significant impact.

The difficulty is that APS consumption isn't always intuitive. A single Workflow Execution generates multiple actions from the moment it starts, and use cases that fit nicely within APS limits at small scale can exhaust those limits as they grow. Many customers are surprised to find they're hitting APS constraints well before they expected to based on their Workflow count alone.

This guide will help you understand why customers hit APS limits, how to design Workflows that use actions efficiently, and what to do when you're approaching capacity. Whether you're just getting started with Temporal Cloud or optimizing an existing deployment, managing APS effectively is key to building scalable, reliable applications.

Understanding Actions in Temporal

Before we dive into why customers hit APS limits, let's talk about what actions are.

What Counts as an Action?

In Temporal, actions are the fundamental operations that drive your Workflows forward. Here's an overview of what counts, with the full list in our documentation.

Workflows: Starting, completing, resetting. Also starting Child Workflows, as well as Schedules and Timers
Activities: Starting, retrying, Heartbeating
Signals, Updates, and Queries

Actions that count toward an APS limit are, with a few exemptions, the same as actions that are billable. The key insight here is that nearly everything that happens in Temporal--state changes, decision points, interactions--is counted as an action.

The Action Multiplier Effect

What this means is that when you start a single Workflow, you're not performing just one action as it relates to APS because a Workflow isn’t a single atomic operation, it’s a series of events that Temporal orchestrates. Each Activity at the start of the Workflow is an Action, so there can be a burst of Activities at the start of a Workflow. Additionally, there are often business reasons to start multiple Workflows at the same time.

These can all contribute to the multiplier effect.

The Effect of Rate Limiting

In Temporal Cloud, the effect of rate limiting is increased latency, not lost work. Workers might take longer to complete Workflows.

Common Reasons Customers Hit APS Limits

Now that you understand how actions are defined and how they count toward APS limits, let's look at the patterns that most commonly push customers into APS constraints.

Bursty Traffic

Most businesses don't operate at constant velocity—they have rhythms, cycles, and spikes. These patterns can create APS challenges because Temporal Cloud enforces limits at the per-second level.

Common bursty patterns include:

Calendar-driven spikes: Month-end financial close processes, quarterly reporting Workflows, payroll that runs on the 1st and 15th, scheduled batch jobs that kick off at midnight. These create predictable but intense load concentrations.
Event-driven surges: Product launches, marketing campaigns, flash sales, breaking news, or seasonal events like Black Friday.
Recovery scenarios: When a downstream dependency fails and then recovers, you often get a thundering herd effect—hundreds or thousands of Workflows that were waiting all suddenly resume execution simultaneously, creating an artificial spike in APS consumption.
Geographic/business hours concentration: Global applications often see load follow the sun, with peak activity during business hours in each region. If your business concentrates in specific markets, you'll see daily peaks rather than even 24/7 distribution.
Retry Storms: when a large number of Workflows get stuck on an Activity, and that Activity is failing, if retry delay is very short, this can cause a spike in Actions.
Timer Storms: a large number of Workflows all set a Timer for the exact same time--triggering a spike as those Timers fire and then Activities run, causing a lot of actions all at the same time.

These types of processes can result in your Namespace averaging 200 APS over a day, but spiking to 800 APS or more during your peak hour/day/event/etc.

How to Mitigate

You can’t change the patterns of how customers interact with your systems, but there are some adjustments you can make to your Workflows to make traffic patterns more consistent, especially for use cases where immediate response isn’t necessary.

These adjustments include:

Implement application-level queuing or rate limiting to smooth out predictable spikes.
For scheduled batch operations, stagger start times rather than triggering everything at once--implement jitter in your high-volume Schedules.
Implement jitter when starting Workflows, such as with Start Delay.
Accept rate limiting
Provisioned Capacity

Cascading Workflows and Fan-Out Patterns

Decomposing complex processes into parent and Child Workflows (or with Nexus) is a common and often appropriate pattern, but the APS costs multiply dramatically with depth and fan-out.

Consider an order fulfillment Workflow that spawns Child Workflows for payment processing, inventory management, shipping, and customer notifications. Each Child Workflow goes through its full action lifecycle (start, tasks, activities, completion), and all of those actions count toward the APS limits on your Namespace.

This pattern appears frequently in:

Batch processing: A parent workflow processes a file with 1,000 records, spawning a Child Workflow for each record. Batch processing is also often bursty whenever the batch begins.
Map-reduce patterns: Data processing Workflows that fan out to process partitions in parallel, then aggregate results.

This challenge additionally compounds when you have multiple levels of nesting--parent Workflows that create children, which create their own children.

How to Mitigate

Evaluate whether Child Workflows are necessary--other options include Activities or Workflows in another Namespace (via Nexus)
When you do use Child Workflows, limit fan-out size--design a Child Workflow to process its work in batches rather than one Child per work item. This sample application shows more detail.
Consider flattening deeply nested hierarchies into shallower structures.

Human-in-the-Loop Processes at Scale

Workflows that incorporate human decision-making--approvals, reviews, manual data entry, quality checks--tend to be long-running and interaction-intensive, which creates sustained APS load.

These Workflows can involve Queries from UIs to display current state and pending tasks.

At small scale, this is manageable. But when you're running thousands of them at the same time--like a content moderation queue with pending reviews, or a loan approval system processing applications, or a support ticket system managing thousands of open cases--the cumulative APS load from all of those long-running Workflows adds up.

How to Mitigate

Avoid polling patterns where UIs constantly query Workflow state. Instead, push state changes to a database that UIs can read.

Real-Time SLAs and Deadline Management

Businesses with strict service level agreements often implement active monitoring and escalation in their Workflows. This is generally accomplished by setting Timers every [x] minutes to determine if an SLA deadline is approaching, allowing the Workflow to trigger escalations or alerts.

Each of these Timers/monitoring actions affect APS. When you have thousands of in-flight Workflows all actively monitoring their own SLAs, the background load becomes significant. You're consuming substantial APS capacity even when Workflows aren't doing their primary work.

How to Mitigate

Use longer monitoring intervals where possible. For example, check SLAs every 30 minutes rather than every 1 minute.
Where possible, consolidate Timers. Rather than 10 Timers that check 10 tasks, have 1 Timer and then check those 10 tasks.
Where possible, have an external system signal your Workflow rather than using short-lived Timers to poll.
For retries, use exponential backoff with reasonable initial intervals.

Additional Design Patterns

There are some design patterns that can lead to high APS that are consistent across many different types of business use cases.

Many Small Activities

Consider two approaches to processing 1,000 records:

Approach A: Create a Workflow that spawns 1,000 separate activities, one per record.
Approach B: Create a Workflow that spawns 10 activities, each processing 100 records in a batch.

Approach B will clearly result in less APS. This is a simple example, but this pattern shows up everywhere: processing individual transactions versus batches, sending individual notifications versus bulk operations, or making separate API calls versus batch endpoints. Each separate Activity adds Action overhead.

How to Mitigate

Consider if you can combine multiple external calls within a single Activity.
If processing a large amount of data, process it in chunks.
See How Many Activities should I use in my Temporal Workflow? for more information.

Multiple Use Cases in One Namespace

Often when starting with Temporal, the first use case is implemented in a single Namespace, generally one per logical environment. When the second Temporal use case is implemented, it runs in the same Namespace, the same for the third, fourth, etc.

An APS limit is set per Namespace, so multiple use cases with multiple traffic patterns in the same Namespace can exhaust this limit quickly.

How to Mitigate

Plan for a set of Namespaces (one per environment) per use case. See Temporal guidance for more details.

Provisioned Capacity

If you have a workload that is both latency-sensitive and is being rate-limited, you can also use Provisioned Capacity Modes on your Namespace. This allows you to set Temporal Resource units that will scale up your limits to meet the needs of your specific workloads.

Knowing if You’re Hitting APS Limits

In addition to understanding the patterns that can affect APS limits on a Temporal Namespace, it’s also important to know if you’re approaching (or exceeding) these limits. Temporal Cloud provides several metrics that, if tracked, will tell you if you’re being rate limited due to APS. See the documentation on detecting resource exhaustion for an explanation of those metrics as well as a sample Grafana dashboard that shows how they could be viewed.

Key Takeaways

Let's recap the main reasons customers hit APS limits and how to address them:

Reason for Hitting APS Limits	How to Address It
Bursty Traffic	Implement application-level queuing or rate limiting to smooth spike, stagger start times for scheduled batch operations.
Cascading Workflows and Fan-Out Patterns	Evaluate if Child Workflows are necessary (consider activities or another Namespace), limit fan-out size by processing work in batches within a Child Workflow, consider flattening deeply nested hierarchies.
Human-in-the-Loop Processes at Scale	Design long-running Workflows to minimize sustained APS load from interaction (by avoiding polling where UIs constantly Query state and using Signals only for key human inputs).
Many small activities	Consider if you can combine multiple external calls within a single Activity. If processing a large amount of data, process it in chunks.
Multiple use cases in one Namespace	Plan for a set of Namespaces (one per environment) per use case.

General guidance

When designing Temporal Workflows with an eye toward APS limits, ask yourself the following questions:

How many actions will a single execution of this Workflow consume?
How many Workflows will typically be running at the same time?
What happens to APS consumption when the number of Actions * number of active Workflows scales to 100x current volume?
Are there natural opportunities to combine operations: combine activities, or process chunks of data together?
Am I polling when I could be using Signals?
Does this Workflow need to run continuously, or can it be event-driven?

A few hours spent optimizing Workflow design can save you from capacity crunches, emergency limit increases, and potentially significant cost increases down the road.

Understanding Actions in Temporal​

What Counts as an Action?​

The Action Multiplier Effect​

The Effect of Rate Limiting​

Common Reasons Customers Hit APS Limits​

Bursty Traffic​

How to Mitigate​

Cascading Workflows and Fan-Out Patterns​

How to Mitigate​

Human-in-the-Loop Processes at Scale​

How to Mitigate​

Real-Time SLAs and Deadline Management​

How to Mitigate​

Additional Design Patterns​

Many Small Activities​

How to Mitigate​

Multiple Use Cases in One Namespace​

How to Mitigate​

Provisioned Capacity​

Knowing if You’re Hitting APS Limits​

Key Takeaways​

General guidance​

Understanding Actions in Temporal

What Counts as an Action?

The Action Multiplier Effect

The Effect of Rate Limiting

Common Reasons Customers Hit APS Limits

Bursty Traffic

How to Mitigate

Cascading Workflows and Fan-Out Patterns

How to Mitigate

Human-in-the-Loop Processes at Scale

How to Mitigate

Real-Time SLAs and Deadline Management

How to Mitigate

Additional Design Patterns

Many Small Activities

How to Mitigate

Multiple Use Cases in One Namespace

How to Mitigate

Provisioned Capacity

Knowing if You’re Hitting APS Limits

Key Takeaways

General guidance