Observability

Performance Testing Your API Gateway

Performance testing is critical for understanding the real-world performance of your API when using an API gateway like Zuplo. This guide helps you create fair and accurate performance tests that properly measure latency and throughput.

Creating Fair Comparison Tests

When evaluating API gateway performance, it's essential to ensure your tests accurately reflect real-world conditions and provide a fair comparison between direct backend calls and calls through your gateway.

Test Location Matters

One of the most common mistakes in performance testing is running tests from within the same cloud provider network as your backend. This creates artificially low latency results that don't reflect real-world usage.

Never run performance tests from the same cloud provider as your backend. If your backend runs on AWS, don't test from AWS. The same applies to GCP, Azure, or any other provider. If you are using a third-party tool such as K8 or Blazemeter, be sure to check where their test nodes are located.

Why this matters: When traffic stays within a cloud provider's network, latency is dramatically reduced and more consistent, especially within the same geographical region. Intra-cloud network latency benefits from:

Dedicated high-speed interconnects between data centers
Optimized routing within the provider's backbone
Minimal network hops
Consistent, predictable performance with low jitter

You can see real-world latency differences using tools like:

AWS CloudPing - Shows inter-region latency for AWS
Google Cloud Network Intelligence Center - Provides detailed GCP network performance metrics

The difference is stark: intra-cloud latency within the same region (for example, within North America) can be 50-70% lower than traffic traversing the public internet or between different cloud providers. Additionally, internet traffic experiences significantly higher jitter (variance), making response times less predictable.

This artificial performance boost from testing within the same cloud provider can make it appear that an API gateway adds substantially more latency than it actually does in real-world scenarios where traffic crosses network boundaries.

Ensure Test Equality

Fair comparisons require testing under identical conditions. Here are the key factors to consider:

Authentication Methods

Different authentication methods have different performance characteristics. If you're testing:

Backend with IAM/JWT: Processing time varies but is typically minimal for JWT validation
Zuplo with API Key authentication: Adds approximately 5-10ms for key validation

To ensure fair testing, use the same authentication method for both tests, or account for the difference in your analysis.

Request Parameters and Payloads

Always use identical:

Request headers
Query parameters
Request body size and complexity
Response size expectations

Scaling Patterns

Test both your backend and gateway-fronted API with the same:

Ramp-up patterns
Concurrent connection counts
Request rates
Test duration

Account for Additional Layers

Your architecture may include additional layers that affect performance:

CDN (CloudFlare, Fastly, etc.): Adds 5-15ms for cache misses
WAF (Web Application Firewall): Adds 10-20ms depending on rule complexity
DDoS Protection: Usually minimal impact (1-5ms) unless under attack
Load Balancers: Adds 1-5ms

Include these layers in both test scenarios or explicitly account for their impact in your analysis.

Understanding Gateway Latency

API gateways necessarily add some latency to process requests. For Zuplo:

Base latency: Approximately 20-30ms with no policies
Per policy: Most policies add 1-5ms each
Complex policies: Authentication, rate limiting, or custom code can add 5-15ms

This latency is the trade-off for the benefits an API gateway provides:

Centralized authentication and authorization
Rate limiting and quota management
Request/response transformation
Analytics and monitoring
Developer portal and documentation

Policy Impact on Performance

Different policies have varying performance impacts:

Low Impact (0-3ms)

Header manipulation
Simple request validation
Basic routing rules
Response caching (for cache hits)

Medium Impact (3-10ms)

API key authentication (Varies depending on cache hits/replication)
Rate limiting checks (0ms with async mode)
Request/response logging
Simple transformations

Higher Impact (10-20ms)

Large payload transformations
Custom code that makes external calls

For optimal performance, order your policies from least to most expensive, and use early-exit conditions where possible. For example, validate API keys before performing complex transformations.

Performance Testing Best Practices

1. Choose the Right Testing Tool

Use professional load testing tools that can:

Generate consistent load patterns
Measure percentile latencies (p50, p95, p99)
Handle connection pooling properly
Report detailed metrics

Recommended tools:

k6 - Modern load testing tool with excellent reporting
Apache JMeter - Comprehensive but complex
Gatling - High-performance testing framework
wrk - Simple but powerful for basic tests

2. Test from Multiple Locations

Run tests from various geographic locations to understand global performance:

Use cloud providers different from your backend
Test from regions where your users are located
Consider using distributed load testing services

3. Measure the Right Metrics

Focus on metrics that matter:

Latency percentiles: p50, p95, p99 (not just averages)
Throughput: Requests per second at various concurrency levels
Error rates: Both 4xx and 5xx responses
Time to first byte (TTFB)
Total request time

4. Test Realistic Scenarios

Design tests that reflect actual usage:

Mix of different endpoints
Realistic payload sizes
Actual authentication flows
Expected traffic patterns (steady, burst, ramp-up)

Interpreting Results

When analyzing your performance test results:

Compare percentiles, not averages: p95 and p99 latencies better represent user experience
Account for geographic distribution: Users farther from your infrastructure will see higher latency
Look for anomalies: Sudden spikes might indicate rate limiting or capacity issues

Remember that Zuplo's edge deployment means your API is served from locations globally, which can actually reduce latency for geographically distributed users compared to a single-region backend.

Optimizing for Intra-Cloud Traffic

If your primary use case involves API traffic that stays within a particular cloud provider's network, consider Zuplo's Managed Dedicated deployment options. With Managed Dedicated, Zuplo can be deployed directly to:

Your chosen cloud provider (AWS, GCP, Azure, etc.)
Your specific regions
Your VPC or private network configurations

This deployment model provides:

Minimal latency: Your API gateway runs in the same cloud network as your backend
Predictable performance: Consistent sub-10ms latency for intra-region traffic
Network isolation: Traffic never leaves your cloud provider's backbone
Compliance benefits: Data remains within your controlled infrastructure

Managed Dedicated is ideal for organizations with:

High-volume internal API traffic
Strict latency requirements for service-to-service communication
Regulatory requirements for data locality
Existing investments in specific cloud providers

For most use cases, where API traffic comes from multiple providers, networks, and geographic locations (mobile apps, web applications, third-party integrations), Zuplo's edge-deployed instances typically provide better overall performance. Edge deployment ensures your API is served from locations closest to your users globally, reducing latency for the majority of real-world traffic patterns.

Cold Starts (Managed Edge Deployments Only)

This section applies only to Zuplo's managed edge (serverless) deployment. If you're running Zuplo in a dedicated environment, cold starts don't apply.

Zuplo's serverless platform automatically scales to handle any load, from zero to billions of requests. However, the first requests after a period of inactivity may experience "cold starts."

Understanding Cold Starts

Initial latency: First request may be 100-200ms slower
Node lifecycle: Once warm, nodes can serve requests for hours or days
Scaling behavior: New nodes spin up automatically based on traffic

Testing with Cold Starts

To accurately test performance:

Run a warm-up phase: Send 100-1000 requests before measuring
Measure steady-state: After warm-up, measure consistent performance
Test scaling: Gradually increase load to observe scaling behavior
Account for real-world patterns: Most production APIs stay warm during business hours

For APIs with predictable traffic patterns, consider implementing a simple keep-warm strategy using scheduled synthetic requests during low-traffic periods.

Summary

Creating fair performance tests requires careful attention to test conditions, understanding of network topology, and realistic expectations about API gateway overhead. By following these guidelines, you'll get accurate measurements that help you make informed decisions about your API architecture.

Remember: Zuplo typically adds only 20-30ms of latency for basic request processing, with additional small increments for some policies. This overhead is often offset by the operational benefits and can even result in better global performance due to edge deployment.

Edit this page

Last modified on September 11, 2025

Proactive monitoring Quickstart