Zuplo
Observability

Performance Testing Your API Gateway

Performance testing is critical for understanding the real-world performance of your API when using an API gateway like Zuplo. This guide helps you create fair and accurate performance tests that properly measure latency and throughput.

Creating Fair Comparison Tests

When evaluating API gateway performance, it's essential to ensure your tests accurately reflect real-world conditions and provide a fair comparison between direct backend calls and calls through your gateway.

Test Location Matters

One of the most common mistakes in performance testing is running tests from within the same cloud provider network as your backend. This creates artificially low latency results that don't reflect real-world usage.

Never run performance tests from the same cloud provider as your backend. If your backend runs on AWS, don't test from AWS. The same applies to GCP, Azure, or any other provider. If you are using a third-party tool such as K8 or Blazemeter, be sure to check where their test nodes are located.

Why this matters: When traffic stays within a cloud provider's network, latency is dramatically reduced and more consistent, especially within the same geographical region. Intra-cloud network latency benefits from:

  • Dedicated high-speed interconnects between data centers
  • Optimized routing within the provider's backbone
  • Minimal network hops
  • Consistent, predictable performance with low jitter

You can see real-world latency differences using tools like:

The difference is stark: intra-cloud latency within the same region (for example, within North America) can be 50-70% lower than traffic traversing the public internet or between different cloud providers. Additionally, internet traffic experiences significantly higher jitter (variance), making response times less predictable.

This artificial performance boost from testing within the same cloud provider can make it appear that an API gateway adds substantially more latency than it actually does in real-world scenarios where traffic crosses network boundaries.

Ensure Test Equality

Fair comparisons require testing under identical conditions. Here are the key factors to consider:

Authentication Methods

Different authentication methods have different performance characteristics. If you're testing:

  • Backend with IAM/JWT: Processing time varies but is typically minimal for JWT validation
  • Zuplo with API Key authentication: Adds approximately 5-10ms for key validation

To ensure fair testing, use the same authentication method for both tests, or account for the difference in your analysis.

Request Parameters and Payloads

Always use identical:

  • Request headers
  • Query parameters
  • Request body size and complexity
  • Response size expectations

Scaling Patterns

Test both your backend and gateway-fronted API with the same:

  • Ramp-up patterns
  • Concurrent connection counts
  • Request rates
  • Test duration

Account for Additional Layers

Your architecture may include additional layers that affect performance:

  • CDN (CloudFlare, Fastly, etc.): Adds 5-15ms for cache misses
  • WAF (Web Application Firewall): Adds 10-20ms depending on rule complexity
  • DDoS Protection: Usually minimal impact (1-5ms) unless under attack
  • Load Balancers: Adds 1-5ms

Include these layers in both test scenarios or explicitly account for their impact in your analysis.

Understanding Gateway Latency

API gateways necessarily add some latency to process requests. For Zuplo:

  • Base latency: Approximately 20-30ms with no policies
  • Per policy: Most policies add 1-5ms each
  • Complex policies: Authentication, rate limiting, or custom code can add 5-15ms

This latency is the trade-off for the benefits an API gateway provides:

  • Centralized authentication and authorization
  • Rate limiting and quota management
  • Request/response transformation
  • Analytics and monitoring
  • Developer portal and documentation

Policy Impact on Performance

Different policies have varying performance impacts:

Low Impact (0-3ms)

  • Header manipulation
  • Simple request validation
  • Basic routing rules
  • Response caching (for cache hits)

Medium Impact (3-10ms)

  • API key authentication (Varies depending on cache hits/replication)
  • Rate limiting checks (0ms with async mode)
  • Request/response logging
  • Simple transformations

Higher Impact (10-20ms)

  • Large payload transformations
  • Custom code that makes external calls

For optimal performance, order your policies from least to most expensive, and use early-exit conditions where possible. For example, validate API keys before performing complex transformations.

Performance Testing Best Practices

1. Choose the Right Testing Tool

Use professional load testing tools that can:

  • Generate consistent load patterns
  • Measure percentile latencies (p50, p95, p99)
  • Handle connection pooling properly
  • Report detailed metrics

Recommended tools:

  • k6 - Modern load testing tool with excellent reporting
  • Apache JMeter - Comprehensive but complex
  • Gatling - High-performance testing framework
  • wrk - Simple but powerful for basic tests

2. Test from Multiple Locations

Run tests from various geographic locations to understand global performance:

  • Use cloud providers different from your backend
  • Test from regions where your users are located
  • Consider using distributed load testing services

3. Measure the Right Metrics

Focus on metrics that matter:

  • Latency percentiles: p50, p95, p99 (not just averages)
  • Throughput: Requests per second at various concurrency levels
  • Error rates: Both 4xx and 5xx responses
  • Time to first byte (TTFB)
  • Total request time

4. Test Realistic Scenarios

Design tests that reflect actual usage:

  • Mix of different endpoints
  • Realistic payload sizes
  • Actual authentication flows
  • Expected traffic patterns (steady, burst, ramp-up)

Interpreting Results

When analyzing your performance test results:

  1. Compare percentiles, not averages: p95 and p99 latencies better represent user experience
  2. Account for geographic distribution: Users farther from your infrastructure will see higher latency
  3. Look for anomalies: Sudden spikes might indicate rate limiting or capacity issues

Remember that Zuplo's edge deployment means your API is served from locations globally, which can actually reduce latency for geographically distributed users compared to a single-region backend.

Optimizing for Intra-Cloud Traffic

If your primary use case involves API traffic that stays within a particular cloud provider's network, consider Zuplo's Managed Dedicated deployment options. With Managed Dedicated, Zuplo can be deployed directly to:

  • Your chosen cloud provider (AWS, GCP, Azure, etc.)
  • Your specific regions
  • Your VPC or private network configurations

This deployment model provides:

  • Minimal latency: Your API gateway runs in the same cloud network as your backend
  • Predictable performance: Consistent sub-10ms latency for intra-region traffic
  • Network isolation: Traffic never leaves your cloud provider's backbone
  • Compliance benefits: Data remains within your controlled infrastructure

Managed Dedicated is ideal for organizations with:

  • High-volume internal API traffic
  • Strict latency requirements for service-to-service communication
  • Regulatory requirements for data locality
  • Existing investments in specific cloud providers

For most use cases, where API traffic comes from multiple providers, networks, and geographic locations (mobile apps, web applications, third-party integrations), Zuplo's edge-deployed instances typically provide better overall performance. Edge deployment ensures your API is served from locations closest to your users globally, reducing latency for the majority of real-world traffic patterns.

Cold Starts (Managed Edge Deployments Only)

This section applies only to Zuplo's managed edge (serverless) deployment. If you're running Zuplo in a dedicated environment, cold starts don't apply.

Zuplo's serverless platform automatically scales to handle any load, from zero to billions of requests. However, the first requests after a period of inactivity may experience "cold starts."

Understanding Cold Starts

  • Initial latency: First request may be 100-200ms slower
  • Node lifecycle: Once warm, nodes can serve requests for hours or days
  • Scaling behavior: New nodes spin up automatically based on traffic

Testing with Cold Starts

To accurately test performance:

  1. Run a warm-up phase: Send 100-1000 requests before measuring
  2. Measure steady-state: After warm-up, measure consistent performance
  3. Test scaling: Gradually increase load to observe scaling behavior
  4. Account for real-world patterns: Most production APIs stay warm during business hours

For APIs with predictable traffic patterns, consider implementing a simple keep-warm strategy using scheduled synthetic requests during low-traffic periods.

Summary

Creating fair performance tests requires careful attention to test conditions, understanding of network topology, and realistic expectations about API gateway overhead. By following these guidelines, you'll get accurate measurements that help you make informed decisions about your API architecture.

Remember: Zuplo typically adds only 20-30ms of latency for basic request processing, with additional small increments for some policies. This overhead is often offset by the operational benefits and can even result in better global performance due to edge deployment.

Last modified on