Load Balancing Guide

Comprehensive guide to load balancing algorithms, health checking, and backend management in Spooky.

Choose Your Algorithm

If your workload looks like this…	Use this algorithm
Stateless API or service, backends are equal capacity	`round-robin`
Stateless API, but backends vary in speed or size	`latency-aware`
Stateless API, prefer simplest possible config	`random`
Session state lives on the backend (shopping carts, websockets, sticky sessions)	`consistent-hash` with `key: "header:x-session-id"` or `cookie:<name>`
QUIC clients that must hit the same backend for the lifetime of a connection	`sticky-cid`
Backends are heterogeneous — some are slower or smaller	`least-connections`

When in doubt, start with round-robin. It has the lowest overhead, requires no configuration beyond the type name, and performs well for the majority of stateless API workloads. Switch to latency-aware if you observe uneven backend utilization due to speed differences, or to least-connections if request durations are highly variable.

If you are migrating from NGINX least_conn, use least-connections. If you are migrating from NGINX ip_hash or HAProxy balance source, use consistent-hash.

Load Balancing Algorithms

Spooky implements six load balancing algorithms, each optimized for different use cases. Each upstream pool configures its own algorithm independently via load_balancing.type.

Round Robin

Algorithm: Sequential distribution across healthy backends in a circular pattern.

Configuration:

upstream:
  api_pool:
    load_balancing:
      type: "round-robin"  # Accepts: round-robin, round_robin, rr
    route:
      path_prefix: "/api"
    backends:
      - id: "backend1"
        address: "10.0.1.10:8080"
        weight: 100
        health_check:
          path: "/health"
          interval: 5000
      - id: "backend2"
        address: "10.0.1.11:8080"
        weight: 100
        health_check:
          path: "/health"
          interval: 5000

Characteristics: - Sequential, predictable distribution pattern - State maintained across requests (counter increments per request) - Equal distribution when all backends have equal weight - Automatically skips unhealthy backends - Counter wraps on overflow (no reset on restart)

Use Cases: - Stateless applications requiring even distribution - Backends with equal capacity and performance - Scenarios where predictable patterns are acceptable - General purpose load balancing

Performance: Very low overhead (simple counter increment)

Consistent Hashing

Algorithm: Hash-based routing using a consistent hash ring with virtual nodes.

Configuration:

upstream:
  api_pool:
    load_balancing:
      type: "consistent-hash"  # Accepts: consistent-hash, consistent_hash, ch
      key: "header:x-user-id"   # Optional key source: header/cookie/query/path/authority/method/cid
    route:
      path_prefix: "/api"
    backends:
      - id: "backend1"
        address: "10.0.1.10:8080"
        weight: 100
        health_check:
          path: "/health"
          interval: 5000
      - id: "backend2"
        address: "10.0.1.11:8080"
        weight: 200  # 2x virtual nodes = 2x traffic share
        health_check:
          path: "/health"
          interval: 5000

Characteristics: - Deterministic routing based on hash key - Same key always routes to same backend (session affinity) - Uses 64 virtual replicas per backend per weight unit - Minimal request redistribution when backends change - FNV-1a hash function for distribution - Automatically skips unhealthy backends

Hash Key Sources:

load_balancing:
  type: "consistent-hash"
  key: "header:x-user-id"       # User ID from header
# key: "header:x-session-id"    # Session ID from header
# key: "cookie:session_id"      # Session cookie
# key: "query:user_id"          # Query parameter
# key: "path"                   # Request path (without query string)
# key: "authority"              # Host/authority
# key: "method"                 # HTTP method
# key: "cid"                    # QUIC connection ID (same as sticky-cid default)

Use Cases: - Applications requiring session affinity - Cache locality optimization (same keys hit same backend caches) - Stateful applications without external session store - Minimizing cache misses during backend changes

Performance: Low overhead (hash computation + BTreeMap lookup)

Random

Algorithm: Random selection from healthy backends.

Configuration:

upstream:
  api_pool:
    load_balancing:
      type: "random"
    route:
      path_prefix: "/api"
    backends:
      - id: "backend1"
        address: "10.0.1.10:8080"
        weight: 100
        health_check:
          path: "/health"
          interval: 5000
      - id: "backend2"
        address: "10.0.1.11:8080"
        weight: 100
        health_check:
          path: "/health"
          interval: 5000

Characteristics: - Non-deterministic selection using thread-local RNG - No state maintained between requests - Statistically even distribution over time - No session affinity - Automatically skips unhealthy backends

Use Cases: - Stateless applications - High-throughput scenarios where simplicity matters - Testing and development - Avoiding predictable patterns for security

Performance: Very low overhead (random number generation)

Least Connections

Algorithm: Routes each request to the healthy backend with the fewest active in-flight requests.

Configuration:

upstream:
  api_pool:
    load_balancing:
      type: "least-connections"  # Accepts: least-connections, least_connections, lc
    route:
      path_prefix: "/api"
    backends:
      - id: "backend1"
        address: "10.0.1.10:8080"
        weight: 100
        health_check:
          path: "/health"
          interval: 5000
      - id: "backend2"
        address: "10.0.1.11:8080"
        weight: 100
        health_check:
          path: "/health"
          interval: 5000

Characteristics: - Tracks active in-flight request count per backend - Ties broken deterministically by backend index order - Automatically skips unhealthy backends - No session affinity

Use Cases: - Backends with variable request processing time - Workloads where some requests are significantly slower than others - Avoiding hot spots when backends have similar capacity

Performance: Very low overhead (scan healthy backends for minimum active count)

Latency-Aware

Algorithm: Selects healthy backends using a latency score combining EWMA backend latency and active request pressure. Unsampled backends are probed first to avoid cold-start bias.

Configuration:

upstream:
  api_pool:
    load_balancing:
      type: "latency-aware"  # Accepts: latency-aware, latency_aware, la
    route:
      path_prefix: "/api"
    backends:
      - id: "backend1"
        address: "10.0.1.10:8080"
        weight: 100
        health_check:
          path: "/health"
          interval: 5000
      - id: "backend2"
        address: "10.0.1.11:8080"
        weight: 100
        health_check:
          path: "/health"
          interval: 5000

Characteristics: - Maintains EWMA latency per backend, updated after each completed request - New backends are probed first (no prior latency sample) before score-based selection - Automatically skips unhealthy backends - No session affinity

Use Cases: - Heterogeneous backends where some are consistently faster - Multi-region or mixed-hardware pools - Situations where you want faster backends to absorb proportionally more traffic automatically

Performance: Low overhead (EWMA update per request, score comparison at pick time)

Sticky CID

Algorithm: Consistent hashing keyed by QUIC connection ID (CID) for connection-level stickiness without application-layer session keys.

Configuration:

upstream:
  api_pool:
    load_balancing:
      type: "sticky-cid"  # Accepts: sticky-cid, sticky_cid, cid-sticky, cid_sticky
    route:
      path_prefix: "/api"
    backends:
      - id: "backend1"
        address: "10.0.1.10:8080"
        weight: 100
        health_check:
          path: "/health"
          interval: 5000
      - id: "backend2"
        address: "10.0.1.11:8080"
        weight: 100
        health_check:
          path: "/health"
          interval: 5000

Characteristics: - Same QUIC CID always routes to the same backend while backend health is stable - Falls back to another backend if the preferred one is unhealthy - Stickiness is per-connection, not per-session or per-user - Automatically skips unhealthy backends

Use Cases: - Applications that benefit from connection affinity without application-level session tokens - Reducing backend cache churn on a per-connection basis - Stateful HTTP/3 streams where all streams within a QUIC connection should land on the same backend

Performance: Low overhead (hash + BTreeMap lookup, same as consistent-hash)

Algorithm Comparison

Algorithm	Complexity	Session Affinity	State	Distribution	Use Case
Round Robin	O(1)	No	Counter	Even	General purpose, predictable load
Consistent Hash	O(log n)	Yes (request key)	Hash ring	Even (with weight)	Session affinity, cache locality
Random	O(1)	No	None	Statistically even	Stateless, high throughput
Least Connections	O(n)	No	Active count	Demand-driven	Variable request latency
Latency-Aware	O(n)	No	EWMA latency	Latency-proportional	Heterogeneous backends
Sticky CID	O(log n)	Yes (QUIC CID)	Hash ring	Even	Connection-level affinity

Backend Weighting

Only consistent hashing respects backend weights. Round-robin and random algorithms currently ignore weights.

Weight Configuration

backends:
  - id: "small-instance"
    address: "10.0.1.10:8080"
    weight: 50      # Receives 1x traffic
    health_check:
      path: "/health"
      interval: 5000

  - id: "medium-instance"
    address: "10.0.1.11:8080"
    weight: 100     # Receives 2x traffic (2x the first)
    health_check:
      path: "/health"
      interval: 5000

  - id: "large-instance"
    address: "10.0.1.12:8080"
    weight: 200     # Receives 4x traffic (4x the first)
    health_check:
      path: "/health"
      interval: 5000

Weight Behavior: - Round Robin: Weight values are currently ignored - Consistent Hash: Number of virtual nodes = replicas × weight (64 replicas per weight unit) - Random: Weight values are currently ignored - Minimum: Weight values below 1 are clamped to 1

Health Checking

Spooky performs active health checks on all backends. Unhealthy backends are automatically removed from rotation.

Health Check Mechanism

backends:
  - id: "backend1"
    address: "10.0.1.10:8080"
    weight: 100
    health_check:
      path: "/health"              # Health check endpoint path
      interval: 5000               # Check every 5 seconds
      timeout_ms: 2000             # 2 second timeout per check
      failure_threshold: 3         # 3 failures → mark unhealthy
      success_threshold: 2         # 2 successes → mark healthy
      cooldown_ms: 10000           # 10 second cooldown before recovery

Health Check Parameters

path: Endpoint to check (default: "/health") - Must return 2xx status code when healthy - Should be lightweight and fast

interval: Time between checks in milliseconds (default: 5000) - Lower values = faster failure detection - Higher values = lower overhead

timeout_ms: Maximum wait time for response (default: 1000) - Should be less than interval - Failed on timeout

failure_threshold: Consecutive failures to mark unhealthy (default: 3) - Higher values = more tolerance for transient failures - Lower values = faster failure detection

success_threshold: Consecutive successes to mark healthy (default: 2) - Higher values = more confidence before recovery - Lower values = faster recovery

cooldown_ms: Minimum time to stay unhealthy (default: 5000) - Prevents flapping - Gives backend time to recover

Health State Machine

Initial State: Healthy
              |
              | failure_threshold consecutive failures
              v
         Unhealthy (cooldown period)
              |
              | cooldown expires
              v
         Unhealthy (testing)
              |
              | success_threshold consecutive successes
              v
           Healthy

Healthy: Backend receives traffic, failures are counted

Unhealthy (cooldown): Backend removed from rotation, health checks continue but successes are ignored until cooldown expires

Unhealthy (testing): Backend still removed from rotation, consecutive successes are counted toward success_threshold

Backend State Tracking

The load balancer tracks per-backend state:

struct BackendState {
    address: String,
    weight: u32,
    health_check: HealthCheck,
    consecutive_failures: u32,
    health_state: HealthState,  // Healthy | Unhealthy { until, successes }
}

State transitions: - record_success(): Increments success counter, transitions to Healthy if threshold met - record_failure(): Increments failure counter, transitions to Unhealthy if threshold met

Multiple Upstream Pools

Spooky supports multiple upstream pools with independent routing and load balancing configuration. Each pool specifies its own algorithm.

Configuration Example

upstream:
  api_pool:
    load_balancing:
      type: "consistent-hash"  # Session affinity for API requests
    route:
      path_prefix: "/api"
    backends:
      - id: "api1"
        address: "10.0.1.10:8080"
        weight: 100
        health_check:
          path: "/health"
          interval: 5000
      - id: "api2"
        address: "10.0.1.11:8080"
        weight: 100
        health_check:
          path: "/health"
          interval: 5000

  auth_pool:
    load_balancing:
      type: "round-robin"  # Even spread across auth backends
    route:
      path_prefix: "/auth"
    backends:
      - id: "auth1"
        address: "10.0.2.10:8080"
        weight: 100
        health_check:
          path: "/health"
          interval: 5000

  static_pool:
    load_balancing:
      type: "random"  # Stateless static assets, any backend is fine
    route:
      path_prefix: "/static"
    backends:
      - id: "cdn1"
        address: "10.0.3.10:8080"
        weight: 100
        health_check:
          path: "/health"
          interval: 10000

Route Matching: - Routes are evaluated by longest-prefix matching - Route with most specific (longest) path prefix wins - For equal-length prefixes, selection depends on HashMap iteration order - Important: Unmatched routes return an error - configure a catch-all upstream (e.g., path_prefix: "/") to handle all other requests

Monitoring and Observability

Logging

Health check events are logged:

[INFO] Backend backend1 health check passed
[WARN] Backend backend2 health check failed: connection timeout
[INFO] Backend backend2 marked unhealthy after 3 consecutive failures
[INFO] Backend backend2 marked healthy after 2 consecutive successes
[DEBUG] Routing request to backend backend1 (round-robin)
[DEBUG] Routing request to backend backend2 (consistent-hash, key=user:123)

Health Check Monitoring

# Monitor health check activity (systemd)
sudo journalctl -u spooky.service -f | grep -i health

# Monitor health check activity (direct process)
spooky --config config.yaml 2>&1 | grep -i health

# Count backend state changes (systemd)
sudo journalctl -u spooky.service | grep "marked unhealthy\|marked healthy" | tail -20

# Count backend state changes (if redirected to file)
grep "marked unhealthy\|marked healthy" /var/log/spooky/spooky.log | tail -20

# Track specific backend (systemd)
sudo journalctl -u spooky.service | grep "backend1" | grep health

# Track specific backend (if redirected to file)
grep "backend1" /var/log/spooky/spooky.log | grep health

Load Distribution Analysis

# Extract backend selection counts (systemd)
sudo journalctl -u spooky.service | grep "routing to backend" | \
  awk '{print $NF}' | sort | uniq -c | sort -rn

# Monitor routing decisions in real-time (systemd)
sudo journalctl -u spooky.service -f | grep "routing to backend"

# If redirected to file
grep "routing to backend" /var/log/spooky/spooky.log | \
  awk '{print $NF}' | sort | uniq -c | sort -rn

Performance Considerations

Algorithm Performance

Algorithm	Time Complexity	Memory per Backend	Per-Request Cost
Round Robin	O(1)	~8 bytes	Counter increment
Consistent Hash	O(log n)	~4 KB (64 replicas × weight)	Hash + BTreeMap lookup
Random	O(1)	~0 bytes	RNG call

n = number of healthy backends

Backend Pool Operations

// All operations filter to healthy backends first
healthy_indices()              // O(n) - scans all backends
pick_backend(algorithm)        // O(1) or O(log n) depending on algorithm
mark_success(index)            // O(1) - direct index access
mark_failure(index)            // O(1) - direct index access

Scalability

Backend Count: Algorithms scale to hundreds of backends
Health Checks: Run asynchronously, do not block request path
Memory: ~8-12 KB per backend (including health state and hash ring)
CPU: Minimal overhead for all algorithms

Troubleshooting

Uneven Load Distribution

Symptoms: Some backends receive disproportionate traffic

Diagnosis:

# Check backend weights
grep -A 10 "backends:" config.yaml | grep -E "id|weight"

# Monitor actual distribution (systemd)
sudo journalctl -u spooky.service | grep "routing to backend" | \
  awk '{print $NF}' | sort | uniq -c

# If redirected to file
grep "routing to backend" /var/log/spooky/spooky.log | \
  awk '{print $NF}' | sort | uniq -c

# Verify all backends are healthy (systemd)
sudo journalctl -u spooky.service | grep "healthy" | tail -20

# If redirected to file
grep "healthy" /var/log/spooky/spooky.log | tail -20

Solutions: - Verify backend weights are configured correctly - Check for unhealthy backends (temporarily removed from rotation) - For consistent-hash, verify hash keys are well-distributed - For round-robin, ensure sufficient request volume for even distribution

Session Affinity Not Working

Symptoms: Requests from same user/session hit different backends

Diagnosis:

# Verify consistent-hash configuration
grep -A 5 "load_balancing:" config.yaml | grep -E "type|key"

# Check if hash key is present in requests (systemd)
sudo journalctl -u spooky.service -f | grep "consistent-hash"

# Check if hash key is present in requests (if redirected to file)
tail -f /var/log/spooky/spooky.log | grep "consistent-hash"

# Test with known hash key
curl --http3-only -H "X-User-ID: test123" https://localhost:9889/

Solutions: - Ensure load_balancing.type is "consistent-hash" - Note: Hash key is automatically derived from request (authority → path → method) - For session affinity, ensure requests include consistent authority or path components - Configurable key sources are planned for future implementation

Frequent Health Check Failures

Symptoms: Backends repeatedly marked unhealthy despite being functional

Diagnosis:

# Monitor health check failures (systemd)
sudo journalctl -u spooky.service -f | grep "health check failed"

# If redirected to file
tail -f /var/log/spooky/spooky.log | grep "health check failed"

# Test health endpoint directly
curl -v http://10.0.1.10:8080/health

# Check response time
time curl http://10.0.1.10:8080/health

# Verify network connectivity
ping 10.0.1.10
traceroute 10.0.1.10

Solutions: - Increase health check timeout (timeout_ms) if endpoint is slow - Increase failure_threshold to tolerate transient failures - Optimize backend health endpoint performance - Check network latency and packet loss - Verify health check path is correct

Backends Not Recovering

Symptoms: Healthy backends remain marked unhealthy

Diagnosis:

# Check cooldown period
grep -A 10 "health_check:" config.yaml | grep cooldown_ms

# Monitor recovery attempts (systemd)
sudo journalctl -u spooky.service -f | grep -E "marked healthy|success"

# If redirected to file
tail -f /var/log/spooky/spooky.log | grep -E "marked healthy|success"

# Verify backend is actually healthy
curl http://10.0.1.10:8080/health

Solutions: - Reduce cooldown_ms for faster recovery - Reduce success_threshold if being too conservative - Verify backend health endpoint returns 2xx status - Check for health check timeout issues

No Backends Available

Symptoms: All backends marked unhealthy, requests fail

Diagnosis:

# List all backend states (systemd)
sudo journalctl -u spooky.service | grep -i "backend.*health" | tail -20

# If redirected to file
grep -i "backend.*health" /var/log/spooky/spooky.log | tail -20

# Check configuration
grep -A 15 "backends:" config.yaml

# Test each backend directly
for backend in 10.0.1.10:8080 10.0.1.11:8080; do
  echo "Testing $backend"
  curl -v http://$backend/health
done

Solutions: - Fix backend health endpoints - Adjust health check parameters (increase timeout, threshold) - Verify backends are actually running and accessible - Check firewall rules between Spooky and backends

Best Practices

Health Check Configuration

Set timeout_ms < interval to prevent check pileup
Use failure_threshold ≥ 3 to avoid false positives
Set cooldown_ms ≥ 10000 to prevent flapping
Keep health endpoints lightweight (< 100ms response time)

Algorithm Selection

Use Round Robin for simple, even distribution with equal backends
Use Consistent Hash when session affinity or cache locality matters
Use Random for stateless, high-throughput scenarios
Use Least Connections when request durations vary significantly across backends
Use Latency-Aware when backends have different performance characteristics and you want faster ones to absorb more traffic automatically
Use Sticky CID for QUIC-connection affinity without application-layer session keys

Weight Configuration

Base weights on backend capacity (CPU, memory, network)
Start with equal weights, adjust based on monitoring
Use relative weights (100, 200, 400) rather than absolute
For consistent-hash, remember: weight × 64 = number of virtual nodes

Upstream Pool Design

Create separate pools for different services (API, auth, static)
Order routes from most specific to least specific
Use path_prefix for path-based routing
Use host for virtual host routing
Provide a catch-all default pool

Advanced Configuration Examples

Multi-Tier Application

# Global load balancing strategy (applies to all upstream pools)
# Upstream pools can override this with upstream.<name>.load_balancing
load_balancing:
  type: "round-robin"

upstream:
  api_tier:
    route:
      path_prefix: "/api"
    backends:
      - id: "api1"
        address: "10.0.1.10:8080"
        weight: 200
        health_check:
          path: "/health"
          interval: 5000
          timeout_ms: 2000
          failure_threshold: 3
          success_threshold: 2
          cooldown_ms: 15000

  auth_tier:
    route:
      path_prefix: "/auth"
    backends:
      - id: "auth1"
        address: "10.0.2.10:8080"
        weight: 100
        health_check:
          path: "/health"
          interval: 3000
          timeout_ms: 1000
          failure_threshold: 2
          success_threshold: 2
          cooldown_ms: 10000

  static_tier:
    route:
      path_prefix: "/static"
    backends:
      - id: "cdn1"
        address: "10.0.3.10:8080"
        weight: 100
        health_check:
          path: "/health"
          interval: 30000
          timeout_ms: 5000
          failure_threshold: 5
          success_threshold: 1
          cooldown_ms: 60000

Heterogeneous Backend Capacities

upstream:
  prod_pool:
    load_balancing:
      type: "round-robin"
    route:
      path_prefix: "/"
    backends:
      - id: "small-1"
        address: "10.0.1.10:8080"
        weight: 50       # 2 vCPU, 4 GB RAM
        health_check:
          path: "/health"
          interval: 5000

      - id: "medium-1"
        address: "10.0.1.11:8080"
        weight: 100      # 4 vCPU, 8 GB RAM
        health_check:
          path: "/health"
          interval: 5000

      - id: "large-1"
        address: "10.0.1.12:8080"
        weight: 200      # 8 vCPU, 16 GB RAM
        health_check:
          path: "/health"
          interval: 5000

See Basics for general configuration and deployment
Refer to Configuration Reference for complete parameter documentation
See Architecture Guide for internal implementation details

Load Balancing Guide

Choose Your Algorithm

Load Balancing Algorithms

Round Robin

Consistent Hashing

Random

Least Connections

Latency-Aware

Sticky CID

Algorithm Comparison

Backend Weighting

Weight Configuration

Health Checking

Health Check Mechanism

Health Check Parameters

Health State Machine

Backend State Tracking

Multiple Upstream Pools

Configuration Example

Monitoring and Observability

Logging

Health Check Monitoring

Load Distribution Analysis

Performance Considerations

Algorithm Performance

Backend Pool Operations

Scalability

Troubleshooting

Uneven Load Distribution

Session Affinity Not Working

Frequent Health Check Failures

Backends Not Recovering

No Backends Available

Best Practices

Health Check Configuration

Algorithm Selection

Weight Configuration

Upstream Pool Design

Advanced Configuration Examples

Multi-Tier Application

Heterogeneous Backend Capacities

Related Documentation