Preparing for Zenoh: Performance Tuning Deep Dive

Part 2 of 3: Achieving 99.8% bandwidth reduction for WiFi-based robotics

ros2
zenoh
performance
networking
optimization
Author

Rajesh

Published

December 15, 2025

TL;DR

  • Linux kernel tuning for network buffers: 16MB rx/tx, BBR congestion control
  • Zenoh configuration strategies: optimized (throughput) vs low-latency (control)
  • Result: 50 MB/s → 100 KB/s = 99.8% bandwidth reduction
  • apply_tuning.sh for one-command optimization

The Bandwidth Problem

Our Unitree Go2 robot publishes approximately 50 MB/s of sensor data:

Sensor Rate Bandwidth
RGB Camera 30 Hz @ 720p ~27 MB/s
Depth Camera 30 Hz @ 720p ~18 MB/s
LIDAR 20 Hz ~2 MB/s
IMU 200 Hz ~100 KB/s
Odometry 50 Hz ~50 KB/s
Total ~50 MB/s

WiFi can’t reliably handle this. Even 802.11ac (WiFi 5) at its theoretical 867 Mbps (~100 MB/s) only achieves 40-60 MB/s real-world throughput under ideal conditions. Add interference, distance, and multiple clients - you’re looking at packet loss, latency spikes, and dropped connections.

We need multi-layer optimization.


The Solution Stack

Performance tuning happens at four layers, each building on the previous:

┌─────────────────────────────────────────────────────────────────┐
│  Layer 4: HARDWARE                                              │
│           NIC ring buffers, WiFi power management               │
├─────────────────────────────────────────────────────────────────┤
│  Layer 3: KERNEL                                                │
│           sysctl network parameters, TCP tuning                 │
├─────────────────────────────────────────────────────────────────┤
│  Layer 2: ZENOH                                                 │
│           Buffer sizes, batching, shared memory                 │
├─────────────────────────────────────────────────────────────────┤
│  Layer 1: APPLICATION                                           │
│           Topic filtering (--allow/--deny)     ← BIGGEST IMPACT │
└─────────────────────────────────────────────────────────────────┘

Key insight: Start from the top (application layer). Topic filtering gives you the biggest bang for your buck - often 95%+ reduction before touching anything else.


Layer 1: Topic Filtering (Biggest Impact!)

The Zenoh bridge supports --allow and --deny patterns to filter which topics cross the bridge.

Bandwidth Impact by Configuration

Configuration Topics Bridged Bandwidth Savings
Unfiltered All 47 topics ~50 MB/s 0%
Color + Odom 2 topics ~28 MB/s 44%
Navigation Only /cmd_vel, /odom, /scan ~3 MB/s 94%
Monitoring Only /odom, /battery, /imu ~100 KB/s 99.8%

Example: Navigation Configuration

# Bridge only what navigation needs
zenoh-bridge-ros2dds \
    --allow "/cmd_vel" \
    --allow "/odom" \
    --allow "/scan" \
    --deny ".*"  # Deny everything else

Example: Monitoring Dashboard

# Minimal data for remote monitoring
zenoh-bridge-ros2dds \
    --allow "/odom" \
    --allow "/battery_state" \
    --allow "/imu/data" \
    --allow "/diagnostics" \
    --deny "/camera/.*" \
    --deny "/scan"
Order Matters!

Filter rules are processed in order. More specific rules should come before general ones:

# CORRECT: Specific allows, then general deny
--allow "/camera/color/compressed" --deny "/camera/.*"

# WRONG: General deny blocks everything
--deny "/camera/.*" --allow "/camera/color/compressed"  # Allow never reached!

Layer 2: Zenoh Configuration

We provide two pre-configured Zenoh profiles:

High-Throughput: zenoh_optimized.json5

Optimized for bulk sensor data transfer:

{
  mode: "router",
  transport: {
    unicast: {
      max_links: 16,
      lowlatency: false,  // Prioritize throughput
    },
    link: {
      tx: {
        batch_size: 65535,      // Large batches
        queue_size: 256,        // Deep queue
        wait_before_drop: "1ms"
      },
      rx: {
        buffer_size: 262144,    // 256KB receive buffer
        max_message_size: 1073741824  // 1GB max
      }
    }
  },
  scouting: {
    multicast: {
      enabled: false  // Disable for stability
    }
  },
  shared_memory: {
    enabled: true,
    size: "16M"  // 16MB shared memory pool
  }
}

Use case: Streaming camera data, bulk file transfer, high-bandwidth sensor fusion.

Low-Latency: zenoh_lowlatency.json5

Optimized for control commands and real-time telemetry:

{
  mode: "client",
  transport: {
    unicast: {
      lowlatency: true,  // Enable low-latency mode
    },
    link: {
      tx: {
        batch_size: 8192,       // Smaller batches
        queue_size: 64,         // Shallower queue
        wait_before_drop: "100us"  // Send immediately
      },
      rx: {
        buffer_size: 65536,     // 64KB buffer (smaller)
      }
    }
  }
}

Use case: /cmd_vel commands, emergency stops, real-time control loops.

Batching vs Latency Trade-off

Batching enabled (lowlatency: false): - Messages are accumulated into batches before sending - Higher throughput, but adds ~1ms latency - Good for: cameras, LIDAR, bulk data

Batching disabled (lowlatency: true): - Messages sent immediately - Lower throughput, but sub-millisecond latency - Good for: /cmd_vel, emergency_stop, heartbeat

Buffer Sizing

The rx/tx buffer sizes should match your expected message sizes:

Sensor Message Size Recommended Buffer
IMU ~200 bytes 64 KB
Odometry ~400 bytes 64 KB
LIDAR ~30 KB 256 KB
Camera (720p) ~900 KB 1 MB
Camera (compressed) ~50 KB 256 KB

Rule of thumb: Buffer = 4 × largest expected message size


Layer 3: Linux Kernel Tuning

The kernel’s default network settings are conservative. For high-bandwidth robotics, we need to increase buffer sizes.

Key Parameters

# Network buffer maximums (default is often 208KB)
net.core.rmem_max = 16777216     # 16MB receive buffer max
net.core.wmem_max = 16777216     # 16MB send buffer max
net.core.rmem_default = 1048576  # 1MB default receive
net.core.wmem_default = 1048576  # 1MB default send

# TCP-specific tuning
net.ipv4.tcp_rmem = 4096 1048576 16777216  # min/default/max
net.ipv4.tcp_wmem = 4096 1048576 16777216
net.ipv4.tcp_congestion_control = bbr      # Better for WiFi

# Backlog queue (for burst handling)
net.core.netdev_max_backlog = 10000

# Memory management
vm.swappiness = 10  # Keep data in RAM

The apply_tuning.sh Script

We created a one-command tool to apply all tuning:

# Check current settings (safe, no changes)
./apply_tuning.sh --check

# Apply all tuning (requires sudo)
sudo ./apply_tuning.sh --apply

# Make tuning persistent across reboots
sudo ./apply_tuning.sh --persist

# Reset to system defaults
sudo ./apply_tuning.sh --reset

Output of --check:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Network Kernel Parameters
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  Parameter                          Current        Recommended
  ─────────────────────────────────────────────────────────────────────
  net.core.rmem_max                  212992         16777216        ⚠
  net.core.wmem_max                  212992         16777216        ⚠
  net.core.rmem_default              212992         1048576         ⚠
  net.core.wmem_default              212992         1048576         ⚠
  net.core.netdev_max_backlog        1000           10000           ⚠
  net.ipv4.tcp_congestion_control    cubic          bbr             ⚠
  vm.swappiness                      60             10              ⚠

  ⚠  7 parameters below recommended values
  ✓  Run 'sudo ./apply_tuning.sh --apply' to optimize

Bandwidth-Delay Product (BDP)

Network buffers should hold at least one BDP worth of data:

BDP = Bandwidth × Round-Trip-Time
    = 100 Mbps × 10ms
    = 1,000,000 bits
    = 125 KB

For WiFi with variable latency (10-50ms), we size buffers at 10× BDP for safety: ~1.25 MB default, 16 MB maximum.

BBR Congestion Control

The default cubic algorithm was designed for wired networks. bbr (Bottleneck Bandwidth and RTT) is Google’s modern algorithm that:

  • Probes for available bandwidth without causing loss
  • Handles WiFi’s variable latency better
  • Recovers from congestion faster
# Enable BBR
sudo modprobe tcp_bbr
sudo sysctl -w net.ipv4.tcp_congestion_control=bbr

Swappiness

When memory is tight, Linux can swap out process memory to disk. For real-time robotics, we want to minimize this:

  • Default swappiness=60: Aggressive swapping
  • swappiness=10: Only swap under memory pressure
  • swappiness=0: Never swap (risky - can OOM kill)

We use 10 as a safe compromise.


Layer 4: Hardware Tuning

The final layer involves hardware-level optimizations.

NIC Ring Buffers

Network interface cards have hardware buffers. Larger buffers help with burst traffic:

# Check current ring buffer sizes
ethtool -g eth0

# Increase ring buffers (if supported)
sudo ethtool -G eth0 rx 4096 tx 4096

WiFi Power Management

WiFi chips often enable power saving by default, which adds latency:

# Check power management status
iwconfig wlan0 | grep "Power Management"

# Disable power management (reduces latency)
sudo iwconfig wlan0 power off

Impact: Can reduce WiFi latency from 50-100ms spikes to consistent 5-10ms.

USB Settings (for RealSense)

The RealSense D435i connects via USB 3.0. Ensure USB autosuspend is disabled:

# Disable USB autosuspend for all devices
echo -1 | sudo tee /sys/module/usbcore/parameters/autosuspend

Multi-Host Robot Architecture

📋 From ROSCon India 2025 Workshop (ZettaScale)

This section covers optimal Zenoh architecture for robots with multiple compute units.

Modern robots often have multiple compute units (ARM for sensors, x86 for perception, Jetson for AI). Here’s how to configure Zenoh for this scenario:

The Single Router Pattern

ZettaScale Recommendation: One Router Only!

For multi-host robots, it’s better to have ONE router (on the main compute unit) rather than running routers on every host.

┌─────────────────────────────────────────────────────────────────────────┐
│                    MULTI-HOST ROBOT ARCHITECTURE                        │
│                                                                          │
│    AMR (Main Compute)             ARM Board (Sensors)                   │
│    192.168.1.10                   192.168.1.11                          │
│    ────────────────               ────────────────                       │
│    ┌──────────────┐               ┌──────────────┐                      │
│    │  rmw_zenohd  │◄──── TCP ────►│  ROS nodes   │                      │
│    │  (ROUTER)    │               │  (clients)   │                      │
│    └──────┬───────┘               └──────────────┘                      │
│           │                                                              │
│    ┌──────┴───────┐               Jetson (AI)                           │
│    │  Navigation  │               192.168.1.12                          │
│    │  Perception  │               ────────────────                       │
│    │  Planning    │               ┌──────────────┐                      │
│    └──────────────┘               │  Detection   │                      │
│                                   │  SLAM        │                      │
│                                   │  (clients)   │                      │
│                                   └──────┬───────┘                      │
│                                          │                              │
│                                          └────── TCP ───────────────────┘
│                                                         to AMR router    │
└─────────────────────────────────────────────────────────────────────────┘

Configuration: Main Compute (AMR)

// amr_router.json5 - Runs on main compute unit
{
  mode: "router",
  listen: {
    endpoints: [
      "tcp/192.168.1.10:7447",     // Ethernet interface
      "tcp/127.0.0.1:7447"         // Local processes
    ]
  },
  transport: {
    shared_memory: { enabled: true }  // SHM for local processes
  }
}
# On AMR (main compute)
export ZENOH_ROUTER_CONFIG_URI=/configs/amr_router.json5
ros2 run rmw_zenoh_cpp rmw_zenohd

Configuration: ARM / Jetson (Clients)

// arm_client.json5 - Runs on ARM board
{
  mode: "client",
  connect: {
    endpoints: ["tcp/192.168.1.10:7447"]  // Connect to AMR router
  }
}
# On ARM board
export ZENOH_SESSION_CONFIG_URI=/configs/arm_client.json5
ros2 launch realsense2_camera rs_launch.py

Why Single Router?

Aspect Multiple Routers Single Router
Complexity Router federation config Simple star topology
Latency Inter-router hops Direct client-router
Memory Router per host One router total
Debugging Multiple log sources Centralized logging
Failure modes Router sync issues Single point of failure
Single Point of Failure

The single-router pattern means the main compute is critical. For production, consider: - Running the router in a systemd service with auto-restart - Health monitoring with watchdog - Graceful degradation for sensor hosts


Putting It All Together

Here’s the complete optimization workflow:

Step 1: Apply Kernel Tuning (Once)

# On the host machine
sudo ./apply_tuning.sh --apply --persist

Step 2: Choose Zenoh Configuration

# In the container

# For high-bandwidth streaming:
zenoh-bridge-ros2dds -c /workshop3/configs/zenoh_optimized.json5

# For low-latency control:
zenoh-bridge-ros2dds -c /workshop3/configs/zenoh_lowlatency.json5

Step 3: Apply Topic Filtering

# Start bridge with selective topics
zenoh-bridge-ros2dds \
    -c /workshop3/configs/zenoh_optimized.json5 \
    --allow "/odom" \
    --allow "/cmd_vel" \
    --allow "/scan" \
    --deny ".*"

Step 4: Verify with Monitoring

# On host
./host_monitor.sh --bandwidth eth0

# Expected: Network usage matches only bridged topics

Results Summary

Optimization Bandwidth Reduction
Baseline (no optimization) 50 MB/s -
+ Topic filtering 3 MB/s 94%
+ Zenoh optimized config 2.5 MB/s 95%
+ Kernel tuning 2.4 MB/s 95.2%
+ Navigation-only topics 100 KB/s 99.8%

The biggest win is always topic filtering. Everything else is refinement.


What’s Next

In Part 3, we build a real-time web dashboard that visualizes all this data in a browser - with a 13-layer architecture from ROS2 callbacks to Canvas/SVG visualizations.

See Part 1: Interactive Tools for the tooling infrastructure.

See Part 3: Web Dashboard for the monitoring UI.


References