Preparing for Zenoh: Performance Tuning Deep Dive

Part 2 of 3: Achieving 99.8% bandwidth reduction for WiFi-based robotics

ros2

zenoh

performance

networking

optimization

Author

Rajesh

Published

December 15, 2025

TL;DR

Linux kernel tuning for network buffers: 16MB rx/tx, BBR congestion control
Zenoh configuration strategies: optimized (throughput) vs low-latency (control)
Result: 50 MB/s → 100 KB/s = 99.8% bandwidth reduction
apply_tuning.sh for one-command optimization

The Bandwidth Problem

Our Unitree Go2 robot publishes approximately 50 MB/s of sensor data:

Sensor	Rate	Bandwidth
RGB Camera	30 Hz @ 720p	~27 MB/s
Depth Camera	30 Hz @ 720p	~18 MB/s
LIDAR	20 Hz	~2 MB/s
IMU	200 Hz	~100 KB/s
Odometry	50 Hz	~50 KB/s
Total		~50 MB/s

WiFi can’t reliably handle this. Even 802.11ac (WiFi 5) at its theoretical 867 Mbps (~100 MB/s) only achieves 40-60 MB/s real-world throughput under ideal conditions. Add interference, distance, and multiple clients - you’re looking at packet loss, latency spikes, and dropped connections.

We need multi-layer optimization.

The Solution Stack

Performance tuning happens at four layers, each building on the previous:

┌─────────────────────────────────────────────────────────────────┐
│  Layer 4: HARDWARE                                              │
│           NIC ring buffers, WiFi power management               │
├─────────────────────────────────────────────────────────────────┤
│  Layer 3: KERNEL                                                │
│           sysctl network parameters, TCP tuning                 │
├─────────────────────────────────────────────────────────────────┤
│  Layer 2: ZENOH                                                 │
│           Buffer sizes, batching, shared memory                 │
├─────────────────────────────────────────────────────────────────┤
│  Layer 1: APPLICATION                                           │
│           Topic filtering (--allow/--deny)     ← BIGGEST IMPACT │
└─────────────────────────────────────────────────────────────────┘

Key insight: Start from the top (application layer). Topic filtering gives you the biggest bang for your buck - often 95%+ reduction before touching anything else.

Layer 1: Topic Filtering (Biggest Impact!)

The Zenoh bridge supports --allow and --deny patterns to filter which topics cross the bridge.

Bandwidth Impact by Configuration

Configuration	Topics Bridged	Bandwidth	Savings
Unfiltered	All 47 topics	~50 MB/s	0%
Color + Odom	2 topics	~28 MB/s	44%
Navigation Only	/cmd_vel, /odom, /scan	~3 MB/s	94%
Monitoring Only	/odom, /battery, /imu	~100 KB/s	99.8%

Example: Navigation Configuration

# Bridge only what navigation needs
zenoh-bridge-ros2dds \
    --allow "/cmd_vel" \
    --allow "/odom" \
    --allow "/scan" \
    --deny ".*"  # Deny everything else

Example: Monitoring Dashboard

# Minimal data for remote monitoring
zenoh-bridge-ros2dds \
    --allow "/odom" \
    --allow "/battery_state" \
    --allow "/imu/data" \
    --allow "/diagnostics" \
    --deny "/camera/.*" \
    --deny "/scan"

Order Matters!

Filter rules are processed in order. More specific rules should come before general ones:

# CORRECT: Specific allows, then general deny
--allow "/camera/color/compressed" --deny "/camera/.*"

# WRONG: General deny blocks everything
--deny "/camera/.*" --allow "/camera/color/compressed"  # Allow never reached!

Layer 2: Zenoh Configuration

We provide two pre-configured Zenoh profiles:

High-Throughput: `zenoh_optimized.json5`

Optimized for bulk sensor data transfer:

{
  mode: "router",
  transport: {
    unicast: {
      max_links: 16,
      lowlatency: false,  // Prioritize throughput
    },
    link: {
      tx: {
        batch_size: 65535,      // Large batches
        queue_size: 256,        // Deep queue
        wait_before_drop: "1ms"
      },
      rx: {
        buffer_size: 262144,    // 256KB receive buffer
        max_message_size: 1073741824  // 1GB max
      }
    }
  },
  scouting: {
    multicast: {
      enabled: false  // Disable for stability
    }
  },
  shared_memory: {
    enabled: true,
    size: "16M"  // 16MB shared memory pool
  }
}

Use case: Streaming camera data, bulk file transfer, high-bandwidth sensor fusion.

Low-Latency: `zenoh_lowlatency.json5`

Optimized for control commands and real-time telemetry:

{
  mode: "client",
  transport: {
    unicast: {
      lowlatency: true,  // Enable low-latency mode
    },
    link: {
      tx: {
        batch_size: 8192,       // Smaller batches
        queue_size: 64,         // Shallower queue
        wait_before_drop: "100us"  // Send immediately
      },
      rx: {
        buffer_size: 65536,     // 64KB buffer (smaller)
      }
    }
  }
}

Use case: /cmd_vel commands, emergency stops, real-time control loops.

Deep Dive: Transport Configuration

Batching vs Latency Trade-off

Batching enabled (lowlatency: false): - Messages are accumulated into batches before sending - Higher throughput, but adds ~1ms latency - Good for: cameras, LIDAR, bulk data

Batching disabled (lowlatency: true): - Messages sent immediately - Lower throughput, but sub-millisecond latency - Good for: /cmd_vel, emergency_stop, heartbeat

Buffer Sizing

The rx/tx buffer sizes should match your expected message sizes:

Sensor	Message Size	Recommended Buffer
IMU	~200 bytes	64 KB
Odometry	~400 bytes	64 KB
LIDAR	~30 KB	256 KB
Camera (720p)	~900 KB	1 MB
Camera (compressed)	~50 KB	256 KB

Rule of thumb: Buffer = 4 × largest expected message size

Layer 3: Linux Kernel Tuning

The kernel’s default network settings are conservative. For high-bandwidth robotics, we need to increase buffer sizes.

Key Parameters

# Network buffer maximums (default is often 208KB)
net.core.rmem_max = 16777216     # 16MB receive buffer max
net.core.wmem_max = 16777216     # 16MB send buffer max
net.core.rmem_default = 1048576  # 1MB default receive
net.core.wmem_default = 1048576  # 1MB default send

# TCP-specific tuning
net.ipv4.tcp_rmem = 4096 1048576 16777216  # min/default/max
net.ipv4.tcp_wmem = 4096 1048576 16777216
net.ipv4.tcp_congestion_control = bbr      # Better for WiFi

# Backlog queue (for burst handling)
net.core.netdev_max_backlog = 10000

# Memory management
vm.swappiness = 10  # Keep data in RAM

The `apply_tuning.sh` Script

We created a one-command tool to apply all tuning:

# Check current settings (safe, no changes)
./apply_tuning.sh --check

# Apply all tuning (requires sudo)
sudo ./apply_tuning.sh --apply

# Make tuning persistent across reboots
sudo ./apply_tuning.sh --persist

# Reset to system defaults
sudo ./apply_tuning.sh --reset

Output of --check:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Network Kernel Parameters
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  Parameter                          Current        Recommended
  ─────────────────────────────────────────────────────────────────────
  net.core.rmem_max                  212992         16777216        ⚠
  net.core.wmem_max                  212992         16777216        ⚠
  net.core.rmem_default              212992         1048576         ⚠
  net.core.wmem_default              212992         1048576         ⚠
  net.core.netdev_max_backlog        1000           10000           ⚠
  net.ipv4.tcp_congestion_control    cubic          bbr             ⚠
  vm.swappiness                      60             10              ⚠

  ⚠  7 parameters below recommended values
  ✓  Run 'sudo ./apply_tuning.sh --apply' to optimize

Deep Dive: Why These Values?

Bandwidth-Delay Product (BDP)

Network buffers should hold at least one BDP worth of data:

BDP = Bandwidth × Round-Trip-Time
    = 100 Mbps × 10ms
    = 1,000,000 bits
    = 125 KB

For WiFi with variable latency (10-50ms), we size buffers at 10× BDP for safety: ~1.25 MB default, 16 MB maximum.

BBR Congestion Control

The default cubic algorithm was designed for wired networks. bbr (Bottleneck Bandwidth and RTT) is Google’s modern algorithm that:

Probes for available bandwidth without causing loss
Handles WiFi’s variable latency better
Recovers from congestion faster

# Enable BBR
sudo modprobe tcp_bbr
sudo sysctl -w net.ipv4.tcp_congestion_control=bbr

Swappiness

When memory is tight, Linux can swap out process memory to disk. For real-time robotics, we want to minimize this:

Default swappiness=60: Aggressive swapping
swappiness=10: Only swap under memory pressure
swappiness=0: Never swap (risky - can OOM kill)

We use 10 as a safe compromise.

Layer 4: Hardware Tuning

The final layer involves hardware-level optimizations.

NIC Ring Buffers

Network interface cards have hardware buffers. Larger buffers help with burst traffic:

# Check current ring buffer sizes
ethtool -g eth0

# Increase ring buffers (if supported)
sudo ethtool -G eth0 rx 4096 tx 4096

WiFi Power Management

WiFi chips often enable power saving by default, which adds latency:

# Check power management status
iwconfig wlan0 | grep "Power Management"

# Disable power management (reduces latency)
sudo iwconfig wlan0 power off

Impact: Can reduce WiFi latency from 50-100ms spikes to consistent 5-10ms.

USB Settings (for RealSense)

The RealSense D435i connects via USB 3.0. Ensure USB autosuspend is disabled:

# Disable USB autosuspend for all devices
echo -1 | sudo tee /sys/module/usbcore/parameters/autosuspend

Multi-Host Robot Architecture

📋 From ROSCon India 2025 Workshop (ZettaScale)

This section covers optimal Zenoh architecture for robots with multiple compute units.

Modern robots often have multiple compute units (ARM for sensors, x86 for perception, Jetson for AI). Here’s how to configure Zenoh for this scenario:

The Single Router Pattern

ZettaScale Recommendation: One Router Only!

For multi-host robots, it’s better to have ONE router (on the main compute unit) rather than running routers on every host.

┌─────────────────────────────────────────────────────────────────────────┐
│                    MULTI-HOST ROBOT ARCHITECTURE                        │
│                                                                          │
│    AMR (Main Compute)             ARM Board (Sensors)                   │
│    192.168.1.10                   192.168.1.11                          │
│    ────────────────               ────────────────                       │
│    ┌──────────────┐               ┌──────────────┐                      │
│    │  rmw_zenohd  │◄──── TCP ────►│  ROS nodes   │                      │
│    │  (ROUTER)    │               │  (clients)   │                      │
│    └──────┬───────┘               └──────────────┘                      │
│           │                                                              │
│    ┌──────┴───────┐               Jetson (AI)                           │
│    │  Navigation  │               192.168.1.12                          │
│    │  Perception  │               ────────────────                       │
│    │  Planning    │               ┌──────────────┐                      │
│    └──────────────┘               │  Detection   │                      │
│                                   │  SLAM        │                      │
│                                   │  (clients)   │                      │
│                                   └──────┬───────┘                      │
│                                          │                              │
│                                          └────── TCP ───────────────────┘
│                                                         to AMR router    │
└─────────────────────────────────────────────────────────────────────────┘

Configuration: Main Compute (AMR)

// amr_router.json5 - Runs on main compute unit
{
  mode: "router",
  listen: {
    endpoints: [
      "tcp/192.168.1.10:7447",     // Ethernet interface
      "tcp/127.0.0.1:7447"         // Local processes
    ]
  },
  transport: {
    shared_memory: { enabled: true }  // SHM for local processes
  }
}

# On AMR (main compute)
export ZENOH_ROUTER_CONFIG_URI=/configs/amr_router.json5
ros2 run rmw_zenoh_cpp rmw_zenohd

Configuration: ARM / Jetson (Clients)

// arm_client.json5 - Runs on ARM board
{
  mode: "client",
  connect: {
    endpoints: ["tcp/192.168.1.10:7447"]  // Connect to AMR router
  }
}

# On ARM board
export ZENOH_SESSION_CONFIG_URI=/configs/arm_client.json5
ros2 launch realsense2_camera rs_launch.py

Why Single Router?

Aspect	Multiple Routers	Single Router
Complexity	Router federation config	Simple star topology
Latency	Inter-router hops	Direct client-router
Memory	Router per host	One router total
Debugging	Multiple log sources	Centralized logging
Failure modes	Router sync issues	Single point of failure

Single Point of Failure

The single-router pattern means the main compute is critical. For production, consider: - Running the router in a systemd service with auto-restart - Health monitoring with watchdog - Graceful degradation for sensor hosts

Link-Level Compression (LZ4)

📋 From ROSCon India 2025 Workshop (ZettaScale)

Zenoh supports transparent link-level compression using LZ4.

Zenoh can compress all traffic at the transport layer - no application changes needed!

How It Works

Uses LZ4 compression (fast, low CPU overhead)
Achieves 30-50% bandwidth reduction typically
Transparent to ROS 2 applications
Must be enabled on BOTH ends of the Zenoh link

Enable Compression

Via environment variable (quick):

# Enable on client
export ZENOH_CONFIG_OVERRIDE='transport/link/compression/enabled=true'
ros2 run demo_nodes_cpp talker

# Enable on router too!
export ZENOH_CONFIG_OVERRIDE='transport/link/compression/enabled=true'
ros2 run rmw_zenoh_cpp rmw_zenohd

Via config file (recommended):

// zenoh_compressed.json5
{
  mode: "client",
  connect: { endpoints: ["tcp/192.168.1.10:7447"] },
  transport: {
    link: {
      compression: {
        enabled: true  // LZ4 compression
      }
    }
  }
}

Compression Impact

Data Type	Raw Size	Compressed	Reduction
Point clouds	10 MB	3-4 MB	~65%
LIDAR scans	30 KB	15 KB	~50%
Odometry	400 B	300 B	~25%
Images (raw)	2.76 MB	1.5 MB	~45%
Images (JPEG)	230 KB	220 KB	~5% (already compressed)

When to Use Link Compression

Enable compression for: - Multi-host robots over Ethernet - Cloud connectivity (always!) - Low-bandwidth wireless links

Skip compression for: - Local SHM transport (no benefit) - Already-compressed data (JPEG, H264, ZSTD) - Ultra-low-latency control (adds ~0.1ms)

Putting It All Together

Here’s the complete optimization workflow:

Step 1: Apply Kernel Tuning (Once)

# On the host machine
sudo ./apply_tuning.sh --apply --persist

Step 2: Choose Zenoh Configuration

# In the container

# For high-bandwidth streaming:
zenoh-bridge-ros2dds -c /workshop3/configs/zenoh_optimized.json5

# For low-latency control:
zenoh-bridge-ros2dds -c /workshop3/configs/zenoh_lowlatency.json5

Step 3: Apply Topic Filtering

# Start bridge with selective topics
zenoh-bridge-ros2dds \
    -c /workshop3/configs/zenoh_optimized.json5 \
    --allow "/odom" \
    --allow "/cmd_vel" \
    --allow "/scan" \
    --deny ".*"

Step 4: Verify with Monitoring

# On host
./host_monitor.sh --bandwidth eth0

# Expected: Network usage matches only bridged topics

Results Summary

Optimization	Bandwidth	Reduction
Baseline (no optimization)	50 MB/s	-
+ Topic filtering	3 MB/s	94%
+ Zenoh optimized config	2.5 MB/s	95%
+ Kernel tuning	2.4 MB/s	95.2%
+ Navigation-only topics	100 KB/s	99.8%

The biggest win is always topic filtering. Everything else is refinement.

What’s Next

In Part 3, we build a real-time web dashboard that visualizes all this data in a browser - with a 13-layer architecture from ROS2 callbacks to Canvas/SVG visualizations.

See Part 1: Interactive Tools for the tooling infrastructure.

See Part 3: Web Dashboard for the monitoring UI.