Preparing for Zenoh: Performance Tuning Deep Dive
Part 2 of 3: Achieving 99.8% bandwidth reduction for WiFi-based robotics
TL;DR
- Linux kernel tuning for network buffers: 16MB rx/tx, BBR congestion control
- Zenoh configuration strategies: optimized (throughput) vs low-latency (control)
- Result: 50 MB/s → 100 KB/s = 99.8% bandwidth reduction
apply_tuning.shfor one-command optimization
The Bandwidth Problem
Our Unitree Go2 robot publishes approximately 50 MB/s of sensor data:
| Sensor | Rate | Bandwidth |
|---|---|---|
| RGB Camera | 30 Hz @ 720p | ~27 MB/s |
| Depth Camera | 30 Hz @ 720p | ~18 MB/s |
| LIDAR | 20 Hz | ~2 MB/s |
| IMU | 200 Hz | ~100 KB/s |
| Odometry | 50 Hz | ~50 KB/s |
| Total | ~50 MB/s |
WiFi can’t reliably handle this. Even 802.11ac (WiFi 5) at its theoretical 867 Mbps (~100 MB/s) only achieves 40-60 MB/s real-world throughput under ideal conditions. Add interference, distance, and multiple clients - you’re looking at packet loss, latency spikes, and dropped connections.
We need multi-layer optimization.
The Solution Stack
Performance tuning happens at four layers, each building on the previous:
┌─────────────────────────────────────────────────────────────────┐
│ Layer 4: HARDWARE │
│ NIC ring buffers, WiFi power management │
├─────────────────────────────────────────────────────────────────┤
│ Layer 3: KERNEL │
│ sysctl network parameters, TCP tuning │
├─────────────────────────────────────────────────────────────────┤
│ Layer 2: ZENOH │
│ Buffer sizes, batching, shared memory │
├─────────────────────────────────────────────────────────────────┤
│ Layer 1: APPLICATION │
│ Topic filtering (--allow/--deny) ← BIGGEST IMPACT │
└─────────────────────────────────────────────────────────────────┘
Key insight: Start from the top (application layer). Topic filtering gives you the biggest bang for your buck - often 95%+ reduction before touching anything else.
Layer 1: Topic Filtering (Biggest Impact!)
The Zenoh bridge supports --allow and --deny patterns to filter which topics cross the bridge.
Bandwidth Impact by Configuration
| Configuration | Topics Bridged | Bandwidth | Savings |
|---|---|---|---|
| Unfiltered | All 47 topics | ~50 MB/s | 0% |
| Color + Odom | 2 topics | ~28 MB/s | 44% |
| Navigation Only | /cmd_vel, /odom, /scan | ~3 MB/s | 94% |
| Monitoring Only | /odom, /battery, /imu | ~100 KB/s | 99.8% |
Example: Monitoring Dashboard
# Minimal data for remote monitoring
zenoh-bridge-ros2dds \
--allow "/odom" \
--allow "/battery_state" \
--allow "/imu/data" \
--allow "/diagnostics" \
--deny "/camera/.*" \
--deny "/scan"Filter rules are processed in order. More specific rules should come before general ones:
# CORRECT: Specific allows, then general deny
--allow "/camera/color/compressed" --deny "/camera/.*"
# WRONG: General deny blocks everything
--deny "/camera/.*" --allow "/camera/color/compressed" # Allow never reached!Layer 2: Zenoh Configuration
We provide two pre-configured Zenoh profiles:
High-Throughput: zenoh_optimized.json5
Optimized for bulk sensor data transfer:
{
mode: "router",
transport: {
unicast: {
max_links: 16,
lowlatency: false, // Prioritize throughput
},
link: {
tx: {
batch_size: 65535, // Large batches
queue_size: 256, // Deep queue
wait_before_drop: "1ms"
},
rx: {
buffer_size: 262144, // 256KB receive buffer
max_message_size: 1073741824 // 1GB max
}
}
},
scouting: {
multicast: {
enabled: false // Disable for stability
}
},
shared_memory: {
enabled: true,
size: "16M" // 16MB shared memory pool
}
}
Use case: Streaming camera data, bulk file transfer, high-bandwidth sensor fusion.
Low-Latency: zenoh_lowlatency.json5
Optimized for control commands and real-time telemetry:
{
mode: "client",
transport: {
unicast: {
lowlatency: true, // Enable low-latency mode
},
link: {
tx: {
batch_size: 8192, // Smaller batches
queue_size: 64, // Shallower queue
wait_before_drop: "100us" // Send immediately
},
rx: {
buffer_size: 65536, // 64KB buffer (smaller)
}
}
}
}
Use case: /cmd_vel commands, emergency stops, real-time control loops.
Batching vs Latency Trade-off
Batching enabled (lowlatency: false): - Messages are accumulated into batches before sending - Higher throughput, but adds ~1ms latency - Good for: cameras, LIDAR, bulk data
Batching disabled (lowlatency: true): - Messages sent immediately - Lower throughput, but sub-millisecond latency - Good for: /cmd_vel, emergency_stop, heartbeat
Buffer Sizing
The rx/tx buffer sizes should match your expected message sizes:
| Sensor | Message Size | Recommended Buffer |
|---|---|---|
| IMU | ~200 bytes | 64 KB |
| Odometry | ~400 bytes | 64 KB |
| LIDAR | ~30 KB | 256 KB |
| Camera (720p) | ~900 KB | 1 MB |
| Camera (compressed) | ~50 KB | 256 KB |
Rule of thumb: Buffer = 4 × largest expected message size
Layer 3: Linux Kernel Tuning
The kernel’s default network settings are conservative. For high-bandwidth robotics, we need to increase buffer sizes.
Key Parameters
# Network buffer maximums (default is often 208KB)
net.core.rmem_max = 16777216 # 16MB receive buffer max
net.core.wmem_max = 16777216 # 16MB send buffer max
net.core.rmem_default = 1048576 # 1MB default receive
net.core.wmem_default = 1048576 # 1MB default send
# TCP-specific tuning
net.ipv4.tcp_rmem = 4096 1048576 16777216 # min/default/max
net.ipv4.tcp_wmem = 4096 1048576 16777216
net.ipv4.tcp_congestion_control = bbr # Better for WiFi
# Backlog queue (for burst handling)
net.core.netdev_max_backlog = 10000
# Memory management
vm.swappiness = 10 # Keep data in RAMThe apply_tuning.sh Script
We created a one-command tool to apply all tuning:
# Check current settings (safe, no changes)
./apply_tuning.sh --check
# Apply all tuning (requires sudo)
sudo ./apply_tuning.sh --apply
# Make tuning persistent across reboots
sudo ./apply_tuning.sh --persist
# Reset to system defaults
sudo ./apply_tuning.sh --resetOutput of --check:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Network Kernel Parameters
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Parameter Current Recommended
─────────────────────────────────────────────────────────────────────
net.core.rmem_max 212992 16777216 ⚠
net.core.wmem_max 212992 16777216 ⚠
net.core.rmem_default 212992 1048576 ⚠
net.core.wmem_default 212992 1048576 ⚠
net.core.netdev_max_backlog 1000 10000 ⚠
net.ipv4.tcp_congestion_control cubic bbr ⚠
vm.swappiness 60 10 ⚠
⚠ 7 parameters below recommended values
✓ Run 'sudo ./apply_tuning.sh --apply' to optimize
Bandwidth-Delay Product (BDP)
Network buffers should hold at least one BDP worth of data:
BDP = Bandwidth × Round-Trip-Time
= 100 Mbps × 10ms
= 1,000,000 bits
= 125 KB
For WiFi with variable latency (10-50ms), we size buffers at 10× BDP for safety: ~1.25 MB default, 16 MB maximum.
BBR Congestion Control
The default cubic algorithm was designed for wired networks. bbr (Bottleneck Bandwidth and RTT) is Google’s modern algorithm that:
- Probes for available bandwidth without causing loss
- Handles WiFi’s variable latency better
- Recovers from congestion faster
# Enable BBR
sudo modprobe tcp_bbr
sudo sysctl -w net.ipv4.tcp_congestion_control=bbrSwappiness
When memory is tight, Linux can swap out process memory to disk. For real-time robotics, we want to minimize this:
- Default
swappiness=60: Aggressive swapping swappiness=10: Only swap under memory pressureswappiness=0: Never swap (risky - can OOM kill)
We use 10 as a safe compromise.
Layer 4: Hardware Tuning
The final layer involves hardware-level optimizations.
NIC Ring Buffers
Network interface cards have hardware buffers. Larger buffers help with burst traffic:
# Check current ring buffer sizes
ethtool -g eth0
# Increase ring buffers (if supported)
sudo ethtool -G eth0 rx 4096 tx 4096WiFi Power Management
WiFi chips often enable power saving by default, which adds latency:
# Check power management status
iwconfig wlan0 | grep "Power Management"
# Disable power management (reduces latency)
sudo iwconfig wlan0 power offImpact: Can reduce WiFi latency from 50-100ms spikes to consistent 5-10ms.
USB Settings (for RealSense)
The RealSense D435i connects via USB 3.0. Ensure USB autosuspend is disabled:
# Disable USB autosuspend for all devices
echo -1 | sudo tee /sys/module/usbcore/parameters/autosuspendMulti-Host Robot Architecture
This section covers optimal Zenoh architecture for robots with multiple compute units.
Modern robots often have multiple compute units (ARM for sensors, x86 for perception, Jetson for AI). Here’s how to configure Zenoh for this scenario:
The Single Router Pattern
For multi-host robots, it’s better to have ONE router (on the main compute unit) rather than running routers on every host.
┌─────────────────────────────────────────────────────────────────────────┐
│ MULTI-HOST ROBOT ARCHITECTURE │
│ │
│ AMR (Main Compute) ARM Board (Sensors) │
│ 192.168.1.10 192.168.1.11 │
│ ──────────────── ──────────────── │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ rmw_zenohd │◄──── TCP ────►│ ROS nodes │ │
│ │ (ROUTER) │ │ (clients) │ │
│ └──────┬───────┘ └──────────────┘ │
│ │ │
│ ┌──────┴───────┐ Jetson (AI) │
│ │ Navigation │ 192.168.1.12 │
│ │ Perception │ ──────────────── │
│ │ Planning │ ┌──────────────┐ │
│ └──────────────┘ │ Detection │ │
│ │ SLAM │ │
│ │ (clients) │ │
│ └──────┬───────┘ │
│ │ │
│ └────── TCP ───────────────────┘
│ to AMR router │
└─────────────────────────────────────────────────────────────────────────┘
Configuration: Main Compute (AMR)
// amr_router.json5 - Runs on main compute unit
{
mode: "router",
listen: {
endpoints: [
"tcp/192.168.1.10:7447", // Ethernet interface
"tcp/127.0.0.1:7447" // Local processes
]
},
transport: {
shared_memory: { enabled: true } // SHM for local processes
}
}
# On AMR (main compute)
export ZENOH_ROUTER_CONFIG_URI=/configs/amr_router.json5
ros2 run rmw_zenoh_cpp rmw_zenohdConfiguration: ARM / Jetson (Clients)
// arm_client.json5 - Runs on ARM board
{
mode: "client",
connect: {
endpoints: ["tcp/192.168.1.10:7447"] // Connect to AMR router
}
}
# On ARM board
export ZENOH_SESSION_CONFIG_URI=/configs/arm_client.json5
ros2 launch realsense2_camera rs_launch.pyWhy Single Router?
| Aspect | Multiple Routers | Single Router |
|---|---|---|
| Complexity | Router federation config | Simple star topology |
| Latency | Inter-router hops | Direct client-router |
| Memory | Router per host | One router total |
| Debugging | Multiple log sources | Centralized logging |
| Failure modes | Router sync issues | Single point of failure |
The single-router pattern means the main compute is critical. For production, consider: - Running the router in a systemd service with auto-restart - Health monitoring with watchdog - Graceful degradation for sensor hosts
Link-Level Compression (LZ4)
Zenoh supports transparent link-level compression using LZ4.
Zenoh can compress all traffic at the transport layer - no application changes needed!
How It Works
- Uses LZ4 compression (fast, low CPU overhead)
- Achieves 30-50% bandwidth reduction typically
- Transparent to ROS 2 applications
- Must be enabled on BOTH ends of the Zenoh link
Enable Compression
Via environment variable (quick):
# Enable on client
export ZENOH_CONFIG_OVERRIDE='transport/link/compression/enabled=true'
ros2 run demo_nodes_cpp talker
# Enable on router too!
export ZENOH_CONFIG_OVERRIDE='transport/link/compression/enabled=true'
ros2 run rmw_zenoh_cpp rmw_zenohdVia config file (recommended):
// zenoh_compressed.json5
{
mode: "client",
connect: { endpoints: ["tcp/192.168.1.10:7447"] },
transport: {
link: {
compression: {
enabled: true // LZ4 compression
}
}
}
}
Compression Impact
| Data Type | Raw Size | Compressed | Reduction |
|---|---|---|---|
| Point clouds | 10 MB | 3-4 MB | ~65% |
| LIDAR scans | 30 KB | 15 KB | ~50% |
| Odometry | 400 B | 300 B | ~25% |
| Images (raw) | 2.76 MB | 1.5 MB | ~45% |
| Images (JPEG) | 230 KB | 220 KB | ~5% (already compressed) |
Enable compression for: - Multi-host robots over Ethernet - Cloud connectivity (always!) - Low-bandwidth wireless links
Skip compression for: - Local SHM transport (no benefit) - Already-compressed data (JPEG, H264, ZSTD) - Ultra-low-latency control (adds ~0.1ms)
Putting It All Together
Here’s the complete optimization workflow:
Step 1: Apply Kernel Tuning (Once)
# On the host machine
sudo ./apply_tuning.sh --apply --persistStep 2: Choose Zenoh Configuration
# In the container
# For high-bandwidth streaming:
zenoh-bridge-ros2dds -c /workshop3/configs/zenoh_optimized.json5
# For low-latency control:
zenoh-bridge-ros2dds -c /workshop3/configs/zenoh_lowlatency.json5Step 3: Apply Topic Filtering
# Start bridge with selective topics
zenoh-bridge-ros2dds \
-c /workshop3/configs/zenoh_optimized.json5 \
--allow "/odom" \
--allow "/cmd_vel" \
--allow "/scan" \
--deny ".*"Step 4: Verify with Monitoring
# On host
./host_monitor.sh --bandwidth eth0
# Expected: Network usage matches only bridged topicsResults Summary
| Optimization | Bandwidth | Reduction |
|---|---|---|
| Baseline (no optimization) | 50 MB/s | - |
| + Topic filtering | 3 MB/s | 94% |
| + Zenoh optimized config | 2.5 MB/s | 95% |
| + Kernel tuning | 2.4 MB/s | 95.2% |
| + Navigation-only topics | 100 KB/s | 99.8% |
The biggest win is always topic filtering. Everything else is refinement.
What’s Next
In Part 3, we build a real-time web dashboard that visualizes all this data in a browser - with a 13-layer architecture from ROS2 callbacks to Canvas/SVG visualizations.
See Part 1: Interactive Tools for the tooling infrastructure.
See Part 3: Web Dashboard for the monitoring UI.