Workshop 3 Preview: Advanced Networking (Exercises 6-8)
Part 3 of 3: Wireless Tuning, Congestion Handling, and NAT Traversal
Series Overview
This is Part 3 (final) of the Workshop 3 exercise preview:
| Part | Exercises | Focus |
|---|---|---|
| Part 1 | 1-3 | Fundamentals: Pub/Sub, QoS, Shared Memory |
| Part 2 | 4-5 | Remote: Cloud Router, mTLS Security |
| Part 3 (This Post) | 6-8 | Advanced: Wireless, Congestion, NAT |
Exercise 6: Wireless Performance Tuning
The WiFi Challenge
WiFi introduces challenges that wired Ethernet doesn’t have:
| Challenge | Impact | Mitigation |
|---|---|---|
| Variable latency | 5-100ms jitter | Increase buffers |
| Packet loss | 1-5% typical | Enable reliability |
| Half-duplex | Can’t TX and RX simultaneously | Reduce bandwidth |
| Interference | Other devices, microwaves | Choose channels wisely |
Bandwidth Reality Check
Theoretical vs Real WiFi throughput:
| WiFi Standard | Theoretical | Real-World | With Interference |
|---|---|---|---|
| 802.11n (WiFi 4) | 300 Mbps | 100-150 Mbps | 50-80 Mbps |
| 802.11ac (WiFi 5) | 867 Mbps | 300-500 Mbps | 150-250 Mbps |
| 802.11ax (WiFi 6) | 1200 Mbps | 500-700 Mbps | 300-400 Mbps |
Robot sensor bandwidth:
| Sensor | Bandwidth |
|---|---|
| 720p Camera @ 30 FPS | ~27 MB/s (216 Mbps) |
| Depth Camera @ 30 FPS | ~18 MB/s (144 Mbps) |
| LIDAR @ 20 Hz | ~2 MB/s (16 Mbps) |
| Total | ~47 MB/s (376 Mbps) |
A Go2 robot publishing full sensor data (~47 MB/s) exceeds typical WiFi 5 real-world throughput. Without optimization, you’ll experience: - Packet loss - Latency spikes - Dropped connections
Linux Kernel Tuning
# Increase network buffer sizes
sudo sysctl -w net.core.rmem_max=16777216
sudo sysctl -w net.core.wmem_max=16777216
sudo sysctl -w net.core.rmem_default=1048576
sudo sysctl -w net.core.wmem_default=1048576
# Use BBR congestion control (better for WiFi)
sudo sysctl -w net.ipv4.tcp_congestion_control=bbr
# Increase backlog for burst handling
sudo sysctl -w net.core.netdev_max_backlog=10000Topic Filtering: The Biggest Win
Instead of sending everything, filter at the source:
# Before: All 47 topics bridged = 50 MB/s
zenoh-bridge-ros2dds
# After: Only essential topics = 100 KB/s
zenoh-bridge-ros2dds \
--allow "/cmd_vel" \
--allow "/odom" \
--allow "/battery_state" \
--deny ".*"| Configuration | Bandwidth | Reduction |
|---|---|---|
| All topics | 50 MB/s | - |
| Navigation only | 3 MB/s | 94% |
| Telemetry only | 100 KB/s | 99.8% |
Hands-On Testing: Simulating WiFi Congestion
We tested with a RealSense D435i camera (720p @ 30 FPS) using Linux Traffic Control to simulate WiFi congestion:
# Baseline (no limits)
ros2 topic bw /camera/camera/color/image_raw
# Result: 84-86 MB/s, 30 FPS, 2.76 MB per frame
# Simulate congested WiFi (200 Mbps + 10ms jitter)
tc qdisc add dev lo root netem delay 10ms 5ms rate 200mbit
# Measure again
ros2 topic hz /camera/camera/color/image_raw
# Result: ~5 FPS (83% drop!)With 200 Mbps bandwidth limit, our 720p camera dropped from 30 FPS to just 5 FPS. Why? The camera produces 84 MB/s (672 Mbps), but we only had 200 Mbps available!
The Compression Solution
The D435i publishes multiple formats via image_transport. Here’s what we measured:
| Topic | Size | FPS (Congested) | Compression |
|---|---|---|---|
/image_raw |
2.76 MB | ~5 FPS | 1x (baseline) |
/image_raw/zstd |
1.45 MB | ~12 FPS | 1.9x |
/image_raw/compressed |
0.23 MB | ~30 FPS | 12x smaller! |
/depth/image_rect_raw |
0.81 MB | ~30 FPS | N/A |
The JPEG compressed topic (0.23 MB) achieves full 30 FPS on the same congested network where raw (2.76 MB) only gets 5 FPS. That’s a 12x size reduction with no FPS loss!
Who Does the Compression?
The image_transport ROS 2 package does it automatically! When you launch the camera, these topics appear:
/image_raw- Original uncompressed/image_raw/compressed- JPEG compressed/image_raw/zstd- ZSTD compressed/image_raw/theora- Video codec
No extra configuration needed - just subscribe to /compressed instead of /image_raw.
Traffic Control Quick Reference
# Add bandwidth limit with delay
tc qdisc add dev lo root netem delay 10ms 5ms rate 200mbit
# Remove all tc rules
tc qdisc del dev lo root
# Show current rules
tc qdisc show dev loExercise 7: Congestion Handling
Head-of-Line Blocking
When a network is congested, messages queue up. If one stream blocks, all streams may be affected:
┌─────────────────────────────────────────────────────────────────────────┐
│ HEAD-OF-LINE BLOCKING │
│ │
│ Publishers Queue Network │
│ │
│ /camera ─────► ┌─────────────┐ │
│ (27 MB/s) │ ███████████ │───slow────► WiFi │
│ │ ███████████ │ (blocked) (congested) │
│ /cmd_vel ────► │ ░░░░░░░░░░░ │ │
│ (tiny) └─────────────┘ │
│ ▲ │
│ │ │
│ /cmd_vel is BLOCKED behind /camera data! │
│ Control commands delayed = dangerous for robot! │
└─────────────────────────────────────────────────────────────────────────┘
Priority Queues
Zenoh supports priority levels to ensure critical messages aren’t blocked:
| Priority | Use Case | Example Topics |
|---|---|---|
| Real-time | Emergency, safety | /emergency_stop |
| Interactive | Control commands | /cmd_vel, /arm_control |
| Data | Sensor streams | /camera, /lidar |
| Background | Logs, diagnostics | /rosout, /diagnostics |
Zenoh Priority Configuration
// zenoh_priority.json5
{
// Define priority mappings
qos: {
publication: {
// High priority for control
"rt/cmd_vel": { priority: "real_time" },
"rt/emergency_stop": { priority: "real_time" },
// Normal priority for sensors
"rt/camera/**": { priority: "data" },
"rt/lidar/**": { priority: "data" },
// Low priority for logs
"rt/rosout": { priority: "background" }
}
}
}
Official Zenoh Priority Levels
Zenoh supports these priority levels for controlling delivery and processing order.
| Priority Constant | Value | Use Case |
|---|---|---|
Z_PRIORITY_REAL_TIME |
1 | Emergency stop, safety-critical |
Z_PRIORITY_INTERACTIVE_HIGH |
2 | Control commands (/cmd_vel) |
Z_PRIORITY_INTERACTIVE_LOW |
3 | User interactions |
Z_PRIORITY_DATA_HIGH |
4 | High-priority sensor data |
Z_PRIORITY_DATA |
5 | Normal sensor streams |
Z_PRIORITY_DATA_LOW |
6 | Low-priority telemetry |
Z_PRIORITY_BACKGROUND |
7 | Logs, diagnostics |
Per-Topic Priority in Zenoh Session Config:
// zenoh_session_priority.json5
{
mode: "client",
connect: { endpoints: ["tcp/127.0.0.1:7447"] },
// Per-topic QoS configuration
qos: {
publication: {
// Real-time priority for control commands
"rt/cmd_vel": {
priority: 2, // Z_PRIORITY_INTERACTIVE_HIGH
express: true // Bypass batching for latency-critical!
},
"rt/emergency_stop": {
priority: 1, // Z_PRIORITY_REAL_TIME
express: true
},
// Data priority for sensors
"rt/camera/**": { priority: 5 },
"rt/scan": { priority: 5 }
}
}
}
express Mode: Bypass Batching!
The express: true setting tells Zenoh to send this message immediately without waiting to batch it with other messages. This reduces latency for critical messages at the cost of slightly less efficient network usage.
Use express: true for: - /cmd_vel - Control commands - /emergency_stop - Safety critical - Heartbeat/liveness messages
Flow Control Strategies
| Strategy | Description | When to Use |
|---|---|---|
| Drop oldest | Discard old messages when queue full | Camera streams |
| Drop newest | Discard new messages when queue full | Rarely useful |
| Block sender | Pause publisher until queue has space | Reliable data |
| Sample | Only keep every Nth message | High-rate sensors |
Downsampling: Per-Topic Frequency Control
Zenoh routers can automatically downsample topics when sending to bandwidth-constrained links.
When to Use Downsampling:
- Crossing from wired LAN to wireless (WiFi, GSM, LTE)
- Sending to cloud with limited bandwidth
- Edge computing with limited processing capacity
- Fleet monitoring where full rate isn’t needed
Zenoh Router Downsampling Configuration:
// router_downsample.json5
{
mode: "router",
listen: { endpoints: ["tcp/0.0.0.0:7447"] },
// Downsampling rules for different interfaces
downsampling: [
{
// For wireless interface (low bandwidth)
interfaces: ["tcp/0.0.0.0:7448"], // Wireless client port
rules: [
{
key_exprs: ["rt/camera/**"],
freq: 5 // 30 Hz → 5 Hz
},
{
key_exprs: ["rt/scan"],
freq: 2 // 10 Hz → 2 Hz
},
{
key_exprs: ["rt/odom"],
freq: 10 // Keep at 10 Hz (control needs it)
}
]
}
]
}
Comparison: Downsampling vs Compression vs Filtering:
| Technique | Bandwidth Reduction | Quality Loss | Use Case |
|---|---|---|---|
| Filtering (–allow/–deny) | 100% (blocked) | Complete | Topics not needed remotely |
| Downsampling (freq) | 50-90% | Temporal | Lower frequency acceptable |
| Compression (JPEG/ZSTD) | 80-95% | Minor | Images, point clouds |
| All combined | 99%+ | Varies | WiFi/cellular deployments |
- Filter first - Don’t send topics you don’t need (
--deny) - Compress images - Use
/compressedinstead of/image_raw - Downsample remaining - Reduce frequency for non-critical topics
- Keep control at full rate - Never downsample
/cmd_vel!
Congestion Control: Per-Topic Dropping Policy
System load is highest at startup - important messages can be dropped. Configure per-topic policies!
The Startup Problem:
At robot startup, multiple high-bandwidth topics start publishing simultaneously: - PointCloud (10-50 MB/message) - OccupancyGrid (1-10 MB/message) - Camera images (1-3 MB/message)
This burst can overwhelm the network, causing important messages to be dropped!
ROS 2 QoS + Zenoh Interaction:
| ROS 2 QoS | Zenoh Behavior | Effect |
|---|---|---|
RELIABLE + KEEP_ALL |
Forces reliable channel, always blocks | Publisher waits for ACK |
RELIABLE + KEEP_LAST(n) |
Reliable with limited queue | Old messages dropped |
BEST_EFFORT |
No retries, fast | Messages may be lost |
Per-Topic Congestion Policy Configuration:
// zenoh_congestion.json5
{
mode: "client",
connect: { endpoints: ["tcp/127.0.0.1:7447"] },
qos: {
publication: {
// Camera: Best effort, drop oldest (it's temporal data)
"rt/camera/**": {
congestion_control: "drop",
priority: 5
},
// PointCloud: Allow dropping to prevent blocking
"rt/points": {
congestion_control: "drop",
priority: 5
},
// Control commands: Block until delivered (critical!)
"rt/cmd_vel": {
congestion_control: "block",
priority: 2,
express: true
},
// OccupancyGrid: Drop oldest, keep latest
"rt/map": {
congestion_control: "drop",
priority: 6
}
}
}
}
Congestion Control Options:
congestion_control |
Behavior | Best For |
|---|---|---|
"block" |
Publisher waits until space available | Control commands, must-deliver |
"drop" |
New messages dropped when queue full | Sensor streams, temporal data |
If camera publishers use congestion_control: "block", a slow subscriber can freeze the entire camera pipeline. Always use "drop" for high-bandwidth sensor data.
Hands-On Testing: cmd_vel Under Congestion
We tested with camera (large images) and cmd_vel (tiny control commands) running simultaneously:
# Start cmd_vel publisher
ros2 topic pub /cmd_vel geometry_msgs/msg/Twist \
"{linear: {x: 0.5}, angular: {z: 0.1}}" --rate 10Baseline (No Congestion):
| Topic | Rate | Jitter (std dev) |
|---|---|---|
/cmd_vel |
10 Hz | 0.0003s |
/camera/image_raw |
30 FPS | 0.0005s |
Moderate Congestion (200 Mbps + 10ms delay):
| Topic | Rate | Jitter (std dev) |
|---|---|---|
/cmd_vel |
10 Hz | 0.0035s (10x worse) |
/camera/image_raw |
~5 FPS | 0.04s |
Severe Congestion (50 Mbps + 50ms delay):
| Topic | Rate | Jitter (std dev) | Interval Range |
|---|---|---|---|
/cmd_vel |
10 Hz | 0.022s | 62-135ms (expected: 100ms) |
Surprising finding: cmd_vel (tiny messages) maintained 10 Hz rate even under severe congestion! Unlike classic “head-of-line blocking”, small messages slip through in Zenoh.
BUT timing consistency degrades significantly: - Jitter increased 73x (0.0003s → 0.022s) - Interval variance: ±37ms instead of ±1ms
Impact on Robots: - Jerky movement from inconsistent timing - Control loop instability - Path following errors
Lesson: Priority queues help ensure consistent timing, not just delivery. For safety-critical systems, predictable timing matters as much as throughput.
Exercise 8: NAT Traversal & Namespace Resolution
NAT Traversal Deep Dive
While Exercise 4 introduced cloud routers, Exercise 8 explores direct peer-to-peer connections through NAT.
┌─────────────────────────────────────────────────────────────────────────┐
│ NAT TYPES │
│ │
│ Full Cone NAT Symmetric NAT Port Restricted │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Any host │ │ Specific │ │ Specific │ │
│ │ can reach│ │host:port │ │ port │ │
│ │ mapped │ │ required │ │ required │ │
│ │ port │ │ │ │ │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ Easy to traverse Hard to traverse Medium difficulty │
└─────────────────────────────────────────────────────────────────────────┘
STUN/TURN Concepts
| Protocol | Purpose | How It Works |
|---|---|---|
| STUN | Discover public IP/port | Client asks STUN server “what’s my public address?” |
| TURN | Relay when direct fails | All traffic goes through TURN server |
| ICE | Try all methods | STUN first, fall back to TURN |
┌─────────────────────────────────────────────────────────────────────────┐
│ ICE CONNECTION ESTABLISHMENT │
│ │
│ 1. Both peers query STUN server for public address │
│ 2. Exchange addresses through signaling (Zenoh router) │
│ 3. Try direct connection (STUN-discovered addresses) │
│ 4. If direct fails, use TURN relay │
│ │
│ Robot ◄────── STUN ──────► Server ◄────── STUN ──────► Laptop │
│ │ │ │
│ └──────────────── Direct P2P (if possible) ───────────┘ │
│ or │
│ └─────────────── TURN Relay (fallback) ───────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
Namespace Resolution
When multiple robots connect to the same router, topic collisions occur:
┌─────────────────────────────────────────────────────────────────────────┐
│ NAMESPACE COLLISION PROBLEM │
│ │
│ Robot 1 publishes: /odom │
│ Robot 2 publishes: /odom ──► Which /odom is which?! │
│ Robot 3 publishes: /odom │
└─────────────────────────────────────────────────────────────────────────┘
Solution: Namespace Prefixes
# Robot 1: Prefix all topics with /robot1
ros2 run demo_nodes_cpp talker --ros-args --remap __ns:=/robot1
# Robot 2: Prefix all topics with /robot2
ros2 run demo_nodes_cpp talker --ros-args --remap __ns:=/robot2 --remap __node:=talker2
# Result:
# /robot1/chatter
# /robot2/chatterUsing --ros-args --remap __ns:=/robotX is more reliable than the ROS_NAMESPACE environment variable, especially with rmw_zenoh_cpp.
Zenoh Namespace Mapping
// zenoh_namespace.json5
{
// Map robot-specific namespace to Zenoh key space
ros2dds: {
namespace: "/go2_alpha",
// Topic remapping
pub: {
"/odom": "fleet/go2_alpha/odom",
"/cmd_vel": "fleet/go2_alpha/cmd_vel"
}
}
}
Multi-Robot Fleet Topology
┌─────────────────────────────────────────────────────────────────────────┐
│ FLEET MANAGEMENT TOPOLOGY │
│ │
│ ┌──────────────┐ │
│ │ Fleet Router │ │
│ │ (Cloud) │ │
│ └──────┬───────┘ │
│ │ │
│ ┌──────────────────────┼──────────────────────┐ │
│ │ │ │ │
│ ┌────┴────┐ ┌────┴────┐ ┌────┴────┐ │
│ │ go2_001 │ │ go2_002 │ │ go2_003 │ │
│ │ /odom │ │ /odom │ │ /odom │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │
│ Zenoh Key Space: │
│ fleet/go2_001/odom │
│ fleet/go2_002/odom │
│ fleet/go2_003/odom │
└─────────────────────────────────────────────────────────────────────────┘
Hands-On Testing: Multi-Robot Namespaces
We tested namespace resolution with two simulated robots:
# Start two "robots" with different namespaces
ros2 run demo_nodes_cpp talker --ros-args --remap __ns:=/robot1 &
ros2 run demo_nodes_cpp talker --ros-args --remap __ns:=/robot2 --remap __node:=talker2 &
# Check nodes
ros2 node list
# Output:
# /robot1/talker
# /robot2/talker2
# Check topics
ros2 topic list | grep robot
# Output:
# /robot1/chatter
# /robot2/chatter
# Verify independent streams
ros2 topic echo /robot1/chatter --once
# data: 'Hello World: 27'
ros2 topic echo /robot2/chatter --once
# data: 'Hello World: 29'
| Robot | Node Name | Namespace | Topic |
|---|---|---|---|
| Robot 1 | talker |
/robot1 |
/robot1/chatter |
| Robot 2 | talker2 |
/robot2 |
/robot2/chatter |
Key Findings: - Namespaces prevent topic collisions - essential for multi-robot fleets - Each robot gets isolated topics: /robot1/odom, /robot2/odom, etc. - Fleet monitoring: Subscribe to patterns like /robot*/status
Putting It All Together
The final workshop challenge combines all 8 exercises:
┌─────────────────────────────────────────────────────────────────────────┐
│ COMPLETE WORKSHOP SCENARIO │
│ │
│ Exercise 1-3: Local robot running Zenoh + SHM │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Robot (local) │ │
│ │ ├─ rmw_zenoh_cpp (Ex 1) │ │
│ │ ├─ QoS configured (Ex 2) │ │
│ │ └─ SHM enabled for cameras (Ex 3) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ │ WiFi (Ex 6 tuning) │
│ │ Priority queues (Ex 7) │
│ ▼ │
│ Exercise 4-5: Secure cloud connectivity │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Cloud Router │ │
│ │ ├─ TLS encrypted (Ex 5) │ │
│ │ ├─ mTLS authenticated (Ex 5) │ │
│ │ └─ NAT traversal (Ex 8) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Exercise 8: Multi-robot namespace │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Fleet Dashboard │ │
│ │ ├─ /go2_001/odom │ │
│ │ ├─ /go2_002/odom │ │
│ │ └─ /go2_003/odom │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
Workshop Preparation Summary
What You’ll Have Learned
| Exercise | Key Skill |
|---|---|
| 1 | Zenoh pub/sub and discovery |
| 2 | QoS configuration and compatibility |
| 3 | Shared memory for zero-copy |
| 4 | Cloud router deployment |
| 5 | mTLS security setup |
| 6 | WiFi optimization and filtering |
| 7 | Priority queues for congestion |
| 8 | Multi-robot namespace management |
Final Checklist
Before attending Workshop 3: