Building a Modern Offline ROS 2 Workshop Infrastructure: ROSCon India 2025
╔═══════════════════════════════════════════════════════════════════════════════╗
║ ║
║ ██████╗ ██████╗ ███████╗ ██████╗ ██████╗ ███╗ ██╗ ██╗███╗ ██╗ ║
║ ██╔══██╗██╔═══██╗██╔════╝██╔════╝██╔═══██╗████╗ ██║ ██║████╗ ██║ ║
║ ██████╔╝██║ ██║███████╗██║ ██║ ██║██╔██╗ ██║ ██║██╔██╗ ██║ ║
║ ██╔══██╗██║ ██║╚════██║██║ ██║ ██║██║╚██╗██║ ██║██║╚██╗██║ ║
║ ██║ ██║╚██████╔╝███████║╚██████╗╚██████╔╝██║ ╚████║ ██║██║ ╚████║ ║
║ ╚═╝ ╚═╝ ╚═════╝ ╚══════╝ ╚═════╝ ╚═════╝ ╚═╝ ╚═══╝ ╚═╝╚═╝ ╚═══╝ ║
║ ║
║ India 2025 Workshop - Container Launcher ║
║ December 18-20, COEP Pune ║
║ ║
╚═══════════════════════════════════════════════════════════════════════════════╝
TL;DR
We built a production-grade Docker infrastructure for ROSCon India 2025 workshops (Dec 18-20, COEP Pune):
- 7 Docker images for offline ROS 2 workshops
- Modern Gazebo Sim (NOT Classic - avoided EOL trap!)
- GPU acceleration + Zenoh bridging + complete Nav2 stack
- 100% offline-capable with bundled models
- Multi-role certification process for quality assurance
Key Innovation: Discovered TurtleBot3 uses deprecated Gazebo Classic (EOL Jan 2025) at Phase 7, pivoted to ros-gz-sim-demos before committing to 80GB of offline images. This saved the workshop from teaching deprecated technology.
Why I Built This: As a workshop attendee, I wanted to arrive prepared—not scrambling to install dependencies while the instructor moves on. This setup gives me multiple environments ready to go, so I can focus on learning and experimenting rather than troubleshooting.
The Problem
Context: ROSCon India 2025 marks the first ROSCon in India, happening December 18-20 at COEP Pune. I’m attending two hands-on workshops:
- Workshop 3: Zenoh Networking (DDS ↔︎ Zenoh bridging over WiFi)
- Workshop 4: IMU Perception with Visual SLAM
What I Wanted to Prepare For:
- No Internet Dependency: I didn’t want to be stuck downloading packages while the workshop moved on
- GPU-Ready: My RTX 5090 should be ready for VSLAM and NVBlox demos out of the box
- Multiple Middleware Options: Both CycloneDDS and Zenoh containers, so I can follow either track
- Real Hardware Ready: RealSense D435i camera and IMU working before I arrive
- Experiment Freely: Extra environments to try things without breaking the workshop setup
- Modern Stack: Discovered TurtleBot3 uses deprecated Gazebo Classic—pivoted to modern Gazebo Sim
Architecture Overview
Three-Tier Docker Image Strategy
TIER 1: Base Images (pulled from registries)
├── nvcr.io/nvidia/isaac/ros:x86_64-ros2_humble (~22GB)
└── osrf/ros:jazzy-desktop-full (~3.5GB)
TIER 2: Custom Base Images (build locally)
├── isaac-ros-base:humble (VSLAM, NVBlox, RealSense, Go2 SDK, Nav2, ros-gz-sim-demos)
└── jazzy-base:latest (MuJoCo, Claude Code, Playwright, Nav2, ros-gz-sim-demos)
TIER 3: Workshop Images (docker-compose)
├── workshop3-humble-dds (CycloneDDS + Zenoh Bridge)
├── workshop3-jazzy-zenoh (rmw_zenoh via apt)
├── workshop3-humble-zenoh (rmw_zenoh source - BROKEN, needs Rust)
├── workshop4-imu (IMU tools, robot_localization)
└── robot-humble (Jetson communication)
Two-Track Rationale
Track A: NVIDIA Isaac ROS (Humble) - GPU-accelerated perception (VSLAM, NVBlox) - CycloneDDS 0.10.2 (required by Unitree Go2 SDK) - Mature ecosystem for robotics
Track B: OSRF Jazzy - Latest ROS 2 LTS - rmw_zenoh available via apt (no source build needed) - Cleaner for Zenoh workshop demos
Why Not One Image? NVIDIA base image is 22GB (overkill for Zenoh demos). Different RMW implementations require separation.
The Docker Images
| Image | Base | Size | Purpose | Status |
|---|---|---|---|---|
| isaac-ros-base | NVIDIA Isaac ROS Humble | 24.1GB | GPU perception, Go2 SDK, Nav2, ros-gz-sim-demos | ✅ Production |
| jazzy-base | OSRF Jazzy | 2.85GB | Latest ROS 2, rmw_zenoh, ros-gz-sim-demos | ✅ Production |
| workshop3-humble-dds | isaac-ros-base | 24.2GB | CycloneDDS + zenoh-bridge v1.7.1 | ✅ Primary W3 |
| workshop3-jazzy-zenoh | jazzy-base | 2.95GB | Native rmw_zenoh v0.2.9 | ✅ Primary W3 (alt) |
| workshop3-humble-zenoh | isaac-ros-base | 24.2GB | rmw_zenoh source (needs Rust) | ❌ Broken (known) |
| workshop4-imu | isaac-ros-base | 24.2GB | imu_tools, robot_localization, rtabmap | ✅ Production |
| robot-humble | isaac-ros-base | 24.1GB | Same as base (Jetson comms) | ✅ Production |
Total Disk: ~150GB uncompressed → ~80-95GB compressed (zstd -19)
Key Technical Details
Choosing Modern Gazebo Over TurtleBot3
While setting up TurtleBot3 (the standard tutorial robot), I noticed this warning:
ros2 launch turtlebot3_gazebo empty_world.launch.py
# [WARN] Gazebo Classic is end-of-life. Please migrate to new Gazebo.Gazebo Classic EOL is January 31, 2025—just 6 weeks after the workshop. I switched to ros-gz-sim-demos instead:
ros2 launch ros_gz_sim_demos diff_drive.launch.py
# Uses modern Gazebo Sim - actively maintainedBonus: The models are bundled in the Debian package—no internet needed to download from Gazebo Fuel:
/opt/ros/jazzy/share/ros_gz_sim_demos/models/vehicle/
/opt/ros/jazzy/share/ros_gz_sim_demos/worlds/vehicle.sdfDDS ↔︎ Zenoh Bridging Architecture
Use Case: Robot (Jetson) with CycloneDDS → Laptop with rmw_zenoh
Robot (Jetson Orin) WiFi Laptop
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ CycloneDDS │◄───────►│ zenoh-bridge │◄───────►│ rmw_zenoh │
│ (can't change│ Domain │ -ros2dds │ Zenoh │ (efficient) │
│ middleware) │ 99 │ │ Router │ │
└──────────────┘ └──────────────┘ └──────────────┘
Why Bridge Instead of Native Zenoh? - Robot firmware uses CycloneDDS (can’t change) - Zenoh more efficient over WiFi (bandwidth savings) - Bridge allows gradual migration
Configuration:
# Terminal 1: Router mode (laptop)
zenoh-bridge-ros2dds -m router -d 99
# Terminal 2: Peer mode (another laptop/node)
zenoh-bridge-ros2dds -m peer -d 88Tested Results: 15 consecutive messages bridged successfully, latency <10ms.
GPU Sharing Across Containers
Hardware: NVIDIA RTX 5090 (24GB VRAM)
Solution:
# docker-compose.yml
services:
workshop3-dds:
runtime: nvidia
environment:
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=allVerified:
# Inside container
nvidia-smi # Shows RTX 5090
glxinfo | grep "direct rendering" # YesUse Cases: VSLAM, NVBlox, Gazebo Sim rendering, RViz2 visualization.
RTX 5090 Hardware Verification: When Newer Isn’t Supported (Yet It Works)
Context: Phase 7.7 tested NVIDIA RTX 5090 (Blackwell architecture, Compute Capability 12.0). The GPU was so new, official documentation didn’t list it.
The Challenge:
Hardware: - RTX 5090 (released Dec 2024, Blackwell architecture) - Compute Capability 12.0 (newest available) - CUDA 12.6+ required officially
Installed Software: - Host driver: 570.86.16 (supports CUDA 12.8) - Isaac ROS base image: Built for CUDA 12.2 - PyTorch containers: Unknown CUDA version support
The Question: Will Compute Capability 12.0 work with CUDA 12.2 containers via driver forward compatibility?
Research Phase:
Official NVIDIA documentation listed RTX 40 series (Ada Lovelace, Compute 8.9) and RTX 30 series (Ampere, Compute 8.6), but RTX 5090 NOT listed for CUDA 12.2.
Driver Forward Compatibility Hypothesis: - NVIDIA drivers support NEWER CUDA toolkits - But can they support NEWER GPU architectures? - Blackwell (Compute 12.0) with CUDA 12.2 containers?
The Test:
Test 1: Basic GPU Detection
docker compose run --rm workshop3-dds bash
# Inside container:
nvidia-smi
# Output:
+------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.16 Driver Version: 570.86.16 CUDA Version: 12.8 |
|-------------------------------+----------------------+----------------------+
| 0 NVIDIA GeForce RTX 5090 | 00000000:01:00.0 Off | Off |
+------------------------------------------------------------------------------+✅ PASS - GPU detected
Test 2: CUDA Toolkit Compatibility
nvcc --version
# Output:
Cuda compilation tools, release 12.2, V12.2.140Test 3: GPU Compute Capability Query
python3 << EOF
import torch
print(f"PyTorch CUDA available: {torch.cuda.is_available()}")
print(f"GPU name: {torch.cuda.get_device_name(0)}")
print(f"Compute capability: {torch.cuda.get_device_capability(0)}")
EOF
# Output:
PyTorch CUDA available: True
GPU name: NVIDIA GeForce RTX 5090
Compute capability: (12, 0) ← Blackwell architecture confirmed!✅ PASS - PyTorch recognizes Compute 12.0
Test 4: Real Workload (PyTorch CUDA 12.8)
Challenge: Need to verify with CUDA 12.8 (matches driver). Image: pytorch/pytorch:2.7.0-cuda12.8-cudnn9-runtime (12.2GB).
The Download Drama:
docker pull pytorch/pytorch:2.7.0-cuda12.8-cudnn9-runtime &
# 1 hour 20 minutes later... is it stuck?Network Bandwidth Monitoring Discovery:
#!/bin/bash
# Created /tmp/check_bandwidth.sh to debug "stuck" download
IFACE=$(ip route get 8.8.8.8 | grep -oP 'dev \K\S+' | head -1)
RX1=$(cat /proc/net/dev | grep "$IFACE" | awk '{print $2}')
sleep 5
RX2=$(cat /proc/net/dev | grep "$IFACE" | awk '{print $2}')
DIFF=$((RX2 - RX1))
SPEED_MB_PER_SEC=$((DIFF / 5 / 1024 / 1024))
echo "Download Speed: $SPEED_MB_PER_SEC MB/s"
# Result: 23 MB/s ✅Not stuck! Just a large image. Patience required.
The PyTorch Test:
docker run --rm --gpus all pytorch/pytorch:2.7.0-cuda12.8-cudnn9-runtime \
python -c "
import torch
print(f'PyTorch version: {torch.__version__}')
print(f'CUDA available: {torch.cuda.is_available()}')
print(f'CUDA version: {torch.version.cuda}')
print(f'GPU name: {torch.cuda.get_device_name(0)}')
print(f'Compute capability: {torch.cuda.get_device_capability(0)}')
# GPU computation test
x = torch.randn(1000, 1000, device='cuda')
y = torch.matmul(x, x.T)
print(f'GPU computation: ✅ {y.shape}')
"
# Output:
PyTorch version: 2.7.0
CUDA available: True
CUDA version: 12.8
GPU count: 1
GPU name: NVIDIA GeForce RTX 5090
Compute capability: (12, 0)
GPU computation: ✅ torch.Size([1000, 1000])✅ PASS - RTX 5090 Compute 12.0 works with CUDA 12.8!
The Verdict:
Driver Forward Compatibility DOES Support Newer GPUs! - Host driver 570.86.16 provides CUDA 12.8 runtime - CUDA 12.2 containers work via driver compatibility - CUDA 12.8 containers work natively - Compute Capability 12.0 (Blackwell) fully supported
What We Learned: 1. Driver version > Container CUDA version enables compatibility 2. Network patience debugging: Check bandwidth before assuming “stuck” 3. Test bleeding-edge hardware, don’t assume based on docs alone
Updated Compatibility Matrix:
| Component | Installed | Tested With | Status |
|---|---|---|---|
| Host Driver | 570.86.16 | - | ✅ CUDA 12.8 runtime |
| Isaac ROS Base | CUDA 12.2 | RTX 5090 (Compute 12.0) | ✅ Works via driver |
| PyTorch Container | CUDA 12.8 | RTX 5090 (Compute 12.0) | ✅ Native support |
| Workshop Containers | CUDA 12.2 | RTX 5090 (Compute 12.0) | ✅ All GPU features work |
Offline-First Design
Challenge: Workshop has no reliable internet.
Strategy:
- Pre-install Everything: All ROS 2 packages via apt, Python deps via pip, Gazebo models bundled
- Shared Caches:
volumes:
- ./cache/pip:/root/.cache/pip
- ./cache/colcon:/root/.colcon
- ./cache/ignition:/root/.ignition
- ./cache/gz:/root/.gz- Docker Image Saves:
docker save isaac-ros-base:humble | zstd -T0 -19 > offline/isaac-ros-base.tar.zst
# 24.1GB → ~12-15GB compressed- Verification:
docker run --network none jazzy-base:latest bash -c \
"ros2 launch ros_gz_sim_demos diff_drive.launch.py"
# ✅ Works perfectly!Offline Test Results: All demos launch without internet, models load from bundled package.
Build & Test Automation
Makefile Targets:
make all # Build everything (~60-90 min first time)
make base # Build TIER 2 base images (~50 min)
make workshop3 # Build all 3 workshop3 variants (~5 min)
make test # Smoke tests (ROS2, Gazebo, Zenoh)
make offline-save # Create compressed tars (~2 hours, 80-95GB)
make offline-load # Load from tars
make status # Show built images and sizesBuild Features:
- Timestamp + Git Hash Tagging:
IMAGE_TAG := 20251214-103538-a177e48(enables rollback) - Parallel Builds: Build bases concurrently where possible
- Smoke Tests: Automated verification
Build Times (with cache): - isaac-ros-base: ~2 min (first: ~35-45 min) - jazzy-base: ~5 min (first: ~10-15 min) - workshop3-*: ~20 sec each - Total rebuild: ~7 min (vs ~60 min fresh)
Making It Easy to Use
I didn’t want to type 15-flag docker commands every time I launch a container. So I wrote two helper scripts.
launch-container.sh
Instead of this:
docker run --rm -it --name roscon-workshop3-dds --hostname workshop3-dds \
--runtime nvidia --gpus all --network host --privileged \
-e DISPLAY=$DISPLAY -e QT_X11_NO_MITSHM=1 -e NVIDIA_VISIBLE_DEVICES=all \
-e NVIDIA_DRIVER_CAPABILITIES=all -e RMW_IMPLEMENTATION=rmw_cyclonedds_cpp \
-e CYCLONEDDS_URI=file:///config/cyclonedds.xml \
-v /tmp/.X11-unix:/tmp/.X11-unix:rw \
-v $HOME/.Xauthority:/root/.Xauthority:rw \
-v ./workspaces:/workspaces:rw -v ./configs:/config:ro \
workshop3-humble-dds:latest bashI just type:
./launch-container.sh 1The script shows a menu with all available containers:
╔═══════════════════════════════════════════════════════════════════════════════╗
║ ║
║ ██████╗ ██████╗ ███████╗ ██████╗ ██████╗ ███╗ ██╗ ██╗███╗ ██╗ ║
║ ██╔══██╗██╔═══██╗██╔════╝██╔════╝██╔═══██╗████╗ ██║ ██║████╗ ██║ ║
║ ██████╔╝██║ ██║███████╗██║ ██║ ██║██╔██╗ ██║ ██║██╔██╗ ██║ ║
║ ██╔══██╗██║ ██║╚════██║██║ ██║ ██║██║╚██╗██║ ██║██║╚██╗██║ ║
║ ██║ ██║╚██████╔╝███████║╚██████╗╚██████╔╝██║ ╚████║ ██║██║ ╚████║ ║
║ ╚═╝ ╚═╝ ╚═════╝ ╚══════╝ ╚═════╝ ╚═════╝ ╚═╝ ╚═══╝ ╚═╝╚═╝ ╚═══╝ ║
║ ║
║ India 2025 Workshop - Container Launcher ║
║ December 18-20, COEP Pune ║
║ ║
╚═══════════════════════════════════════════════════════════════════════════════╝
Select a container to launch:
1) ● workshop3-dds [humble] CycloneDDS + Zenoh Bridge
└─ workshop3-humble-dds:latest (24.2GB)
2) ● workshop3-jazzy [jazzy] ROS 2 Jazzy + rmw_zenoh
└─ workshop3-jazzy-zenoh:latest (2.95GB)
3) ○ workshop3-humble [humble] ROS 2 Humble + rmw_zenoh (partial)
└─ workshop3-humble-zenoh:latest (not built)
4) ● workshop4-imu [humble] IMU tools + VSLAM
└─ workshop4-imu:latest (24.2GB)
5) ● robot-humble [humble] Jetson communication
└─ robot-humble:latest (24.1GB)
Enter selection (1-5), or [r]efresh / [s]tatus / [c]leanup / [q]uit:
Green dots (●) mean built and ready. Red circles (○) mean not built yet.
new-terminal.sh
When I need another terminal in the same container:
./scripts/new-terminal.sh 1 # Connect to workshop3-dds
./scripts/new-terminal.sh dds # Fuzzy match works too
./scripts/new-terminal.sh # Auto-connects if only one runningNo more docker ps → copy container ID → docker exec. ROS is already sourced when the terminal opens.
What I Learned
Things That Paid Off
- Three-tier Docker strategy - Base images rebuild rarely, workshop images rebuild fast
- Testing before committing to offline images - Caught the Gazebo Classic EOL before baking 80GB of deprecated software
- Shared caches - 75% disk savings, and my containers stay in sync
- Writing helper scripts -
launch-container.shandnew-terminal.shsave me from typing 15-flag docker commands
Things I’d Do Earlier Next Time
- Check EOL dates first - Would have avoided TurtleBot3 entirely
- Test offline from day one - Run
docker run --network noneearly - Note the naming differences -
ignvsgzcost me debugging time
Surprising Discoveries
- Models bundled in apt packages -
ros-gz-sim-demosincludes everything, no Fuel downloads needed - ROS_LOCALHOST_ONLY matters - Without it, my containers couldn’t see RealSense topics
- HID devices need special rules - IMU access requires
device_cgroup_rulesin docker-compose - Volume mounts can override installed files - Fixed upstream launch file bugs without rebuilding
- RTX 5090 just works - Driver forward compatibility handled Blackwell (Compute 12.0) even with CUDA 12.2 containers
If You Want to Do This Too
Here’s what I’d recommend:
- Pick your ROS 2 distro - Humble (LTS until 2027) or Jazzy (LTS until 2029)
- Choose your base image - Need GPU? Use NVIDIA Isaac ROS. CPU only? OSRF official images are smaller
- Use modern Gazebo Sim - NOT Classic (EOL Jan 2025)
- Test offline early -
docker run --network noneshould work before you arrive at the venue - Write wrapper scripts - Your future self will thank you
Time investment: I spent about a week on this, spread across evenings. Most of that was debugging the VSLAM namespace issues and waiting for Docker builds.
Wrapping Up
I now have 7 Docker images ready (about 80GB compressed), covering both workshop tracks plus extras for experimentation. My RealSense D435i works, VSLAM runs on my RTX 5090, and everything launches with a single command.
Was it overkill? Maybe. But when the workshop starts, I’ll be following along instead of troubleshooting apt-get failures on conference WiFi.
The real lesson: Test your setup with the network cable unplugged. If it works offline, it’ll work anywhere.
This post documents my preparation for ROSCon India 2025 (Dec 18-20, COEP Pune). If you’re doing something similar, feel free to reach out—happy to share configs.