A4: Sim-to-Real Transfer

Deploying Learned Policies on Real Robots

Learning Objectives

By the end of this lesson, you will be able to:

Explain the sim-to-real gap and its causes
Describe domain randomization techniques
Load and run pre-trained policies using ONNX
Understand safety considerations for real robot deployment

Prerequisites

Completed A3: Reinforcement Learning Fundamentals
Understanding of neural network inference
Basic knowledge of robot safety

Theory: The Sim-to-Real Gap

What is the Sim-to-Real Gap?

The sim-to-real gap refers to differences between simulation and reality that cause trained policies to fail on real robots.

Sim-to-Real Gap

Alt-text: Diagram illustrating the sim-to-real gap. On the left, a simulated robot in a simplified environment. On the right, a real robot with complex physics, sensor noise, and real-world variations. Arrows show the gap between simulation (perfect physics, no noise) and reality (complex physics, sensor noise, actuator delays).

Sources of the Gap

Category	Simulation	Reality
Physics	Simplified models	Complex dynamics
Sensors	Perfect readings	Noise, latency
Actuators	Instant response	Delays, backlash
Environment	Static, known	Dynamic, variable
Contacts	Simplified friction	Complex surface interactions

Consequences

Policy works perfectly in simulation
Policy fails catastrophically on real robot
Robot may fall, collide, or behave erratically

Techniques to Bridge the Gap

1. Domain Randomization

Randomize simulation parameters during training so the policy learns to be robust:

# Domain randomization example
class RandomizedEnv:
    def reset(self):
        # Randomize physics
        self.friction = np.random.uniform(0.5, 1.5)
        self.mass_scale = np.random.uniform(0.8, 1.2)

        # Randomize sensors
        self.sensor_noise_std = np.random.uniform(0.0, 0.1)
        self.sensor_delay = np.random.randint(0, 3)

        # Randomize actuators
        self.motor_strength = np.random.uniform(0.9, 1.1)
        self.action_delay = np.random.randint(0, 2)

Common Randomization Targets:

Friction coefficients
Mass and inertia
Motor strength
Sensor noise
Control delays
Link lengths
External forces

2. System Identification

Measure real robot parameters and match simulation:

# System identification measurements
real_robot_params = {
    'motor_kp': 80.0,    # Measured from step response
    'motor_kd': 5.0,
    'friction': 0.8,     # Measured from sliding test
    'latency_ms': 15,    # Measured from command-response
}

# Update simulation
sim.set_motor_gains(real_robot_params['motor_kp'], real_robot_params['motor_kd'])

3. Progressive Training

Train in stages from simple to realistic:

Stage 1: Perfect simulation, no noise
Stage 2: Add sensor noise
Stage 3: Add actuator delays
Stage 4: Add physics randomization
Stage 5: Deploy to real robot

Loading Pre-Trained Policies

ONNX Format

ONNX (Open Neural Network Exchange) is a standard format for trained models:

Works across frameworks (PyTorch → ONNX → TensorRT)
Optimized for inference
Portable to edge devices

Code Example: Policy Loader

#!/usr/bin/env python3
"""
policy_loader.py

Loads a pre-trained ONNX policy and runs inference.
Demonstrates sim-to-real policy deployment pattern.

Dependencies: onnxruntime, numpy

Usage:
  python3 policy_loader.py
"""

import numpy as np

try:
    import onnxruntime as ort
except ImportError:
    print("Install onnxruntime: pip install onnxruntime")
    raise


class PolicyLoader:
    """Loads and runs inference on an ONNX policy."""

    def __init__(self, model_path: str):
        """Initialize the policy loader.

        Args:
            model_path: Path to .onnx policy file
        """
        # Create inference session
        self.session = ort.InferenceSession(
            model_path,
            providers=['CPUExecutionProvider']  # or CUDAExecutionProvider
        )

        # Get input/output info
        self.input_name = self.session.get_inputs()[0].name
        self.input_shape = self.session.get_inputs()[0].shape
        self.output_name = self.session.get_outputs()[0].name

        print(f"Loaded policy: {model_path}")
        print(f"Input: {self.input_name} {self.input_shape}")
        print(f"Output: {self.output_name}")

    def get_action(self, observation: np.ndarray) -> np.ndarray:
        """Run policy inference to get action.

        Args:
            observation: Current state observation

        Returns:
            Action to execute
        """
        # Ensure correct shape and type
        obs = observation.astype(np.float32)
        if len(obs.shape) == 1:
            obs = obs.reshape(1, -1)  # Add batch dimension

        # Run inference
        outputs = self.session.run(
            [self.output_name],
            {self.input_name: obs}
        )

        action = outputs[0][0]  # Remove batch dimension
        return action


def main():
    """Example usage with dummy observations."""

    # Load the pre-trained policy
    policy = PolicyLoader('pretrained/locomotion_policy.onnx')

    # Simulate observations (replace with real sensor data)
    obs_dim = 48  # Typical for quadruped: joints + velocities + commands
    observation = np.random.randn(obs_dim).astype(np.float32)

    # Get action
    action = policy.get_action(observation)
    print(f"Observation shape: {observation.shape}")
    print(f"Action shape: {action.shape}")
    print(f"Action values: {action[:5]}...")  # First 5 values

    # In real deployment:
    # while True:
    #     observation = get_robot_state()
    #     action = policy.get_action(observation)
    #     send_to_robot(action)


if __name__ == '__main__':
    main()

Safety Considerations

Pre-Deployment Checklist

WARNING: Real robot deployment requires careful safety protocols. Failure to follow safety procedures can result in damage to the robot, environment, or injury to humans.

Emergency stop tested and accessible
Safety perimeter established (no humans in workspace)
Velocity limits configured in hardware
Joint limits enforced in software
Collision detection enabled
Initial tests at reduced speed
Fallback behavior defined

Speed Progression

Stage	Speed	Duration
1	10% max	Until stable
2	25% max	10+ minutes
3	50% max	30+ minutes
4	75% max	Extended testing
5	100% max	Production ready

Monitoring During Deployment

def safe_execution_loop(policy, robot):
    """Safe policy execution with monitoring."""

    MAX_VELOCITY = 0.5  # m/s safety limit
    MAX_TORQUE = 10.0   # Nm safety limit

    while True:
        # Get observation
        obs = robot.get_observation()

        # Get action from policy
        action = policy.get_action(obs)

        # Safety clipping
        action = np.clip(action, -MAX_TORQUE, MAX_TORQUE)

        # Check velocity limits
        if np.any(np.abs(robot.joint_velocities) > MAX_VELOCITY):
            print("VELOCITY LIMIT EXCEEDED - stopping")
            robot.emergency_stop()
            break

        # Execute (with rate limiting)
        robot.send_action(action)
        time.sleep(0.01)  # 100 Hz control

Key Concepts Summary

Concept	Definition
Sim-to-Real Gap	Differences between simulation and reality
Domain Randomization	Randomize simulation during training
System Identification	Measure and match real robot parameters
ONNX	Portable neural network format
Safety Limits	Hardware and software safeguards

Hands-On Exercise

Exercise A4.1: Domain Randomization Design

For a quadruped robot, list 10 parameters you would randomize:

Parameter	Range	Rationale
1.
2.
...

Click for suggested answer

Friction coefficient: 0.5-1.5 (floor surfaces vary)
Base mass: ±20% (payload variations)
Motor strength: ±10% (manufacturing tolerances)
Joint damping: ±30% (wear and temperature)
Sensor noise: 0-5% (sensor quality)
Control latency: 0-20ms (communication delays)
External push force: 0-50N (unexpected contacts)
Ground height: ±5cm (uneven terrain)
Link lengths: ±2% (calibration errors)
Gravity direction: ±5° (slopes)

Exercise A4.2: Safety Protocol

Create a safety checklist for deploying a mobile robot policy:

Hardware safety measures
Software safety limits
Testing progression
Monitoring during operation

AI Agent Assisted Prompts

Prompt 1: Gap Analysis

My quadruped policy trained in Isaac Gym falls over when I deploy it on the
real robot. The simulation uses simplified contact models. What are the most
likely causes and how can I diagnose the specific problem?

Prompt 2: Domain Randomization Strategy

I'm training a manipulation policy for a robot arm. What parameters should
I randomize and what ranges should I use? I want good real-world transfer
without making training too hard.

Prompt 3: Real-Time Inference

My ONNX policy runs at 50ms per inference on CPU but I need 1kHz control.
What optimization strategies can I use? Should I use TensorRT, model
pruning, or simpler architectures?

Summary

In this lesson, you learned:

Sim-to-real gap causes policies to fail on real robots
Domain randomization makes policies robust to variations
System identification matches simulation to reality
ONNX enables portable policy deployment
Safety protocols are essential for real robot deployment

Chapter Complete!

Congratulations! You've completed Chapter 4: AI-Robot Brain!

You now understand:

Perception pipelines and sensor processing
SLAM and map generation
Navigation with Nav2 and behavior trees
Reinforcement learning fundamentals
Sim-to-real transfer techniques

What's Next?

Complete Advanced Exercises
Apply these concepts to your own robot projects
Continue to Chapter 4 for manipulation and grasping

A4: Sim-to-Real Transfer

Learning Objectives​

Prerequisites​

Theory: The Sim-to-Real Gap​

What is the Sim-to-Real Gap?​

Sources of the Gap​

Consequences​

Techniques to Bridge the Gap​

1. Domain Randomization​

2. System Identification​

3. Progressive Training​

Loading Pre-Trained Policies​

ONNX Format​

Code Example: Policy Loader​

Safety Considerations​

Pre-Deployment Checklist​

Speed Progression​

Monitoring During Deployment​

Key Concepts Summary​

Hands-On Exercise​

Exercise A4.1: Domain Randomization Design​

Exercise A4.2: Safety Protocol​

AI Agent Assisted Prompts​

Prompt 1: Gap Analysis​

Prompt 2: Domain Randomization Strategy​

Prompt 3: Real-Time Inference​

Summary​

Chapter Complete!​

What's Next?​