Imitation Learning

A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

export TS="$(date +%Y%m%d_%H%M%S)" && \
export DATASET_REPO_ID="${HF_USER}/eval_koch-tshirt-dagger-${TS}" && \
export INTERVENTION_REPO_ID="${HF_USER}/koch-tshirt-dagger-corrections-${TS}" && \
lerobot-record \
  --robot.type=bi_koch_follower \
  --robot.left_arm_port="$FOLLOWER_LEFT_PORT" \
  --robot.right_arm_port="$FOLLOWER_RIGHT_PORT" \
  --robot.id=bimanual_follower \
  --robot.cameras="{ top: {type: opencv, index_or_path: $TOP_CAMERA_INDEX_OR_PATH, width: 640, height: 480, fps: 30}, left_wrist: {type: opencv, index_or_path: $LEFT_WRIST_CAMERA_INDEX_OR_PATH, width: 640, height: 480, fps: 30}, right_wrist: {type: opencv, index_or_path: $RIGHT_WRIST_CAMERA_INDEX_OR_PATH, width: 640, height: 480, fps: 30} }" \
  --teleop.type=bi_koch_leader \
  --teleop.left_arm_port="$LEADER_LEFT_PORT" \
  --teleop.right_arm_port="$LEADER_RIGHT_PORT" \
  --teleop.id=bimanual_leader \
  --teleop.intervention_enabled=true \
  --dataset.repo_id="$DATASET_REPO_ID" \
  --dataset.single_task="Fold the t-shirt and put it in the bin" \
  --dataset.num_episodes=10 \
  --dataset.episode_time_s=60 \
  --dataset.reset_time_s=15 \
  --intervention_repo_id="$INTERVENTION_REPO_ID" \
  --policy.path="${HF_USER}/act_policy_koch-tshirt-folding-v2" \
  --display_data=true \
  --teleop.inverse_follow=false

DAgger = dataset aggregation

First heard about this paper when reading up on ALOHA.

Dagger "Helps reduce distribution shift for the learner", but how?

DAgger improves on behavioral cloning by training on a dataset that better resembles the observations the trained policy is likely to encounter, but it requires querying the expert online.

Krish M was mentioning this, and I was confusing it with IMPALA.