OGBench: Benchmarking Offline Goal-Conditioned RL

Really good for surveys of different goal-conditioned implementations.

I’m mostly going to focus on cube-triple for now, trying to understand what the state-based representation actually encodes?

This is the script that they used to generate:

Cube-single

(Pdb) train_dataset['observations'].shape
(1000000, 28)

Cube-double

(Pdb) train_dataset['observations'].shape
(3000000, 37)

Cube-triple

(Pdb) train_dataset['observations'].shape
(3000000, 46)
  • 46 dimensional data interesting
Got it — using the raw next_ob (flat vector of length 27) with this layout:

0:6 → joint_pos (6)
6:12 → joint_vel (6)
12:15 → effector_pos (x,y,z)
15 → effector_yaw
16 → gripper_opening
17 → gripper_vel
18 → gripper_contact
19:22 → block_0_pos (x,y,z)
22:26 → block_0_quat (w,x,y,z or env-specific; unused here)
26 → block_0_yaw

They have a helper function to explain what these keys mean:

env.compute_ob_info()

The actual compute_observation() function returns this: https://github.com/seohongpark/ogbench/blob/master/ogbench/manipspace/envs/cube_env.py#L766

def compute_observation(self):
    if self._ob_type == 'pixels':
        return self.get_pixel_observation()
    else:
        xyz_center = np.array([0.425, 0.0, 0.0])
        xyz_scaler = 10.0
        gripper_scaler = 3.0
 
        ob_info = self.compute_ob_info()
        ob = [
            ob_info['proprio/joint_pos'],
            ob_info['proprio/joint_vel'],
            (ob_info['proprio/effector_pos'] - xyz_center) * xyz_scaler,
            np.cos(ob_info['proprio/effector_yaw']),
            np.sin(ob_info['proprio/effector_yaw']),
            ob_info['proprio/gripper_opening'] * gripper_scaler,
            ob_info['proprio/gripper_contact'],
        ]
        for i in range(self._num_cubes):
            ob.extend(
                [
                    (ob_info[f'privileged/block_{i}_pos'] - xyz_center) * xyz_scaler,
                    ob_info[f'privileged/block_{i}_quat'],
                    np.cos(ob_info[f'privileged/block_{i}_yaw']),
                    np.sin(ob_info[f'privileged/block_{i}_yaw']),
                ]
            )
 
        return np.concatenate(ob)
  • So it gets both the current state and the position of the cube * number of cubes