OGBench: Benchmarking Offline Goal-Conditioned RL
Really good for surveys of different goal-conditioned implementations.
Iām mostly going to focus on cube-triple
for now, trying to understand what the state-based representation actually encodes?
This is the script that they used to generate:
Cube-single
(Pdb) train_dataset['observations'].shape
(1000000, 28)
Cube-double
(Pdb) train_dataset['observations'].shape
(3000000, 37)
Cube-triple
(Pdb) train_dataset['observations'].shape
(3000000, 46)
- 46 dimensional data interesting
Got it ā using the raw next_ob (flat vector of length 27) with this layout:
0:6 ā joint_pos (6)
6:12 ā joint_vel (6)
12:15 ā effector_pos (x,y,z)
15 ā effector_yaw
16 ā gripper_opening
17 ā gripper_vel
18 ā gripper_contact
19:22 ā block_0_pos (x,y,z)
22:26 ā block_0_quat (w,x,y,z or env-specific; unused here)
26 ā block_0_yaw
They have a helper function to explain what these keys mean:
env.compute_ob_info()
The actual compute_observation() function returns this: https://github.com/seohongpark/ogbench/blob/master/ogbench/manipspace/envs/cube_env.py#L766
def compute_observation(self):
if self._ob_type == 'pixels':
return self.get_pixel_observation()
else:
xyz_center = np.array([0.425, 0.0, 0.0])
xyz_scaler = 10.0
gripper_scaler = 3.0
ob_info = self.compute_ob_info()
ob = [
ob_info['proprio/joint_pos'],
ob_info['proprio/joint_vel'],
(ob_info['proprio/effector_pos'] - xyz_center) * xyz_scaler,
np.cos(ob_info['proprio/effector_yaw']),
np.sin(ob_info['proprio/effector_yaw']),
ob_info['proprio/gripper_opening'] * gripper_scaler,
ob_info['proprio/gripper_contact'],
]
for i in range(self._num_cubes):
ob.extend(
[
(ob_info[f'privileged/block_{i}_pos'] - xyz_center) * xyz_scaler,
ob_info[f'privileged/block_{i}_quat'],
np.cos(ob_info[f'privileged/block_{i}_yaw']),
np.sin(ob_info[f'privileged/block_{i}_yaw']),
]
)
return np.concatenate(ob)
- So it gets both the current state and the position of the cube * number of cubes