Language to Rewards for Robotic Skill Synthesis (2023.6.14 Google DeepMind)

https://language-to-reward.github.io/

reward shaping in robotics

utilize LLMs to define reward parameters, and bridge the gap between high-level language instructions or corrections to low-level robot actions

Reward translator + Motion controller

Reward translator:

User instructions + Motion template and rules → Motion description

Motion description + Reward Coder Prompt → Reward function

The transformation is done by LLM

Keypoints:

Step by step thinking:

Not directly generate codes but firstly generate description and then generate codes
Template:

Directly generating codes is difficult for LLM. But it is much more easier to complete a template

Firstly apply the LLM to reward shaping in for robotic skill synthesis

Still much work(template design) need to be done by human.
Constraining the reward design space helps improve stability of the system while sacrifices some flexibility.

Environment: MuJoCo MPC

Underlying LLM: GPT-4

2 robots: quadrupted robot(locomotion) and dexterous manipulator(manipulation)

Comparison and ablation experiment:

learn to tell a good story.
If you have a novel idea, but it does not work well. Maybe you can try to take a step back (i.e. template in this paper) to let it work temporarily. Figure out how to let it work later.