MPC vs RL or Future Combo Discussion

Hey Everyone,

This post is more of a theoretical question (esp. working through this resource). As I’ve been learning more about reinforcement learning and MPC I’ve begun to wonder the real differences. From what I understand the advantage of MPC is that it is an online controller looking into the future for every real time step, and due to it adapting from real world info it can still function even if the model’s dynamics are fuzzy/inaccurate. Meanwhile RL is using a policy generated offline that is ideally generalized for the goal task (e.g. walking) by presenting the agent with random environments, system parameters, etc.

My question is do you think that one methodology will take over the other in terms of practicality and versatility in the future? Will we just have two different methods going forth that we implement based on whatever the task-space is? Or will robotics ideally come to some kind of fusion and what would a true fusion of control theory and RL even look like?

I was once thinking of RL maybe being the “link” between different controllers so you would use an RL policy to know when to and transition between two different locomotion controllers. But now it feels like some RL policies perform better than traditional controllers (e.g. Cassie running).

What do you guys think?


Hi Roy,

This is a very general question and still open for research. Short answer is both frameworks try to find an optimal control policy in general. MPC uses a model to plan into the future and solves online an optimal control problem which is cast to a nonlinear program, while RL resorts to trial and error and function approximation to find optimal policy. based on my experience, RL can find more performant policies for single tasks (as we have seen many examples until now such as the latest running of Cassie), but it does not generalize and one needs to generate for every new task a new policy. On the other hand, MPC can potentially generate different motions without changing the structure, but the price to pay is that it’s computationally expensive and normally does not admit highly performant policies like RL due to the assumptions used in writing down the optimal control problem. This is the most general thing I could say, but there is much more detail in that …



Very nice question and discussion going on here!

In addition to the MPC-RL workshop, another reference I recommend is this presentation from Marco Hutter’s group at a workshop in ICRA 2023: [09] M. Hutter, 6th Workshop on Legged Robots ICRA'22 - YouTube