Example Training Data | Placement Poses Generated by Diffusion-CCSP | Execution Trajectories Planning by RRT |
---|---|---|

For example, to place A into the tray in the figure below, we need to generate the grasping pose grasp A, placement pose pose A, and the robot arm trajectory.

This paper introduces an approach for learning to solve continuous constraint satisfaction problems (CCSP) in robotic reasoning and planning. Previous methods primarily rely on hand-engineering or learning generators for specific constraint types and then rejecting the value assignments when other constraints are violated. By contrast, our model, the compositional diffusion continuous constraint solver (Diffusion-CCSP) derives global solutions to CCSPs by representing them as factor graphs and combining the energies of diffusion models trained to sample for individual constraint types. Diffusion-CCSP exhibits strong generalization to novel combinations of known constraints, and it can be integrated into a task and motion planner to devise long-horizon plans that include actions with both discrete and continuous parameters.

**Constraints in the above tasks**:

- (a) The triangles must not collide with each other and fit in the container.
- (b) The configuration of rectangles must satisfy a given set of qualitative constraints, in total 13 types. For example,
*left-in(yellow, box)*and*horizontally-aligned(green, blue)*. The figure illustrates a subset of the 45 constraints present in the problem. - (c) The boxes must fit in between the shelves in a stable configuration. The arrows show the current support relationships.
- (d) The objects must fit inside the container and there exist a packing order such that the gripper won't collide with any object or the container during placing.

Note that:

**Diffusion-CCSP (ULA)**: our method with annealed unadjusted Langevin algorithm to sample from composed diffusion models.- This method performs better than other baselines (except for Task (a) comparing to StructDiffusion), especially in harder problems.

**Diffusion-CCSP (Reverse)**: our method with the standard reverse diffusion process to sample from composed diffusion models.- This method is faster at sampling than ULA, but is generally found to have worse generalization performance.

**StructDiffusion**: Instead of composing diffusion models, a single Transformer is used that takes in a sequence of object embeddings, each of which is the concatenation of the geometry and pose embedding (and grasp embedding for Task (d)) plus time embedding.- The transformer architecture used in StructDiffusion successfully models the relational structure among all pairs of objects. It can achieve a similar performance as our model for the simpler task Task (d); but on more complicated tasks such as Task (b), our compositional diffusion model outperforms it.

**Rejection-Sampling**: We sequentially sample each decision variable according to a generic sampler and check all constraints. For each variable, we sample at most 50 samples.- It completely failed for Task (c) and hard tasks in (b) and (d).

- OOD = Out of Training Distribution, i.e. more objects in each problem than in the training problems.

```
@inproceedings{yang2023diffusion,
title={{Compositional Diffusion-Based Continuous Constraint Solvers}},
author={Yang, Zhutian and Mao, Jiayuan and Du, Yilun and Wu, Jiajun and Tenenbaum, Joshua B. and Lozano-P{\'e}rez, Tom{\'a}s and Kaelbling, Leslie Pack},
booktitle={Conference on Robot Learning},
year={2023},
}
```