Abstract
In this letter we address the task of recognizing assembly actions as a structure (e.g. a piece of furniture or a toy block tower) is built up from a set of primitive objects. Recognizing the full range of assembly actions requires perception at a level of spatial detail that has not been attempted in the action recognition literature to date. We extend the fine-grained activity recognition setting to address the task of assembly action recognition in its full generality by unifying assembly actions and kinematic structures within a single framework. We use this framework to develop a general method for recognizing assembly actions from observation sequences, along with observation features that take advantage of a spatial assembly's special structure. Finally, we evaluate our method empirically on two application-driven data sources: 1) An IKEA furniture-assembly dataset, and 2) A block-building dataset. On the first, our system recognizes assembly actions with an average framewise accuracy of 70% and an average normalized edit distance of 10%. On the second, which requires fine-grained geometric reasoning to distinguish between assemblies, our system attains an average normalized edit distance of 23% - a relative improvement of 69% over prior work.
Original language | English (US) |
---|---|
Article number | 9372803 |
Pages (from-to) | 3728-3735 |
Number of pages | 8 |
Journal | IEEE Robotics and Automation Letters |
Volume | 6 |
Issue number | 2 |
DOIs | |
State | Published - Apr 2021 |
Keywords
- Probabilistic Inference
- assembly
- multi-modal perception for HRI
- recognition
- sensor fusion
ASJC Scopus subject areas
- Control and Systems Engineering
- Biomedical Engineering
- Human-Computer Interaction
- Mechanical Engineering
- Computer Vision and Pattern Recognition
- Computer Science Applications
- Control and Optimization
- Artificial Intelligence