The Technical University of Berlin and researchers from Google have unveiled the PaLM-E, (Pathways Language Model with Embodied), which is an advancement in human and robot interaction. They claim that it has the ability to control different robots across multiple environments, showing a level of flexibility previously unseen in robotics. PaLM-E integrates AI-powered vision and language to enable autonomous control, allowing the robot to perform a task based on human voice commands, without the need for constant retraining.
The model is able to make use of visual data to enhance its language processing capabilities, resulting in an embodied language model that is both versatile and quantitatively competent. It has been trained on a mixture of tasks across multiple robot embodiments and general vision-language tasks.
Their largest model, PaLM-E-562B, shows capabilities like “multimodal chain of thought reasoning”, over multiple images, despite being trained on only single-image prompts.
Researchers claim that the model uses “positive transfer,” which basically means it can transfer knowledge and skills learned from prior tasks to new ones, leading to higher performance than single-task robot models. In also displays ‘multimodal chain-of-thought reasoning,’ meaning it can analyse a sequence of inputs (both language and visual), as well as “multi-image inference,” where it uses multiple images as an input to make inference or predict something.