In December 2015, Elon Musk and some people and companies in the technology industry joined forces to announce the creation of OpenAI, a non-profit organization with the goal of making the results available worldwide Research in the field of artificial intelligence without requiring financial compensation.
At the time of its creation, the founders of the company explained that their researchers will be strongly encouraged to publish their work in the form of documents, blog posts, code, and patents (if any) World. A few years have now passed, and a few days ago, the company announced the availability of a new algorithm based on artificial intelligence.
OpenAI has announced the availability of a framework allowing robots to learn by imitating what they are given to see. Generally, for a system to be able to master the various facets of a task and run it without problems, it requires learning tests on a broad range of samples. OpenAI, therefore, wanted to go even faster in learning by allowing robots to learn as human beings do.
This gave rise to the “one-shot imitation learning” framework. With this algorithm, a human can communicate to a robot how to perform a new task after executing it in a virtual reality environment. And from a single demonstration, the robot can perform the same task from an arbitrary initial configuration.
Thus one can construct a policy by learning imitation or reinforcement to stack blocks in towers of 3. But with this new algorithm, researchers have succeeded in designing policies that are not specific to a particular task, but rather can be used by a robot to know what to do in a new situation of a task.
In the above video, OpenAI has a demonstration of the formation of a policy that solves a different instance of the same task with as a learning data the simulation observed on another demonstration.
To stack the blocks, the robot uses an algorithm supported by two neural networks, namely a vision network and an imitation network. The vision array acquires the desired capabilities by recording hundreds of simulated images in a task with different lighting, texture, and object disturbances. The imitation network observes a demonstration, milking, reduces the trajectory of the moving objects and then accomplishes the intention starting with blocks arranged differently.
Below the imitation network, it has a process called “Soft Attention” that deals with both the different steps and actions as well as the appropriate blocks to be used in stacking and also the components of the vector specifying the locations of the various blocks in the environment.
The researchers explain that for the robot to learn a robust policy, a modest amount of noise has been introduced into the results of the script policy. This allowed the robot to perform its task properly even when things go wrong. Without the injection of this noise, the robot would not have been able to generalize what he learned by observing a specific task.
Finally, it should be noted that although the “one-shot imitation learning” algorithm was used to teach a robot to move blocks of colored cubes, it can also be used for other tasks.