Dactyl learns to solve the object reorientation task entirely in simulation without any human input. After this training phase, the learned policy works on the real robot without any fine-tuning. Learning methods for robotic manipulation face a dilemma. Even modeling what happens when two objects touch — the most basic problem in manipulation — is an active area of research with no widely accepted solution. Our approach, domain randomization , learns in a simulation which is designed to provide a variety of experiences rather than maximizing realism.
This gives us the best of both approaches: by learning in simulation, we can gather more experience quickly by scaling up, and by de-emphasizing realism, we can tackle problems that simulators can only model approximately. It's been shown by OpenAI and others that domain randomization can work on increasingly complex problems — domain randomizations were even used to train OpenAI Five.
Here, we wanted to see if scaling up domain randomization could solve a task well beyond the reach of current methods in robotics. We built a simulated version of our robotics setup using the MuJoCo physics engine.
Notes on NIPS – Ilya Kuzovkin
This simulation is only a coarse approximation of the real robot:. The simulation can be made more realistic by calibrating its parameters to match robot behavior, but many of these effects simply cannot be modeled accurately in current simulators. Instead, we train the policy on a distribution of simulated environments where the physical and visual attributes are chosen randomly. Randomized values are a natural way to represent the uncertainties that we have about the physical system and also prevent overfitting to a single simulated environment.
If a policy can accomplish the task across all of the simulated environments, it will more likely be able to accomplish it in the real world. By building simulations that support transfer, we have reduced the problem of controlling a robot in the real world to accomplishing a task in simulation, which is a problem well-suited for reinforcement learning.
While the task of manipulating an object in a simulated hand is already somewhat difficult , learning to do so across all combinations of randomized physical parameters is substantially more difficult. To generalize across environments, it is helpful for the policy to be able to take different actions in environments with different dynamics. Because most dynamics parameters cannot be inferred from a single observation, we used an LSTM — a type of neural network with memory — to make it possible for the network to learn about the dynamics of the environment.
The LSTM achieved about twice as many rotations in simulation as a policy without memory. We use a different model architecture, environment, and hyperparameters than OpenAI Five does, but we use the same algorithms and training code.
- How Cognitive Models of Human Body Experience Might Push Robotics!
- Bestselling Series.
- Powered by AI, our digital platform is transforming what’s possible across industries..
- Cardiovascular Problems in Emergency Medicine.
- The Indian Grocery Store Demystified.
- Puzzles from other worlds : fantastical brainteasers from Isaac Asimovs science fiction magazine.
Rapid used CPU cores and 8 GPUs to train our policy, collecting about one hundred years of experience in 50 hours. For development and testing, we validated our control policy against objects with embedded motion tracking sensors to isolate the performance of our control and vision networks.
Dactyl was designed to be able to manipulate arbitrary objects, not just those that have been specially modified to support tracking. Therefore, Dactyl uses regular RGB camera images to estimate the position and orientation of the object. We train a pose estimator using a convolutional neural network.
The neural network takes the video streams from three cameras positioned around the robot hand and outputs the estimated position and orientation of the object. We use multiple cameras to resolve ambiguities and occlusion.
General Deep Learning
We again use domain randomization to train this network only in simulation using the Unity game development platform, which can model a wider variety of visual phenomena than Mujoco. Reality gap occurs when you want to take your agent that is trained in a simulator into a real world. We might need something else that wheel or track to locomote in unknown environments like other planets and such. Nasa has build tensegrity Super Ball Bot see picture on the right that is able to move on larger number of surfaces, is much more resilient and can be deployed without a parachute from greater heights.
Since no one has a clue how this things should use its motors to move RL is the obvious choice and it works!
- Rapid Learning In Robotics.
- Rapid learning in robotics!
- NFPA 221 : standard for high challenge fire walls, fire walls, and fire barrier walls.
- The prospect of immortality.
- Fields of Action – Austrian Council on Robotics and Artificial Intelligence.
- Module Leader;
- Constructing the Middle Ages: Historiography, Collective Memory, and Nation-Building in Luxembourg.
- Physical Combinatorics.
It would be cool to see RL algorithm that is trained to replicate behavior of simple species like c. The idea is to train a new column for each new task. Each new column is connected to all previous columns so that features that were learnt for the previous tasks can be reused in the new tasks. Once a new column is added the weights in the previous ones are freezed. Advantages: no forgetting of previous tasks, feature transfer, now capacity for each new task.
Disadvantages: separate mechanism is need to know when to switch between the columns, number of parameters grows with the number of tasks whole new network for each task. The approach demonstrates the usefulness of relevant intermediate tasks a bit like curriculum learning. Even better generalization and robustness can be achieved via data augmentation: changing shapes, size, color, lighting of the objects in the simulator leads to better final agent.
Speeding up the training in a physical environment can be achieved using this technology: simulator is the first columns, each new task in a real world is an additional column. Imitation learning is also good way to beat sample complexity. Training in the simulator happens with A3C and is fast and parallelizable. Comparing training for the same task: 24h in simulator with subsequent PNN for bridging the gap, 55d in real physical system.
Alternative approach is finetuning to the real world after training in a simulator, but PNN approach beats that with a large margin. A very nice talk that puts Bayesian ML and Deep Learning together and demonstrates how they fit together. Mention of Bayesian framework being naturally related to memory, which seems to be an important topic at the moment in the context of Memory Networks and Neural Turing Machines.
Is ensembling a practical replacement for Bayesian methods? Estimating confidence intervals and measuring the uncertainty were important points of discussion. Actually we do — the hole hierarchical structure of deep convolutional neural networks is a prior we imposed and it happened to be useful one. New important trend is to build generative models into algorithms and agents and use their predictions about the world in training and deployment.
Comparison between actual brain and DNN seems to be very popular theme. This is something I am doing myself and it woke mixed feelings to see 3 posters doing almost the same thing I am doing. On the other hand it means that this direction in research seemed important to several groups, therefore it could have merit.
Or just effect of the hype, who knows…. Synapse pruning happens with age.
The world's most used offline programming tool for robotics
Why that could be useful? Initially everything is connected, with time active connections persist, useless disappear. They experimentally tested pruning approach start with fully connected and remove vs. Next thing to explore is what is the optimal rate of pruning. Conducted experiments on mouse whiskers and somatosensory columns — slides the brain, applied automatic synapse detection to count number of synapses. Locality-sensitive hashing LSH is common way to assign similar things close to each other in the hash space same bin.
Done via random projection method. How does a fly do that? Same sparse locality-preserving algorithms seem to appear in other brain areas as well, might be an important underlying mechanism for some of the brain processes. When you a human demonstrates behavior he tends to exaggerate to highlight behavior patterns that are important.
If your task is just to perform, not to demonstrate, you do it quite differently and focus on optimality. Analogy between visual cortex and CNNs. New larger visual DNNs match better and better to ventral stream. The talk is about temporal coding and predictive coding in the context of deep learning. Next frame prediction works well on synthetics datasets, natural road images, comma. Synergy of neuroscience and machine learning It would be nice to have more in vivo neuroscience experiments that produce more data for neuroinspired algorithms to borrow ideas from.
Brain-like constraints speed, size, memory efficiency could be added to algorithmic approaches and that could lead to more brain-like algorithms. On the other hand if the goal is not to understand the neural code, but to build intelligence we might not need to limit ourselves by mere biological constraints. Maybe it is more beneficial to focus on understanding the learning process vs. Imagination-based planning, that is model-based learning and generative learning seems important at the moment.
Neuroscientists criticize the simplicity of ANNs ML people are having hard time trying to understand what exactly ANNs are doing, despite the fact that we have access to each particular neuron — imagine how hard it is for neuroscience to figure out what the neurons are doing without such level of access. Neuroscience could record more area-wide data as opposed to single unit recordings.
Dense networks not hierarchical can outperform there was a paper on DenseNets hierarchical ones with smaller number of neurons. Real synapse can be seen as having a state and this is a promising area to explore. What neuroscience has to do to be useful for ML? Test the algorithms especially biologically plausible ones proposed ML in in vivo experiments.
The precesses where intensity of an event depend on the previous events are called hawkes processes. There are examples of such in our world, for example spiking of neurons. There are models designed to fit such processes. Rapid image categorization is done well by humans: conducted set of experiments using Amazon Mechanical Turk.
Until 5th layer of CNN diffuculty correlates, after that correlation drops indicating that DNN is doing something different from what a human is doing. Meta learning, or learning to learn was a concept that reemerged here and there as yet-uncertain-but-conceptually-promising direction. There are various ideas under that umbrella, one of which is to formulate the design of an optimization algorithm as a learning problem. That way what is learnt are not only the best parameters that minimize the loss function, but also the parameters of an optimizer that learns those parameters, alleviating the need for manual design of the optimization algorithm.
To know more about this and related concepts read about Learning to learn by gradient descent by gradient descent. Is a company that uses massive distributed computational resource to evaluate learning algorithm using genetic algorithms. Just curious to know that such thing exists. Lots of data in bioinformatics, lots of experiments, needs to be reproducible.
They work on that by fixing pipelines, opensourcing data and code, checking results of others. Transparency in analysis allows to publish non-perfect results and still move science forward. As the title says: they show that for a netflix-like matrix completion objective function all local minima are global. Given a dataset you pick common samples prototypes and most uncommon examples criticisms from each category using Maximum Mean Discrepancy.
Examples from ImageNet: prototypes are, for example, usual pictures of dogs, criticisms are the pictures of dogs from behind, from weird angles etc. Code completions formulated as ML task: the input is piece of code and output is the next line. Differently from other similar tools makes better use of the context to propose the completion.
Having demos was a nice initiative — they show the gap between all-is-so-well you can observe in the papers and the reality. Nothing special to report here: few applications in robotics, few real-time application of CNNs, everything almost works, but does not amaze. I have worked on various projects in machine learning and computer science, neuroscience and brain-computer interfaces, reinforcement learning and robotics.
Currently I am focusing on two things: leading machine learning team at OffWorld Inc. Save my name, email, and website in this browser for the next time I comment. Skip to content. December 16, Ilya Kuzovkin. Notes on NIPS General Deep Learning Tutorial: Building Applications using Deep Learning by Andrew Ng A presentation about practical tips on applying deep learning and the ways how to think about ML experimental pipeline, evaluation measures, and the concepts of bias and variance in the era of human-level performance.
Fast associative weights act as Hopfield network by attracting a state most similar to the current input. Outperforms usual LSTMs in high resolution sampling and asynchronous sampling conditions synthetic data. Credit Assignment Beyond Backpropagation by Yoshua Bengio There are several algorithms for credit assignment: Boltzmann machines — high variance; REINFORCE algorithm — very high variance, does not scale, susceptible to noise; Actor-Critic models — lower variance, but potentially high bias; Backpropagation — wins over all previous algorithms as considers only one direction of update and does not waste time exploring other possible directions.
An interesting difference between supervised learning and RL: in supervised learning the training data distribution is fixed, while in RL change in policy leads to sampling from another distribution and too big step in a wrong direction might break everything. Learning to poke by poking Robotic hand is trained with reinforcement learning algorithm to poke object in such a way that eventually they end up in a required position. Guided Policy Search Explored the possibility of demonstrating correct trajectory that is given by a human operator or an expert algorithm to a reinforcement learning agent.
Using a slow RL algorithm to learn a fast RL algorithm using recurrent neural networks by Ilya Sutskever All notable RL success stories are relying on huge sample complexity. Mirowski et al. X Wang et al. Duan et al. Chen et al. OpenAI Universe Is a platform where you can train your RL agents in computer-based environment games, browser, word processors, anything!
Representation learning by SGD in cross-validation by Rich Sutton An idea presented by author of the textbook on reinforcement learning is about a better way to learn representations in deep nets.
Learning to Experiment An agent has to give answers about physical properties of an object, such as weight. Deep Learning for Robotics by Peter Abbeel We might need something else that wheel or track to locomote in unknown environments like other planets and such. RL on simple species It would be cool to see RL algorithm that is trained to replicate behavior of simple species like c.
Related Rapid Learning in Robotics
Copyright 2019 - All Right Reserved