Publications
-
Hierarchical Few-Shot Imitation with Skill Transition Models
Preprint
Kourosh Hakhamaneshi, Ruihan Zhao, Albert Zhan, Pieter Abbeel, Michael Laskin
A desirable property of autonomous agents is the ability to both solve long-horizon problems and generalize to unseen tasks. Recent advances in data-driven skill learning have shown that extracting behavioral priors from offline data can enable agents to solve challenging long-horizon tasks with reinforcement learning. However, generalization to tasks unseen during behavioral prior training remains an outstanding challenge. To this end, we present Few-shot Imitation with Skill Transition Models (FIST), an algorithm that extracts skills from offline data and utilizes them to generalize to unseen tasks given a few downstream demonstrations. FIST learns an inverse skill dynamics model, a distance function, and utilizes a semi-parametric approach for imitation. We show that FIST is capable of generalizing to new tasks and substantially outperforms prior baselines in navigation experiments requiring traversing unseen parts of a large maze and 7-DoF robotic arm experiments requiring manipulating previously unseen objects in a kitchen.
-
Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback
Preprint
Xiaofei Wang, Kimin Lee, Kourosh Hakhamaneshi, PieterAbbeel, Michael Laskin
A promising approach to solving challenging long-horizon tasks has been to extract behavior priors (skills) by fitting generative models to large offline datasets of demonstrations. However, such generative models inherit the biases of the underlying data and result in poor and unusable skills when trained on imperfect demonstration data. To better align skill extraction with human intent we present Skill Preferences (SkiP), an algorithm that learns a model over human preferences and uses it to extract human-aligned skills from offline data. After extracting human-preferred skills, SkiP also utilizes human feedback to solve down-stream tasks with RL. We show that SkiP enables a simulated kitchen robot to solve complex multi-step manipulation tasks and substantially outperforms prior leading RL algorithms with human preferences as well as leading skill extraction algorithms without human preferences.
-
JUMBO: Scalable Multi-task Bayesian Optimization using Offline Data
Preprint
Kourosh Hakhamaneshi, Pieter Abbeel, Vladimir Stojanovic, Aditya Grover
The goal of Multi-task Bayesian Optimization (MBO) is to minimize the number of queries required to accurately optimize a target black-box function, given access to offline evaluations of other auxiliary functions. When offline datasets are large, the scalability of prior approaches comes at the expense of expressivity and inference quality. We propose JUMBO, an MBO algorithm that sidesteps these limitations by querying additional data based on a combination of acquisition signals derived from training two Gaussian Processes (GP): a cold-GP operating directly in the input domain and a warm-GP that operates in the feature space of a deep neural network pretrained using the offline data. Such a decomposition can dynamically control the reliability of information derived from the online and offline data and the use of pretrained neural networks permits scalability to large offline datasets. Theoretically, we derive regret bounds for JUMBO and show that it achieves no-regret under conditions analogous to GP-UCB (Srinivas et. al. 2010). Empirically, we demonstrate significant performance improvements over existing approaches on two real-world optimization problems: hyper-parameter optimization and automated circuit design.
-
GACEM: Generalized Autoregressive Cross Entropy Method for Multi-Modal Black Box Constraint Satisfaction
Preprint
Kourosh Hakhamaneshi, Keertana Settaluri, Pieter Abbeel, Vladimir Stojanovic
In this work we present a new method of black-box optimization and constraint satisfaction. Existing algorithms that have attempted to solve this problem are unable to consider multiple modes, and are not able to adapt to changes in environment dynamics. To address these issues, we developed a modified Cross-Entropy Method (CEM) that uses a masked auto-regressive neural network for modeling uniform distributions over the solution space. We train the model using maximum entropy policy gradient methods from Reinforcement Learning. Our algorithm is able to express complicated solution spaces, thus allowing it to track a variety of different solution regions. We empirically compare our algorithm with variations of CEM, including one with a Gaussian prior with fixed variance, and demonstrate better performance in terms of: number of diverse solutions, better mode discovery in multi-modal problems, and better sample efficiency in certain cases.
-
AutoCkt: Deep Reinforcement Learning of Analog Circuit Designs
DATE 2020
Keertana Settaluri, Ameer Haj-Ali, Qijing Huang, Kourosh Hakhamaneshi, Borivoje Nikolic
Domain specialization under energy constraints in deeply-scaled CMOS has been driving the need for agile development of Systems on a Chip (SoCs). While digital subsystems have design flows that are conducive to rapid iterations from specification to layout, analog and mixed-signal modules face the challenge of a long human-in-the-middle iteration loop that requires expert intuition to verify that post-layout circuit parameters meet the original design specification. Existing automated solutions that optimize circuit parameters for a given target design specification have limitations of being schematic-only, inaccurate, sample-inefficient or not generalizable. This work presents AutoCkt, a machine learning optimization framework trained using deep reinforcement learning that not only finds post-layout circuit parameters for a given target specification, but also gains knowledge about the entire design space through a sparse subsampling technique. Our results show that for multiple circuit topologies, AutoCkt is able to converge and meet all target specifications on at least 96.3% of tested design goals in schematic simulation, on average 40X faster than a traditional genetic algorithm. Using the Berkeley Analog Generator, AutoCkt is able to design 40 LVS passed operational amplifiers in 68 hours, 9.6X faster than the state-of-the-art when considering layout parasitics.
-
BagNet: Berkeley Analog Generator with Layout Optimizer Boosted with Deep Neural Networks
ICCAD 2019
Kourosh Hakhamaneshi, Nick Werblun, Pieter Abbeel, Vladimir Stojanovic
The discrepancy between post-layout and schematic simulation results continues to widen in analog design due in part to the domination of layout parasitics. This paradigm shift is forcing designers to adopt design methodologies that seamlessly integrate layout effects into the standard design flow. Hence, any simulation-based optimization framework should take into account time-consuming post-layout simulation results. This work presents a learning framework that learns to reduce the number of simulations of evolutionary-based combinatorial optimizers, using a DNN that discriminates against generated samples, before running simulations. Using this approach, the discriminator achieves at least two orders of magnitude improvement on sample efficiency for several large circuit examplses including an optical link receiver layout.