<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>AutoML | Baohe Zhang 张宝赫</title><link>https://2bh.github.io/tags/automl/</link><atom:link href="https://2bh.github.io/tags/automl/index.xml" rel="self" type="application/rss+xml"/><description>AutoML</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en</language><lastBuildDate>Thu, 01 Apr 2021 00:00:00 +0000</lastBuildDate><image><url>https://2bh.github.io/media/icon_hu_87928198b81ce9d.png</url><title>AutoML</title><link>https://2bh.github.io/tags/automl/</link></image><item><title>On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning</title><link>https://2bh.github.io/publication/hpo4mbrl/</link><pubDate>Thu, 01 Apr 2021 00:00:00 +0000</pubDate><guid>https://2bh.github.io/publication/hpo4mbrl/</guid><description>&lt;div class="bg-white p-3 rounded"&gt;
&lt;div class="row align-items-center"&gt;
&lt;div class="col-md-6 text-center mb-3"&gt;
&lt;img src="pbt.png" alt="Population-Based Training (PBT)" class="img-fluid" /&gt;
&lt;/div&gt;
&lt;div class="col-md-6 text-center mb-3"&gt;
&lt;img src="multi-fidelity.png" alt="Multi-fidelity HPO" class="img-fluid" /&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class="row align-items-center"&gt;
&lt;div class="col-md-6 text-center mb-0"&gt;
&lt;img src="hpo4rl_curves.png" alt="HPO for MBRL learning curves" class="img-fluid" /&gt;
&lt;/div&gt;
&lt;div class="col-md-6 text-center mb-0"&gt;
&lt;div class="embed-responsive embed-responsive-16by9"&gt;
&lt;iframe class="embed-responsive-item" src="https://www.youtube.com/embed/ztuyicYEiXw" title="HPO for MBRL talk" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen&gt;&lt;/iframe&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h2 id="highlights"&gt;Highlights&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Why HPO for MBRL?&lt;/strong&gt; Model-based RL pipelines stack dynamics learning and planning, exposing tens of interacting hyperparameters that are usually hand-tuned by experts.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Approach.&lt;/strong&gt; We apply automated HPO — including &lt;em&gt;multi-fidelity&lt;/em&gt; search (top-right) and &lt;em&gt;Population-Based Training&lt;/em&gt; (top-left) — to MBRL, and additionally allow hyperparameters to be tuned &lt;em&gt;dynamically&lt;/em&gt; during training. Figures are made by André Biedenkapp.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Result.&lt;/strong&gt; Automated HPO beats human-expert tuning across MuJoCo tasks (bottom-left), and dynamic tuning yields further gains over the best static configuration. Our results found a bug in mujoco that allows halfcheetah goes wildly like a &lt;strong&gt;helicopter&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Takeaways.&lt;/strong&gt; The paper also dissects which hyperparameters (plan horizon, learning rate, etc.) drive stability and final reward in MBRL.&lt;/li&gt;
&lt;/ul&gt;</description></item></channel></rss>