Driving Problem

Given a two-layer MLP with fixed random initialization $\theta_0$, and a target parameter vector $\theta^*$, can we engineer a sequence of inputs $x_1, x_2, \ldots$ such that gradient steps drive $\theta \to \theta^*$?

Here we explore the simplest version: at each step, sample a random $x$, compute the gradient $g(x)$, and take the step along $\hat{g}(x)$ that best closes the gap.


residual gap
$$\Delta\theta_t = \theta^* - \theta_t$$
optimal step size along $\hat{g}(x_t)$
$$\alpha_t^* = \langle \Delta\theta_t,\, \hat{g}(x_t) \rangle$$
update
$$\theta_{t+1} = \theta_t + \alpha_t^*\, \hat{g}(x_t)$$

input dim $d_{in}$ 5
hidden dim $d_h$ 10
output dim $d_{out}$ 1
depth
activation
biases
$x$ distribution
$x$ scale $\sigma$ 1.0

max steps/sec no limit

step 0  ·  0 steps/sec

EMA window 20

Gradient ascent on $x$ via finite differences on the scalar objective.

learning rate $\eta_x$ 0.1
steps 500

jsi@berkeley.edu
james-simon
gScholar
@learning_mech