Expander Nets


NETWORK
network function
$$\hat{f}(\mathbf{x}) = \mathbf{W}_2 \, \sigma \circ \mathbf{W}_{\mathrm{froz}} \, \mathbf{W}_1 \, \mathbf{x}$$
$$\mathbf{W}_2 \in \mathbb{R}^{1 \times k}, \, \mathbf{W}_{\mathrm{froz}} \in \mathbb{R}^{k \times d},$$
$$\mathbf{W}_1 \in \mathbb{R}^{d \times d}, \, \mathbf{x} \in \mathbb{R}^d$$
nonlinearity
$$\sigma(z) = \sqrt{2} \cos\left(z + \frac{\pi}{4}\right)$$
hidden dimension
$k$: 1000
DATA
data distro
$$\mathbf{x} \sim \mathcal{N}(0, \mathbf{\Gamma})$$
data dimension
$d$: 3
data covariance
$$\mathbf{\Gamma} = \mathrm{diag}(\gamma_i)$$
TARGET FUNCTION
target function
$$\begin{align} f_*(\mathbf{x}) &= h_{\boldsymbol{\alpha}}\left(\mathbf{\Gamma}^{-1/2} \mathbf{x}\right) \\ &= h_1(\gamma_1^{-1/2} x_1) \cdot h_2(\gamma_2^{-1/2} x_2) \cdot h_3(\gamma_3^{-1/2} x_3) \end{align}$$
number of terms
1
TRAINING
loss
$$\mathcal{L} = \frac{1}{2} \mathbb{E}_\mathbf{x}\left[(f_*(\mathbf{x}) - \hat{f}(\mathbf{x}))^2\right]$$
initialization
$$\begin{align} \mathbf{W}_1(0) &= \mathbf{I}_d \\ \mathbf{W}_{\mathrm{froz},ij} &\sim \mathcal{N}(0, 1) \\ \mathbf{W}_2(0) &= \mathbf{0} \end{align}$$
dynamics
$$\begin{align} \dot{\mathbf{W}}_1 &= -\nabla_{\mathbf{W}_1} \mathcal{L} \\ \dot{\mathbf{W}}_{\mathrm{froz}} &= \mathbf{0} \\ \dot{\mathbf{W}}_2 &= -\frac{1}{k}\nabla_{\mathbf{W}_2} \mathcal{L} \end{align}$$
HYPERPARAMETERS
learning rate
$\eta$: 0.01
batch size
$|B|$: 100

loss $\mathcal{L}$
step
parameter size
step
plot w.r.t. step$t_{\mathrm{eff}}$ EMA: 0

modeling ODE
$\mathcal{L} = \frac{1}{2}\left(1 - c a_1^{\alpha_1} \cdots a_d^{\alpha_d} b\right)^2$
dynamics
$\begin{align} \dot{a}_i &= -\partial_{a_i} \mathcal{L} \\ \dot{b} &= -\partial_b \mathcal{L} \end{align}$
initialization
$\begin{align} a_i(0) &= 1 \\ b(0) &= 0 \end{align}$

number of relevant directions
$d_{\mathrm{rel}} = \#\{\alpha_i > 0\}$
order of $f_*$
$|\boldsymbol{\alpha}| = \sum_i \alpha_i$
order of dynamical system
$\ell = |\boldsymbol{\alpha}| + 1$

mean core parameter at init
$\beta = \frac{1}{d+1} \left(\sum_i \frac{|a_i(0)|}{\sqrt{\alpha_i}} + |b(0)|\right)$
shape parameters
$r_i = \frac{a_i(0)^2}{\alpha_i \beta^2}$
shape integral
$F(\mathbf{r}) = \left(\frac{\ell}{2} - 1\right) \int_0^\infty (s + r_b)^{-1/2} \prod_i (s + r_i)^{-\alpha_i/2} \, ds$
rise time
$t_{\mathrm{rise}} = \begin{cases} c^{-2} & \text{if } \ell = 1, \\ -\frac{1}{c} \cdot \log(c\beta) & \text{if } \ell = 2, \\ \frac{1}{\ell - 2} \cdot \frac{1}{c} \cdot \frac{F(\mathbf{r})}{\beta^{\ell-2}} & \text{if } \ell > 2. \end{cases}$
jsi@berkeley.edu
james-simon
gScholar
@fakejamiesimon