Expander Nets
SETUP
NETWORK
network function
$$\hat{f}(\mathbf{x}) = \mathbf{W}_2 \, \sigma \circ \mathbf{W}_{\mathrm{froz}} \, \mathbf{W}_1 \, \mathbf{x}$$
$$\mathbf{W}_2 \in \mathbb{R}^{1 \times k}, \, \mathbf{W}_{\mathrm{froz}} \in \mathbb{R}^{k \times d},$$
$$\mathbf{W}_1 \in \mathbb{R}^{d \times d}, \, \mathbf{x} \in \mathbb{R}^d$$
nonlinearity
$$\sigma(z) = \sqrt{2} \cos\left(z + \frac{\pi}{4}\right)$$
hidden dimension
$k$:
1000
DATA
data distro
$$\mathbf{x} \sim \mathcal{N}(0, \mathbf{\Gamma})$$
data dimension
$d$:
3
data covariance
$$\mathbf{\Gamma} = \mathrm{diag}(\gamma_i)$$
TARGET FUNCTION
target function
$$\begin{align}
f_*(\mathbf{x}) &= h_{\boldsymbol{\alpha}}\left(\mathbf{\Gamma}^{-1/2} \mathbf{x}\right) \\
&= h_1(\gamma_1^{-1/2} x_1) \cdot h_2(\gamma_2^{-1/2} x_2) \cdot h_3(\gamma_3^{-1/2} x_3)
\end{align}$$
number of terms
1
TRAINING
loss
$$\mathcal{L} = \frac{1}{2} \mathbb{E}_\mathbf{x}\left[(f_*(\mathbf{x}) - \hat{f}(\mathbf{x}))^2\right]$$
initialization
$$\begin{align}
\mathbf{W}_1(0) &= \mathbf{I}_d \\
\mathbf{W}_{\mathrm{froz},ij} &\sim \mathcal{N}(0, 1) \\
\mathbf{W}_2(0) &= \mathbf{0}
\end{align}$$
dynamics
$$\begin{align}
\dot{\mathbf{W}}_1 &= -\nabla_{\mathbf{W}_1} \mathcal{L} \\
\dot{\mathbf{W}}_{\mathrm{froz}} &= \mathbf{0} \\
\dot{\mathbf{W}}_2 &= -\frac{1}{k}\nabla_{\mathbf{W}_2} \mathcal{L}
\end{align}$$
HYPERPARAMETERS
learning rate
$\eta$:
0.01
batch size
$|B|$:
100
SIMULATION
loss $\mathcal{L}$
step
parameter size
step
—
THEORY COMPUTED BELOW THIS LINE
modeling ODE
$\mathcal{L} = \frac{1}{2}\left(1 - c a_1^{\alpha_1} \cdots a_d^{\alpha_d} b\right)^2$
dynamics
$\begin{align}
\dot{a}_i &= -\partial_{a_i} \mathcal{L} \\
\dot{b} &= -\partial_b \mathcal{L}
\end{align}$
initialization
$\begin{align}
a_i(0) &= 1 \\
b(0) &= 0
\end{align}$
number of relevant directions
$d_{\mathrm{rel}} = \#\{\alpha_i > 0\}$
—
order of $f_*$
$|\boldsymbol{\alpha}| = \sum_i \alpha_i$
—
order of dynamical system
$\ell = |\boldsymbol{\alpha}| + 1$
—
mean core parameter at init
$\beta = \frac{1}{d+1} \left(\sum_i \frac{|a_i(0)|}{\sqrt{\alpha_i}} + |b(0)|\right)$
—
shape parameters
$r_i = \frac{a_i(0)^2}{\alpha_i \beta^2}$
—
shape integral
$F(\mathbf{r}) = \left(\frac{\ell}{2} - 1\right) \int_0^\infty (s + r_b)^{-1/2} \prod_i (s + r_i)^{-\alpha_i/2} \, ds$
—
rise time
$t_{\mathrm{rise}} = \begin{cases}
c^{-2} & \text{if } \ell = 1, \\
-\frac{1}{c} \cdot \log(c\beta) & \text{if } \ell = 2, \\
\frac{1}{\ell - 2} \cdot \frac{1}{c} \cdot \frac{F(\mathbf{r})}{\beta^{\ell-2}} & \text{if } \ell > 2.
\end{cases}$
—