Hyper-v
Softmax Linear Unit (softmax(x)*x) Neurons – Preliminary Results
Softmax Linear Unit (softmax(x)*x) Neurons – Preliminary Results
#Softmax #Linear #Unit #softmaxxx #Neurons #Preliminary #Results
“Mechanistic Interpretability”
In previous work, we’ve found the MLP layers of transformers very difficult to understand. Switching the activation function to be softmax(x)*x — which we call “softmax linear” units (SoLU) — seems to greatly help.
—
As an experiment, we’ve been recording a couple videos discussing our…
source
To see the full content, share this page by clicking one of the buttons below |