Great explanation, but the last question is quite simple. You determine the weights via brute force. Simply running a large amount of data where you have the input as well as the correct output (handwriting to text in this case).
ggambetta•Feb 6, 2026
"Brute force" would be trying random weights and keeping the best performing model. Backpropagation is compute-intensive but I wouldn't call it "brute force".
Ygg2•Feb 6, 2026
"Brute force" here is about the amount of data you're ingesting. It's no Alpha Zero, that will learn from scratch.
jazzpush2•Feb 6, 2026
What? Either option requires sufficient data. Brute force implies iterating over all combinations until you find the best weights. Back-prop is an optimization technique.
Lovely visualization. I like the very concrete depiction of middle layers "recognizing features", that make the whole machine feel more plausible. I'm also a fan of visualizing things, but I think its important to appreciate that some things (like 10,000 dimension vector as the input, or even a 100 dimension vector as an output) can't be concretely visualized, and you have to develop intuitions in more roundabout ways.
I hope make more of these, I'd love to see a transformer presented more clearly.
ge96•Feb 6, 2026
I like the style of the site it has a "vintage" look
Don't think it's moire effect but yeah looking at the pattern
This visualizations reminds me of the 3blue1brown videos.
giancarlostoro•Feb 6, 2026
I was thinking the same thing. Its at least the same description.
pks016•Feb 6, 2026
Great visualization!
anon291•Feb 6, 2026
Nice visuals, but misses the mark. Neural networks transform vector spaces, and collect points into bins. This visualization shows the structure of the computation. This is akin to displaying a Matrix vector multiplication in Wx + b notation, except W,x,and b have more exciting displays.
It completely misses the mark on what it means to 'weight' (linearly transform), bias (affine transform) and then non-linearly transform (i.e, 'collect') points into bins
titzer•Feb 6, 2026
> but misses the mark
It doesn't match the pictures in your head, but it nevertheless does present a mental representation the author (and presumably some readers) find useful.
Instead of nitpicking, perhaps pointing to a better visualization (like maybe this video: https://www.youtube.com/watch?v=ChfEO8l-fas) could help others learn. Otherwise it's just frustrating to read comments like this.
artemonster•Feb 6, 2026
I get 3fps on my chrome, most likely due to disabled HW acceleration
nerdsniper•Feb 6, 2026
High FPS on Safari M2 MBP.
8cvor6j844qw_d6•Feb 6, 2026
Oh wow, this looks like a 3d render of a perceptron when I started reading about neural networks. I guess essentially neural networks are built based on that idea? Inputs > weight function to to adjust the final output to desired values?
adammarples•Feb 6, 2026
Yes, vanilla neural networks are just lots of perceptrons
sva_•Feb 6, 2026
A neural network is basically a multilayer perceptron
The layers themselves are basically perceptrons, not really any different to a generalized linear model.
The ‘secret sauce’ in a deep network is the hidden layer with a non-linear activation function. Without that you could simplify all the layers to a linear model.
18 Comments
If you want to understand neural networks, keep going.
I hope make more of these, I'd love to see a transformer presented more clearly.
Don't think it's moire effect but yeah looking at the pattern
<https://visualrambling.space/dithering-part-1/>
<https://visualrambling.space/dithering-part-2/>
That's cool, rendering shades in the old days
Man those graphics are so good damn
It completely misses the mark on what it means to 'weight' (linearly transform), bias (affine transform) and then non-linearly transform (i.e, 'collect') points into bins
It doesn't match the pictures in your head, but it nevertheless does present a mental representation the author (and presumably some readers) find useful.
Instead of nitpicking, perhaps pointing to a better visualization (like maybe this video: https://www.youtube.com/watch?v=ChfEO8l-fas) could help others learn. Otherwise it's just frustrating to read comments like this.
https://en.wikipedia.org/wiki/Multilayer_perceptron
The ‘secret sauce’ in a deep network is the hidden layer with a non-linear activation function. Without that you could simplify all the layers to a linear model.
https://mlu-explain.github.io/neural-networks/