top of page

The Geometry of Error: Why Loss Shapes Change

1583104838559792769nonconvex_bce_loss_surface_bmi_sugar_3d_static_major_shifts.gif

If you’ve done even a bit of Machine Learning, you know this:

ML works because of loss functions.
They give direction — literally. A loss function is the quantity the model tries to minimize. For regression, the most common choice is Mean Squared Error (MSE):

MSE = (1/n) Σ (yi − ŷi)²

It optimizes for the smallest average squared error across training points.

But here’s a deeper question:

If MSE is “the same formula”, why does optimization look different every time the dataset changes?

Let’s break it down.

yi = actual target
ŷi = predicted target

For simple Linear Regression (1 feature):

ŷi = w₁xi + w₀

So MSE becomes:

(1/n) Σ (yi − (w₁xi + w₀))²

Now notice something critical:

👉 The xi values come from the dataset.

When the dataset changes:

  • xi changes

  • yi changes

  • their distribution changes

Even though the form of MSE stays the same, the surface we optimize over completely changes.

Same loss.
Different geometry.

If data is well spread and linear → smooth bowl-shaped surface.
If data is noisy or complex → steeper curves, warped valleys.

And the more irregular the surface, the harder it is to reach the minimum.

That’s why we repeatedly need:
• Gradient Descent
• Momentum
• Adam
• Other optimization tricks

Because optimization difficulty is not just about the formula. It’s about the geometry induced by the data.

Same loss function.
Different landscape.

+91 9731888441

278 6th A Cross, RPC Layout, Bangalore, India

  • Facebook
  • Instagram
  • X
  • TikTok

Stay Connected with Us

!
Widget Didn’t Load
Check your internet and refresh this page.
If that doesn’t work, contact us.

© 2035 by The art of ML. Powered and secured by Wix 

bottom of page