Suppose you want to become a skilled chef. You could get a job as a line cook at a restaurant, and learn all you could from the chef there. You could read cooking books, try lots of different recipes, and slowly develop technical skill. But you probably wouldn’t learn the underlying chemistry of how food interacts with human taste receptors.
Similarly, software engineers rarely learn how NAND gates are physically composed during their training, and students of literature rarely train in linguistics. It’s not that these sorts of knowledge are useless - in fact, hearing that someone had them would be a pleasant surprise. But they’re rarely a priority.
On the other hand, students of psychology do learn about some basic neurotransmitters, even though they’re focused on a higher level epiphenomenon. And this is more subjective, but I think great music critics probably have some understanding of music theory, even if they’re unlikely to mention it in their reviews.
So what’s the difference? When is it valuable to understand something at a lower level of abstraction, and when is it merely a bonus? I think it comes down to about three things: novelty, certainty, and complexity.
Novelty
If some field of study is new, then there isn’t much knowledge built up about it yet, and it may pay to study the fundamentals. Cooking has been going on forever - you could learn recipes for your entire life and never exhaust them all. On the other hand, you’re unlikely to figure out a fundamentally new cooking technique that leverages a little-known property of umami. So your time is probably well spent just hanging out at the level of cooking and eating, rather than digging deeper.
Suppose, instead, that you’re a competitive pickleball player. Pickleball has only been going on for a few years, so you might get a strong edge paying attention to underlying sports fundamentals and the precise boundaries of the rules, to figure out original moves and beat the competition.
All else equal, it’s good to understand newer domains at a deeper level of abstraction, and timeless ones at the object level.
Certainty
If you’re studying theoretical physics, you can bet that your models bottom out in math. If some physics equation uses techniques from calculus, it’s a safe bet you’ll need to understand calculus. The lower level of abstraction is totally unambiguous, and the dependencies are clear.
If you’re studying dance, on the other hand, the lower level of abstraction is a total mystery. The kinesthetics of the human body, as understood by a physical therapist? Which movements best display reproductive fitness? Psychological freedom from inhibition? Ask a dozen people, and you’d get a dozen answers.
When something is unambiguously determined by some set of dependencies, it’s probably helpful to understand those dependencies. If it’s less certain what something arises out of, it might be better to just focus on the thing itself.
Complexity
What’s the point of abstraction, anyway? In some cases, like cooking, an abstraction is closer to our direct experience as human beings. We don’t experience firings of specific taste receptors as such, even if that’s technically what’s going on under the hood. But where abstractions are actively manmade, it’s to bundle up complexity. A computer programmer uses a high level programming language because it would be too complex to micromanage the physical flow of electricity through their computer.
So if something’s really, really complicated, it might be best to engage with it in a bundled, abstracted form. Even in domains like math where it’s standard practice for skilled specialists to build up from the fundamental building blocks, you don’t recite the commutative property under your breath every time you do a calculus problem.
If something is secretly not so complicated, after all, and the higher level of abstraction isn’t actually bundling that much detail, it might be worth it to go a level deeper and figure out the nuts and bolts. Calculus courses for math majors cover epsilon-delta proofs for a reason: they’re simply not that complicated, and really do undergird a lot of the subject.
So what?
As my last post probably makes obvious, I’m trying to figure out how transformers work. I use LLMs often, and I expect them to be a big deal in the coming years, so I want to understand them better. But is going deeper really the best way to do that? Tons of great research on LLMs engages with them purely on their own terms, as input-output devices rather than as aggregations of multivariate calculus and linear algebra.
So, sure. I’ll use LLMs as an example, and motivate my recent study.
Novelty: LLMs are super new - transformers were introduced to the world just seven years ago. And while a lot of dedicated energy (and noise) has gone into their study, there’s only so much to learn at the surface level.
Certainty: It is a matter of fact how transformers work - their outputs are absolutely generated by specific kinds of mathematical transformations.
Complexity: It’s not actually that hard to learn how transformers function; undergraduate level abstract algebra took me six months to learn, while (autoregressive) transformers inference has only taken a week and a half.
Did I invent this framework under highly confounded conditions, then use it to justify something I was already doing? Yes. Am I also doing that thing because it’s fun and makes me feel cool? Also yes.
I do think it’s worth it, though. It’s easy to assume that the fundamentals behind something interesting are unmanageably hard, but, well. Only one way to find out! And maybe that, for getting into the weeds, is the best argument of all.