What Survives Scale
There is a question underneath most decisions about what to build in AI right now, and it is rarely asked in plain form: as model capability grows, what becomes worthless, and what does not?
The cleanest articulation of the threat comes from Rich Sutton's bitter lesson. The history of AI keeps teaching the same thing: general methods that ride on computation, search and learning, eventually overtake methods that encode human understanding of a problem. Researchers cannot resist pouring their domain insight into a system. It works in the short term. It becomes the ceiling in the long term, and then a method that simply scales compute walks past it.
If that is true, then the only rational thing to do is to build the part that scale does not eat. Which requires knowing how to tell the difference. And the obvious way to tell the difference is wrong in an instructive way.
The first cut, and why it fails
The tempting discriminator is: encoding a method gets eaten; encoding the structure of the world does not. A workflow that says "first do literature review, then generate hypotheses, then critique" encodes a method, a human prior about how the work should proceed, and a stronger model will dissolve it by decomposing the task itself. Whereas the fact that conclusions rest on evidence, that evidence has a source, that state changes over time, that is world-structure, and it stays.
This is a good heuristic. It is not a law, and it fails at exactly the place you need it most.
It fails because world-structure itself gets eaten. The history of science is a graveyard of confidently-held ontologies. Phlogiston was world-structure. The ether was world-structure. Vital force was the way the world was assumed to be carved at the joints. Each was an account of what kinds of things exist; and each was overturned. So "ontology survives, method dies" claims too much. Ontologies die all the time.
The better cut
What actually survives is narrower and stranger than "world-structure." It is the class of epistemic constraints; and the reason they survive is not that they describe the world correctly, but that a stronger model does not make them unnecessary.
Take the rule that you must not let something unverified look verified. A model a hundred times more capable does not retire this rule. It still cannot collapse the gap between a map and the territory; it lives on the map side of that gap like everything else. Stronger models make better maps. They do not make the distinction between "agreed" and "true" obsolete, because that distinction is not about how good the map is: it is about the fact that the map is a map.
So the discriminator is not method versus world-structure. It is domain shortcut versus epistemic constraint. A domain shortcut encodes how some problem is currently solved; it gets absorbed. An epistemic constraint encodes the irreducible structure of not-yet-knowing (provenance, the difference between consensus and truth, honest accounting of what rests on what), and it does not get absorbed, because no amount of capability removes the condition it responds to.
This is sharper because it explains the failure of the first cut. Ontologies get eaten when they are really domain shortcuts wearing the costume of world-structure: phlogiston was a particular era's best guess about combustion, dressed up as the nature of fire. The constraints that survive are the ones that encode the gap between any model and the world, not any particular model of it.
The unstable corner
There is one place where even this cleaner cut rests on a bet, and honesty requires naming it.
The reason epistemic constraints survive is that the map-territory gap is permanent. And the reason the gap is permanent is that some verifiers are irreducible: there exist questions whose ground truth requires the real world, in real time, at irreducible complexity, to answer. If that were false, if the territory could be fully simulated, if a good enough model could always stand in for reality's reply, then the gap would slowly close, "agreed" would creep toward "true," and the surviving constraints would become as quaint as some discarded ontology.
So the whole edifice rests on a single empirical wager: that the territory contains a residue no map can fully replace. There is reason to believe it. Any simulation is itself a map, so using simulation to stand in for the territory is using a map to verify a map: circular. But this is a structural argument from a premise about the world, not a theorem, and it could be overturned in some specific domain by a single empirical fact.
The honest position is therefore: the discriminator works, and the thing it protects is durable, as long as the territory keeps a residue. Bet that it does. But hold the bet as a bet.
Why this matters for what you build
The practical consequence is that the most defensible thing to build is rarely the most impressive-looking thing. The impressive thing, the system that autonomously does the whole job, is almost always a stack of domain shortcuts that look necessary right now precisely because they fill a gap in general capability. The gap is closing. The shortcuts have an expiration date written on them in invisible ink.
The durable thing is the scaffolding that a more capable model would still need: the honest record of where things stand, the refusal to let the unconfirmed pass as confirmed, the structure that lets work accumulate without quietly corrupting itself. It is less exciting. It is also the part that grows more valuable as the model gets stronger, rather than less, because a stronger model produces more work that needs to be kept honest.
The deepest version of the bitter lesson may be this: scale does not just beat hand-coded knowledge. It reveals, one capability jump at a time, which of the things we called structure were only scaffolding we built because we did not have enough compute to do without it.
什么能在规模化中幸存
当下关于"在 AI 领域该构建什么"的大多数决策背后,都潜藏着同一个问题,而这个问题很少被直白地提出:随着模型能力不断增长,什么会变得毫无价值,什么不会?
对这种威胁最清晰的表述来自理查德·萨顿(Rich Sutton)的苦涩的教训(the bitter lesson)。AI 的历史反复传授着同一件事:依靠计算、搜索和学习的通用方法,最终会超越那些将人类对问题的理解编码进去的方法。研究者总是忍不住把自己的领域洞见注入系统。短期内有效,长期却成为天花板,然后一个只是单纯扩大算力的方法便会超越它。
如果这是对的,那么唯一合理的做法就是构建规模无法吞噬的那部分。而这要求你能分辨哪些会被吞噬、哪些不会。而那个显而易见的区分方式是错的——但错得颇有教益。
第一刀:为什么行不通
一个诱人的区分标准是:编码方法的东西会被吞噬,编码世界结构的东西不会。 比如一个工作流规定"先做文献综述,再生成假设,再进行批判"——这编码的是一种方法,一种关于工作应如何进行的人类先验假设,更强的模型会通过自行分解任务来消解它。而"结论依赖于证据""证据有其来源""状态随时间变化"——这些是世界的结构,它们会留存下来。
作为启发式规则,这不错。但它不是定律,而且它恰恰在你最需要它的地方失效。
失效的原因是:世界结构本身也会被吞噬。 科学史就是一座由曾经被自信地持有的本体论堆成的墓场。燃素是世界结构,以太是世界结构,活力论是人们以为世界本就如此切分的方式。每一个都是关于"存在着什么类型的事物"的论述,每一个都被推翻了。所以"本体论能幸存,方法会消亡"这个说法断言得过多了。本体论一直在消亡。
更好的一刀
真正能幸存的东西比"世界结构"更窄、也更奇特。它是一类认识论约束(epistemic constraints);它们之所以能幸存,不是因为它们正确地描述了世界,而是因为更强的模型并不能使它们变得多余。
以这条规则为例:不能让未经验证的东西看起来像是已验证的。一个强大一百倍的模型也无法废除这条规则。它仍然无法消除地图(map)与疆域(territory)之间的鸿沟——它和其他一切一样,活在鸿沟的地图那一侧。更强的模型能绘制更好的地图,但它们无法让"被认同的"与"真实的"之间的区分变得过时,因为这个区分并不取决于地图画得多好,而取决于地图终究是地图这一事实。
因此,真正的区分标准不是方法与世界结构,而是领域捷径(domain shortcut)与认识论约束(epistemic constraint)。领域捷径编码的是某个问题当前的解决方式,它会被吸收。认识论约束编码的是"尚未知道"这一状态中不可化约的结构——出处(provenance)、共识与真理之间的差异、对"什么依赖于什么"的诚实记录——这些不会被吸收,因为再强的能力也无法消除它们所回应的那个条件。
这个区分更为锐利,因为它解释了第一刀为何失效。本体论之所以会被吞噬,是因为它们实质上是披着世界结构外衣的领域捷径:燃素是某个时代对燃烧的最佳猜测,却被包装成了火的本质。能幸存的约束,编码的是任何模型与世界之间的鸿沟,而非关于世界的某个特定模型。
不稳定的角落
即使是这把更干净的刀,也有一处建立在赌注之上,而诚实要求我们把它指出来。
认识论约束之所以能幸存,是因为地图与疆域之间的鸿沟(map-territory gap)是永久性的。而这个鸿沟之所以永久,是因为某些验证者(verifier)是不可化约的:存在这样的问题,其真值(ground truth)需要真实世界以真实时间、不可化约的复杂度来回答。如果这一点不成立——如果疆域可以被完全模拟,如果一个足够好的模型总能代替现实的回应——那么鸿沟就会逐渐闭合,"被认同的"会向"真实的"缓缓靠拢,而那些幸存的约束也终将变得像被丢弃的本体论一样过时。
因此,整个大厦建立在一个经验性的赌注之上:疆域包含着某种任何地图都无法完全替代的残余(residue)。有理由相信这一点。任何模拟本身就是一张地图,因此用模拟来代替疆域,就是用一张地图去验证另一张地图——这是循环论证。但这是一个从关于世界的前提出发的结构性论证,而非定理,它可能在某个具体领域被某个经验事实所推翻。
因此,诚实的立场是:这个区分标准是有效的,它所保护的东西是持久的,只要疆域保有其残余。赌它保有。但要把这个赌注当作赌注来持有。
对你构建什么的意义
实践上的结论是:最具防御性的东西,往往不是看起来最令人印象深刻的东西。那个令人印象深刻的东西——自主完成整个工作的系统——几乎总是一堆领域捷径的堆叠,它们之所以现在看起来不可或缺,恰恰是因为它们填补了通用能力中的空白。而这个空白正在闭合。那些捷径的过期日期已经用隐形墨水写在了上面。
持久的东西是一种即使模型更强也仍然需要的脚手架:关于事物当前状态的诚实记录,拒绝让未经确认的东西冒充已确认的东西,让工作得以积累而不会悄然自我腐蚀的结构。它没那么激动人心,但恰恰是那个随着模型变强而变得更有价值——而非更无价值——的部分,因为更强的模型产出更多需要被诚实守护的工作。
苦涩的教训最深层的版本或许是这样的:规模不仅仅击败了手工编码的知识,它还在每一次能力跃迁中揭示出,我们曾称之为"结构"的那些东西里,哪些其实只是因为算力不够而不得不搭建的临时脚手架。