Probably Approximately Correct - Unsu's Digital Garden

![rw-book-cover](https://images-na.ssl-images-amazon.com/images/I/41FUYMXY%2B4L._SL200_.jpg) ## Metadata - Author: [[Leslie Valiant]] - Full Title: Probably Approximately Correct - Category: #books ## Highlights - Those aspects of knowledge for which there is a good predictive theory, typically a mathematical or scientific one, will be called theoryful. The rest will be called theoryless. ([Location 118](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=118)) - This book is based on two central tenets. The first is that the coping mechanisms with which life abounds are all the result of learning from the environment. The second is that this learning is done by concrete mechanisms that can be understood by the methods of computer science. ([Location 139](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=139)) - The variety of applications of computation to domains of human interest is a totally unexpected discovery of the last century. There is no trace of anyone a hundred years ago having anticipated it. It is a truly awesome phenomenon. Each of us can identify our own different way of being impacted by the range of applications that computers now offer. A few years ago I was interested in the capabilities of a certain model of the brain. In a short, hermit-like span of a few weeks I ran a simulation of this model on my laptop and wrote up a paper based on the calculations performed by my laptop. I used a word processor on the same laptop to write and edit the article. I then emailed it off to a journal again from that laptop. This may sound unremarkable to the present-day reader, but a few generations ago, who would have thought that one device could perform such a variety of tasks? Indeed, while for most ideas some long and complex history can be traced, the modern notion of computation emerged remarkably suddenly, and in a most complete form, in a single paper published by Alan Turing in 1936.2 ([Location 148](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=148)) - At the heart of my thesis here is a mathematical definition of learning. It is called the PAC or the probably approximately correct model of learning, and its main features are the following:3 The learning process is carried out by a concrete computation that takes a limited number of steps. Organisms cannot spend so long computing that they have no time for anything else or die before they finish. Also, the computation requires only a similarly limited number of interactions with the world during learning. Learning should enable organisms to categorize new information with at most a small error rate. Also, the definition has to acknowledge that induction is not logically fail-safe: If the world suddenly changes, then one should not expect or require good generalization into the future. ([Location 203](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=203)) - The primary purpose of the ecorithm is to change the circuits so that they will behave better in the environment in the future and produce a better outcome for the owner. ([Location 218](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=218)) - In human terms, the fact that we can go through a typical day, one that may include many events and interactions with others, and be seldom surprised is testament surely of our good predictive talents. Of course, the domains in which we make these reliable predictions often relate only to everyday life—what other people will say or other drivers do. They are mundane, almost by definition. But even mundane predictions become mystifying once one tries to understand the process by which the predictions are being made, or tries to reproduce them in a computer. ([Location 283](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=283)) - From this viewpoint, the general disappointment that the world financial crisis had not been better predicted was not based entirely on naïve illusion. It was based on the well-justified high regard we have for our predictive abilities, and so it would be clearly to our advantage to identify why they failed. It may be that the world was changing in such a random fashion that the past did not even implicitly contain reliable information about the future. Or perhaps the past did indeed contain this information, but that it was somehow so complex that it was not practically feasible to dig it out. A third case is that prediction was indeed feasible, but the wrong algorithm or the wrong data had been used. The study of ecorithms is concerned with delineating among these possibilities. ([Location 287](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=287)) - when seeking to understand the fundamental character of life, learning algorithms are a good place to start. ([Location 306](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=306)) - In the words of computer scientist Donald Knuth, “Science is what we understand well enough to explain to a computer. Art is everything else we do.” ([Location 331](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=331)) - That evolution could work in principle in some infinite limit is obvious and needs little discussion. But modern humans are believed to have existed for no more than about 10,000 generations and with modest population sizes for much of that history. Our predecessor species may have had not dissimilar statistics. Theories of evolution that assume unbounded resources for evolution, in generations or population sizes, or those that do not address this issue at all, cannot resolve the central scientific question of whether some instance of natural selection does fit the constraints that have ruled in this universe. ([Location 408](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=408)) - Computer science is no more about computers than astronomy is about telescopes. EDSGER DIJKSTRA ([Location 467](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=467)) - Turing’s paper contained several ingredients that are now seen as fundamental to the study of computation. First, he described a model, now called the Turing machine, that captures the phenomenon he was attempting to describe, namely that of mechanistic step-by-step procedures. Second, he proved a strong possibility result for what can be achieved on his model. In particular, he showed how to design a universal Turing machine that is capable of executing every possible mechanical procedure. This universality property is what enables computer technology to be so pervasively useful, and would be utterly astonishing were it not so commonplace now by virtue of its effectiveness. Third, Turing also proved a strong impossibility result, that not all well-defined mathematical problems can be solved mechanically. ([Location 496](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=496)) - We have seen that an essential ingredient of the Turing methodology is that of defining a model of computation that captures a real-world phenomenon, in this case that of mechanical processes, including those that no one had (or has yet) envisaged. That last part is crucial: With his machine, Turing aimed to capture all processes a human could exploit while performing a mental task that can be regarded as mechanical as opposed to requiring creativity or inspiration. The audaciousness of the attempt has attracted many who would prove Turing’s machine insufficient to the power Turing claimed for it. However, when different individuals have tried to define their own notions of mechanical processes in hopes of creating models of greater power, all the models they have devised—no matter how different they may seem—could be proved to have no greater capabilities than those of Turing machines. For example, having two tapes, or five tapes, or a two-dimensional tape adds no new power. Similarly, allowing the program to make random decisions, or transitions that have the parallelism suggested by quantum mechanics, also adds no new capabilities. Extensive efforts at finding models that have greater power than Turing machines, but still correspond to what one would instinctively regard as mechanical processes, have all failed. Therefore there is now overwhelming historical evidence that Turing’s notion of computability is highly robust to variation in definition. This has placed Turing computability among the most securely established theories known to science. ([Location 541](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=541)) - For learning and evolution, robust models are as indispensable as they are for general computation. Without this robustness the value of any model or theory is questionable. We are not interested in properties of arbitrary formalisms. We want some assurance that we have captured the characteristics of some real-world phenomenon. Robustness of models is the only known source of such assurance. ([Location 552](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=552)) - Prior to Turing, mathematics was dominated by the continuous mathematics used to describe physics, in which (classically, anyway) changes are thought of as taking place in arbitrarily small, infinitesimal increments. The Turing machine, however, is a discrete model. Before his time, discrete mathematics had been little explored or developed; in fact, a seldom discussed influence of Turing’s work is the rise of discrete mathematics subsequent to it. It is striking that for the phenomena that we shall study here, including learning and evolution, discrete models again provide the most immediate robust models and have been most useful in isolating the basic phenomena. Continuous models are ultimately of at least as great interest, but for the initial explorations necessary to identify the most fundamental concepts they are not the most fruitful. ([Location 591](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=591)) - For long multiplication the actual number of operations on individual pairs of digits may be 4n2 or 5n2 or cn2 for some fixed number c, depending on what exactly you consider an operation. It will not, however, grow faster than n2, such as n3 or n4; nor will it grow more slowly, such as n1.9. We can describe the order of growth using what is known as O notation, while omitting the less important detail of the value of c. We simply say that the long multiplication algorithm is an O(n2) algorithm. ([Location 621](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=621)) - While the boundary between the practically computable and the infeasible is not sharp, the polynomial time criterion is the most convenient place that has been found to put that boundary. Clearly, polynomial time with high exponent, such as n100, is as infeasible in practice as exponential time, even for modest values of n. However, the polynomial versus exponential distinction has proved very useful, simply because the majority of algorithms that are known for important problems conveniently dichotomize between feasibly low degree polynomials, such as quadratic, O(n2), and proper exponentials, such as 2n. Hence, for reasons that are not understood, this polynomial versus exponential criterion is more useful in practice than its bare definition justifies. ([Location 670](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=670)) - One can extend the definition of standard deterministic Turing machines to allow them to make decisions according to the toss of a coin in this way. These are called randomized Turing machines. The corresponding polynomial class is called BPP, for bounded probabilistic polynomial time. It is possible that P and BPP are mathematically identical, in which case every computation that uses randomization could be simulated in polynomial time by one without it. This question of whether P and BPP are equal is currently unresolved. A class broader still than BPP is called BQP, for bounded quantum polynomial time. This class is inspired by quantum physics, which posits that a physical system can be in multiple states at the same time, in a certain specific sense. ([Location 689](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=689)) - One can define the class PhysP to be the maximal class of problems that the physical universe we live in permits to be computed in polynomial time. Identifying the limits of the class PhysP would appear to be one of the great scientific questions of our time. BQP is a natural candidate. If it turns out not to be realizable, then BPP is the most natural alternative known. ([Location 707](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=707)) - One thing we do know is that each of these three classes—P, BPP, and BQP—itself has substantial robustness under variation. Attempts to characterize deterministic, randomized, or quantum computations have yielded just one good candidate computational model for each class. This robustness for the randomized and quantum classes is known only for problems with yes/no answers. This currently leaves two main candidates, BPP and BQP, for the practically computable yes/no problems in this physical world. ([Location 714](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=714)) - A celebrated class of problems is the so-called NP, or nondeterministic polynomial time, class. These are characterized as the problems for which solutions may or may not be hard to find but for which a candidate solution is easily verified. ([Location 752](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=752)) - The primality problem is the problem of determining for an arbitrary number x whether it has factors other than itself and 1. It is an NP problem since this verification of a particular candidate solution can be done in polynomial time (i.e., as we have observed, given an n-digit number x and two further numbers p and q, we can verify whether pq = x in O(n2) steps). It turns out that, in fact, there do exist some very clever algorithms that can determine in polynomial time whether a number is prime. They reveal whether factors exist, but, curiously, not what these factors are.10 Hence this particular NP problem of determining primality is in fact also in P. ([Location 762](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=762)) - The importance of NP is that it captures the very general process of mental search.12 We call these problems mental search because they can be solved by searching objects one generates internally in one’s head or computer. ([Location 780](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=780)) - Remarkably, it has been shown that all the members of a very large class of NP problems are in fact provably equivalent to each other, in the sense that a polynomial time algorithm for one would give a polynomial time algorithm for any other. This class has the further remarkable property that each member is provably the hardest member of NP. In other words, for these so-called NP-complete problems, no one currently knows a polynomial time algorithm for any of them, but if someone did find such an algorithm for any one, then polynomial time algorithms would follow for all problems in NP. ([Location 790](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=790)) - Because intensive efforts to find polynomial time algorithms for NP-complete problems have to date failed, many currently conjecture that P ≠ NP, or equivalently, that no polynomial time algorithm exists for NP-complete problems. (NP-complete problems are similarly conjectured not to be in BPP or QBP either.) Whether this conjecture is in fact a computational law, like Turing’s proven assertion that the Halting Problem is not computable, is potentially resolvable, and I expect that it will be proved or disproved one day. ([Location 809](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=809)) - More interestingly, there are many natural problems where testing whether there exists a solution is in P but counting their number is #P-complete. This means that while the existence of solutions can be detected fast, counting the number of solutions is as hard as for NP-complete problems.15 Examples of such problems abound in the context of reliability—for example, where one wants to determine the probability that a complex network or system will fail from the failure probabilities of the components. ([Location 827](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=827)) - I hope it will not shock experimental physicists too much if I say that we do not accept their observations unless they are confirmed by theory. ARTHUR EDDINGTON1 ([Location 985](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=985)) - It may seem elementary, but it is still noteworthy that the way the information is represented in DNA is the same as it is represented in a Turing machine, as a sequence of symbols from a fixed alphabet. In the case of DNA the alphabet has the four symbols A, G, T, C, standing for the four nucleobases. And as Turing showed, a one-dimensional sequence of symbols from a fixed finite alphabet can describe and support all computations. ([Location 999](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=999)) - After Alan Turing died, Max Newman, his mentor and friend, described in an obituary the central theme that had inspired Turing’s many contributions to science: “The varied titles of Turing’s published work disguise its unity of purpose. The central problem with which he started, and to which he constantly returned, is the extent and the limitations of mechanistic explanations of nature.” ([Location 1054](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=1054)) - Some have complained that the favored metaphor for the brain in every age has been the most complicated mechanism known at the time. Since the computer is currently that most complex mechanism, is it not a fallacy to adopt that metaphor? I would argue that the computer analogy goes beyond the fact that the computer is another complicated mechanism. What makes it different this time is the widely agreed universality of computation over all processes that we regard as mechanistic. ([Location 1074](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=1074)) - Turing and von Neumann both shared a second crucial insight: that mathematical logic, from which computation theory had emerged, was not the right grounding for a computational description of either thinking or life. In particular, Turing has the following memorable conclusion to his paper describing noncomputability results in logic: “The results which have been described in this article are mainly of a negative character, setting certain bounds to what we can hope to achieve purely by reasoning. These, and some other results of mathematical logic, may be regarded as going some way toward a demonstration, within mathematics itself, of the inadequacy of ‘reason’ unsupported by common sense.”1 This passage may be the first occurrence in science of the idea that common sense is somehow superior to reason. It foreshadows ample computational experience in the years that followed. While computers are extremely good at reasoning using mathematical logic, they find common sense much more challenging. ([Location 1078](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=1078)) - The critical idea in PAC learning is that both statistical as well as computational phenomena are acknowledged, and both are quantified. There have been earlier attempts to model induction using purely computational or purely statistical notions.8 However, I believe that combining the computational and the statistical provides the key to understanding the rich variety of learning phenomena that exist. ([Location 1353](https://readwise.io/to_kindle?action=open&asin=B00BE650IQ&location=1353))