Linguistics as a Cognitive Science
Linguistics 260 286 01 and 260 386 01
John Goldsmith
Winter 2000
Cobb 214
Last updated: January 10, 2000
Time: Tuesday/Thursday 9 AM - 10:20 AM
Abstract of the course:
This course is intended for the upper-level undergraduate, though we're happy to include graduate students too. I taught this course last year (Fall 1998) when it consisted mainly of graduate students, and the document that follows outlines the material that we covered. When I teach it again in the Winter quarter of 2000, I'll give more background in linguistics as well as in the other topics.
The first large part of the course will cover the history of treatments of mind from a cognitive point of view, divided into roughly four parts: (i) before the 20th Century; (ii) the immediate post-war period, the period of great excitement associated with "cybernetics"; (iii) the first cognitive revolution, lasting roughly from 1956 to 1966; (iv) and the second cognitive revolution, beginning in the late 1980s and lasting through today.
In the second part of the course, we'll talk about neural nets and related ideas, and link these ideas to questions of language and linguistics. We'll look at some particular studies of language from a quantitative point of view, and consider briefly some larger issues regarding evolution, innateness, and complexity
The questions that will form the unifying thread of the discussion in the class will be the following two. The first puts the emphasis on the more traditional symbolic way of thinking, and the second puts the emphasis on the modes of thinking associated with the ways of thinking that will be emphasized in the second half of the course:
1. What is a syntactic category, and what is a syntactic constituent? Are there constituents in morphology and phonology in the same sense?
2. What kind of knowledge can be quantified, i.e., be expressed on a scale of 0 to 1? What kind cannot be? Is there a kind of formal device which is not enhanced by thinking about it in a quantifiable fashion?
![]()
Texts:
Mind's New Science, by Howard Gardner, Basic Books
Artificial Intelligence, the very idea, by John Haugeland , MIT Press
Recommended:
The Quark and the Jaguar, by Murray Gell-Mann (Freeman)
Organization of the course: Winter 2000
Week 1. Overview.2000_Class1.ppt - Powerpoint for Class 1 (overview)
Week 2
Powerpoint: Philosophical background;
Week 3
History of psychology: See http://serendip.brynmawr.edu/Mind/Table.html
Week 4
Powerpoint: First cognitive revolution;
Week 5
Neural netwrks
Powerpoint: Connectionism 1;
Week 6
Week 7
N-spacesPowerpoint slides on n-spaces, vectors, matrices, and the like
Week 8
probabilityPowerpoint slides on probability
Week 9
a. entropy and information
b. minimum description length
Related topic: Zipf's law: see Wentian Li's page: http://linkage.rockefeller.edu/wli/zipf/
Week 10
Automatic learning of morphology
Linguistics in the context of the cognitive and computational sciences:
Big questions, like the relations among:
Material from last year's syllabus....
Part III Modeling intelligence and modeling learning.
Babies learn human languages; how hard can it be to get a computer to do it?
Innateness, and unsupervised learning.
Evolution and genetic algorithms.
Self-organization and emergence.
Readings: See the end of this document.
![]()
More detailed contents of the course as offered in 1998
Part 1: Overview
Week 1. Overview
Readings:
Highly recommended: "Some Historical Notes" by Wilfrid Rall, and "Brain Metaphor and Brain Theory"; these are Chapters 1 and 2 of Computational Neuroscience, ed. Eric Schwartz, MIT Press (1990) -- an excellent collection, by the way, expressing the point of view of the network-inclined in the late 1980s.
and: "The Prehistory of Android Epistemology," by Clark Glymour, Kenneth Ford, and Patrick Hayes. 1995. Reprinted in George F. Luger (ed.) Computation and Intelligence, MIT Press, 1995 (there are several very good papers in this collection).
Two good book-length studies of the rise of "cognitive science" are:
Also very good is Aux origines des sciences cognitives, by Jean-Pierre Dupuy, focusing on the first wave of cognitive science, that which flew under the banner of cybernetics: the days of Wiener, von Neumann, McCulloch and Pitts. (La Découverte 1994). Click here for more on Wiener.
Highly recommended are several readings in Neurocomputing: Foundations of Research, MIT Press 1988, ed. James A. Anderson and Edward Rosenfeld (note that there are 2 volumes in this series of reprints, both of which are outstanding, and deserve a place in everyone's library):
David Marr, Vision (1982) : Chapter 1: The Philosophy and the Approach.
History of psychology: See http://serendip.brynmawr.edu/Mind/Table.html
Some basic themes:
1. Is the mind a computer? Is it like a computer? If so, like what kind of computer? What is a computer? Data, memory, and program in a computer; logical branching in programs.
2. Is knowledge of language a kind of thought or thought process? Is the study of language the study of cognition? Central areas of cognition: reasoning, problem-solving, perception, categorization, memory, learning...music, mathematics.... In what ways is knowledge of language similar and different from these other faculties? In what way is language a necessary condition for these other faculties? Does that matter? One tradition puts language, thought, and cognition as central exemplars of cognitive processes. Another puts the more "conscious" activities at the center, such as problem-solving, planning, and attention. If we equate cognitive studies with studies of human information-processing, where does that situate cognitive studies?
There are three crude ways for language and cognition to relate to one another: which one is right? or is some combination of 2 and 3 correct?
3. Search versus inference as the central metaphor for modeling cognition.
Pylyshyn (p. 193)
"The discussion of cognition...portrays mental activity as involving consideration ratiocination, or "reasoning through."....I argue that rationality and the application of truth-preserving operations remains the unmarked case....the mind is depicted as continually engaged in rapid, largely unconscious searching, remembering and reasoning and generally in manipulating knowledge -- that is, "cognizing." [what kind of search, though...]Reading: Marvin Minsky "Steps Toward Artificial Intelligence" (Proceedings of the Institute of Radio Engineers (IEEE) January 1961) reprinted in Luger, on the notion of search, hill-climbing, solution spaces. Reinforcement and learning. Credit-assignment problem.
4. Algorithms; resources required, and limitations on computational resources.
5. Memory; learning.
(
Pylyshyn (65): ...Chomsky and others' methodological injunction not to hypothesize learning mechanisms for the acquisition of certain skills until one has a good theory of the steady-state skill itself.")6. Time. What role does time play in studies of cognition and language? In what ways do theories of language use the fact that speech occurs in time? Our linguistic representations on a blackboard invoke the implicit assumption that all of a representation is computationally accessible at the same time, and make no use of the unrolling of a representation in time. In the physical word, causality is unidirectional: the past influences the future, but not vice versa. In linguistics, the later can influence the earlier just as easily as the earlier can influence the later. Also: discrete versus continuous character of linguistic time and real time.
7. Maximizing an objective function; the mathematics of that.
8. Marr's three levels of analysis: computation, algorithm, mechanism.
What does this mean in linguistics? Example: Dynamic computational nets (Goldsmith and Larson). Categorization; prototypes; lists of properties; invariant properties.
9. Conscious versus unconscious knowledge.
10. The difference between architectural assumptions (what linguists often call UG) and the search for emergent properties.
11. How do computational systems give linguists the structures they want and/or need? Examples: 1. how to deal redundancy (e.g., Jackendoff 1975 on the lexicon; lexical phonology on markedness and the lexicon). Also: Linear representations: phonology, syntax; hierarchical structure in syntax; autolexical representations; content-addressable memory for the lexicon: hashes and hopfield nets.
12. Everyone agrees that scientific analysis makes decomposition into component parts. The symbolic paradigm sees the component parts as computational entities which have labels and labeled pointers (where pointers point to other computational entities). The alternative connectionist (in a very broad sense) paradigm sees the component parts as the "dimensions" (that is, the basis vectors) of a large space, and the system being studied is a location in that n-space.
13. It would be good to talk about desiderata of theory-construction. Chomsky has argued for lean theories with little redundancy and "rich deductive structures". Bill Wimsatt has made the case for "robust" theories -- that are reliable and overdetermined. Early cognitive scientists (notably von Neumann) were concerned with the properties that make for reliability -- in nervous systems, in computers, and so forth. See Wimsatt "Robustness, Reliability, and Overdetermination," in M. Brewer and B. Collins, Eds. Scientific Inquiry and the Social Sciences. Jossey-
Bass, San Francisco, 1981 ,pp. 124-163.
Classes this week:
Class1. Overview of the whole quarter's syllabus.
What are we talking about (when we talk about cognitive science)? Cognitive metatheory: (Baars) "...a belief that psychology studies behavior in order to infer unobservable explanatory constructs, such as "memory," "attention," and "meaning." (144). "The cognitive revolution took place in many places at the same time, and involved a number of areas, including memory, language, imagery, and attention. (147)...a metatheory that encourages one to infer unobservable theoretical constructs from empirical observations. (158).
Cognitive science (Gardner) : "a contemporary, empirically based effort to answer long-standing epistemological questions -- particularly those concerned with the nature of knowledge, its components, its sources, its development, and its deployment."(6)
To summarize, we can talk of a first cognitive revolution, the one that took place in the 1950s (and which is the hero of Gardner's book, inter alia) -- and a second -- well, if not revolution, then period of civil discontent, starting in the mid 1980s and continuing to this day. The first cognitive revolution was based on the rise of the computer and the metaphor of mind as an information-processor. The second skirmish pits the symbolic view of cognition against...subsymbolic, or dynamical, or connectionist points of view. These two periods are the theme of this course, and our goal is to understand the points of view, and to learn enough about them to let them influence the way we think.
Class 2. The origins of cognitive science.
The rise of the notion of law and science, with Isaac Newton's (1643-1727) work the archetype. Descartes had resisted the assimilation of mind to matter (1595-1650), in part because of the paradox which is still not resolved: how can two sets of laws, apparently oblivious to each other, both describe the same real-time event: the laws of physics and (say) the laws of baseball? But the successes of physics strengthened the desire to provide a mechanistic account of man's central faculties, his reason (and language). Julien de la Mettrie (1748) L'homme machine. 19th century work on logic: Boole, Pierce. David Hilbert's project in mathematics. Turing. Information theory (Clause Shannon). The rise of the modern computer. Minsky, Papert, McCarthy, Simon.
Psychologists outside the US, primarily: Gestalt psychology in German (Kurt Lewin); Jean Piaget in Switzerland; Vygotsky and A.R. Luria in Russia. Starting in the 1950s in the US: George Miller, Jerome Bruner, Herbert Simon.
Viewed up close, the cognitive revolution appeared to many (and appealed to them, for this reason) as a methodological liberation. Baars (op.cit.) makes this point clearly:
The strict behaviorist (such as John B. Watson) put strict methodological constraints: The peculiarity of scientific psychology is...that scientists are conducting their research as observers, standing outside their subject matter, even as each scientists is ultimately an insider to the subject matter of psychology. Watson's solution to this dilemma was radical: let us pretend that we are ultimate outsiders, that we are studying animals and people only as bodies moving through space. In Watson's view, then, there is no room in psychology for anything that cannot be externally observed: no room for consciousness, purpose, thought, meaning, feelings, imagery, self, and the like. (43).
Baars: "...cognitive psychology is primarily a metatheory for psychology, one that simply encourages psychologists to do theory, relatively free from prior philosophical constraints." (144) (emphasis in original). On Baars' view, this liberation derived from two trends: "First developments associated with the theory of computation led some psychologists and neurophysiologists to view the nervous system as a kind of information processor, a theoretical metaphor that made it legitimate to think in terms of goals and representations." (and second, psychologists were able to find ways to use behaviorist experimental paradigms to test such constructs as mental imagery, etc.)
Information theory (Shannon, 1947) performed a slam-dunk repudiation of the notion that cognitive content was ethereal and unquantifiable. Though Shannon's information is not strictly equivalent to the everyday common sense notion of information, it was close enough (and the names were close enough) to make it plain that the shoe was now on the other foot: information was henceforth on the side of hard, cold science.
1946-1953: a series of Macy Foundation-sponsored meetings bringing together mathematicians, psychologists, etc, including: Gregory Bateson, Margaret Mead, John von Neumann, Norbert Wiener, Warren McCulloch, Walter Pitts, Wolfgang Köhler, Kurt Lewin, and many others -- investigating the parallels between machines and living organisms (Baars, p. 155, citing Heims 1975). See the book by Dupuy cited at the top of this document.
1956: MIT IRE Symposium on information theory: George Miller, Noam Chomsky, Jerome Bruner, Newell, Simon.
von Neumann's posthumous The Computer and the Brain (1958).
Class 3. Some background in philosophy.
Philosophy of mind. Several themes in the history of philosophy have had an impact on the origin of cognitive science. Modern philosophy traditionally begins with Descartes, who grappled with the epistemological side of the rise of capitalism and Protestantism: if tradition does not provide the grounds for certainty, then what does? Descartes' answer was: first of all, introspection does; and second of all, God does-- and way down the list was the senses and perception. The British empiricists, notably Locke and Hume, responded by saying that their introspection granted them no certainty, and that all that they found in their minds was sensory impressions anyway. Kant, at the end of the 18th century, accepted both sides of the argument, and said that there must be a mind that provides a scaffolding, an operating system we might say today, which provides the grounds for the categories of thought that imbue all perception, such as time, unity, and causality. After Kant, there is less of a clear linear evolution of philosophical thought. One tradition following Kant proceeded through Fichte, Hegel, and Karl Marx, coming round full circle to denying Descartes' original obsession with getting rid of the outer world of human history. For Marx, the ground of knowledge is in human history, and the history of human action. Americanized (stripped of anti-capitalist fervor), this led to John Dewey's (1859-1952) version of pragmatism (Dewey was here at the U of C from 1894 to 1904), which viewed philosophy and psychology as two sides of the same subject (William James, leading psychologist of the time, was centrally involved in the development of this form of pragmatism). Modern-day pragmatists such as Richard Rorty continue to be influential, and to link philosophical concerns to current social issues. This version of pragmatism inherits from its post-Kantian roots (that is, from the Hegel-Marx roots) a rejection of both the familiar views of truth (correspondence theory of truth: statement P is true iff its logical form is in the right relationship to an occurring state of affairs; coherence theory: a statement's truth is determined by its coherence with a wide range of other statements, ranging from low-level observations to high-level generalizations) in favor of a more action-oriented view of the truth: the truth is what works, all things considered. It seems to me that symbolic-cognitivists (e.g., Fodor) maintain the traditional view of knowledge, while the connectionists (etc.) follow a pragmatic line: we don't have prior knowledge of what knowledge-representations must look like; we must simply build our cognitive models, and see how well they work, doing creative things as human beings do (e.g., pattern completion, content-addressable memory). The cognitivist is essentially and ineluctably tied to supporting the centrality of representations, a notion which presumes an outer world such we can sensibly ask whether it does or doesn't match up with our representations.
Check out: http://cogweb.english.ucsb.edu/CogSci/Empiricism.html
Harry Bracken, in a paper heavily influenced by Chomsky's view of Descartes and of language:
http://www.mullasadra.org/papers/harry_m_bracken.htmlLocke: http://cogweb.english.ucsb.edu/CogSci/Locke.html
http://landow.stg.brown.edu/victorian/religion/locke1.html
http://landow.stg.brown.edu/victorian/religion/locke1.html
http://www.maths.tcd.ie/pub/HistMath/People/Leibniz/RouseBall/RB_Leibnitz.html
http://www.maths.tcd.ie/pub/HistMath/People/Leibniz/RouseBall/RB_Leibnitz.html
William James: http://wabakimi.carleton.ca/~dreddick/topic.html
John Searle's celebrated Chinese room conundrum -- does it speak Chinese, or doesn't it? -- and the discussion that followed it touched a nerve that was already bared due to the conflicted feelings about subjectivity in cognitive sciences. Most cognitive scientists have felt a thorough commitment to modeling (cf. Dupuy), some version of functionalism, and thus to simply jettisoning concern for subjectivity. Philosophers are very much divided on this subject, and some feel that it is their professional duty to keep alive the thought that the explanation of consciousness as a subjective phenonenon is the ultimate problem (for example, David Chalmers: see that web page).
Philosophy of science. Another philosophical tradition is important in this story, developing in central Europe towards the end of the 19th century. First, a highly skeptical philosophy of science arose, of which the prodigious Ernst Mach (1838-1916) was a leading voice. He was a thorough-going empiricist of the Humean brand: the foundation of knowledge is sensory impression, on his view. But unlike Hume, he was committed to turning this into a science, not a drawing-room skepticism, and indeed much of sensory psychology finds its roots in this work. Mach's position also led him naturally to the notion that scientific generalization was a compact statement of observations, and to the notion that one could explicitly seek the simplest statement for a given set of observations: a notion that directly leads to contemporary notions such as Minimum Description Length (Rissanen; see Week 6 below). Mach was always on the warpath against belief in abstractions that got too big for their britches: they had to be kept thoroughly in place, and reminded that they were there simply and totally in order to simplify descriptions of the phenomena. In this way he challenged the reality of Newton's absolute space and time, and was instrumental in opening up physicists' thoughts to the possibility of a new conception of space and time, as proposed then by Poincaré and by Einstein. (See http://www.weltkreis.com/mauthner/mach.html, or http://www.weltkreis.com/mauthner/hist/mach4.html) In this sense, Mach's skepticism was extremely liberating: we must never forget that powerful themes get that way because they are, for a time, liberating, even if they later become stultifying. In psychology, the emphasis on sensations was liberating early on, but the emphasis on what is physically measurable and the reality only of space and time, as emphasized by the logical positivists (growing out of Vienna, earlier Mach's home base; the Vienna Circle literally grew out of a group named after Mach) such as Rudolph Carnap (who moved to the U of Chicago in 1936) had a natural fit with the American brand of behaviorism established by John B. Watson and later the now better known B. F. Skinner (on logical positivism: see http://www.utm.edu/research/iep/l/logpos.htm. You'll recall that logical positivism was based on the principles that (verificationism) the meaning of a proposition is the method of verifying it; (physicalism) that the observations relevant to verifying are observations in space and time; that what does not have a method of being verified is either logic, convention, or nonsense -- metaphysics is part of nonsense. It was against this prevailing and dominating trend in American academic psychology that the cognitive revolution of the late 1950s took place, and that revolution was a liberating one. In general, the subparts of psychology that are termed cognitive are typically those areas in which, it can be argued, sensory impression is not the main part of the story: attention, memory, language, e.g. Attention, for example, is the archetypical case of something that cannot be reduced to sensory impression, for attention is precisely that which, from inside, determines the fate of the incoming sensory impressions.
![]()
Week 2 Symbolic approaches.
Reading:
1."Computing in Cognitive Sciences": Zenon Pylyshyn (in Posner, ed.). Also: Zenon Pylyshyn, Computation and Cognition: toward a foundation for cognitive science. 1984. MIT Press/Bradford Books. Chapter 2 "The Relevance of Computation" is a good excerpt to read; chapter 7 criticizes some version of connectionism.
2. "Symbolic architectures for cognition." By Allen Newell, Paul Rosenbloom, and John Laird (in Posner, ed.)
3. "Grammatical Theory", Tom Wasow (in Posner).
Possible readings:
Fodor: Language of thought.
Lakoff: Linguistics and Natural Logic.
Chomsky, Rules and Representations.
In linguistics, the contrast between classical generative grammar and Principle and Parameters (PP) approaches. Representations built out of basic relations: linear order, constituency, properties as a map from unit to values. Focus on constituency: there should in general be a link between the syntax and the semantics of a representation. (Syntax and semantics in formal systems: semantics as link to world or model outside the formal system.) Pylyshyn (57): "the important thing is that, according to the classical view, certain kinds of systems, including both minds and computers, operate on representations that take the form of symbolic codes....the meaning of a complex expression depends in a systematic way on the meaning of its parts (or constituents)....the classical view assumes that both computers and minds have at least the following distinct levels of organization: the semantic level...the symbol level...and the physical (or biological) level." The semantic level explains (to the human researcher) what is happening at the symbolic level, and (in the case of an artifact -- a computer, not a human) is what we need to prove the accuracy/significance of any algorithm applying at the symbolic level. ("No computation without representation.")
Issue of control structure(s) in symbolic approaches, in CS more generally: feedback from environment. What role does this play in language? General structure of control in computer systems: (1) shift of control from one function/subroutine to another, as determined by the program explicitly; (2) shift instigated by an event outside the computer (e.g., the user at the console); (3) instigated within the operating system by a computational need (e.g., call for new memory when there isn't enough; division by zero); (4) when parallel threads compute for time or other resources.
A cognitive scientist in general must care not only about what mental processes are possible within the system, but also about what operations will be (or are likely to be) invoked at a certain time (with respect to internal or external conditions).
One attractive aspect of this paradigm (or super-paradigm) is that it allows us to develop theories in which we explain things in terms of principles that make sense at a human level: he makes a phone call because he saw an accident and he wants to call the police. How could we explain that kind of behavior without recourse to knowledge of that sort? We impute such knowledge to the agents of whom (not which!) we develop explanatory models. (...but do we as linguists want to do that? not really; the implicit rule systems we impute do not sound like the kind of knowledge that we might otherwise impute to ourselves or others.)
Functionalism: (NB: this has nothing to do with the kind of functionalism of the Susumu Kuno-Bernard Comrie-Ed Keenan-etc. sort!) Functionalism (term coined by Hilary Putnam) is the view that holds that cognition can be studied as a set of algorithms divorced from the underlying mechanism on which they may be instantiated in humans, computers, or what have you. The hardware/software split is central to this perspective: for the overwhelming majority of purposes, what is interesting about a computer program can be studied entirely independently of what type of machine it is placed on. Indeed, a program written in a high-level language (C, LISP, etc.) will be translated into a machine language translation, and much of the fine (local) structure of that program is of no interest to the study of the algorithm embodied in the high-level program. (Others will challenge this functionalism, and thus the reliability of the hardware/software distinction for understanding the human brain.)
Discrete symbolic systems: Pylyshyn (p. 50f) "The only property symbol tokens have in these systems is a nominal one: type identity. A particular symbol token has to be recognizable as belonging to a certain type regardless of the context in which it occurs. Another way of stating this is to say that symbol tokens were themselves assumed to be unstructured; one could not say two symbol tokens are more or less similar, or that they are similar in certain ways, any more than one can say two electrons are more or less similar. Further, it was assumed that one could make indistinguishable copies of symbols (in order to identify two marks as distinct occurrences or tokens of the same symbol). The notion of a discrete atomic symbol is the basis of all formal understanding. Indeed, it is the basis of all systems of thought, expression, or calculation for which a notation is available. It is important to stress that such an idea not only has deep roots in what is sometimes called the intellectualist tradition, but that no one has succeeded in defining any other type of atom from which formal understanding can be derived. Small wonder, then, that many of us are reluctant to dispense with this foundation in cognitive psychology under frequent exhortations to accept symbols with such varied intrinsic properties as continuous or analogue properties. Unless these notions can be reduced to either atomic symbol foundations or to physical foundations, they remain intellectual orphans, hence, are a poor basis for explanation. The problem is, such notions lack systematic foundations; we do not know what can be done with them."
Heart of the symbolic view (Pylyshyn): "...if, in order to express some computational regularity in a finitary manner, we must refer to a structure of symbols, then the structure of the expressions must be reflected in the structure of the states of the system, as they were in our arithmetic example. Hence, it is not enough to describe the system as merely transforming state Sn into state Sn+1. ...the machine itself must be designed so its physical operation will be governed not simply by the state of the entire device but by the substates of parts of the device exactly as described by the syntactic, or expression-transforming, description of the rules." (68f.) I.e., if you do good linguistics, you must be describing the machine's architecture. "Thus: to capture the rule-governed quality of computation the process must be viewed in terms of operations on formal expressions rather than in terms of state transitions."(68). "In my view, the ...distinction...between the program and the machine...is absolutely fundamental....The difference between an extremely complex device characterized merely [! JG] as proceeding through distinguishable states (but not processing symbols) and what I call a "computer" is precisely the difference between a device viewed as a complex finite-state automaton and one viewed as a variant of a Turing machine. I shall use the term finite-state automaton as a general way of talking about any device whose operation is described without reference to the application of rules to symbolic expressions. Thus, in this view, the "new connectionist" machines...are finite-state automata....the potentially infinite length of the Turing-machine tape serves to force a kind of qualitative organization on the process....the fact that they apply over very large domains ...means that they must be dealt with generatively...[T]he touchstone for the fundamental distinction that I also want to press, namely, the distinction between a strictly finite mechanism...and a finite but unbounded string of symbols."(69f.)
Baars (op.cit): "No-one seriously maintains that humans resemble digital computers, but the nervous systems needs to solve many of the same problems that must be solved by computers in performing similar tasks....it is important to understand computational theory because it applies not merely to contemporary computational hardware; rather, it specifies mathematical principles that apply to an infinite class of symbolic devices. If nervous systems are specially adapted to represent and symbolically transform the word of the organism, then the abstract principles of symbol-manipulation must apply to it. ...[this] is the most cogent scientific rationale for the revolution available today...."(148).
Baars (153): There are a number of legitimate and useful ways of analyzing the functioning of any computer (Newell 1981): at the physical level one can study the machine as such (the device level), and somewhat more abstractly, one can view its elements as electrical circuits (the circuit level); as memories with transfers between them (the register-transfer level); in terms of the program (the level); in terms of the program (the symbolic level, the most familiar one); and even further, in terms of the system architecture (the configuration level). According to Newell (1981), "Each level is defined in two ways. First, it can be defined autonomously, without reference to any other level. to an amazing degree, programmers need not know logic circuits, logic designers need not know electrical circuits, managers can operate at the configuration level with no knowledge of programming, and so forth. Second, each level can be reduced to the level below..."
Classes this week
1. Classic organization of a (von Neumann) computer. Sharp distinction between program and data. Another sharp distinction: between hardware and software; this latter is the basis of the functionalist (term due to Hilary Putnam) point of view. Pointers. Logical branching of program. (Test-operate-test-exit TOTE units).
2. Three levels of description: physical laws governing the hardware; the symbolic/algorithmic level; the semantic level. (Of course, using the term "symbolic" wrt the intermediate level implicitly points to the semantic level.) Can language be significantly studied just at the intermediate level: is there a kind of linguistics to be done without meaning? Some (like the early Chomsky) have argued that it's not obvious how knowledge of semantics would help with many of the bare, basic analytical problems of linguistics (e.g., just how does knowledge of semantics help with the acquisition of Arabic morphology?) In order to explain intelligent behavior, we will need to impute what we can call law-like generalizations to the systems at its symbolic level; these generalizations point to other representations, and hence are structurally decomposable. (generalization: "If I am hungry, I go look for food." This points to a unit "I am hungry"; the original generalization clearly has a structure in which "I am hungry" is a constituent. Use of pointers in representations facilitates representations with internal structure. (Conversely, not having pointers makes complex structures difficult to model: the connectionist's problem.)
Challenge to Marr's levels: in Computational Neuroscience.
3. Jerry Fodor: the language of thought -- which is innate. In Gardner's words, "people are born with a full set of representations, onto which they can then map any new forms of information that happen to emerge from their experiences in the world." In Fodor's words: "the language of thought may be very [much] like a natural language. It may be that the resources of the inner code are rather directly represented in the resources of the codes we use for communication....[this is] why natural languages are so easy to learn." (1975, p. 156). Great skepticism with regard to the notion that a notion can be learned. Fodor argues that there are psychologically/mentally real representations which possess a structure that is equivalent to some relevant part of the situation or object being represented. (Is that clear? No?). In short: constituent structure is an essential part of mental representations, which in turn are expressed in the language of thought (LOT).
Fodor (LOT): "Contemporary cognitive psychology is . . . by and large conservative in its approach to the commonsense tradition. . . . Cognitive psychologists accept, that is, what the behaviorists were most determined to reject: the facticity of ascription of propositional attitudes to organisms and the consequent necessity of explaining how organisms come to have the attitudes to propositions they do. What is untraditional about the movement . . . is the account of propositional attitudes that it proposes: . . . having a propositional attitude is being in some computational relation to an internal representation. (p.198)"
To which, Dan Dennett replies: "As a bit of sociology of science, this is egregiously tendentious; and ever since, Fodor has been hard pressed to insist that you can't be a proper cognitive scientist unless you accept the "facticity" of propositional attitudes. This comes out most clearly, perhaps, in his recent broadside against the connectionists, who are, by his lights, enemies of cognitive science precisely because they don't accept the facticity of the "classical" mental types and processes." See: http://www.tufts.edu/as/cogstud/papers/granny.htm
Reading: read Language of Thought Hypothesis: State of the Art by Murat Aydede, available at http://plato.stanford.edu/entries/language-thought/
And in addition, the longer version is available at: http://humanities.uchicago.edu/faculty/aydede/LOTH.SEP.html.
![]()
Week 3: Connectionist approaches
3. Architecture of Mind: Connectionism:
Reading:
Article by David Rumelhart (in Posner volume) on connectionism.
Section in Ballard book pp. 137-142; and - p. 158
Also recommended:
1. "The Appeal of Parallel Distributed Processing" J. L. McClelland, D. E. Rumelhart, and G. E. Hinton. In PDP I (Chapter 1), 1986.
2. "Connectionism without Tears," Mark S. Seidenberg. pp. 84-122. In Connectionism: theory and practice, ed. Steven Davis. Oxford University Press 1992. This begins: "What accounts for the cool reaction to the emergence of connectionism in the 1980s on the part of people who study language for a living?"
3. "Grammatical Structure and Distributed Representations," by Jeffrey L. Ellman. pp. 138-178 in Connectionism: theory and practice, cited above.
4. "Structured Representations in Connectionist Systems?" by Terence Horgan and John Tienson, pp. 195-228, in Connectionism: theory and practice.
5. Hopfield, J.J. 1982. "Neural networks and physical systems with emergent collective computational abilities," Proceedings of the National Academy of Sciences 79:2554-2558, reprinted in Anderson and Rosenfeld as chapter 27.
"PDP Models and General Issues in Cognitive Sciences." Chapter 4 of PDP I. Rumelhart and McClelland. [PDP stands for Parallel Distributed Processing, part of the title of a 2 volume books from MIT Press edited by Rumelhart, McClelland, et al.]
Highly recommended: 2 volume collection of papers on Neurocomputing (1 and 2) from MIT Press (ed. James A. Anderson and Edward Rosenfeld); excellent editorial comments, and excellent historical perspective.
Mind as Motion: Explorations in the Dynamics of Cognition, edited by Robert Port and Timothy van Gelder. 1995, MIT Press/ Bradford Books. Very stimulating book.
Language: Strong verb pattern learning. Where does Cognitive Grammar fit in?
On neural nets, some excellent books:
D. Amit 1989 Modeling Brain Function. Cambridge University Press. This is a deep and profound book. Its mathematical look will put off some readers, but once one is accustomed to the notation, the mathematics itself is not difficult. Amit is a physicist by training, and this book has an outstanding discussion of the linkage between neural nets and spin glass models in physics.
Bishop, Christopher M. 1995. Neural Networks for Pattern Recognition. Oxford University Press.
Hertz, John, Anders Krogh, and Richard G. Palmer. Introduction to the Theory of Neural Computation. Santa Fe Institute Studies in the Sciences of Complexity. 1991.
Looney, Carl. G. 1997. Pattern Recognition using Neural Nets: Theory and Algorithms for Engineers and Scientists. Oxford University Press. Excellent, especially on clustering (a core technology in unsupervised learning).
P. Peretto, An Introduction to the Modeling of Neural Networks.
Prototypes in linguistics: George Lakoff, Women, Fire, And Dangerous Objects. University of Chicago Press.
Classes this week:
1. A simple recurrent net that acts as a pattern-completer/content-addressable memory. In a network with k units, the current state of a system is a point in k-space. The longer-term structure of the net is a point in k2-space (one dimension for the value of each node-to-node link). Evolution of the network in time can be described as a path through k-space. More generally, we can recognize that the connections can change through time (learning, degradation,....); hence we should think of the system as evolving through k Ä k2 space, where the total space is composed of two subspaces, one where movement is fast and one where movement is slow.
2. Dividing the world of neural networks into feed-forward and hopfield nets; back-propagation. Inclusion of recurrence into what were feed-forward nets (Jordan-Ellman style). Strong-verb modeling in neural net.
3. Issue of time: Feldman's 100-step limitation. (Newell 1990, Ballard 1998) Neuron level: 1 ms. per cycle. Simple perceptual tasks take on the order of 30-100 ms. Simple eye-movements, on the order of 300 ms. Cognitive tasks?
![]()
Part II: Natural Computation
Theme: robust algorithms. Robust: opposite of brittle. Robust algorithms ask for, but do not demand, certain levels of information and resources. The quality of the output degrades gracefully under limitations of supplied information or resources. Main resource: Dana Ballard, Introduction to Natural Computation. But a number of other sources will be extremely helpful. (You may find Ballard's syllabus at http://www.cs.rochester.edu/users/faculty/dana/syllabus.html)
Week 4. N-spaces (n-dimensional spaces of high dimension: n>10). Linear algebra; closeness of vectors measured by inner product; measuring angle between two vectors. Neural nets, feedforward and hopfield nets. Fuzzy logic. Content-addressable memory. Prototypes. Central categories. Solving optimization problems with neural nets. How does this relate to Optimality theory? Relation to Smolensky's Harmony theory.
Reading
"An Introduction to linear algebra in Parallel Distributed Processing," by Michael Jordan. Chapter 9, PDP I.
Ballard: Chapter 4. I don't expect this to be terribly clear on the first n readings, even where n is a good-sized number.
Also: P. Kanerva, Sparse Distributed Representations. MIT Press.
Smolensky 1986, Harmony Theory (Chapter 5, PDP I).
D. Amit 1989 Modeling Brain Function. Cambridge University Press.
Autolexical grammar. Prototypes in syntax (e.g., Keenan/Comrie). Elman on syntax (1992). NetTalk.
On Maximum Entropy: http://www.cs.cmu.edu/~aberger/maxent.html
Classes this week:
1. A traveler's guide to n-space.
Representing numbers in n-space. Examples: 1. 26-space: one dimension for each letter. Representing a word as a point in that space. 2. In a corpus of vocabulary-size-N, each word can be associated with a point representing the set of right-neighbors of that word (similarly, of left-neighbors). A line from the origin to such a point is a vector. A vector can be thought of as (a) an ordered set of numbers; (b) a point in an n-space; (c) a distance from the origin plus a particular direction. If we have a set of vectors, and we want to know which of them is closest to some new vector V, we measure or compute the angle between V and all of the old vectors; the one with the smallest angle is the one closest to V. The inner product (scalar, dot) is how you measure the angle. Normalization of vector lengths. Restricting vectors to n-space [-1, 1]n. Two special subspaces of that n-space: the set of points whose coordinates add up to 1.0 (distributions) and the set of points where the squares of the coordinates add up to 1.0 (surface of a hypersphere). Prototypes; competitive learning.
2. Matrices.
A network...is a matrix. A state of a network is a vector. A state of a network evolves to its next state by multiplying the vector-state by the matrix-network. The model is thus inherently temporal. We can study the dynamics of such systems, by asking questions like: are there states which are fixed points (i.e., a state S whose next step is S, i.e., when you multiply the state vector S by the matrix, you get S again)? (eigenvectors) Are there loops or near loops? In what sense can such systems be thought of as being repositories of information (compare with a silicon-based von Neumann computer)?
3.
![]()
Week 5. Probability and bayesian decision theory.
Probability; information theory and entropy. Hidden Markov models (speech recognition, etc.). Solomonoff's solution to the projection problem for phrase-structure grammars. Principal components analysis.
Reading:
Ballard, Chapter 2 (Fitness), sections 2.1 - 2.5.
Please read Nick Chater, Neural networks: the new statistical models of mind. Chapter 11 of Connectionist Models of Memory and Language. Edited by Joseph P. Levy, Dimitrious Bariaktaris, John Bullinaria and Paul Cairns. p. 207-227. I've xeroxed this, and you can make a copy yourselves if you wish.
A very good introduction to reasoning about probability, and its application in physical contexts to such things as gases, heat, and entropy: Reasoning about luck: probability and its uses in physics. Vinay Ambegaokar, Cambridge University Press, 1996.
Important web site: Bayesian analysis, from Washington University: http://bayes.wustl.edu/
John a. Bullinaria and Paul Cairns. p. 207-227
Excellent introduction to probability and statistics: http://quarles.unbc.ca/psyc/gradstudents/stork/Stats/stathome.html
and not to be missed by linguists: The Linguist's Guide to Statistics, by Brigitte Krenn and Christer Samuelsson, which is online at //www.coli.uni-sb.de/{~krenn}. (You can substitute christer for krenn and still get it.)
See also: http://www.princeton.edu/~bayesway/ "A miscellany of work on probabilistic thinking" (and Bayesian thinking)
http://www.vision.irl.cri.nz/research/isa/9596/eoy/node80.html A review of probability theory and stochastic processes.
http://www.soton.ac.uk/~xzhang/signal/part8/text8.html
Other suggestions:
Denning, P.J. "Bayesian Learning." American Scientist 77. 1989. 216-218.
Jaynes, E. T. "Bayesian Methods: General Background." 1986. In Maximum Entropy and Bayesian Methods in Applied Statistics, edited by J.H. Justice. Cambridge University Press, 1-25.
Class 1. What is probability?
There are three traditions of interpretation of what probability is: the frequentist; the mathematical; and the bayesian. (The third has various other names, like degree of belief, and others). The frequentist believes that the fundamental meaning of probabilistic statements involves counting occurrences of outcomes in repeatable experiments; outcomes of die-tosses over multiple experiments is a fine prototypical example of a probabilistic case study on this view. The holder of the mathematical view sees probability as a mathematical system that obeys a small number of mathematical axioms (certain subsets of the universe of events are assigned a number between 0 and 1, which is its probability; the probability of the whole set is 1; and the probability of two mutually exclusive random events is the sum of their individual probabilities). The bayesian interpretation of probability is that it is a measure of our confidence in the truth of a statement. This is very important; the frequentist point of view has gotten virtually all of the airtime during this century, but the bayesian interpretation is now defended by a very active research and application community.
Digression on foundational issues: On the bayesian account, probability theory is fundamental to our understanding of rationality; indeed, probability theory can be viewed as a quantitative account of rational inference. The probability of P is a measure of our rational grounds for belief of P. (This is sometimes called a subjectivist account, though some who are sympathetic to it reject the term subjective.) Early work in this area -- indeed, up through this century -- was dominated by this view (Jakob Bernoulli, e.g. see writings by L. Daston (Classical probability in the enlightenment, Princeton University Press 1988, Gigerenzer et al The empire of chance: how probability changed sciece and everyday life. CUP 1989.)
It is convenient to speak of a universe of mutually exclusive elementary events comprising a sample space. An event is a set of elementary events. It has a probability, which is the sum of the probabilities of its elementary events. If all the elementary events have the same probability, then the probability of an event is the number of elementary events it includes divided by the total number of elementary events. In linguistics, we usually don't have our elementary events being equi-probable, and we can think of probability as a glob of putty whose total weight is 1.0 and which is divided out among the elementary events. Then the probability of an event is just its mass.
If A and B are events, then the probability of A or B is the prob(A) + prob(b) - prob(A and B both occur). The probability that A occurs given B is written prob( A | B) and it's defined as prob(A and B) divided by prob(B). It means if we restrict ourself to the case where the B outcomes are true, then the probability of A in that subuniverse is prob (A|B). This is known as conditional probability.
Bayes' law follows from this definition. From the definition the following follows:
prob (A and B) = prob (B and A); so
prob (A |B) prob (B) = prob (B|A) prob (A) (just using the definition of prob (X|Y));
hence ** Bayes' law, or rule.

Bayes' rule has particular interest for us if A is a hypothesis and B is evidence. (Extra credit: How can we interpret probability so that both hypotheses and observable events are endowed with the same thing, a "probability"?) Then Bayes' rule can be thought of as telling us this:
The probability of a hypothesis A, given some new data B, is equal to:
(1) the probability that the hypothesis A predicts for the data B, times
(2) the probability of that data, divided by
(3) what we used to think was the probability of that hypothesis before we got the new data.
If we have an explicitly probabilistic hypothesis, then (1) should be easy to compute; (2) is something that we should be able to obtain by long-term observations; (3) is a little harder to know, when you're starting off. Maybe all hypotheses should originally be given equal plausibility = probability. This area remains a bit murky. But Bayes' rule tells you how to update your certainty when given new data.
People talk about random variables, which is unfortunate, because random variables aren't variables at all, and maybe they're not even random. What they are is functions. A random variable is a function. It maps from the set of events to the real numbers (the number line). (Its range can either be a discrete set (finite or infinite) or a continuous set.) We'll stick to the discrete case. Most of the time, we'll be able to specify the domain of the function in numerical terms too (as is the case of the die-toss; if we want to do this with words, we'll have to assign each word a number, such as its rank in a frequency list). We can think of an experiment as underlying each random variable. And we care what the probability is that the random variable X takes on some particular value, or more generally, what the probability is that it take on any value.
So we can ask, for any given point, what the probability is that the outcome is at this value or less? E.g., we can ask what the probability is that the outcome of one die-toss is 4 or less ? (answer: 2/3) 6 or less ( answer: 1.0). This gives us a graph that is monotonically increasing (it always goes up), it starts at zero somewhere, and at some point (the maximal value in our domain) it gets to 1.0, and never goes higher.
Talking about random variables is just the probabilist's way of making the problem quantifiable, so that arithmetic techniques will make sense. It allows us to talk about a mean or average value, though whether that is sensible or not depends on how we (empirically) define the random variable in relation to the underlying experiment.
Remember: random is not a word meaning incomprehensible, and "just" does not deserve to be placed in front of it.
![]()
Week 6 Algorithmic complexity
Technical skill is mastery of complexity while creativity is mastery of simplicity. - C. Zeeman
Complexity as a formal notion, and Minimum Description Length (Rissanen).
Shannon's entropy measures the complexity of a message system, using frequencies computed over a representative period (or known a priori). What about the complexity of a single message? -- the algorithmic complexity of a specific answer. That is, suppose our language consists of numbers between 0 and 1, specified to 100 digits of accuracy. We may compute the information of the language-system as - log-base-2 (100), but is there a difference of complexity between some random 100-digit number (.03588203301902...) and the number 0.25? Or, for that matter, pi = 3.1415926535...? Surely the answer is yes. The complexity of a number is the length of the shortest way of describing it.
Compression, and compression as generalization. If we're talking about a system in which all of the messages sent are real numbers, then the complexity of an answer is the length of the most parsimonious way of describing how to compute the number.
Use of simplicity metric in early generative grammar (Halle, Chomsky), and its replacement by principles and parameters: phonotactics, using features rather than segments.
Chomsky (Language and Responsibility 112) "The central part of this [LSLT] project was an attempt to demonstrate in painstaking detail that the generative grammar I presented was the "simplest possible" grammar in a well-defined technical sense: namely, given a certain framework for the formulation of rules and a precise definition of "simplicity," the grammar was "locally optimal" in the sense that any interchange of order of rules in a tightly ordered system of many rules would lead to a less simple grammar. Reading back into this work the explicit concerns of a later period, one might say, then, that the goal was to show exactly how this grammar with its empirical consequences would be constructed by someone initially equipped with the framework for rules and the definition of simplicity (the evaluation measure), and given a sufficient sample of the data. Actually, this was done in far greater detail and scale than anything I've attempted since, and was far too ambitious, I suppose."
Reading: Ballard, Chapter 2, sections 2.6;
recommended: Carl de Marcken's dissertation. To be found at: http://alpha-bits.ai.mit.edu/people/cgdemarc/cgdemarc.html (in post-script format).
Very good summary at http://www.labs.bt.com/people/mulhaug/docs/tutorials/complexity/complexity.htm
Also, several good papers by Gregory Chaitin, one of the discoverers of this field: see his homepage at http://www.cs.auckland.ac.nz/CDMTCS/chaitin/#BN with links to papers such http://www.cs.auckland.ac.nz/CDMTCS/chaitin/ieee74.html
Check out this online bibliography of complexity: http://www.cpm.mmu.ac.uk/~bruce/combib/
Boucheron, Stéphane. 1992. Théorie de l'apprentissage: de l'approche formelle aux enjeux cognitifs. Paris: Hermes.
Chaitin, J.G. Algorithmic Information Theory. 1987. Cambridge University Press.
Nicols, G. and I. Prigogine. Exploring Complexity: An Introduction. New York: Freeman. 1989.
A good popularization: Heinz R. Pagels, The Dreams of Reason: The computer and the rise of the sciences of complexity. 1988. Bantam Books.
![]()
![]()
Part III Other Big Issues
Complexity
On the web: Bruce Edmonds' page on complexity: http://www.cpm.mmu.ac.uk/~bruce/complink.html
Week 7: Innateness and dynamical systems
A. Innateness. Guest lecture.
B. Unsupervised learning. Ballard, chapter 9. Is unsupervised learning the right way to think about the problem of language acquisition? Must linguists interested in language acquisition make a choice (perhaps a nuanced, parameterized choice) between models incorporated innate principles and principles of unsupervised learning? (yes). Or can principles of unsupervised learning be considered to be innate ideas?
C. Dynamical systems
Reading: See especially the Port and van Gelder collection on this subject.
Petitot cites Zeeman (1977): "What is needed for the brain is a medium-scale theory....The small-scale theory is neurology: the static structure is described by the histology of neurons and synapses, etc., and the dynamic behavior is concerned with the electrochemical activity of the nerve impulse, etc. Meanwhile the large-scale theory is psychology [whatever...JG]: the static structure is described by instinct and memory, and the dynamic behavior is concerned with thinking feeling, observing, experiencing, responding, remembering, deciding, acting, etc. (...) Question: what type of mathematics therefore should we use to describe the medium-scale dynamic? Answer: the most obvious feature of the brain is its oscillatory nature, and so the most obvious tool to use is differential dynamical systems. In other words for each organ O in the brain we model the states of O by some very high dimensional manifold M and model the activity of O by a dynamic on M (that is a vector field or flow on M). Moreover since the brain contains several hierarchies of strongly connected organs, we should expect to have to use several hierarchies of strongly coupled dynamics." (From Zeeman, C. Catastrophe theory: selected papers 1972-1977. Addison-Wesley (Redwood City CA). Cited by Jean Petitot, in "Morphodynamics and attractor syntax: constituency in visual perception and cognitive grammar." In Port and van Gelder (eds.) , cited above.
Week 8: Evolution and genetic algorithms. Phenotype versus genotype; generalized Darwinism. How does this relate to Chomskian innateness? Guest speaker?
Edelman, G.M. Neural Darwinism. New York; Basic Books, 1987.
Excellent general paper synthesizing genetic algorithms (a.k.a. classifier systems), neural networks, and related systems: J. Doyne Farmer, "A Rosetta Stone for connectionism," in Forrest (ed.). 153-187. (see way below for reference)
Jerry Fodor's review of Steven Pinker's recent book, where Pinker tries to bring evolution into a Chomskian perspective:
Week 9: Self-organization and emergence.
This is the topic that the Santa Fe Institute has made famous, and maybe even popular. Murray Gell-Mann's The Quark and the Jaguar is as good a general introduction to this as one is likely to find, but it aims too low, unfortunately -- very possibly the result of over-eager editors. (That wouldn't have happened if the book had been published in France.)
How rich, specific, and information-laden does the genetic endowment have to be to achieve what appears to be a high degree of fine-tuning in the end-product? (Case of retinotopic images in the brain's visual system; von der Malsburg and Kohenen's analyses of self-organization in the visual systems.) (On von der Malsburg: see, for example, Peretto, section 9.2.1 (pp. 307ff.), and vdM's own paper (1973), reprinted in Anderson and Rosenfeld as chapter 17 ["Self-organization of orientation sensitive cells in the striata cortex." Kybernetik 14: 85-100.] On Kohonen, see e.g., (1982) "Self-organized formation of topologically correct feature maps," Biological Cybernetics 43: 59-69, reprinted in Anderson and Rosenfeld as chapter 30.
Emergence: examples from simple physical systems, e.g. molecules: rigidity and compressibility are macro-properties of objects that emerge out of the laws governing the interaction of the micro-scale objects comprising the macro-level object. Example: magnetic state of macro-object. Variation in macro-properties based on speed of cooling, despite the uniformity of the laws governing micro-interactions. Effect of dimensionality on emergence of macromagnetism (see Amit, especially section 3.2). Example: content-addressable memory, Hopfield net. Maximization of an objective or energy function emerges out of local decision by units in a Hopfield net. Hierarchy of explanation: best: structure emerges out of interaction of subparts; next best: structure is postulated (hence explanation is handed over to a theory of evolution or to an anthropic argument); worst: argument is left as a free variable.
Self-organization:
Peretto (cited above), Chapter 9 Self-organization, begins: "A neural network self-organizes if learning proceeds without evaluating the relevance of output states." That is, if the network does some computing and draws some conclusions, it does not have a teacher or oracle that can help it determine if its conclusions are correct or not.
Emergence:
Consider a system
a. composed of a large number of interchangeable (i.e., architecturally or essentially indistinguishable) parts.
b. The parts interact according to simple rules (all fundamentally the same).
c. Each part only interacts with a small subset of the total set of parts (which we may call its neighbors)
d. Viewed from a macro-level, the system as a whole finds a solution to a globally stated problem. The statement of the solution may involve treating as identical any two states in the same state.
The rules (b. above) are considered to be distributed, which is the opposite of centralized. There is no central processor, and no program separate from the simple rules governing the behavior of the individual micro-pieces.
Connectionist systems are a species of these, in which the interactions are limited to sigma-pi style connections (more precisely: each unit has an activation value; the relationship between any two pairs can be fully described by a single value; the activation-upgrade equation is a function of the activation values of its neighbors and the connection-weights to those neighbors; and we can put some constraints on what such functions can be.)
Reading:
Good summary/tutorial by Gregory Mulhauser at: http://www.labs.bt.com/people/mulhaug/docs/tutorials/emergence/emergence.htm
Anderson, P.W. "More is Different." Science 177 (1972) 393-396.
Complexity, Entropy, and the Physics of Information. Edited by Woyciech H. Zurek. Santa Fe Institute Studies in the Sciences of Complexity. See especially:
John Archibald Wheeler: "Information, Physics, Quantum: The Search for Links." An absolutely astonishing paper from one of the world's most respected physicists.
Fogelman-Soulié, Françoise (ed.) Les Théories de la complexité: autour de l'oeuvre d'Henri Atlan. Paris: Seuil.1991
Forrest, S. (ed.) Emergent Computation. MIT/ North Holland. 1991. Stimulating collection of papers.
Haken, H. Information and Self-Organization: A Macroscopic Approach to Complex Systems. Berlin: Springer Verlag. 1979.
Mainzer, Klaus. 1994. Thinking in Complexity The complex dynamics of matter, mind, and mankind. Springer Verlag. This book covers everything from pre-Socratics to Haken's synergetics; difficult to categorize.
Kohonen, T. Self-Organization and Associative Memory. New York: Springer. 1989 (3rd edition).
Yates, F.E., ed. Self-Organizing Systems: The Emergence of Order. New York: Plenum. 1987.
Other literature:
Charniak, Eugene. 1993. Statistical Language Learning. MIT Press.
Hutchinson, Alan. 1994. Algorithmic Learning. Oxford University Press. Very lucidly written; graduate text in computer science.
Jelinek, Frederick. 1997. Statistical Methods for Speech Recognition. MIT Press.
On theoretical learning, with an emphasis on PAC (probably approximately correct) learning, an important new paradigm : this is not easy going:
Martin Anthony and Norman Biggs, Computational Learning Theory, 1992. Cambridge University Press.
Readings:
Morphology
Syntax
Class-Based n-gram Models of Natural Language, Peter F. Brown, Vincent Della Pietra, et al. Computational Linguistics 18:4 1992.
Phonology
Unsupervised discovery of phonological categories through supervised learning of morphological rules. Walter Daelemans, Peter Berck, and Steven Gillis. COLING 1996 (Copenhagen).
A list of papers that I will put copies of outside my office for you to borrow and xerox if you wish. Needless to say, I'm recommending this highly.
From Schwartz, ed. Computational Neuroscience:
1. Some historical notes, by Wilfrid Rall.
2. Brain Metaphor and Brain Theory by John G. Daugman
From Connectionism: Theory and Practice:
3. Grammatical Structure and Distributed Representaitons, by Jeffrey Elman
4. Structured Representations in Connectionist Systems, by Terence Horgan and John Tienson
From PDP Volume 1:
Chapters 1,2,3: 1. The Appeal of Parallel Distributed Processing;
2. A General Framework for Parallel Distributed Processing
3. Distributed Representations