Programming for Linguists: Perl for Language Researchers. By Michael Hammond. Oxford: Blackwell Publishing, 2003. Pp. 219.

Reviewed by John Goldsmith, University of Chicago (book review)

Dept. of Linguistics

1010 East 59th St.

Chicago IL 60637

ja-goldsmith@uchicago.edu

 


Programming for Linguists: Perl for Language Researchers. By Michael Hammond. Oxford: Blackwell Publishing, 2003. Pp. 219.

Reviewed by John Goldsmith, University of Chicago

1. In case you hadn’t noticed, the world has changed. We now have our own computers, each of us, and we have instant access to the largest collection of documents in the world, documents in more languages than we could have dreamed of. I’m referring to the Internet, of course, and the linguistic resources it makes available to us are staggering. Interested in reduplication in Tagalog? It took me about fifteen minutes of searching to find a couple hundred thousand words in Tagalog so I could set up my own database and scour it for reduplicated forms. Swahili, likewise. It wouldn’t even take that long to find a wordlist in English that matched 50,000 words in orthographic form to a phonemic representation, if you want to study English lexical phonology.

You can do some remarkable things with a good Web browser, and just a word processor like Microsoft Word, and even more if you’re handy with a spreadsheet program (like Excel). Word will let you search for words or patterns, and Excel will let you sort lists easily. But if you want to go past that, and you’re not already a programmer, then you should learn Perl. You really should.

Perl is language that was written by Larry Wall about 15 years ago. Wall is not only a computer developer but a linguist trained by the Summer Institute of Linguistics (SIL); a blurb on one of his books says that he was a graduate student at both UCLA and Berkeley in linguistics. It won’t mean much to neophytes to say that Perl is written much in the spirit of C (a standard programming language), but it is. (The up-side of that is that someone who knows Perl very much has a leg up on advancing to C or C++.) Perl is a relatively easy programming language to learn. It runs on every platform I’ve heard of (that means there are versions of it for Windows, Macintosh, Linux, and so on), and it’s free – you can download it off the Internet.

How, then, are you to make good on your determination to follow my good advice, or to follow your own inclination, and learn Perl? You could take a course somewhere, like a community college, or you could read on-line documentation – or, most likely, you’ll buy a book. A book that tells you that you can learn Perl just by working your way through that particular book.

I have taught programming to linguistics students, and I’ve learned programming languages (like Perl) from books. One generalization that emerges from all of this is that buying just one book on a programming language simply isn’t enough. That may sound a bit spendthrift when we recall that the price of a book is likely to start at $30 (Hammond’s book, which this will be a review of, is $40), averages $50, and goes as high as $90, but it’s not. With a few good books which complement each other, you can learn material that is almost priceless: in any event, a basic course at a college level will start at several hundred dollars, so you have no call to be cheap on books. Take my advice – if you want to learn Perl, or any other programming language, it will be worth your money to buy two or three books so you can read several different approaches to the same basic programming tasks.

2. Michael Hammond, a phonologist at the University of Arizona, has written a book on Perl for linguists who want to learn to program. It’s a fairly compact book, as computer books go, and was published by Blackwell’s, a publisher better known for its linguistics line than for its computer languages line. He covers the principal aspects of the language that a linguist needs to know, and he touches on a few topics of special interest to linguists in more detail.

If you read Hammond’s book (“PfL”), you will learn the basics of using Perl, and in addition, you’ll learn about one of its major uses, which is in connection with Web pages. Web pages are written in HTML, a “mark-up language”, and Perl is an excellent language for manipulating HTML. You’ll also learn about CGI (“Common gateway interface”) programming, commonly done in Perl, which involves programs (such as Perl programs) that animate computationally active web pages, such as the one you’ll be able to write when you’ve worked your way through PfL.

What makes Perl so good for beginning linguist-programmers is this (in addition to the features I have already mentioned): it shields the neophyte programmer from a lot of complicated detail involved in reading and writing computer files, and even more from the complexities of “memory management” – and it has superb ways of analyzing and dealing with long strings of characters. Whether you want to divide the string up in complicated ways, or modify characters or substrings, or count occurrences of substrings, or reorganize the string entirely: whatever, Perl provides one or more ways of performing that operation smoothly and simply.

What might you, a linguist, want to do? You might want to take a text in a language you are working on (Somali, Swahili, whatever) and write a program that divides the words into morphemes. Writing that program would likely be of value to you, and it would no doubt teach you a lot about the language and about Perl.

3. If there’s one thing I was a bit dissatisfied with about PfL, it was that the examples were not linguistic enough. The simple, early examples in the book often involve arithmetic, like finding prime numbers, or extremely simple word-based operations (type in some words, see them typed back to you in a different order or format). To be sure, there are linguistic projects that are discussed. For example, a program that takes a large text file and divides it up into sentences is first proposed in Chapter 6, and then developed subsequently. But it does not seem to me that Hammond actually ran the program on a number of large pieces of English text, though from my point of view it’s the ability and the willingness to do this that is the motivation of learning Perl. Hammond’s Perl code takes all periods “.” to mark sentence breaks, and of course not all periods really do mark sentence breaks: the previous one earlier in this sentence doesn’t, and the period after an abbreviation does not mark a sentence break, most of the time – though the next one does, e.g. The task of writing a program that can distinguish sentence-final periods from all other periods is quite an interesting and challenging one.

I would have liked to have seen some examples that actually made a linguist sit up and say, Hey that’s interesting. Some could be extremely simple. With access (over the Internet) to a list of phonemic representations of English, you could ask, what is the most common consonant in English? What is the most common non-coronal consonant? Does English have a dozen words with two adjacent non-homorganic non-coronal obstruents? What words appear adjacent to one another in text (e.g.,  this too too solid flesh), or if this does not seem serious enough, what prepositions are pied-piped in Mark Twain’s Tom Sawyer, and under what conditions?

A few things jumped out at me as aspects of the book that seemed less than optimal. The actual Perl code is written in a sans serif font, something like Arial, but in many books of this genre, a font more like Courier is used, a non-proportional font which in some ways is easier for the eye to parse, when it is trying to cope with a sequence of unfamiliar symbols. Advice is given, on p. 47, on how to make Perl code more readable to the human being, but it is by no means always taken to heart. The vital subject of concordances is brought up on p. 114, allowing the reader to be able to build word-lists from texts, but the technique that would allow the programmer to sort the words by frequency is not discussed anywhere in the book. That’s the sort of thing that leads to the point I made earlier: buying one book is never enough.

There are some excellent Perl books on the market: SAMS’ Teach yourself Perl in 21 days is excellent, and Coriolis’ Perl Core Language is a good reference for the beginner and intermediate Perl programmer. Hammond’s Programming for Linguists is a worthwhile introductory book for linguists who want to learn Perl and are looking for a book with more of a eye towards linguists’ concerns.