Introduction to programming for linguists

John Goldsmith
Winter 2003

We will use Perl to learn programming for linguists. The primary text will be Teach Yourself Perl in 21 Days: despite the name, you won't have to teach it to yourself, and it's a very good book, both for learning from and for keeping as a reference book later on. Perl is an excellent language: just suited for linguists' purposes, relatively easy to learn (compared to C or C++), free for all platforms, and a good stepping stone towards learning C++ if you decide to go more deeply into programming. I haven't ordered copies of the book at the bookstore, because you can get it cheaper on-line, and you can also get used copies on-line.

Links for you on Perl:

    1.  http://www.activestate.com/Products/ActivePerl/Download.html is the best link to use to download Perl to your computer.
    2. Check out http://www.perl.com/ , from O'Reilly publishers, as a first link to information on Perl.
    3. The Comprehensive Perl Archive Network (CPAN) is a good second link to information on Perl.

    4. Nice introduction to Perl from the University of Missouri.
    5. Another very nice introduction, this one from the University of Kansas!
    6. Very good step-by-step explanation to getting Perl set up under Windows by Selena Sol and Nikhil Kaul.
    7. In the way of fun, sort of: John Lawler (University of Michigan) has an interesting discussion of a simple Perl program that creates sentences that...well, just take a look for yourself.

We'll write several linguistic projects this quarter. My goal is to do the following projects with you:

1. A program to read a corpus, produce a list of words with frequencies, ranked alphabetically or by frequency.
2. A program to test the validity of Zipf's law.
3. A program to strip affixes in English.
4. An Earley parser for a fragment of English.

But whether we get through the fourth project depends on us. If we only get the first three done, I would not be surprised (or disappointed).

** Think about making your output be an HTML file. ** This is an easy way to get very professional looking output with very little trouble. Take a look at a basic primer on HTML -- you'll be surprised how simple it is (for example, check out http://www.utexas.edu/learn/html/ ). All you need to do is have some statements like
print "<FONT COLOR = red>", $MyText, "</FONT>;
and -- BANG! -- your word is printed in red. Remember to make the name of the file that you create end in ".htm".

 

The material in the table below follows the textbook closely up to Week 6; please read the material in the text before class during this portion of the course.

Week    
1 Introduction Dowloading and installing Perl. Text editors. Obtaining text files, dictionaries, corpora.
2 Scalar data: strings and numbers. Names of variables for scalars. Initializing variables. Adding, incrementing, counting. Input from keyboard, print to screen, and input from a text file.
3

Lists: arrays and hashes.
Arrays and hashes are very important in Perl: while their behavior is different, they share a lot in common, and they are said to both be lists. Lists of words, lists of counts. Splitting a string into a list of words (that is, into an array). First pass through the notion of iterating through a list.

Assignment: 1. Write a program to print out the ASCII characters associated with the numbers from 0 to 255.
Assignment 2: Write a program that reads in each word of a corpus, and then prints them out. Optionally, you may make the program smart enough to output each word only one.

Hashes (also known as associative arrays). Using hashes to count words. The keys and the values of a hash. Iterating through the keys of a hash. Printing the words in a hash.
4

Condiitionals and loops.

Assignment 1: Write a program that counts the frequency of each letter in a corpus, and the frequency of each pair of letters; output them in frequency-ranked order.

Assignment 2: Write a program that does the following: First, it reads a corpus. Then, when the user types in a letter L, it outputs the frequency ranked list of letters that follows L in the corpus, and then it outputs the frequency ranked list of letters that precedes L. ("L" can be any letter that the user types.)

 

Sorting a list alphabetically. Sorting a list using a different relation (e.g., sort induced by hash value).
5 Lists: push, pop, shift, unshift, splice; reverse, join, map;

Regular expressions

Examples from class.

6 Subroutines  
7 Implementing the Porter stemming algorithm  
8    
9 Writing an Earley parser  
10