Introduction to programming for linguists
John Goldsmith
Winter 2003
We will use Perl to learn programming for linguists. The primary text will be Teach Yourself Perl in 21 Days: despite the name, you won't have to teach it to yourself, and it's a very good book, both for learning from and for keeping as a reference book later on. Perl is an excellent language: just suited for linguists' purposes, relatively easy to learn (compared to C or C++), free for all platforms, and a good stepping stone towards learning C++ if you decide to go more deeply into programming. I haven't ordered copies of the book at the bookstore, because you can get it cheaper on-line, and you can also get used copies on-line.
Links for you on Perl:
1. http://www.activestate.com/Products/ActivePerl/Download.html
is the best link to use to download Perl to your computer.
2. Check out http://www.perl.com/
, from O'Reilly publishers, as a first link to information on
Perl.
3. The Comprehensive Perl Archive Network (CPAN)
is a good second link to information on Perl.
4.
Nice introduction
to Perl from the University of Missouri.
5. Another very
nice introduction, this one from the University of
Kansas!
6. Very good step-by-step explanation
to getting Perl set up under Windows by Selena Sol and Nikhil
Kaul.
7. In the way of fun, sort of: John Lawler
(University of Michigan) has an interesting discussion of a simple Perl program
that creates sentences that...well, just take a look for yourself.
We'll write several linguistic projects this quarter. My goal is to do the following projects with you:
1. A program to read a corpus, produce a list of words with frequencies, ranked
alphabetically or by frequency.
2. A program to test the validity of Zipf's law.
3. A program to strip affixes in English.
4. An Earley parser for a fragment of English.
But whether we get through the fourth project depends on us. If we only get the first three done, I would not be surprised (or disappointed).
** Think about making your output be an HTML file. ** This is an easy way to
get very professional looking output with very little trouble. Take a look at
a basic primer on HTML -- you'll be surprised how simple it is (for example,
check out http://www.utexas.edu/learn/html/
). All you need to do is have some statements like
print "<FONT COLOR = red>", $MyText, "</FONT>;
and -- BANG! -- your word is printed in red. Remember
to make the name of the file that you create end in ".htm".
The material in the table below follows the textbook closely up to Week 6; please read the material in the text before class during this portion of the course.
| Week | ||
| 1 | Introduction | Dowloading and installing Perl. Text editors. Obtaining text files, dictionaries, corpora. |
| 2 | Scalar data: strings and numbers. Names of variables for scalars. Initializing variables. | Adding, incrementing, counting. Input from keyboard, print to screen, and input from a text file. |
| 3 |
Lists: arrays and hashes. Assignment: 1. Write a program to print out the ASCII characters associated
with the numbers from 0 to 255. |
Hashes (also known as associative arrays). Using hashes to count words. The keys and the values of a hash. Iterating through the keys of a hash. Printing the words in a hash. |
| 4 |
Condiitionals and loops. Assignment 1: Write a program that counts the frequency of each letter in a corpus, and the frequency of each pair of letters; output them in frequency-ranked order. Assignment 2: Write a program that does the following: First, it reads a corpus. Then, when the user types in a letter L, it outputs the frequency ranked list of letters that follows L in the corpus, and then it outputs the frequency ranked list of letters that precedes L. ("L" can be any letter that the user types.)
|
Sorting a list alphabetically. Sorting a list using a different relation (e.g., sort induced by hash value). |
| 5 | Lists: push, pop, shift, unshift, splice; reverse, join, map; |
Regular expressions |
| 6 | Subroutines | |
| 7 | Implementing the Porter stemming algorithm | |
| 8 | ||
| 9 | Writing an Earley parser | |
| 10 |