Base Textes de Français Ancien
Textes de Français Ancien Database

The Textes de Français Ancien Database is presented by the Laboratoire de Français Ancien (University of Ottawa) and the ARTFL Project (University of Chicago).

Search the 60 documents of the Textes de Français Ancien database, covering the 12th-15th centuries, or consult the Bibliography.

The TFA database contains 63,803 Types (unique word forms) and 1,276,795 Tokens (total words).

Sample search: set Author to renart and Search Corpus for amor, which searches 1 document and finds 35 hits.


Define Corpus (leave blank to search whole database):
Author: (ex. renart)
Title:.... (ex. chevalier)
Dates:.. (ex. 1180)
Genre:.. (ex. dramatic or narrative)
Type of Document:.. (ex. verse or prose)

Search Corpus:
Word, words or phrases:
Examples: cortois or triste.* or r.ine|princes.e or saint pEre (with Phrase Search)
Notes: The vertical line (|) is the OR operator, space or carriage return is the AND operator for co-occurence and phrase searches (also see Pattern Matching). Accented characters are represented by two characters (a\ = à).

Search Options: Phrase Search
Output Options: KWIC Report Frequency by Title

Submit Query: press to or press to


Send comments or questions to Pierre Kunstmann: kunstman@rogers.wave.ca or Mark Olsen: mark@barkov.uchicago.edu.

Accent Representation

The default keyboard representation of accented characters in X-Mosaic, running under the UNIX operating system, is made up of two character strings with the vowel followed by the appropriate symbol. These are

grave = back slash.  Example: à --> a\
aigu = forward slash.  Example: é --> e/
circonflexe = caret.  Example: ê --> e^
cedille = coma.  Example ç --> c,
trema = double quote.  Example  ö --> o"
Capital vowel = match all accents.  Example E --> é ê è and no accent.

Return to Form

Define Corpus by Author

Select all of the works in the data base written by one or more authors. In the Author: box, type the author's last name without accents, such as renart or coinci. Compound names should be represented as the most peculiar string. Thus, Chrétien de Troyes should be chretien or troyes. A corpus defined as the works written by three writers can be represented as

coinci,chretien,renart
This returns 5 works in the data base written by the three authors, two by Gautier de Coinci, one by Chrétien de Troyes, and one by Jean Renart. The comma between the authors' names serves as an "OR" operator.

Return to Form

Define Corpus by Title

Select one or more titles in the data base. In the Title: box enter either a single word or a string of words in double quotes, without accented characters. Searching for eracle will select Gautier d'Arras's Eracle. Title searching on a single common word results in the selection of multiple titles containing that term. Thus, searching on title yvain will result in the selection of Chrétien de Troyes's Le Chevalier au Lion (Yvain), in two versions; a search for title roman creates a corpus of six works. Title searching also supports the "OR" operator in the same way as author searching.

Complete titles can also be searched by entering the complete title, as it is listed in the TFA bibliographies, with or without double quotes.

Return to Form

Define Corpus by Date Range

Select texts to be searched by the year assigned to them by the TFA, an approximation of the year or years of their composition. Entering 1265-1321 selects all texts assigned composition dates between these years. If a text is assigned more than one year as its date of composition, the database only takes into account the first date. To facilitate search mechanisms, approximate dates, such as "End of 14th century," have been listed as precise dates, 1399.

Return to Form

Define Corpus by Genre or Type

"Genre": the TFA database currently contains narrative and dramatic works.

"Type of Document" can be either verse or prose.

Return to Form

Word, Co-occurence and Phrase Searching

You may enter one or more words or patterns for searching. It is important to note that the vertical line (|) serves as the logical OR operator and the space or carriage return serves as the logical AND operator. Thus, chat|chien will search for either chat or chien. By contrast, entering chat chien or

chat
chien
will find occurences of chat AND chien. By default, the TFA Database will search for co-occurences within the same sentence (that is, the presence of both words, regardless of order or proximity). Clicking on the Phrase Search button will restrict the search to adjacent words in the order specified. Thus, by default, searching for biau cheval will find all sentences containing both biau and cheval (7 occurrences in the entire database). When the phase search is selected, the search will only retrieve the phrase biau cheval (one occurrence).

Return to Form

Pattern Matching

Given variations in the spelling of these texts, pattern matching is an important component of searching for terms in the TFA database. Pattern matching allows the user to specify a large number of words corresponding to a defined pattern. The search term c.*rtoi.* will result in all of the words that begin with "c", followed by an unspecified number of characters, followed by "rtoi", followed again by an unspecified number of characters. This search finds cortois, cortoisement, courtoisement, courtoisie, curtoise, etc. The # indicated capitalized words and can be used to distinguish between common and proper names: #pierre specifies all of the capitalized occurrences of Pierre.

The pattern matching conventions correspond to UNIX "regular expressions," a pattern matching language which is described in a number of UNIX manuals. Also see Brian Kernighan and Rob Pike, The UNIX Programming Environment (1984): 102-5 for a discussion of regular expressions.

The most commonly used regular expression operators used in PhiloLogic searches and wordlist generation are:

. (period)               -- matches any single character;
.* (period asterisk)     -- matches any string of characters;
# (hash mark)            -- match proper names only;
E (capital vowel)        -- match all accented and non-accented forms;
| (veritical line)       -- or: uomo|donna 
[a-z]  -- matches a single character found in the specified range;
Return to Form

Display result frequencies by title

This option does not display text. Rather, it is used to generate a count of search results in descending order by title. You may specify any valid corpus and search pattern(s).

Return to Form

Concordance and KWIC reports

The TFA database produces initial text reports in two formats, concordance and Key-Word-In-Context or KWIC, with the default being a concordance report. Both reports indicate the number of texts searched, the terms searched for in the corpus, and the number of total occurences. Following this general information is a list of occurences. An abbreviated bibliographic citation with page or folio number precedes each occurrence. The page number is linked to the TFA page server, which allows users to retrieve the entire page, with the keyword highlighted. An asterisk following the abbreviated title indicated that the edition of the work is under copyright. In this case, no page context is available.

In the concordance report, context consists of about 40 words on each side of the keyword. The KWIC report centers and emphasizes the keyword in a single line of text. At the end of the report, full bibliographic references for each work cited are displayed.

Return to Form


Word Frequency Exploder

Pattern: (ex. dieu)

Press to or press to

This is a "floating search"; it will find all occurrences of the indicated string of characters, as whole words or as a portion of a word. Use ARTFL regular expressions for accented characters, except that the upper case character (for both unaccented and accented) is not implemented. Results are sorted alphabetically, with the number of hits in the first column. This is used to check parsing of input data and as a cross-reference for hits on main search engine results.


Return to the main page of the Textes de Français Ancien Database

email: Mark Olsen, (mark@barkov.uchicago.edu), The ARTFL Project