Using a Concordancer in Literary Studies.

by

Maria Rosario Caballero Rodriguez University Jaume I, Castellón

The aim of this paper is to introduce concordancers to all those who still do not know about them or, if they do, are not aware of their potential for developing new research techniques. I also attempt to ‘reconcile’ linguistics (and, within it, computational linguistics) and literary criticism, on the grounds that serious analysis (in linguistic terms) of the texture of the literary piece under discussion is an essential part of literary criticism, even if linguistics is not the only discipline to be employed.

The issues to consider here are then (1) what are concordancers? (2) What is their application in literary studies? (3) What are their main characteristics? Also, in order to illustrate the application of concordancers to literary studies, I will make a few comments on my own research.

1.- What is a concordancer?

For those who are not used to working with a concordancer, I should start saying that it is one of the simplest but, at the same time, most powerful tools to elicit certain types of information -in a quick and effective way- from the diverse corpora (electronic or not) available nowadays. Everybody knows that a concordance is an alphabetical listing of words in a text (called search/key words) together with the contexts in which they appear, and the concordancer is the program which supplies these listings. Furthermore, even the most basic concordancers can provide statistical lists for all the words in the text/corpus object of study, statistics which can appear in alphabetical order or with regard to the frequency of occurrence of each word.

The working procedures are very similar in all concordancers: the first basic step is to load the text(s) you want to work with (you need a scanner for this, but you can also download the texts from the Guttenberg Project), after which the different menus of the program will be displayed on the screen of your computer (the number of menus as well as their more or less ‘sophisticated’ quality differs considerably from one concordancer to another). Then, you can either search for a particular word/string, a collocation, or simply a statistical recount of words (again different concordancers allow for different things, but all of them present at least the above mentioned basic options).

Usually, the search term is displayed in the centre of a window, along with a context consisting of a set number of preceding and following characters. Once the search is completed, the results may be sorted, saved or printed and, if you don’t want to save the whole concordance, you can select the number of results you want to save (number of lines of concordance). Also, clicking on a concordance line leads to the display of the search word with several lines of context in the small upper window (whose contents may also be saved or printed).

2.- Applications of concordancers

Concordancers have been widely used in linguistics, above all in text-type studies which rely on quantitative analysis. Yet, linguistics is not the only field where concordancers may prove useful; apart from genre studies, translation and literary criticism might also be benefitted from the advantages these programs present. Thus, if stylistic analysis is nowadays applying concordancers to detect cases of literary influence (in some cases mere plagiarism), by the same token, the programs can be useful to detect cases of intertextuality, be it intertextuality ‘proper’ (specific texts overtly drawn upon within a text), or interdiscursivity (how a discourse type is constituted by a combination of other discourse types). Obviously, concordancers do not work on ‘nothing’: you need a first ‘intuition’ to start with but, once you have it, concordancers are very simple to use and save a great deal of time: collocation, frequencies (and their implications regarding lexical cohesion), or intertextual links among texts are spotted quickly and efficiently.

When I began my research, I had no idea how helpful a concordancer would prove. My object of study was Angela Carter’s "The Company of Wolves", a tale classified as one of the landmarks of feminist revised versions of fairy-tales, classification which I discussed in my work. Thus, although I did not deny its evident intertextual connections with the famous story of Little Red Riding Hood, I had also found traces of other texts and, above all, of other generic types. The purpose of my work was, then, to see which generic type(s) were present in the tale, how they showed in it and, above all, why.

Norman Fairclough’s Critical Discourse Analysis was adopted as the framework for the analysis, an approach that attempts both a description of discourse (as practised within linguistics by discourse analysts) and, mostly, an interpretation of discourse as social practice, aiming at making the opaque aspects of discourse more visible (discourse that is determined to a great extent by hegemonic ideologies). A key issue in Fairclough’s framework is that of the heterogeneous nature of texts, heterogeneity that can be exploited by the participants of the discursive practices for political purposes ("a strategy for dealing with the problematisation of one’s position is to be creative, to put together familiar discourse types in novel combinations as a means of finding new ways of doing things to replace the now-problematic old ones.", Fairclough 1989:171). The first task in my analysis was, then, to establish the degree of heterogeneity of the text under discussion, therefore establishing what generic type(s) it drew upon.

In order to delimitate the kind of genres present in our text, I also took into account the kind of multidimensional approach proposed by Biber (1989), which accounted for a number of text-types regarding both their linguistic co-occurrence patterns and their shared functions. His approach attempts to identify the linguistic parameters along which genres vary, and to do so he incorporates quantitative analysis (computer-based) to account for the co-occurrence of bundles of linguistic features in texts, which will be later ‘qualitatively’ analysed and interpreted with regard to the underlying function of the texts.

My main work hypothesis was that Carter’s tale was mainly constructed upon two different generic types –religious sermons and folk tales- which, although theoretically and ‘functionally’ different, shared a certain number of traits which made them susceptible of being combined within a single text with a clear purpose in mind. I then proceeded to establish the frequency of lexical apparitions and collocations, together with verb tenses and the like with the help of a concordancer, to prove my hypothesis. Once this was achieved, I realised that both generic types mixed in the text in a very interesting way, ‘cross-referring’ to each other as the narrative unfolded, and here also the concordancer proved really useful. I also made use of it to establish the cohesive patterns of the text, as well as certain features of Carter’s style (such as the recurrent use of hypallages or transferred epithets).

Once this was done, I searched for the explicit intertextual links of Carter’s tale with other texts. Apart from the obvious connections with different versions of Little Red Riding Hood, the text also hinted at some links with religious writings, certain poets, some Renaissance works (above all, the recurrent use of a number of images and expressions), and gothic novels, all of which I either scanned or downloaded from Internet sources to run the concordance program and spot the appearances. Obviously, all these things can also be –and still are- done ‘manually’, but at the risk of wasting a considerable amount of time and, also, of losing our patience more often than not!

The concordancer used in my research was the commercial program called Monoconc. Yet, as my work advanced, I became more interested in this kind of programs, and started to search the net for other resources of the like. As it happens, I found two other concordancers which were rather attractive, relatively easy to use and, above all, free. I considered then that there might be more researchers interested in knowing a bit more about them: how they worked, as well as which were their advantages/disadvantages when compared to a commercial one.

3.- Types of concordancers

The programs under discussion are MonoConc (commercial), and two free concordancers available in Internet (ConcApp and Wconcord), all of which have been developed for PCs working with Windows system. I have also included a few interesting Internet addresses at the end of this paper for those interested in further information.

3.1.- Commercial Concordancers: Monoconc

Monoconc (designed by Michael Barlow and published by Athelstan) is a program that is relatively easy to run and is useful to establish lists of simple concordances (word, lexeme, or phrase), conveniently specified in the Search Parameters option.

The different menus available in the program will appear once you have loaded the text(s) you want to work with, and the kind of screen you will visualise once you have established a concordance is this

You may be interested in obtaining some simple frequency information about your text/corpus. This can be obtained by choosing Frequency from the Collocation menu. You can obtain frequency lists arranged alphabetically or by frequency order. You can also choose from a range of frequency options, such as obtaining the list of content words in your text. To do this, you must edit a word list with all the words you don’t want to appear in the frequency results, or you can select a minimum/maximum frequency for the words in the list.

The advantages of the program are: (1) it is easy to operate, (2) the presence of the upper window for full context display (which allows for quick access to the original text), (3) the possibility of editing lists of words in the frequency options. Yet, the disadvantages outnumber the advantages: Monoconc offers a limited set of search options (simple words, lexemes, or specific strings of words), together with limited editing options (both of the text(s) and the concordances). Besides, you can only work with previously loaded texts; therefore, if you change your mind and want to work with some texts from your hard disk, you have to load them each time (all of which results in a considerable loss of time).

3.2.- Freeware Concordancers: Wconcord, ConcApp.

Wconcord

This is a Windows concordancer specially designed for those with little or zero previous experience with this type of programs. It was developed by Zdenek Martinek from West Bohemia University, Pilsen (Czech Republic) and Les Siegrist from the Technische Hochschule Darmstadt (Germany). The program is available at: http://www.pef.zcu.cz/andy/martinek/wconcord (if you have any doubt you can contact the authors at these addresses: martinek@top.cz and siegrist@hrz1.hrz.th.darmstadt.de)

The program allows for the search of words (Word Search), lexemes and their conjugations or inflections (Lemma), or the search of ‘complex collocations’ (Advanced Word Search), such as phrases, verb tenses or any other syntactic form consisting of more than one word. Besides, Wconcord provides the user with a number of macros which turn the different procedures into a very easy job (you can also select the different options by clicking the right button in your mouse).

The display of the resulting concordance is very similar to the one in Monoconc (two windows, one for the concordance itself and another one for the context), the only change is the number of macros available.

The program also builds frequency lists (Build Word Frequency List from Tools menu. Also the Statistics option from Edit menu apart from providing the total number of words, paragraphs, and sentences in the text, also provides a recount of word types (versus word tokens) making the analyst’s task a bit easier.

One of the most interesting features of Wconcord is the option Advanced Search, which allows for the search of complex strings such as verbal tenses in a very ‘open’ way: you can select the number of words occurring between those in the string searched, as well as the order of constituents:

The program is slightly more advanced than Monoconc, and among its advantages we can list: (1) simplicity combines with wider search possibilities (a substantial improvement is the Advanced Search option), (2) a number of macros as well as the accessibility to many options through the right button of the mouse (which accelerates considerably the whole process). The main disadvantages are similar to those in MonoConc: the constraint of having to work with previously loaded texts, and the narrow editing options. Yet, the program is an improved version of Monoconc (and don’t forget it is free!).

ConcApp

Concordance Application (ConcApp) was developed by Chris Greaves, and there are two different versions available in the net (http://vlc.polyu.edu.hk/pub/concapp/concapp.htm), the first of which (ConcApp Concordancer Browser Version1) can be run both in Windows 3.1 and Windows 95, and has a useful Help file (which does not exist for the other version yet). We will comment on the latest version, which has as its main attractive feature the inclusion of a powerful editing program.

ConcApp allows for the search of a word, phrase (20 characters maximum), or any occurrence of a word with a given prefix/suffix.

At first sight, the most outstanding feature in the program’s menu display is its similarity to the tool bar of a text processor like Microsoft Word (menus and macros). Also, ConcApp allows you to work with any text, previously loaded or not, saved in the hard disk of your computer.

Once a concordance is established, the screen appears with the following format:

We can also see the number of collocates for the different concordances, as well as building a frequency list for them

Finally, the program also allows for the building of frequency lists for all the words in a text, as well as the frequency of any given word, and this can be obtained from any text in the hard disk.

The advantages of the program are then fundamentally two: the possibility of working with any text (without having to load it), and the editing options. Yet, the search options are very similar to those seen in Monoconc.

3.3.- Conclusions

As we have seen, all concordancers cover for a number of basic functions, the main differences among them being the editing, searching, and text-loading options and the number of facilities to quicken these processes. In this sense, both Wconcord and ConcApp seem to represent a step further when compared to MonoConc (above all, with regard to the number of macros they incorporate). Yet, if we had to choose between them, the task would prove a difficult one. The difficulty lies paradoxically in the most outstanding features of both programs: if Wconcord has as its main attractive feature its wide search options and the quick access to the different options, Concapp has more editing options and a great accessibility to different texts. The ideal situation would be a mixed version. The choice also depends on our personal research: if you are working with different texts at once, you may find ConcApp more useful due to its facilities to work with texts from your hard disk without having to load them each time; if you just work with one text, Wconcord is very convenient though. In any case, they are easier to use than Monoconc, and they are free.

Interesting addresses and other concordancers

There is also a growing number of concordancers which have been adjusted for specific corpora, and which do more things than just provide word lists (sophisticated as these might be). Two examples of this are:

Blueletter Bible Project

A program devised by Mark-Jason Dominus from Pennsylvania University (http://www.khouse.org/blueletter), and applied to King James Authorised Version of the Bible. The program includes other corpora such as: Topical Bible (Nave), Bible Dictionary (Easton), New Topical Textbook (Torrey), Thematic Subject Guide, Dictionary of New Testament Words (Vine) and Bible Names Dictionary (Hitchcock). Here you can look for any quote/word from the Bible. The program also provides information about the different senses of a word and the corresponding concordances for each sense.

Web Concordance

It was developed in the English Department of Dundee University (http://www.dundee.ac.uk/English/), and adapted to the literary study of the following authors: PB Shelley (Selected Poems), ST Coleridge (The Ancyent Marinere), John Keats (The Odes of 1819), William Blake (Songs of Innocence, Songs of Experience), Wordsworth and Coleridge (Lyrical Ballads, 1798), and Gerald Manley Hopkins (Poems, First Edition 1918)

Mackintosh concordancers

1. Conc 1.76 (http://www.sil.org/computing/conc/). A Conc Tutorial is also available. More information in the Summer Institute of Linguistics of Dallas (Texas) with the following address: http://www.sil.org/

2. Free Text from Michigan University. Available in the following address: ftp://nora,hd.uib.no/pub/mac/

Interesting sites

An interesting address is http://www.Idc.upenn.edu/, which corresponds to that of the Linguistic Data Consortium of Pennsylvania University. If you contact them, they will provide you with a password and an ID number and you will be able to search for concordances from the Brown Text Corpus and the TIMIT Speech Corpus (the only disadvantage being that you have to be on-line).

Something very similar is what happens in Web Concordancer (the responsible person being Chris Greaves), where you can search for concordances in the Brown Corpus, LOB, or in a miscellany of texts and articles.