I start this post by giving a very quick introduction to concordances. If you are already an experienced corpus linguist, you can skip to the final section on categorising concordance lines. I am curious about your own practices for analysing concordance lines: do you print them out and highlight the different patterns? Or do you annotate the lines electronically, using a concordancer or a spreadsheet? Is there any other option that hasn’t occurred to me yet?
The basic display format in corpus linguistics
In the past year or so I was pre-occupied with relatively abstract, ‘big picture’-style analyses of my corpus (basically key key word and collocation analysis), but now I have come across a theme for which a smaller-scale, qualitative analysis is more appropriate. (Once I’ve wrapped it all up, I hope to share some insights. Or you may have to wait for my thesis to get done …).
For me as a corpus linguist, the go-to tool for any qualitative investigation is the concordancer. As the name suggests, it produces concordances. A concordance is the basic display format in corpus linguistics that lists snippets of the text, illustrating the use of a particular word or phrase in a corpus. Concordance analysis has brought the discipline a long way, especially when Sinclair developed very systematic ways of analysing concordance lines for making dictionaries. (Sinclair’s guidelines are recorded in his book Reading Concordances; it’s a shame that Google Books has no preview …).
Consider this quote from Martin Wynne’s (2008, online) handbook chapter on concordancing:
For many linguists, searching and concordancing is what they mean by “doing corpus linguistics”.
The way we read concordance lines is quite different from the way we read a text. This vertical reading may take some time getting used to. Here’s an example, concordance lines for language on WebCorp:
You can also use WebCorp to produce concordance lines from the web; or you can access corpora that are available online with integrated concordance functionality, such as the BNCweb or the BYU corpora. (If you want to run concordances on specialised subcorpora on the BYU interface, you might be interested in the slides and the handout from my session at the University of Birmingham Summer School in Corpus Linguistics this year).
Of course, we often want to use corpus linguistic tools on materials that haven’t been made widely available, because it is often necessary to prepare a corpus from scratch for a particular research question. To create concordances for your own texts you using concordancers like AntConc and WordSmith Tools (which you could buy if your institution doesn’t have a license).
What are your personal preferences for analysing concordance lines?
Concordance analysis is all about viewing a word (or phrase) in its co-text to identify any patterns in the way it is used. It’s often helpful to resort the concordance lines. Concordance tools usually let you resort based on the surrounding words (in positions 1-5 or more on the left and right).
According to Martin Wynne (2008, online),
[t]his type of manual annotation of concordance lines is often done on concordance printouts with a pen. Software which allows the annotation to be done on the electronic concordance data makes it possible to sort on the basis of the annotations, and to thin the concordance to leave only those lines with or without a certain manual categorisation.
Personally, I usually start with a print out of the simple concordance lines. Then, once I have identified some simple categories I often move on to an Excel spreadsheet. I like being able to add columns for categories (I should just not overdo it, like in the photo…). Moreover, in some versions of Excel, it is possible to select and change the font of particular words in the same cell (seems to work on Excel for Mac but not for Windows). That way, I can highlight the word or phrase which prompts the category for the concordance line. It is also possible to assign a concordance to particular categories.
Some concordancers provide functionality for categorizing concordance lines. In WordSmith Tools it is possible to assign categories (‘sets’). I have only recently tried this function and I’m quite impressed with the range of colours that are available, which you can see in the screenshot on the left. More information is available from the manual. BNCweb also provides a (simple) categorisation function with up to 6 categories. In the example from the screenshot below we would distinguish between can as the modal verb and can as the container for a drink. Of course, the modal is much more frequent (in general language usage, not in a text about coke cans…). Therefore all the example concordance lines represent the modal usage.
I am curious about these features and in how far people use them. If you don’t use these functions, how else do you categorise concordance lines? Do you do it manually, after printing out? In practice, how often do you analyse concordance lines? Are they quite important in your research or do you focus on more quantitative aspects, checking concordance lines when necessary?