Posted in corpus linguistics, techy

On analysing concordance lines

I start this post by giving a very quick introduction to concordances. If you are already an experienced corpus linguist, you can skip to the final section on categorising concordance lines. I am curious about your own practices for analysing concordance lines: do you print them out and highlight the different patterns? Or do you annotate the lines electronically, using a concordancer or a spreadsheet? Is there any other option that hasn’t occurred to me yet?

The basic display format in corpus linguistics

In the past year or so I was pre-occupied with relatively abstract, ‘big picture’-style analyses of my corpus (basically key key word and collocation analysis), but now I have come across a theme for which a smaller-scale, qualitative analysis is more appropriate. (Once I’ve wrapped it all up, I hope to share some insights. Or you may have to wait for my thesis to get done …).

For me as a corpus linguist, the go-to tool for any qualitative investigation  is the concordancer. As the name suggests, it produces concordances. A concordance is the basic display format in corpus linguistics that lists snippets of the text, illustrating the use of a particular word or phrase in a corpus. Concordance analysis has brought the discipline a long way, especially when Sinclair developed very systematic ways of analysing concordance lines for making dictionaries. (Sinclair’s guidelines are recorded in his book Reading Concordances; it’s a shame that Google Books has no preview …).

Consider this quote from Martin Wynne’s (2008, online) handbook chapter on concordancing:

For many linguists, searching and concordancing is what they mean by “doing corpus linguistics”.

The way we read concordance lines is quite different from the way we read a text. This  vertical reading may take some time getting used to. Here’s an example, concordance lines  for language on WebCorp:


You can also use WebCorp to produce concordance lines from the web; or you can access corpora that are available online with integrated concordance functionality, such as the BNCweb or the  BYU corpora. (If you want to run concordances on specialised subcorpora on the BYU interface, you might be interested in the slides and the handout from my session at the University of Birmingham Summer School in Corpus Linguistics this year).

Of course, we often want to use corpus linguistic tools on materials that haven’t been made widely available, because it is often necessary to prepare a corpus from scratch for a particular research question. To create concordances for your own texts you using concordancers like AntConc and WordSmith Tools (which you could buy if your institution doesn’t  have a license).

What are your personal preferences for analysing concordance lines?

Concordance analysis is all about viewing a word (or phrase) in its co-text to identify any patterns in the way it is used. It’s often helpful to resort the concordance lines. Concordance tools usually let you resort based on the surrounding words (in positions 1-5 or more on the left and right).


According to Martin Wynne (2008, online),

[t]his type of manual annotation of concordance lines is often done on concordance printouts with a pen. Software which allows the annotation to be done on the electronic concordance data makes it possible to sort on the basis of the annotations, and to thin the concordance to leave only those lines with or without a certain manual categorisation.

Personally, I usually start with a print out of the simple concordance lines. Then, once I have identified some simple categories I often move on to an Excel spreadsheet. I like being able to add columns for categories (I should just not overdo it, like in the photo…). Moreover, in some versions of Excel, it is possible to select and change the font of particular words in the same cell (seems to work on Excel for Mac but not for Windows). That way, I can highlight the word or phrase which prompts the category for the concordance line. It is also possible to assign a concordance to particular categories.


wst_set_coloursSome concordancers provide functionality for categorizing concordance lines. In WordSmith Tools it is possible to assign categories (‘sets’). I have only recently tried this function and I’m quite impressed with the range of colours that are available, which you can see in the screenshot on the left. More information is available from the manual. BNCweb also provides a (simple) categorisation function with up to 6 categories. In the example from the screenshot below we would distinguish between can as the modal verb and can as the container for a drink. Of course, the modal is much more frequent (in general language usage, not in a text about coke cans…). Therefore all the example concordance lines represent the modal usage.


I am curious about these features and in how far people use them. If you don’t use these functions, how else do you categorise concordance lines? Do you do it manually, after printing out? In practice, how often do you analyse concordance lines? Are they quite important in your research or do you focus on more quantitative aspects, checking concordance lines when necessary?

Further reading

Sinclair, J. (2003). Reading Concordances: An Introduction. Harlow: Pearson/Longman.
Wynne, M. (2008). Searching and concordancing. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics: An International Handbook (Vol. 1, pp. 706–737). Berlin: Mouton de Gruyter. [pre-publication draft available online]


I am a research fellow on the CLiC Dickens project at the Centre for Corpus Research, University of Birmingham. My research interests focus on the use of corpus linguistic tools to identify meaning in texts. In the CLiC Dickens project we develop and use methods to study the language of literary texts, particularly in Dickens’s and other 19th century fiction. My PhD research seeks to understand connections in discourse through a corpus linguistic approach. Specifically, I study how the concept of surveillance is represented in different types of texts. This blog reflects my personal opinions and not those of my employers.

2 thoughts on “On analysing concordance lines

  1. My wife and I used to exploit the very randomness of concordance lines to make small “works of art” to frame as Christmas presents for friends. The whole point was to juxtapose lines which were quite disparate, starting with search terms which were surnames (Moss, Bell, Badger) or first names (Heather, Rose), favourite foods, etc. To see what I mean, look at We haven’t actually made any now for ten years or so.

    Liked by 1 person

    1. Thanks for sharing, that’s a creative idea! And it reminds me of posters I’ve seen in a bookshop of all words in the first chapter from famous books; for example a Dickens novel, with the words being shaped according to the silhouette of the author’s face. Haven’t seen any concordances, though. They can be pretty to look at, you’re right 🙂


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s