Posted in Conferences/events, corpus linguistics

Coming up: Corpus Linguistics 2017, Birmingham

Just a quick update: the registration for the CL2017 conference at the University of Birmingham (24-28 July) is closing very soon on 30 July.  [I shall admit openly that I’m part of the organising committee and therefore advertising it – but genuinely think this conference will be a good one!] This conference series is one of the biggest corpus linguistics events in Europe that runs bi-annually and has been hosted at the universities of Birmingham, Lancaster and Liverpool. The CL2017 programme contains streams related to a variety of CL applications. Of particular interest will be the plenary papers by Susan Conrad, Andrew Hardie, Susan Hunston, Christian Mair, Dan McIntyre and Mike Scott.

IMG_6504.JPG

Posted in academia, Conferences/events, corpus linguistics

Free teacher workshop: corpus stylistics for the English classroom

I have recently started working as a Research Fellow on the CLiC Dickens project at the Centre for Corpus Research, University of Birmingham. The main focus of this project is the custom-developed CLiC web app, which allows to use corpus tools – i.e. search, concordance, find clusters (repeated phrases) etc. – in Dickens’s novels and other 19th century fiction.

Next week the CLiC Dickens project is hosting a free workshop for English teachers (and those interested in/researching teaching methods for literature): ‘Corpus stylistics for the English classroom‘ at the University of Birmingham on June 16, 2017. If you’re interested, please do check the event link. Registration is easy & free via email (to me) and refreshments will be provided :).

You can also check out some of the CLiC functionality in this recent video tutorial that introduces the CLiC KWICGrouper; a new approach to sorting concordances! (Read my previous blog post for more information on reading, sorting and analysing concordances.)

As the CLiC Dickens project is about corpus linguistics and meaning, the work is pretty ‘close to home’ (it’s also physically in the same department) in terms of my previous work. At the same time, there are some new directions in it for me: corpus stylistics is concerned with meaningful patterns in literature (mainly, anyway) and this is quite different from my PhD research which looks at non-fiction (academic writing, blog posts and newspaper articles). Moreover, the CLiC project combines its corpus stylistic approach with ‘cognitive poetics’, which is another really exciting direction.

Posted in corpus linguistics, techy

On analysing concordance lines

I start this post by giving a very quick introduction to concordances. If you are already an experienced corpus linguist, you can skip to the final section on categorising concordance lines. I am curious about your own practices for analysing concordance lines: do you print them out and highlight the different patterns? Or do you annotate the lines electronically, using a concordancer or a spreadsheet? Is there any other option that hasn’t occurred to me yet?

The basic display format in corpus linguistics

In the past year or so I was pre-occupied with relatively abstract, ‘big picture’-style analyses of my corpus (basically key key word and collocation analysis), but now I have come across a theme for which a smaller-scale, qualitative analysis is more appropriate. (Once I’ve wrapped it all up, I hope to share some insights. Or you may have to wait for my thesis to get done …).

For me as a corpus linguist, the go-to tool for any qualitative investigation  is the concordancer. As the name suggests, it produces concordances. A concordance is the basic display format in corpus linguistics that lists snippets of the text, illustrating the use of a particular word or phrase in a corpus. Concordance analysis has brought the discipline a long way, especially when Sinclair developed very systematic ways of analysing concordance lines for making dictionaries. (Sinclair’s guidelines are recorded in his book Reading Concordances; it’s a shame that Google Books has no preview …).

Consider this quote from Martin Wynne’s (2008, online) handbook chapter on concordancing:

For many linguists, searching and concordancing is what they mean by “doing corpus linguistics”.

The way we read concordance lines is quite different from the way we read a text. This  vertical reading may take some time getting used to. Here’s an example, concordance lines  for language on WebCorp:

screen-shot-2016-11-17-at-17-55-23

You can also use WebCorp to produce concordance lines from the web; or you can access corpora that are available online with integrated concordance functionality, such as the BNCweb or the  BYU corpora. (If you want to run concordances on specialised subcorpora on the BYU interface, you might be interested in the slides and the handout from my session at the University of Birmingham Summer School in Corpus Linguistics this year).

Of course, we often want to use corpus linguistic tools on materials that haven’t been made widely available, because it is often necessary to prepare a corpus from scratch for a particular research question. To create concordances for your own texts you using concordancers like AntConc and WordSmith Tools (which you could buy if your institution doesn’t  have a license).

What are your personal preferences for analysing concordance lines?

Concordance analysis is all about viewing a word (or phrase) in its co-text to identify any patterns in the way it is used. It’s often helpful to resort the concordance lines. Concordance tools usually let you resort based on the surrounding words (in positions 1-5 or more on the left and right).

 

According to Martin Wynne (2008, online),

[t]his type of manual annotation of concordance lines is often done on concordance printouts with a pen. Software which allows the annotation to be done on the electronic concordance data makes it possible to sort on the basis of the annotations, and to thin the concordance to leave only those lines with or without a certain manual categorisation.

Personally, I usually start with a print out of the simple concordance lines. Then, once I have identified some simple categories I often move on to an Excel spreadsheet. I like being able to add columns for categories (I should just not overdo it, like in the photo…). Moreover, in some versions of Excel, it is possible to select and change the font of particular words in the same cell (seems to work on Excel for Mac but not for Windows). That way, I can highlight the word or phrase which prompts the category for the concordance line. It is also possible to assign a concordance to particular categories.

categorising_concordance_lines

wst_set_coloursSome concordancers provide functionality for categorizing concordance lines. In WordSmith Tools it is possible to assign categories (‘sets’). I have only recently tried this function and I’m quite impressed with the range of colours that are available, which you can see in the screenshot on the left. More information is available from the manual. BNCweb also provides a (simple) categorisation function with up to 6 categories. In the example from the screenshot below we would distinguish between can as the modal verb and can as the container for a drink. Of course, the modal is much more frequent (in general language usage, not in a text about coke cans…). Therefore all the example concordance lines represent the modal usage.

 

I am curious about these features and in how far people use them. If you don’t use these functions, how else do you categorise concordance lines? Do you do it manually, after printing out? In practice, how often do you analyse concordance lines? Are they quite important in your research or do you focus on more quantitative aspects, checking concordance lines when necessary?

Further reading

Sinclair, J. (2003). Reading Concordances: An Introduction. Harlow: Pearson/Longman.
Wynne, M. (2008). Searching and concordancing. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics: An International Handbook (Vol. 1, pp. 706–737). Berlin: Mouton de Gruyter. [pre-publication draft available online]