Coming up: Corpus Linguistics 2017, Birmingham

Just a quick update: the registration for the CL2017 conference at the University of Birmingham (24-28 July) is closing very soon on 30 July.  [I shall admit openly that I’m part of the organising committee and therefore advertising it – but genuinely think this conference will be a good one!] This conference series is one of the biggest corpus linguistics events in Europe that runs bi-annually and has been hosted at the universities of Birmingham, Lancaster and Liverpool. The CL2017 programme contains streams related to a variety of CL applications. Of particular interest will be the plenary papers by Susan Conrad, Andrew Hardie, Susan Hunston, Christian Mair, Dan McIntyre and Mike Scott.


Free teacher workshop: corpus stylistics for the English classroom

I have recently started working as a Research Fellow on the CLiC Dickens project at the Centre for Corpus Research, University of Birmingham. The main focus of this project is the custom-developed CLiC web app, which allows to use corpus tools – i.e. search, concordance, find clusters (repeated phrases) etc. – in Dickens’s novels and other 19th century fiction.

Next week the CLiC Dickens project is hosting a free workshop for English teachers (and those interested in/researching teaching methods for literature): ‘Corpus stylistics for the English classroom‘ at the University of Birmingham on June 16, 2017. If you’re interested, please do check the event link. Registration is easy & free via email (to me) and refreshments will be provided :).

You can also check out some of the CLiC functionality in this recent video tutorial that introduces the CLiC KWICGrouper; a new approach to sorting concordances! (Read my previous blog post for more information on reading, sorting and analysing concordances.)

As the CLiC Dickens project is about corpus linguistics and meaning, the work is pretty ‘close to home’ (it’s also physically in the same department) in terms of my previous work. At the same time, there are some new directions in it for me: corpus stylistics is concerned with meaningful patterns in literature (mainly, anyway) and this is quite different from my PhD research which looks at non-fiction (academic writing, blog posts and newspaper articles). Moreover, the CLiC project combines its corpus stylistic approach with ‘cognitive poetics’, which is another really exciting direction.

Quick tip for R: How to save your dataset in a native R format for future work

This is a note to self more than anything else, but maybe someone learning R out there finds it useful, too.

I lost some time recently because I kept running R results and only saved them as plots and csvs. As I’m on a budget Macbook with limited memory I can’t keep many results loaded in R (it all stays in memory). Now if I want to go back and change a plot, for example to make it prettier in terms of its dimensions or to add a title or even to filter the data that goes into a subplot… I’ll have to rerun the results.

Saving the results in a csv file is good for future reference, but won’t help with the issue that we can’t easily (?)  recreate the results from it. It seems far easier to save the actual R data in an R format.

In fact, my ‘statistics and programming colleague’ has been providing such R files in the ‘RDS format’ for our project to save my time of running them, but giving me the chance to select my own subsets for plots. I’m a bit gutted that I didn’t realise the potential of this function for my own work until today. (I am having to rerun the results in order to create nicer plots; but then it’s also better for archiving the results in an R format than only in csv, I suppose, because things do change or I might find mistakes in my methodology later…).

In order to create an RDS file you have to use this function (from the R documentation; for me its usually sufficient to simply name the object the file path):

saveRDS(object, file = "", ascii = FALSE, version = NULL,
        compress = TRUE, refhook = NULL)

For the technical details you can refer to the R documentation linked above or this post that explains the difference between ‘saveRDS()’ and ‘save()’ in more detail. In a nutshell, ‘save()’ apparently saves the object with its name. So, if my original results were called ‘results’ and meanwhile I had created another object called ‘results’ I’d have a problem when I loaded the saved version. With ‘saveRDS()’ we don’t have this problem.

Hopefully, this post can be of use to some of you (obviously check what’s most helpful for your work). I’ll start saving all my important R results in this format 🙂

On analysing concordance lines

I start this post by giving a very quick introduction to concordances. If you are already an experienced corpus linguist, you can skip to the final section on categorising concordance lines. I am curious about your own practices for analysing concordance lines: do you print them out and highlight the different patterns? Or do you annotate the lines electronically, using a concordancer or a spreadsheet? Is there any other option that hasn’t occurred to me yet?

The basic display format in corpus linguistics

In the past year or so I was pre-occupied with relatively abstract, ‘big picture’-style analyses of my corpus (basically key key word and collocation analysis), but now I have come across a theme for which a smaller-scale, qualitative analysis is more appropriate. (Once I’ve wrapped it all up, I hope to share some insights. Or you may have to wait for my thesis to get done …).

For me as a corpus linguist, the go-to tool for any qualitative investigation  is the concordancer. As the name suggests, it produces concordances. A concordance is the basic display format in corpus linguistics that lists snippets of the text, illustrating the use of a particular word or phrase in a corpus. Concordance analysis has brought the discipline a long way, especially when Sinclair developed very systematic ways of analysing concordance lines for making dictionaries. (Sinclair’s guidelines are recorded in his book Reading Concordances; it’s a shame that Google Books has no preview …).

Consider this quote from Martin Wynne’s (2008, online) handbook chapter on concordancing:

For many linguists, searching and concordancing is what they mean by “doing corpus linguistics”.

The way we read concordance lines is quite different from the way we read a text. This  vertical reading may take some time getting used to. Here’s an example, concordance lines  for language on WebCorp:


You can also use WebCorp to produce concordance lines from the web; or you can access corpora that are available online with integrated concordance functionality, such as the BNCweb or the  BYU corpora. (If you want to run concordances on specialised subcorpora on the BYU interface, you might be interested in the slides and the handout from my session at the University of Birmingham Summer School in Corpus Linguistics this year).

Of course, we often want to use corpus linguistic tools on materials that haven’t been made widely available, because it is often necessary to prepare a corpus from scratch for a particular research question. To create concordances for your own texts you using concordancers like AntConc and WordSmith Tools (which you could buy if your institution doesn’t  have a license).

What are your personal preferences for analysing concordance lines?

Concordance analysis is all about viewing a word (or phrase) in its co-text to identify any patterns in the way it is used. It’s often helpful to resort the concordance lines. Concordance tools usually let you resort based on the surrounding words (in positions 1-5 or more on the left and right).


According to Martin Wynne (2008, online),

[t]his type of manual annotation of concordance lines is often done on concordance printouts with a pen. Software which allows the annotation to be done on the electronic concordance data makes it possible to sort on the basis of the annotations, and to thin the concordance to leave only those lines with or without a certain manual categorisation.

Personally, I usually start with a print out of the simple concordance lines. Then, once I have identified some simple categories I often move on to an Excel spreadsheet. I like being able to add columns for categories (I should just not overdo it, like in the photo…). Moreover, in some versions of Excel, it is possible to select and change the font of particular words in the same cell (seems to work on Excel for Mac but not for Windows). That way, I can highlight the word or phrase which prompts the category for the concordance line. It is also possible to assign a concordance to particular categories.


wst_set_coloursSome concordancers provide functionality for categorizing concordance lines. In WordSmith Tools it is possible to assign categories (‘sets’). I have only recently tried this function and I’m quite impressed with the range of colours that are available, which you can see in the screenshot on the left. More information is available from the manual. BNCweb also provides a (simple) categorisation function with up to 6 categories. In the example from the screenshot below we would distinguish between can as the modal verb and can as the container for a drink. Of course, the modal is much more frequent (in general language usage, not in a text about coke cans…). Therefore all the example concordance lines represent the modal usage.


I am curious about these features and in how far people use them. If you don’t use these functions, how else do you categorise concordance lines? Do you do it manually, after printing out? In practice, how often do you analyse concordance lines? Are they quite important in your research or do you focus on more quantitative aspects, checking concordance lines when necessary?

Surveillance and religion workshop

This week I had the opportunity to join the first workshop of the Surveillance & Religion Network, organised by Eric Stoddart of the University of St. Andrews and Susanne Wigorts Yngvesson of Stockholm School of Theology. The ‘Surveillance, Religion, and Security – Workshop One’ took place in Birmingham from 17-19 October and was the first of several events for which the organisers secured funding from the AHRC.

My PhD research broadly deals with two areas: corpus linguistic methods (how can we identify patterns of meaning in a discourse?) and their application to surveillance discourse (how is the concept of surveillance discussed in different domains of public discourse?). In the first two years of my PhD I spent most time focusing on the methodological concerns. How do I collect relevant texts? How do I need to process these texts? What corpus linguistic methods are out there? How have other researchers applied and developed them? Which methods are most suitable for my project? As I was dealing with these questions I mainly talked to other linguists.

However, I haven’t engaged much with the other relevant group. So, when I saw the CfP for the first workshop from the Surveillance & Religion Network, I considered this a good chance to initiate some dialogue with surveillance studies scholars. They, I thought, would be more interested in the theme rather than the method and would therefore be able to give me more feedback with that regard.

This photo of the refreshments provided at the workshop, in my view, is a good representation of the atmosphere throughout the event: friendly and familiar (the cookies were actually really good!)

Once the event had started, I was happy to discover that my nervousness to attend the event as a linguist was unnecessary. The atmosphere at the workshop was very friendly indeed. Attendees came from very mixed backgrounds: academics (sociology, theology, education, archaeology, linguistics), practicing clerics and even police.

We thus had an insightful programme full of different perspectives on surveillance. My personal highlight was the public lecture by Professor David Lyon, the director of the Surveillance Studies Centre at Queens University, Canada. The lecture was entitled ‘Why surveillance is a religious issue’.

Professor Lyon emphasised one point that was also voiced throughout other sessions in the programme: the increasing ‘surveillance culture’ promotes a climate of suspicion which can only be overcome through a promotion of trust. In Lyon’s view, while surveillance practices can reinforce the marginality of minorities, religious institutions are in a position that allows them to promote trust and hope. Lyon was particularly keen on promoting the idea that we should not give up on our agency, which, as he argues, is in line with the teachings of Abrahamic religions. Indeed, there are small steps we can all take in promoting trust by, for example, campaigning for less surveillance at our workplaces or encouraging our software-developing friends to collect less consumer data. The public lecture was recorded and the audio will be made available soon. (I will post the link here once it is live.)

I have just given the example from David Lyon here, but throughout the workshop we also heard about many ways in which religion and surveillance can be related. For instance, the metaphor of the ‘divine gaze’: how God, in Abrahamic religions, watches over the people. My own contribution was, obviously, linguistic in nature. I presented work related to my PhD thesis: a corpus linguistic analysis of religious themes in surveillance discourse in the academic journal Surveillance & Society and in a collection of blogs. I enjoyed meeting this group of scholars and practitioners who share an interest in surveillance and its social consequences. They also reassured me that my research is of interest for them, as there is not much dialogue between surveillance studies and linguistics.

If you are curious about the relationship between surveillance and religion in particular, you might be interested in the next event by the Surveillance & Religion Network. The ‘Religions Consuming Surveillance – Workshop Two‘ is taking place from March 20 – 22 in Edinburgh and the deadline for the CfP is 15 December. Should you have any experiences related to the theme of surveillance & religion or interdisciplinary encounters, I’d be curious to hear about these in the comments!

Update 26 October: I just found another blog post about Professor Lyon’s public lecture by the organiser of the Open Rights Group Birmingham, Francis Clarke. His attendance (and participation in the question session) is a good example of how academics and public groups, particularly activists, can engage with one another.

University of Birmingham Corpus Linguistics Summer School

This week (20 – 24 June 2016) a corpus linguistics summer school took place at the University of Birmingham Centre for Corpus Research. I was fortunate to be involved in the event.
The schedule was tight, but it seems to have been well worth it, as these tweets from participants suggest:
The full virtual Twitter conversation from throughout the week can be found under the hashtag #ccrss16.
Topics ranged from multiple facets of corpus statistics and their applications in R to Sinclairian lexical items, corpus stylistics and translation studies, specialised corpora and an introduction to Python for corpus linguists. The workshops and talks were held by Johan de Joode, Stefan Evert, Chris Fallaize, Matt Gee, Stefan Th. Gries, Nicholas Groom, Susan Hunston, Andrew Kehoe, Michaela Mahlberg, Lorenzo Mastropierro, Florent Perek, Simon Preston, Pablo Ruano, Adam Schembri, Paul Thompson and I. While most of us are based at UoB, it was great to have colleagues from other institutions and even from abroad join us to share their expertise.
My own session was inspired by a talk from Mark Davies at the ICAME 37 conference (Chinese University of Hong Kong, May 2016), where he demoed the new ‘virtual corpus’ feature on the BYU corpus interface.[Click on the links for the PDF versions of my presentations slides and the handout of my session].
Personally I enjoyed this week of intense exposure to different aspects of corpus linguistics. Full-week events like conferences and summer schools can be quite draining as you have to be ‘always on’, responding to new contents and people. However, the learning hopefully makes up for that.
The joy of moving on to the next chapter

I’m very happy to share the news about moving on from my first analysis chapter (Chapter 4 in the thesis).  On January 31  I was already sharing my frustration about writing this chapter and now, exactly two months later, I finally have a full draft. Actually, I’ve been sitting on this draft for a while with only a few paragraphs that needed reworking or were still in the shape of bullet points. In the mean time the text has been part of various different documents/files. The screenshot here displays the metadata of the current file. I know it’s at ~ 17,000 words too long for the final chapter. Now this number includes tables that I might shorten/delete/move to the appendix in the final thesis. The document also has a rather long background and methodological section which I might have to move to the background and methodology chapters of my thesis at a later stage.

Screen Shot 2016-03-31 at 16.11.28.png

For now, though, I’m just really happy that I was psychologically able to call it a ‘full draft’. This means I sent it to a friend today who will have a look at it and give me some comments. She’s also a linguist, but works in a different subfield. I need some distancing from this text and – as I’ve been feeling quite insecure – either some confirmation that it is an okay text or some advice on what is needed to clarify things a bit. I won’t go back to this until late April or early May, though.

I think that having worked on this chapter or preparatory stages for it since September has been too long of an intense period of thinking about this particular aspect of my PhD. My supervisor has been urging me to move on and today I finally felt ready to let it go. I know that it’s nowhere near the shape that I need it in for my final thesis. Some references aren’t probably as relevant as I first thought and others are lacking. The argumentation may not be clear enough. But I am moving on to the next stage of my analysis where I’m applying the same method to a different dataset. I am sure this will also give me more ideas for the analysis of the first corpus.

Best of all, I can feel some enthusiasm again! Have you felt tired about any of your chapters? Did it help to move on to something new and return to the work after a couple of weeks? Or have you found it most useful to fully finish one chapter/study before starting something else?

Little cartoon sharing at the end of the leap day (just for fun)

Everyone loves phdcomics, right? They even get included in Grad School workshop presentations…

Lately I’ve come to admire another source of grad student/ academic comments though: Have a look at A Prolific Source by Belle Kim, will you? I think you might enjoy it 🙂

Belle Kim’s cartoons are just lovely and they often strike a chord with me. I also like her approach that drawing can help you stay sane. It made me want to start, too. Now here is a very poor first draft. (I HAVE drawn other stuff recently but it’s too cute and non-academic; Chinese-style stickers from WeChat… and I have also jumped on that colouring book bandwagon). Anyway, not trying to do anything professional here, hence also just a cellphone picture, no scan. A really quick drawing to share an anecdote from the end of my leap day.


[By the way, it’s March now! oO *ahhhhhh* *heeeeeelp*)


Leap day = thesis day?!

It’s Monday morning and I should be full of joy about the opportunities ahead. Not only is a new week starting, but today is leap day – what a rare chance to have a leap day during the PhD! (Is it?) Somehow all I want to do is crawl to bed though.

BUT I saw a tweet just now saying “What Leap day means for me: an extra day of thesis writing.. ” (by @A_GowardBrown). I liked that attitude and that got me thinking that I ought to be more positive! After all, the sun is shining here in the English Midlands, I don’t have any appointments or teaching commitments today and I don’t need to sit on a train for hours. All of these rather rare events coming together seem to make this leap day really special with an extra few hours for me to get that chapter draft fixed.

By the way, I wish I knew how to add the tweet here looking like properly embedded, like a clickable screenshot. Does anyone know?

I don’t have energy for checking now – and it would only be procrastination anyway. So what I’ll try to do is to pretend I’m attending one of the lovely ‘Shut Up & Work’ events at my Grad School’s PGRHub, with a self-enforced schedule and tasks for every working session and plenty of breaks with biscuits and coffee. Perhaps I can move the afternoon session to a cafe.

Happy Shut Up and Working 😉


Flying (and floating) like a kite


Just a some quick sharing today. First of all I’d like to thank everyone who read and commented on yesterday’s post on my feelings related to writing the first analysis chapter. It really feels great to hear back from people who have been through this already or are going through the same sort of thing.

So far I still feel a bit lost – and today some other annoying bits like problems with technology and bureaucracy were added to my plate. It doesn’t help, either, that I’ve some other deadline coming up … in theory it’s all very exciting only right now it doesn’t seem to be working quite ideally just yet. But I’ll try to hang in there and follow everyone’s advice to just try and get something ‘down’.

For now I just wanted to share this silly little drawing. I mentioned this simile to a friend recently (who is also a PhD student) and we got some fun out of it. We sometimes really feel like we’re flying (or floating) in the wind, sometimes way too far into one direction (or so it seems). Then at some point our supervisors may try to pull us back. At the moment I can feel lots of forces pulling on my line. But I do hope that something will pull me back to more familiar heights or grounds so that I’ll feel more comfortable soon. If you can relate, I hope you’ll feel that soon as well. Or perhaps you’ve already gotten into this kite thing – in that case happy flying :)!!!