Posted in academia, programming

Trying to take up a coding mindset (as a linguist)

Blog_coding_20150625
I originally tweeted this photo on the morning when my second R book by Stefan Gries arrived at the same time as the Bloomberg code issue… was that a sign? I haven’t gotten around to starting through with the book yet though, as I am still working through the first one!
I am currently attempting to learn something about the programming language R. Why? Is that even a good idea?

At a few points during the past few years I have considered whether I chose  the wrong degree(s). My BA degree was called “English Studies for the Professions (BAESP)” and I really enjoyed it and found everything interesting. At the same time I wanted to get more involved with research and see how linguistics can get really useful. So I moved on to an MA in Applied Linguistics and finally a PhD in the same field. I am really interested in linguistics and think it is a worthwhile area. BUT at times I wonder “Why didn’t I study computational linguistics?” Since my research deals with corpus linguistics this is actually not so far of a stretch. The problem is that I don’t seem have a computational mindset… So far the only type of computational stuff that I can more or less deal with is interactive. During the MA we did some work with the statistical package SPSS which used to be command-driven but now has an interactive interface. For corpus linguistic analyses I have used WordSmith Tools, AntConc and SketchEngine, which are all more or less user-friendly. If anything I get confused by too many buttons and settings that are offered.

When and how did I decide to do something about my non-computational situation?
I have been playing with the thought of getting a little bit more tech-savvy (and at the same time brush up on my understanding of statistics) for a year or so. Throughout my studies I have simply come across so many studies where people do more interesting stuff than I seem to be able to do because I don’t know how to make something like that happen. An example is a Twitter study that I already quoted in my BA project (which was also about Twitter). For my own project I used an online tool (at the time it was called TAGS v3 now there is TAGS v6) to collect a limited number of Tweets, leading to a small corpus. Michelle Zappavigna (@SMLinguist), in her book Discourse of Twitter and Social Media, however, had access to the infrastructure and support necessary for downloading and compiling a large Twitter corpus containing over 100 million Tweets. She used a Python script and the Twitter API. At that time I thought that I’m never going to be able to either do this myself or have the required technical support. While I still don’t know how to do this my attitude has changed slightly. I’m lucky to be cooperating with people from statistics and programming for a project coordinated by my supervisor. This regular interdisciplinary contact has taught me there are things that seem infinitely difficult to me but can easily be done by others in a short amount of time with a few lines of code. Moreover, the cooperation is gradually showing what kind of things are actually possible with programming. In the meantime I have been wondering whether or not it is worth investing time and energy (and money I guess) for learning some baby steps in programming when there are so many experts out there? Well, I don’t know, but I am trying to regain some control over my work…

Here are some interesting view points on coders and coding expressed by Paul Ford in that recent Bloomberg code issue:

Coders are people who are willing to work backward to that key press. It takes a certain temperament to page through standards documents, manuals, and documentation and read things like “data fields are transmitted least significant bit first” in the interest of understanding why, when you expected “ü,” you keep getting “?”

[Paul Ford, What Is Code?, Bloomberg Special Double Issue June 15-28, 2015, print p. 24 (digital – free & with really cool animated visualisations! – Section 2.1)]

Regarding the question whether or not to learn coding, Ford says:

There’s likely to be work. But it’s a global industry, and there are thousands of people in India with great degrees. […] I’m happy to have lived through the greatest capital expansion in history, an era in which the entirety of our species began to speak, awkwardly, in digital abstractions, as venture capitalists waddle around like mama birds, dropping blog posts and seed rounds into the mouths of waiting baby bird developers, all of them certain they will grow up to be billionaires. It’s a comedy of ego, made possible by logic gates. I am not smart enough to be rich, but I’m always entertained. I hope you will be, too. Hello, world!

[print pp. 109-112, digital Section 7.5]

Personally, I don’t think I can now start to become ‘a real coder’ and ‘compete’ with all those computer science graduates and other professional coders. BUT, the whole thing seems fascinating and if I know a little bit some light might be shed on so many areas that are still dark for me.

Why R?
I saw info about the ‘Regression modelling for corpus linguistics‘ workshop by the linguist Stefan Gries (held in Lancaster, 20 July) and knew about his books (Quantitative Corpus Linguistics with R – QCLWR – and Statistics for Linguistics with R) so I finally decided to buy them. That’s really the main point for me. [By the way, in the book, Gries argues that R is particularly well-suited for corpus linguistics…] While I know other resources are available, such as MOOCs (I even attempted a MOOC on R but dropped out), I need to see something that’s relevant to my own research (the R MOOC I attempted used data from biology, I believe). Having said that the MOOC introduced a neat little learning environment called ‘Swirl‘ which allows you to “learn R, in R”. I might go back to that at some point. Actually, it’s even hard for me to get through the first 100 pages of Gries’ QCLWR because it’s about the basics with few linguistic applications. But I try to motivate myself to continue by flipping beyond the 100 pages now and then because  I can see that soon I’ll be soon (hopefully) able to apply those basics to linguistic problems (I’m almost at page 96  now – yay!). So if someone had made a book about Python for corpus linguistics (is there one?), I might have gone for that, because I didn’t really know anything about which language is best to know. However, I am looking forward to a session at the Nottingham Summer School in Corpus Linguistics entitled ‘Essential python for corpus linguists’ run by Johan de Joode.

My main problems so far
Unfortunately, I am still lacking the coding mindset, but I hope that will change after working through the second, more applied linguistics part of QCLWR. I haven’t done proper math since high school and this step-wise logical thinking about embedding logical/ regular expressions and loops and variables and whatnot all feels a bit foreign to me. More often than not I can’t follow the examples at first sight (usually because I have missed a parenthesis somewhere…). Just have a look at an example of the lines that I have been trying to work through… (Gries, 2009: 89):

gsub(“(\\w+?)(\\W+\\w*?)\\1(\\W)”, “\\1\\2\\1\\3”, text, perl=T)

Trying to keep track of everything that could be potentially useful in my copy of QCLWR with sticky tags.
Trying to keep track of everything that could be potentially useful in my copy of QCLWR with sticky tags.

I also have difficulties with remembering function names and their argument structures and, worse still, I can’t really follow the R/ RStudio help entries about the functions. The biggest problem is that it takes me ages to go through the tutorial in Gries’ QCLWR. There are still more than a hundred pages left including masses of exercises and assignments and the second book (Statistics for linguists with R) is still waiting for me… Obviously this is not even the only task I’m supposed to be doing for my PhD at the moment…
On the bright side, though, I am slowly starting to feel more comfortable staring at condensed strings of digits and characters and slowly picking up the ability to analyse a command string step by step. Once something does work it really delights me.

What are your experiences with starting to code? Do you think it’s worthwhile to invest in these skills? Which programming language are you learning and why? [And sorry for turning this into such a long post…]

Advertisements

Author:

I am a PhD student of Applied Linguistics and specifically looking at surveillance discourses. I want to find out how surveillance is talked about in the media (i.e. specifically newspapers). Moreover, I hope to see differences and similarities in the ways that journalists and academics (surveillance scholars) present the concept of surveillance. Hopefully my work will be relevant to various disciplines. I take a corpus linguistic approach and hope that my research can also contribute to methodological developments in this field.

6 thoughts on “Trying to take up a coding mindset (as a linguist)

  1. I know you mentioned you tried a MOOC on R, but didn’t get too far, but there’s another MOOC you might want to try. It’s on MITx and is called CS50x–it’s run by Harvard, and I believe it was the first ever MOOC on MITx, and one of the ones that helped to make the MOOC scene more mainstream. Because it’s been through a few iterations, they’ve really had a chance to adapt it and improve. I’ve tried a few MOOCs in the past couple of years, but this is the first (and only) one where I’ve thought WOW! and really enjoyed doing it. It’s not a MOOC specifically oncoding for linguistics, but it is an introduction to computer science. It uses C to begin with (as well as a few others later on in the course), which is an older language, but almost all of the concepts that they introduce and go over will be relevant to most if not all languages an applications. I think the additional resources and help videos that they have are really really excellent, and the professor, David Malan, is obviously an excellent teacher. The best thing about the MOOC is that it is open all year round, so you can start whenever. I think this is a really great course to help get anyone in the coding mindset.

    Liked by 1 person

    1. Thank you David that sounds really helpful, I will definitely have a look at it. It’s good that it’s open all year, as part of the reason I dropped out last time was that I missed the first two assignment deadlines and then just totally lost track…
      Sounds like that MOOC would also give me the chance to learn sth more general and not tied to only one language. Thanks 🙂 Just need to see how much time I can spare for it…

      Liked by 1 person

  2. Regular expression is a mini programming language in itself. It’s one of the most difficult part, in my own experience. For Python, there’s verbose mode, which is much much readable. Not sure if R has a similar mode.

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s