Blog Post, Process Blog

Four Thoughts on Working With Twitter Data

My digital dissertation on historical thinking, social media, and the digital age primarily utilizes Twitter data to answer questions about students’ understandings of the significance of the practice and content of history. Working with Twitter data is new territory to me but I have a few thoughts on the process of cleaning and organizing the data thus far.

1. Twitter is a Hydra

Hydra: A mythological beastie with many heads. When someone lops off one head, two more appear. Also apparently exhales poisonous fumes. Heracles (Hercules) was only able to destroy the monster with the help of his nephew, who cauterized the stump of each head to prevent new ones growing.

The Greek hero Hercules battles a many-headed, fire-breathing, serpentine monster called the Hydra
Hercules, John Singer Sargent [Public domain], via Wikimedia Commons
I didn’t know Twitter was going to be a Hydra, partly because the initial collection of tweets was super easy thanks to Ian Milligan, who generously set up and hosted a dnflow server for me this semester.1 (Because digital humanities people are awesome about helping new-to-DH scholars realize their projects.)

My students and I used a class hashtag (#hwc111) to organize our tweets and, once a week or so, I entered the hashtag into a search box on dnflow. The program created analytics regarding the most popular tweets, common images, and the number collected and I downloaded this data into a neat Google Drive folder.

The original data set was comprised of 10,486 tweets – but I knew that wasn’t all of them. Dnflow had trouble collecting retweets 2 and quoted retweets 3. Plus no one (myself included) tags their tweets perfectly all the time.

My initial, optimistic workflow looked a bit like this:

A young Bette Davis walks through a door, closes it, and collapses in hysterical laughter

  1. Compile all tweets from dnflow requests into a single spreadsheet.
  2. Review individual feeds.
  3. Add missing tweets.
  4. Categorize tweets based on which question in class, if any, the tweet responded to.

Simple, yes? Hilarious is more like it.

The second task “review individuals feeds” became an additional four sub-tasks and “add missing tweets” turned into adding not only un-tagged tweets, but also all replies by students because I decided halfway through that I wanted to explore whether a network existed among students and, if so, what it looks like. I also added new tasks as I started reviewing the data, such as creating a column to describe the media (GIFs, images, quotes from class readings) attached to tweets.

For this sort of work, the experience and assistance of other people is clearly beneficial. Something like Jessica Otis’s workflow for examining a conference network with Gephi would have been exceptionally helpful at the start of this process (and certainly will be helpful as I experiment with Gephi). To that end, I’ve documented the workflow that emerged for me (really, it’s more of a task list), available via Google Drive. Ideally, this will help other Twitter -data newcomers avoid similar pitfalls in the future.

2. How to be a historian who thinks with machines?

I ultimately added 1,671 tweets to the original 10,486 – about a 16% increase in the data set. I’m not sure yet whether or not this is a significant amount. (Though I’m sure some students or colleagues with working knowledge of statistics can tell me…)

I’m used to thinking like a historian in an archive, where documents are rare and every particular piece of evidence matters. This isn’t that kind of project, though. Instead, I’ll be visualizing and analyzing the contents of hundreds of tweets at a time. Will 16 extra tweets make a difference when analyzing a batch of 100?

My guess is that the additions may not make much of a difference to any text analysis involving large segments of the data set. Added tweets might, however, make a difference in the composition of the network of students. The new tweets might also contain some zingy and insightful quotes that allow me to make a point with a bit more panache. Like this one from a student processing the perspective and bias of the Greek historian, Herodotus.

I suspect I’ll return to the question of how much data is worth saving, adding, and exploring. This is an important question in the broader practice of digital history. How should digital historians balance a disciplinary preference for the particularities of individual documents with a methodology that requires setting aside the particular, at least initially, in order to extract generalizations from massive sets of evidence? What will that look like for this particular project.

3. Backtracking is disheartening, but necessary.

While reviewing the individual Twitter feeds created by my students, I came up with a clever idea I believed would expedite the review process: Save the missing tweets to a Twitter Moment a place to store and then return to record the retweets after completing the initial review of the feeds.

 

I tried this method over the course of two individual feeds and it seemed to be working. I also double checked Twitter’s support pages to ensure there was no limit to the number of tweets one could add; no limitation was obviously stated. I continued adding tweets to the moment as I reviewed my next 30 feeds and then proudly showed off my “Tweets to Add” Moment Peggy Olson from Mad Men slowly lowers her head to a desk in frustration.to my supportive spouse – at which point I discovered that approximately 50 of the most recently added tweets were in fact saved to the Moment.4

Ugh.

Paige Morgan and Yvonne Lam, who led the Intro to Data Wrangling workshop at the Digital Humanities Summer Institute, warned participants that starting over is always a possibility. Paige also noted in a recent talk/blog post: “I say that I work with data, but in some ways, it feels more accurate to say that I work with various types of mess.”5

I wholeheartedly agree with acknowledging that starting over happens and that data is usually some type of mess or another. And I suspect this won’t be the last time that happens. Backtracking is disheartening and time-consuming and that emotional toll perhaps could be better acknowledged in DH work – even if it’s a necessary part of the messy digital process.

4. It’s okay to leave some things for later.

In a recent chat with Veronica Armour, an Instructional Designer at Seton Hall University, I asked her what project management training she acquired before moving into her current field of work.6 Her answer was “not much” (which seems quite common), but she did recommend some online courses – particularly those that favor “agile management” over “waterfall management.”

Streams from a waterfall run down a bright green cliffside.
Sean MacEntee, “Waterfall,” CC BY 2.0, via Flickr

My understanding of these models is quite basic, but here’s the heart of it. Agile models make it possible and even desirable to move forward even if all the pieces aren’t yet in place; the object is to continuously work toward goals by testing, seeking feedback, and testing again as new information or materials become available. Waterfall models, by contrast, require everything from one step of the project to be completed before moving onto the next.

I’m definitely a “waterfall” person when it comes to my own projects. I don’t like the feeling of incompleteness and I prefer to explore every possibility before moving on.

But that is shaping up to be an ineffective way to work with data – especially data that acts like a Hydra. With the next few stages of the project, then, I’m hoping to become more okay with leaving things for later.


Notes:

Blog Post, Process Blog

#dh/#dhist: Party of One

Republished from Storify. This sentence is the link to the original publication.

A quick clarification of terms (with thanks to friend and colleague Paul McAfee who asked me to define on FB):

  1. DH is “digital humanities” – a broad umbrella term for projects that utilize digital tools and processes to forward research in one or more of the humanities fields.
  2. dhist is one of many hashtags for “digital history” on social media. Digital history is a subset of digital humanities in which digital tools and methods are used to explore historical content.
  3. Imposter Syndrome is the feeling that you aren’t actually capable of doing what you’re supposed to be doing/have chosen to do – and that sooner or later everyone else will figure that out too. It doesn’t always (or even often) mean the person doesn’t know what their up to – it’s just a persistent, nagging voice in the head to the effect of “not good enough.”
Blog Post

“This is hard. It isn’t very linear.”

Class Blogging and Non-Linear Storytelling

The World Civ I classes I teach are embarking on the final stage of their blogging project this semester. This is a thoroughly self-directed project. Students can choose any topic within the time frame of the course (10,000 BCE to 1500 CE) and they can present their topic however they choose. Thus far, I have tentative proposals for Instagram feeds, Pinterest boards, Tumblrs, fashion videos, and songs and I am pretty darn excited to see where things go.

I also have a few groups clearly struggling to figure out how digital storytelling works. During our last workshop for the blogging project, I was working with one group to define a topic, a takeaway, and a creative medium they would be comfortable working with. In the midst of the conversation, one member of the group encapsulated the difficulty of pinning down this project. “This is hard,” they said. “It isn’t very linear.”

That, I think, is exactly the difficulty of creating good digital material. It isn’t especially linear and when you’ve really only been taught to think of writing in linear ways (intro, thesis, body, conclusion), it can be incredibly difficult to think about organizing information in a way that is connective but not linear.

The student’s comment prompts a number of questions for me (which I’ll record here in the hopes that I can come back to them sometime):

  • What’s the purpose of trying to think in non-linear ways if it feels so unnatural?
  • How can I teach non-linear and creative thinking?
    • Is there more prep and introduction I can give students to this sort of task
    • Undoubtedly, yes – but what prep should I give? The 5 Photos exercise might be a good place to start…
  • How do I help students locate models and assistance (outside of the class and myself) for trying to think and create in new ways?

The Dissertation Project and Boundaries of Storytelling

The last question is especially pressing for me as I try to work out the purpose and shape of my dissertation project. My project currently centers on how students understand and express the importance of a particular person, event, or idea in history. My working hypothesis is that the default definition of “historically significant” for most students at the start of a class has to do with whether or not something or someone is relatable.

I suspect this definition is, at least in part, a product of the pervasiveness of social media platforms that encourage us (the students and myself) to react to or comment on everything. I’m wondering if this preference for interactive material prompts us to consider our own reactions to content as co-equal in importance to the content itself. I think that might cause us to filter all information (past and present) through that question of whether or not something is personally relatable.

This is all very tentative stuff at the moment. In order to understand whether that hypothesis is reasonable, I need to ask students what they actually think about history and how they use digital media. No surprise there; lots of historians concerned with what students think about history or digital media have asked them before.

I think, though, that I’d like to play with the linear way historians usually ask students what they think about history and digital media. In academic work about historical thinking (see the work of Sam Wineburg, the edited volume Knowing, Teaching, and Learning History, or the Perspectives series, “Thinking Historically in the Classroom“) or digital media (I’m thinking especially of Henry Jenkins and Mills Kelly here), the research model is usually pretty traditional. The researcher formulates a hypothesis, designs a study, collects the data, analyzes the data, and publishes her/his findings. In the case of works about pedagogy, the author of a book or article might simply reflect on what they’ve observed in their classes.

In either case, the researcher has the final word when it comes to interpretation. That makes sense given the short-term nature of many studies and the clear knowledge difference between, say, primary- or secondary-school students and a researcher with a Ph.D. This process also produces perceptive frameworks for thinking about how people think about history, many of which as a springboard for my own work, so I’m not by any means trying to overturn this model altogether.

I am wondering, though, if it might be worthwhile to interrupt the linear research model by asking for feedback from participants about the conclusions of a study. I’m planning to work with adults (most of whom aren’t much younger than me) who possess the self-awareness to tell me if I’m misinterpreting their written or spoken responses in an activity. I’d love to make them collaborators in this process and seek their input throughout the project. I am also unsure how to accomplish that in meaningful ways.

I currently lack a model for that sort of collaboration and there are a lot of questions I would need to address to do this sort of work:

  • Would students (participants even be interested in providing feedback about conclusions?
  • What would it take to get students to agree to provide regular and helpful feedback about my work? Would I need to incentivize their participation bribe them with extra credit or the possibility of putting a line on their CVs as collaborators or research assistants of sorts? What are the ethics of that?
  • How would I incorporate their feedback? Who gets the final say about the interpretation of a set of data – the students who provided the data or me as the researcher?
  • How might student feedback shift the trajectory of the research or the shape of activities that are part of the research? [That last question matters immensely since I need to seek the approval of an ethics committee – an institutional review board (IRB) – for every aspect of the project.]

These are hard questions. I don’t know if that will possible to come up with good answers in the progress of this project or if I will be able to put any ideas about these questions into practice. I do think I agree with my student about the difficulties of this unfamiliar territory: “This is hard. It isn’t very linear.”

Blog Post

Things I Found This Week: Dissertation Distractions Edition

The best/worst part of a dissertation project involving social media and technology is the constant discovery of blogs, online journals, and tech tools for teaching and researching. After reading any book or article, after every discussion with a peer or scholar, I’m left with a lengthy list of new resources – which at the moment seem far more interesting and exciting than the harder work of sitting down to, you know, actually think, read, and write about my topic. For instance:

I’m itching to play around with Omeka and Scalar, two resources a fellow Drew student, Jessica Brandt, was good enough to alert me to.

Omeka looks like a more traditional blog/website platform, but it’s designed to assist scholars (amateur and professional) and institutions in creating top-notch online archives, exhibits, and narratives. The platform allows for beautiful image collections, searchable tags, interactive images, and customizable themes, fonts, etc. The software is free and open source and looks like a powerful story-telling tool.

(What Is Omeka from Omeka on Vimeo.)

Scalar is exciting for the way it allows scholars to structure their narratives in fully digital ways. The platform’s purpose, according to creators, is to give “authors tools to structure essay- and book-length works in ways that take advantage of the unique capabilities of digital writing, including nested, recursive, and non-linear formats.” Authors can create multiple paths through the same project, tag paragraphs and sections to create relationships throughout the “document”, and insert multimedia content related to the text portions of the project. My project could, I think, benefit from all of these possibilities and I’m excited to give it a test run once the topic is a little more structured.

(Scalar Platform — Trailer from MA+P @ USC on Vimeo.)

I’m also super tempted to enroll in one (or…all) of the online courses offered by Hybrid Pedagogy. The upcoming course topics are “The Flipped Classroom,” “Teaching with Twitter,” and “Learning Online” – all topics of interest to me and all for very reasonable prices ($250-350, with discounts for adjuncts and students). Unfortunately, I’m pretty sure that adding an online class to the mix of dissertation prospectus + two classes + training for half-marathon would turn out to be a little much… Alas. I’ll just have to keep an eye on the offerings in the Digital Pedagogy Lab in the future.

Two final resources/projects on my radar this week: Educause Review (my thanks to Gamin Bartle for this one), which looks to be full of all sorts of thoughtful pieces regarding technology and the digital age in and education, and the Wikipedia page for the feminist sci-fi film, Advantageous. The film is gorgeous and provocative and made my cyborg-loving self terribly happy. The Wikipedia page doesn’t do it credit – by which I mean the information is super basic. So I added a link for one of the actresses last night and today (or tomorrow or whenever I decide to neglect other work), I’d really like to add a plot summary or something about critical reception, and them maybe begin work on pages for Freya Adams and Samantha Kim. I’ve been meaning to make a foray into editing Wikipedia for awhile and this seems like a good place to start.

Leave a comment if you’re using similar resources or if you want to talk ed-tech, social media in the classroom, or dissertations. Or anything else. Now to the real work of the day for me – updating my class website to include the new syllabus and a list of topics for the semester’s portfolio project. (Will provide links for those once they’re ready to go…) I also need to finish off an e-book of last semester’s blog posts for my students – it’s the closest I can get to preserving their work for the moment and that task has been on the back burner for far too long.