Blog Post

Searchable NPR Book Concierge: 2008-2018

Pile of books in shape of Christmas tree
Photo by Toa Heftiba on Unsplash

In this post:

Intro: A reading list for the new year

NPR Love, Meet Dissertation Data

Snapshot of the Last 10 Years: Visualizations

Conclusion

Tables


A reading list for the New Year

It is indeed still Christmastide! So I have a gift for everyone who has already blown through the books they found under the tree (or knows they will in this glorious lull between Christmas and New Year’s).

I web-scraped the NPR Book Concierge and Best Books of the Year pages from 2008 to 2018. And then I compiled them into a nifty spreadsheet that is searchable by tag and author or filterable by any category.

The spreadsheet contains 2323 books from the past 10 years of NPR’s “best of” lists. Search the tags to find new favorites in your most-read genres to take a chance on the “no-tag” books to discover something new.

For more on the process, keep reading the post. Though I understand if you go off and read the 2000+ awesome books from NPR’s Concierge instead :).

You can view the spreadsheet: 2008-2018 NPR Books or download a copy for yourself: Make a copy of 2008-2018 NPR Books

NPR Love, Meet Dissertation Data Skills

Since this year’s Book Concierge marks the 10th anniversary of NPR’s end-of-year recommendations, so I thought it’d be a fun side project to compile the lists from 2008 to 2018. This also gave me a chance to practice three of the digital humanities skills I’ve been using for the dissertation: web scraping plus tidy data and visualizations in R.

Web Scraping with OutWit Hub

First, I employed web-scraping using OutWit Hub. This is a method of harvesting data from a webpage by asking the scraping program to look for specific patterns in the HTML code behind a web page. 

Scraping the 2013 to 2018 pages was a breeze. The web app format NPR has used for the last five years is tagged well. For instance, <author> precedes the author’s name in the source code. The tags made it easy to identify common, unique patterns to plug into OutWit Hub’s desktop app.

Scraping the pages for 2008 to 2012, on the other hand, was a pain. (I’m still fairly new to web scraping and it stumped me.) The flow of the pages is inconsistent and I found it tricky to scrape just some links instead of all of them. Honestly, I ended up doing a lot of manual clean up, especially for 2008, 2009, and 2010. But okay. Worth it.

Tidying Data in RStudio

I also did some data tidying using the tidyverse and tidytext packages in RStudio. (I used Text Mining with R by Julia Silge and David Robinson as my bible for this work, as always.) Essentially, tidy data means that every instance of an observation has its own separate row in a table. For example, a book from the Concierge with two tags appears in two rows rather than one. (One for each tag.) To give a super simple example of untidy vs tidy data:

Figure 1: Untidy Data

AuthorBookTags
Anne LeckieAncillary Justicescience-fiction-and-fantasy,
it’s-all-geek-to-me

Figure 2: Tidy Data

AuthorBookTag
Anne LeckieAncillary Justicescience-fiction-and-fantasy
Anne LeckieAncillary Justiceit’s-all-geek-to-me

Snapshot of the last 10 years

I mainly tidied the data so I could show off some visualizations. ๐Ÿ™‚ Voila –  total counts for year and tag plus most used tags by year.

Total number of books per year

Graph: Total books per year

Column graph total books by year. See Table: Total Books by Year for accessible data.
See Table 1: Total Books by Year for accessible data.

Interestingly, a bit of a dip in the number of books in the concierge this year (2018). But otherwise the count keeps going steadily up. We should see upwards of 320 in 2019 at least.

Top 20 Tags (2008-2018)

Over the last 10 years, the “best of” lists utilized 74 distinct tags. This gets visually messy, so I’ve opted to show just the top 25 here. I removed the “no-tag” label, which applied to all books in the 2008, 2009, and 2010 “best of” lists.

Please also note there’s some overlap. “Staff-picks” and “npr-staff-pics,” for example, are essentially the same thing. Overall, 760 of the 2323 total book (37.72%) carry one of the “staff-pick” tags.

Graph: Top 25 Tags (2008-2018)

Column graph, top 25 tags. See Table 2 for accessible data.
See Table 2: Top 25 Tags (2008-2018) for accessible data.

Top 10 Tags by Year (2011-2018)

The most popular tags have shifted from year to year, with some tags fading out altogether over time. The facets below show the most popular tags per year from 2011 to 2018 (since no tags are available for 2008-2010). My apologies for the messy visualizations. I’m still working on how to sort the darn things properly…

Top Tags by Year. See Table 3: Top Tags by Year for accessible data
See Table 3: Top Tags by Year for accessible data.

Conclusion

This was a fun project for my winter break and definitely something I’ll use going forward. My next step is to start tracking which books I’ve read and locate the ones I most want to read in my local library.

Suggestions welcome for other improvements. Otherwise, happy reading!


Tables

Table 1: Total books per year

YearTotal Books
2008108
2009106
2010128
2011126
2012135
2013205
2014253
2015271
2016309
2017374
2017219

Table 2: Top 25 Tags (2008-2018)

TagTotal Books
staff-picks633
nonfiction395
the-dark-side377
tales-from-around-the-world368
realistic-fiction367
family-matters363
seriously-great-writing360
book-club-ideas314
for-history-lovers293
eye-opening-reads282
identity-and-culture272
ladies-first261
science-fiction-and-fantasy246
rather-long184
love-stories172
biography-and-memoir166
funny-stuff150
kids-books147
historical-fiction146
mysteries-and-thrillers143
book-club-ideas138
for-art-lovers137
npr-staff-picks127
poetry126
family-matters125

Table 3: Top 10 Tags by Year

YearTagCount
2011nonfiction26
2011literary-fiction16
2011arts-and-entertainment15
2011historical-fiction11
2011food-and-wine10
2011childrens-books9
2011fiction9
2011young-adult8
2011poetry6
2011mysteries-thrillers-and-crimes4
2011science-fiction-and-fantasy4
2012nonfiction38
2012fiction15
2012mysteries-thrillers-and-crimes12
2012science-fiction-and-fantasy11
2012historical-fiction9
2012poetry9
2012arts-and-entertainment8
2012literary-fiction8
2012childrens-books5
2012comics-and-graphic-novels5
2013for-history-lovers60
2013seriously-great-writing52
2013the-dark-side52
2013family-matters51
2013book-club-ideas49
2013realistic-fiction42
2013tales-from-around-the-world36
2013npr-staff-picks30
2013rather-long30
2013eye-opening-reads29
2013funny-stuff29
2013npr-staff-picks29
2014npr-staff-picks98
2014realistic-fiction76
2014tales-from-around-the-world71
2014the-dark-side67
2014science-and-society61
2014seriously-great-writing59
2014family-matters47
2014for-history-lovers44
2014eye-opening-reads42
2014science-fiction-and-fantasy39
2015staff-picks122
2015seriously-great-writing72
2015realistic-fiction70
2015nonfiction69
2015tales-from-around-the-world65
2015the-dark-side60
2015family-matters58
2015science-fiction-and-fantasy50
2015book-club-ideas46
2015eye-opening-reads39
2016staff-picks148
2016nonfiction95
2016the-dark-side67
2016tales-from-around-the-world66
2016seriously-great-writing65
2016family-matters64
2016ladies-first64
2016identity-and-culture58
2016realistic-fiction58
2016book-club-ideas55
2017staff-picks206
2017nonfiction116
2017realistic-fiction92
2017identity-and-culture81
2017tales-from-around-the-world75
2017ladies-first74
2017the-dark-side71
2017family-matters64
2017book-club-ideas63
2017for-history-lovers57
2018staff-picks157
2018nonfiction115
2018identity-and-culture103
2018the-states-were-in97
2018ladies-first87
2018family-matters79
2018eye-opening-reads75
2018book-club-ideas69
2018realistic-fiction61
2018the-dark-side60