Data scientist with 15 years experience in quantitative and qualitative data analysis, specializing in data visualization, relational databases, and social-media analytics. Proficient in various methods for natural language processing and machine learning (R and Python). Experienced storyteller and writer, published in academic, technical, and popular publications. Work featured in New York Times, Bloomberg, BuzzFeed News, Slate.

Select projects featured at github.com/kshaffer and github.com/corpusmusic.


Senior Computational Disinformation Analyst, New Knowledge, 2018–present

  • Studying and combatting disinformation online.
  • Global election monitoring.

Data Scientist and Web Intelligence Analyst, 2017–present

  • Social media analytics, web scraping, dashboards, data visualization, and written reports for New Knowledge, Sovereign Intelligence, Discourse Intelligence, and Southern Poverty Law Center.

Instructional Technology Specialist & Adjunct Instructor of Computer Science, University of Mary Washington, 2016–present

  • Developed an analytics dashboard for UMW Domains (over 2500 student & faculty domains) to analyze usage and predict funding & support needs: umwdomains.com/data.
  • Systems administrator & analyst for UMW Domains & UMW Blogs (WordPress multisite).
  • Data Science Committee member, planning new masters degree in data science.
  • Teach Intro to Data Science, Digital Storytelling, Intro. to Digital Studies, The Internet.

Instructor of Music Theory & Adjunct Instructor of Computer Science, CU–Boulder, 2013–2016

  • Principal investigator/faculty advisor for research projects using Python, R, cluster analysis, Markov and hidden Markov models, and traditional statistics to model musical and poetic structures: github.com/corpusmusic (repos: bb-cluster & liederCorpusAnalysis).
  • Taught graduate & undergraduate courses in music theory, cognition, and computational analysis.

Education & certifications

Ph.D., M.Phil. & M.A. in music theory, Yale University, 2011

  • First music dissertation at Yale to include computational statistical methods

Collaborative Institutional Training Initiative (CITI) certification, 2015

  • Social Behavioral Research – Investigators and Key Personnel, Human Research


Data analysis and visualization: quantitative and qualitative analysis of web and business analytics, natural language data, unstructured data, human subject experimental data.

Data mining and wrangling: web scraping; text mining; data transformation (unstructured flat files, multi-layer musical streams, JSON & XML data from APIs); merging and tidying data from multiple sources.

Natural Language Processing: Topic modeling, sentiment analysis, social media analytics.

Machine learning: k-means clustering, word2vec, cosine similarity, topic models, Markov models, regression.

Database querying: MySQL, TeraSoft, RSQLite (databases containing over 100 million records).

Writing: published in textbook, academic journals, enterprise technology websites, trade magazine.

Software: R (TidyVerse, ggplot2, Shiny), Python (pANDS, SciKit-Learn, GenSim), SQL, APIs, Git/GitHub, Linux/bash.

Sample projects


Data analytics tools

  • tweetmineR – scrape tweets with Python and analyze them with R.
  • Presidency Project – scrape and analyze content from The American Presidency Project.
  • bb-cluster – Mine and analyze chord-progression data from songs in the Billboard Hot 100.
  • Lieder Corpus Analysis – analyze music and poetic structures in 19th-century German art songs.

Web development tools