Do robots celebrate Christmas?
No. At least, not yet. But that doesn't mean that we can't feed a bunch of lyrics and melodies into an algorithm and have it spit out some Christmas carols!
And that's what I did. Just in time for Advent.
I downloaded lyrics to a few thousand Christian hymns and melodies to a few hundred Christmas songs and used an algorithm called a recurrent neural network (details below for those who are interested) to create new song lyrics and melodies. Then I selected a few stanzas that contained Christmas-y words (like born, child, star, or angel), combined them into a song, and then did the same with the melodies (without the lyrical theming, of course). Finally, I drew upon my years of teaching and tutoring first-year music theory to compose harmony parts to accompany that melody.
The result is the following song! (And raw materials for many, many more...)
What I learned while "composing" this hymn
This process was ... shall we say ... educational. Like with most machine-learning/artificial-intelligence projects, I learned more about the source materials and how to frame the problem than I produced good songs.
The biggest and most surprising thing I learned was that a neural network can learn poetic meter, even if I didn't program it to. The algorithm I used was focused on characters (individual letters, spaces, and punctuation). I knew from past examples that it could pick up on structural cues, but I thought that poetic meter would be a stretch. It turned out that if I fed it a bunch of songs whose lyrics had the same meter, most of the output text was comprised of lines that matched that meter. I started with songs in Common Meter (126.96.36.199 or iambic tetrameter alternating with iambic trimeter). It worked pretty well, but given the small amount of input text I had available to me, I decided to try 188.8.131.52 (iambic tetrameter), so that all the lines had the same meter. The results were uncanny. Even the lines of nonsense (and sometimes even the non-words) were in iambic tetrameter most of the time! This was striking, and definitely something to follow up on in the future.
I also learned that applying the algorithm to lyrics, notes, rhythm, and harmony separately can make up for a lack of training data. Since the algorithm I was using was designed for plain-text input, I initially started out thinking I would represent each song with a custom text format that combines the melody and lyrics in a single text stream. The initial results were disastrous. I'm pretty sure I'd need hundreds of thousands of songs in the same poetic and musical meter as source material, which just isn't practical. However, by running the algorithm on the lyrics for 184.108.40.206 songs, then running it on melodies (all transposed to the same key and jury rigged to 8-syllable lines), taking the best results from each, and combining them, I could get passable results with a lot less source material, and a lot less computation.
I learned that there is no straightforward way to take musical notation data and extract the equivalent of poetic lines. I suspected this, but after trying a bunch of different things (and finding only a small amount of usable source material), I had to settle on a pretty hacky solution (divide each melody into "lines" of eight notes, regardless of the actual musical structure), knowing I could "filter" the results down to only the solutions that matched the desired output. This means that the melodies were "composed" by the algorithm, but it's far from push-button composition. At best, the algorithm's musical output was raw material I could select from. To improve this, I'd have to do a lot of manual musical encoding on the front end — something I might try in the future, when I have more time and want to evaluate how much it would improve the output.
That said, I also learned that basic text notation for musical notes is a perfectly usable input for a text-oriented algorithm. I used a simple text notation like "C4" (middle C), "B-3" (the B-flat below middle C), and "F#5" (the F-sharp on the top of the treble staff) for notes, all separated by spaces. Since I started with MusicXML files, the Python package music21 provided an easy, automatable way to convert the music notation into that text format.
I learned that, even with a relatively small set of source melodies, the algorithm was able to generally distinguish major- from minor-key melodies. I thought this was going to be a sticking point, and originally planned on isolating major from minor melodies to train separate models. When I realized how few melodies I was able to find, I just threw them all at the model and hoped for the best. As it turned out, most of the output melodies were in a major key, with a few chromatic pitches (mostly raised tones like #4), many of which even resolved themselves properly! A few other melodies were full of flats, reflecting (mostly) "natural" minor, again with a few chromatics. There were certainly some awkward mixes in the output, but there were plenty of firmly major and firmly minor melodies to choose from.
I also learned that machine-generated four-part contrapuntal harmony is extremely challenging. There are a few recent solutions out there, but they look more involved than I was prepared to go this time around, so I haven't dug into the details yet. Generating four-part harmony with the plain-text neural net that works well for the lyrics and melody would require a lot more source material than I was able to find. And, of course, you can't run each vocal part separately and combine them at the end, like you can with the lyrics and tune. To keep things simple this time around, I decided to use the same algorithm to generate lyrics and melody, and then just compose the harmony parts myself. I know it's cheating, and I realize that the harmonization is pretty much what saves the song at the more awkward melodic moments, but this was a good (for me) first step. Perhaps I'll find, or create, a model capable of generating full harmonizations in the future.
Finally, I learned that there is a rich lyric tradition in Christian hymns. I mean, I already knew that, but when you pump a few thousand poems into an algorithm and say "learn the 'rules' governing this style and produce new poems that follow those rules", things really jump out at you. I expected more "fluff" — there's plenty of it in the repertoire — but the phrases and formulations that occur most frequently across the collection are pretty meaty in terms of Christian teaching. Because of their consistent prominence in the source material, those richer turns of phrase appear disproportionately in the output, leading to, at times, some nice lyrics. I mean, it's still mostly nonsense, but it's in the right ballpark.
How I did it
I downloaded approximately 2000 hymn texts (all liturgical seasons) from online hymnals that fit the 220.127.116.11 meter. I also downloaded approximately 300 Christmas and Advent hymn melodies — basically whatever I could find in MusicXML format.
I then made one large text file containing all of the texts, and another text file containing pitches of all of the melodies (transposed to C major or minor), represented as plain text. The pitches were extracted and transposed using the music21 Python package. Basic text manipulation was done in Python. Finding and standardizing the source texts and melodies was, as is typical with these kinds of things, the biggest time suck.
Then came the main event. I used Torch RNN — a character-level recurrent neural network application — to train models from the source texts and melodies, and then to use the models to generate new texts and melodies. Though it takes a fair bit of computational time to train the model, the instructions are fairly straightforward, and the process itself is simple, if you're familiar with command-line programming.
Since the source texts were not all Christmas- or Advent-themed, and not every song was well-formed, I perused the output text for stanzas that 1) fit 18.104.22.168 meter (which was surprisingly common) and 2) contained a word suggestive of Christmas or Advent ("star", "child", "born", "angel", etc.). I also perused the melody output for four-line segments that started and ended in appropriate places to begin and end a song in that key. Once I had passable exemplars, I pasted together a few of the "good" stanzas, and combined them with one of the output melodies.
Finally, I composed harmony parts (alto, tenor, and bass) to accompany the machine-generated melody, according to standard tonal music theory and typical choral hymn style.