Last week was my first week at the Recurse Center! I’m having so much fun lol. While here, I’m exploring creative applications of deep learning.
As my starter project, I wanted to generate jazz music using a neural network. LSTM stands for Long Short-Term Memory, and is a type of recurrent neural network that is capable of processing sequences. You can think of this as having short-term memory capable of learning long-term dependencies.
Using this tutorial as a starting point, I trained an LSTM model on two datasets: Final Fantasy music (conveniently provided from the tutorial, which let me focus on the model building over finding data), and Herbie Hancock jazz music (my original goal!).
Here are the results:
Final Fantasy
For this composition, I generated a bunch of MIDI files using the model, picked 3 I liked, set them to different instruments, and composited them into one piece.
Herbie Hancock
For these, I didn’t compose or edit the songs by much. After generating a few MIDI files, I picked some I liked, and set each one individually to an instrument I thought sounded nice.
My favorite parts are 0:22-0:45 in the Wurlitzer piece, and the first 5 seconds of the Vibraphone piece.
Honestly I’m quite happy with the results! This was my first time working with music data. I also had access to the Recurse Center’s GPU cluster, which made this project possible. I’ve pushed my code up to this Github repo for reference.
Project components:
Learning about the MIDI file format and how to encode it using the Python Music21 library
Finding MIDI files for my training data
Getting set up on the GPU cluster (and using screen so I don’t disconnect and interrupt my training session when I leave for the day!)
Training the LSTM model using Keras, saving the weights as I go. Learned from a friend: if you have access to a GPU, you’ll want to use
CuDNNLSTM
rather thanLSTM
layers, to save on training times! Generating doesn’t take that long but it would improve on generating times as well.Generating music using the LSTM model (same architecture, load up the most recent weights file). Using the first 100 notes, predict the next note. Shift the window for the input sequence by one note, repeat. Stop whenever you feel your song is long enough lol. The songs in this post are about 250 notes each.
Opening up the MIDI files in Garage Band so I can play it with various fun instruments and sounds :-)
There are a number of possible extensions from here. For example, right now the rhythm is pretty straightforward, as notes are set to offset from the last note by 0.5 seconds. A possible extension is to encode the rhythms of the training data (doable since MIDI file formats are essentially note+time offset). Another extension is to add some music rules, e.g. counterpoint, harmony, consonance, etc. which I would have to research more about. It would also be really interesting to train a network on multiple instrumental parts, such as an orchestral score, where different instruments would have musical relationships or dependencies with one another.
Sheet Music
For fun, I used an online converter to generate sheet music for piano from the output MIDI files :-)
Here’s the first song (Wurlitzer Electric):
Here’s the second song (Vibraphone):