Implementing Neural Style Transfer from Scratch using PyTorch

March 19, 2019

This past week, I’ve been playing around with more image processing and generation techniques. In particular, I implemented the neural style transfer algorithm by Gatys, Ecker, and Bethge in PyTorch following this tutorial. The paper and technique have been around for a few years, but it wasn’t until now that I have access to a GPU here at Recurse. This was so much fun to implement and experiment with!

My GitHub repo contains instructions on setup and usage, as well as a directory containing many results, if you would like to try it out and explore for yourself!

Model Overview

Neural style transfer takes two images as input and applies the style of one image onto the content of the other. In the example below, the first image is the style input, the second image is the content input, and the third image is the result of the style transfer. (The style image used here is one of my favorite paintings: Nocturne in Black and Gold, the Falling Rocket by James Abbott McNeill Whistler.)

The approach builds off of the VGG-19, a convolutional neural network pretrained on millions of images. It’s 19 layers deep and built by the Visual Geometry Group, hence VGG-19.

For neural style transfer, we modify the network architecture as such: we insert a content loss layer, using mean squared error, after the fourth convolutional layer; and insert style loss layers, using mean squared error on normalized gram matrices, after the first five convolutional layers.

Results

These are some of my favorite images that resulted from my explorations. Note that this is a curated collection of results.

Generated Final Fantasy MIDI files, visualized

Generating Jazz Music with an LSTM Recurrent Neural Network

February 25, 2019

Last week was my first week at the Recurse Center! I’m having so much fun lol. While here, I’m exploring creative applications of deep learning.

As my starter project, I wanted to generate jazz music using a neural network. LSTM stands for Long Short-Term Memory, and is a type of recurrent neural network that is capable of processing sequences. You can think of this as having short-term memory capable of learning long-term dependencies.

Using this tutorial as a starting point, I trained an LSTM model on two datasets: Final Fantasy music (conveniently provided from the tutorial, which let me focus on the model building over finding data), and Herbie Hancock jazz music (my original goal!).

Here are the results:

Final Fantasy

For this composition, I generated a bunch of MIDI files using the model, picked 3 I liked, set them to different instruments, and composited them into one piece.

Herbie Hancock

For these, I didn’t compose or edit the songs by much. After generating a few MIDI files, I picked some I liked, and set each one individually to an instrument I thought sounded nice.

My favorite parts are 0:22-0:45 in the Wurlitzer piece, and the first 5 seconds of the Vibraphone piece.

Honestly I’m quite happy with the results! This was my first time working with music data. I also had access to the Recurse Center’s GPU cluster, which made this project possible. I’ve pushed my code up to this Github repo for reference.

Project components:

Learning about the MIDI file format and how to encode it using the Python Music21 library
Finding MIDI files for my training data
Getting set up on the GPU cluster (and using screen so I don’t disconnect and interrupt my training session when I leave for the day!)
Training the LSTM model using Keras, saving the weights as I go. Learned from a friend: if you have access to a GPU, you’ll want to use CuDNNLSTM rather than LSTM layers, to save on training times! Generating doesn’t take that long but it would improve on generating times as well.
Generating music using the LSTM model (same architecture, load up the most recent weights file). Using the first 100 notes, predict the next note. Shift the window for the input sequence by one note, repeat. Stop whenever you feel your song is long enough lol. The songs in this post are about 250 notes each.
Opening up the MIDI files in Garage Band so I can play it with various fun instruments and sounds :-)

There are a number of possible extensions from here. For example, right now the rhythm is pretty straightforward, as notes are set to offset from the last note by 0.5 seconds. A possible extension is to encode the rhythms of the training data (doable since MIDI file formats are essentially note+time offset). Another extension is to add some music rules, e.g. counterpoint, harmony, consonance, etc. which I would have to research more about. It would also be really interesting to train a network on multiple instrumental parts, such as an orchestral score, where different instruments would have musical relationships or dependencies with one another.

Sheet Music

For fun, I used an online converter to generate sheet music for piano from the output MIDI files :-)

Here’s the first song (Wurlitzer Electric):

Here’s the second song (Vibraphone):

Music and Mood: Assessing the Predictive Value of Audio Features on Lyrical Sentiment

January 3, 2018

aka - what's the relationship between the audio features of a song and how positive or negative its lyrics are?

aka - data analysis of my spotify music data + sentiment analysis + supervised machine learning

aka - my senior thesis

the full jupyter notebook used to conduct this data analysis can be found on my github here: Spotify Data Analysis

(pg. 32 and onward is just the full python jupyter notebook in the appendix.)