Vanilla LSTM with numpy

October 8, 2017

This is inspired from Minimal character-level language model with a Vanilla Recurrent Neural Network, in Python/numpy by Andrej Karpathy.

The blog post updated in December, 2017 based on feedback from @AlexSherstinsky; Thanks!

This is a simple implementation of Long short-term memory (LSTM) module on numpy from scratch. This is for learning purposes. The network is trained with stochastic gradient descent with a batch size of 1 using AdaGrad algorithm (with momentum).

http://blog.varunajayasiri.com/ml/lstm.svg

You can download the jupyter notebook from http://blog.varunajayasiri.com/ml/numpy_lstm.ipynb

The model usually reaches an error of about 45 after 5000 iterations when tested with 100,000 character sample from Shakespeare. However it sometimes get stuck in a local minima; reinitialize the weights if this happens.

You need to place the input text file as `input.txt` in the same folder as the python code.

VARUNA JAYASIRI

Vanilla LSTM with numpy

October 8, 2017