8. Long Short-Term Memory: LSTM
LSTM (Long Short-Term Memory), invented in 1997 by S. Hochreiter and J. Schmidhuber, is a type of recurrent neural network (RNN) specifically designed to handle long-term dependencies in sequences.
LSTM networks work by using a special type of weights and biases that can store information for long periods of time. These weights and biases are controlled by three gates: the input gate, forget gate, and output gate.
- The input gate controls how much new information is added to the weights and biases.
- The forget gate controls how much information is removed from the weights and biases.
- The output gate controls how much information from the weights and biases is used by the LSTM network.
There is a lot of good documentation available for a detailed explanation of LSTM networks, such as:
While many resources cover LSTM architecture, I have not found a clear and concise explanation of LSTM backpropagation. Therefore, the following sections will delve into the backpropagation process and LSTM implementation in detail.
Chapter Contents