Deep Learning
Notes from the Introduction to Deep Learning MIT course
Notes also from Andrej Karpathy YT series on Neural Networks
Introduction
-
Artificial Intelligence
- Any technique that enables computers to mimic human behaviour
-
Machine Learning
- Ability to learn without being explicitly programmed
-
Deep Learning
- Extract patterns of data using neural networks
Why Deep Learning?
- Hand engineered features are time consuming and brittle and not scalable in practice.
- Can we learn the underlying features directly from the data.(low-level, mid level, high level)
Why Now?
- Big Data: Larger datasets, Eaier collection and storage.
- Hardware: GPUs, Massivel Parallelized.
- Software: Improved Techniques, New models, Toolboxes.
Many Artificial Intelligence tasks can be solved by designing the right set of features to extract for that task, then providing those features to a simple machine learning algorithm.
The neural network abstraction can be viewed as a modular approach of enabling learning algorithms that are based on continous optimization on a computational graph of dependencies between inputs and outputs.
Transformers enabled a rapid scaling up of the complexity of language models by increasing the number of parameters in the model, as well as other factors. The parameters can be thought of as connections between words, and models improve by adjusting these connections as they churn through text during training. The more parameters in a model, the more accurately it can make connections, and the closer it comes to passably mimicking human language.
Building Blocks
- Perceptron - structural building block of deep learning
- Activation functions - introduce non-linearities into the network.
- Objective function/ Cost function - loss of our network measures the cost incurred from incorrect predictions.
- Loss optimization - find network weights that achieve the lowest loss.
- Backpropagation -
- Learning Rate -
- Gradient Descent -
- Regularization -