ML & DL — Convolutional Neural Networks (Part 6)
Convolutional neural networks extract the most useful information for the task at hand.
In this article, you will find:
- A brief introduction to convolutional neural networks,
- Parameters and layers,
- Regularization,
- Implementation of a convolutional neural network with Keras in the Jupyter Notebook,
- Partial summary.
Convolutional neural networks
Convolutional neural networks or Convolutional networks are neural networks that use convolution instead of the general multiplication of the matrix, in at least one of its layers[1].
Convolution is a mathematical operation that describes a rule of how to mix two functions or parts of the information.
S(i,j) = (I*K)(i,j)
, convolution, whereI
Feature map,K
convolution kernel, andS(i,j)
Map of transformed characteristics.
Parameters
Convolutional layers have parameters that are learned so that these filters are automatically adjusted to extract the most useful information for the task at hand.
- Input is a multidimensional array of data,
- Kernel is a multidimensional array of parameters,
These multidimensional arrays are tensors:
- Time-series: grid 1D regular time intervals,
- Image data: 2D pixel grid.
Layers
- Convolution: extract features from the image,
- Pooling: reduce the size of the entrance, and
- Dense/Fully connected: connect layers.
Regularization
It is used to overcome the problem of underfitting and overfitting.
In the regularization, we penalize the loss by adding a standard L1
(LASSO) or L2
(Ridge) in the weight vector W
. These penalties are incorporated in the loss function that the network optimizes.
• L1
: the sum of the absolute value of the coefficients.
• L2
: the sum of the squared value of the coefficients.
• Dropout: randomly sets a fraction of input units to 0 for each update during training time
Github code
In this repository, you will find the implementation of a convolutional neural network, step by step, with Keras in Jupyter Notebook.
Model training and evaluation
- Data: MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
2. Model:
model = Sequential()
# Add the input layer and hidden layer 1
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
# Add hidden layer 2
model.add(Conv2D(64, (3, 3), activation='relu'))
# Flatten convolutional output
model.add(Flatten())
# Add hidden layer 3
model.add(Dense(128, activation='relu'))
# Add the output layer
model.add(Dense(10, activation='softmax'))
3. Compile:
model.compile(optimizer='rmsprop', loss='categorical_crossentropy',
metrics=['categorical_accuracy'])
4. Fit:
history = model.fit(X_train, y_train_cat,
batch_size=256, epochs=50,
validation_data=(X_test, y_test_cat))
5. Evaluate:
[test_loss, test_acc] = model.evaluate(X_test, y_test_cat)
MNIST results
For all training models: Epochs: 50, Batch size: 256, Optimizer: RMSProp, and Output layer: 10 softmax units.
Partial summary
All theoretical and practical implementations: Linear regression, Logistic regression, Artificial neural networks, Deep neural networks, and Convolutional neural networks.
For those looking for all the articles in our ML & DL series. Here is the link.
References
[1] Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press.