ML & DL — Artificial Neural Networks (Part 4)
Artificial neural networks are called networks because they are represented by the composition of several different functions.
In this article, you will find:
- A brief introduction to artificial neural networks,
- Graphic representation,
- Activation and cost function,
- Implementation of artificial neural networks with Keras in the Jupyter Notebook,
- Partial summary.
Artificial neural networks
Most real-world problems are not linearly separable. To calculate nonlinear hypotheses, one of the most computationally efficient ways is to connect small units that do “logistic regression”.
Artificial neural networks are called networks because the composition of several different functions represents them.
Information on a network flows through functions connected in a chain[1]:
ŷ = 𝑓⑵(𝑓⑴(𝓍))
, functions connected or artificial neural network, where:(𝓍)
is the input layer,𝑓⑴
is the hidden layer, and𝑓⑵
is the output layer.
for each layer 𝑓⒤
:
𝑓⒤(𝒽) = σ⒤(W⒤𝒽 + b⒤)
, logistic regression, where:σ⒤
is the activation function,W⒤
is the weight matrix, andb⒤
is the bias vector.
that is:
ŷ = σ⑵ [W⑵σ⑴ (W⑴𝓍 + b⑴) + b⑵]
Graphic representation
Activation function
They introduce non-linearity in artificial neural networks.
For regression problems σ⑵
: identity function,
σ(𝑧)=𝑧
For classification problemsσ⑵
: sigmoid function.
- Logistic,
- Tanh,
- Softmax,
- Read more about activation function.
Cost function
For regression problems:
MSE = 1/n ∑(y⒤-ŷ⒤)²
, minimizing with MSE.
For classification problems:
E(y, ŷ)= — ∑ y⒤ log(ŷ)
, minimization with cross-entropy.
Github code
In this repository, you will find the implementation of an artificial neural network, step by step, with Keras in the Jupyter Notebook.
Model training and evaluation
- Data: MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
2. Model:
model = Sequential()
# Add the input layer and hidden layer 1
model.add(Dense(32, input_shape=(784,), activation='sigmoid'))
# Add the output layer
model.add(Dense(10, activation='softmax'))
3. Compile:
model.compile(optimizer='rmsprop', loss='categorical_crossentropy',
metrics=['categorical_accuracy'])
4. Fit:
history = model.fit(X_train, y_train_cat,
batch_size=256, epochs=50,
validation_data=(X_test, y_test_cat))
5. Evaluate:
[test_loss, test_acc] = model.evaluate(X_test, y_test_cat)
MNIST results
For all training models: Epochs: 50, Batch size: 256, Optimizer: RMSProp, and Output layer: 10 softmax units.
Partial summary
All theoretical and practical implementations: Linear regression, Logistic regression, Artificial neural networks, Deep neural networks, and Convolutional neural networks.
For those looking for all the articles in our ML & DL series. Here is the link.
References
[1] Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press.