For working professionals
For fresh graduates
More
Deep Learning Tutorial: A Comp…
In deep learning, it was believed that the more layers we implemented, the result should have been that much more accurate. It changed when we noticed that after a certain point, the accuracy of the model decreased when more layers were introduced. We call this the vanishing gradient problem. The gated recurrent unit helped tackle this problem. Let me walk you through the basics of GRU so you can also use this method in your next project.
Over time, I learned how GRUs use gating methods to control the flow of information throughout the network, helping reduce the vanishing gradient problem. The latter commonly occurs in deep RNNs and hampers learning long-range dependencies. Gated recurrent units changed deep learning forever, making it a staple in our industry.
Let me start by giving you a brief idea about GRU in this tutorial on GRU neural network. A gated recurrent unit (GRU) is a (RNN) recurrent neural network architecture that aims to solve the problems of RNNs, which include vanishing gradient problems and difficulty getting long-term dependency. Kyunghyun Cho et al in 2014 introduced GRUs.
GRUs are similar to Long Short-Term Memory (LSTM) networks in that they use gating mechanisms to control the flow of information inside the network. However, GRUs have a simpler architecture with fewer gating units, resulting in higher computational efficiency.
Let's discuss the components used in the gated recurrent unit to better understand how it works. Here are the components of GRU.
At each time step (t), the GRU receives an input vector containing the current data point in the series.
The GRU also accepts the previous hidden state as input, which provides information from the previous time step that helps detect dependencies over time.
The update gate controls how much of the previous concealed state is kept and how much new information is added to the current state. It is calculated as Z(t)= σ(Wz . [h(t-1) . x(t)]. Here, the weight matrix for the update gate is 𝑊_𝑧. The sigmoid activation function is σ, and the concatenation of the previous hidden state and the current input is [ℎ 𝑡 − 1 , 𝑥 𝑡 ].
The reset gate controls how much of the previous concealed state should be forgotten. It is calculated as r_t = σ(W_r ⋅[h t−1,x t]). The weight matrix for the reset gate is denoted as 𝑊_𝑟.
The GRU creates potential hidden states based on the reset gate (𝑟𝑡). It's calculated as:
h(t) = tanh(Wh) ⋅[r × t ⊙h( t−1),x ( t)]. Here, 𝑊_ℎ is the weight matrix for the candidate memory content, tanh is the hyperbolic tangent activation function, ⊙ denotes element-wise multiplication, and [ 𝑟 𝑡 ⊙ ℎ(𝑡 − 1), 𝑥( 𝑡) ] [r t⊙h( t−1),x( t)] represents the concatenation of the reset gate applied to the previous hidden state and the current input.
Finally, the new hidden state h(t) is calculated by mixing the preceding concealed state. h(t-1) and the candidate memory content using the update gate z t: ℎ 𝑡 = ( 1 − 𝑧 𝑡 ) ⊙ ℎ 𝑡 − 1 + 𝑧 𝑡 ⊙ ℎ ~ 𝑡 h t=(1−z t)⌙h t−1+z t⊙ h(t).
GRU is like other recurrent neural network architectures. It analyzes sequential data one element at a time, changing its hidden state using both the current and prior inputs. At each time step(t), the GRU generates a "candidate activation vector" by combining information from the input and the prior hidden state. This candidate vector is then used to update the concealed state in the following time step.
The Update and reset gates are used to calculate the candidate activation vector. The reset gate chooses how much of the prior concealed state it should forget. The update gate, on the other hand, decides how much of the candidate activation to insert in the current hidden state.
Let me explain the maths behind this whole spectacle.
r(t) = σ(W(r) * [h(t-1), x(t)])
z(t) = σ(W(z) * [h(t-1), x(t)])
In the above expression, W(r) and W(z) are called weight matrices. They are evaluated during the training of the neural network.
h(t)~ = tanh( W(h) * [r(t) * h(t-1), x(t)] )
Here, W(h) is another weight matrix.
h(t) = (1 - z(t) ) * h(t-1) + z(t ) * h(t)~
The end result is a compact architecture that can selectively update its hidden state based on the input and prior hidden state. This eliminates the need for a separate memory cell state, which is used in LSTM (Long Short-Term Memory).
At each time step 𝑡, the GRU gets an input vector 𝑥, which represents the current data point in the series.
The function accepts the previous concealed state as input (ℎ, 𝑡, -1). It contains information from the preceding time step and aids in the identification of temporal dependencies.
The GRU generates an update gate (z t) to determine how much of the prior hidden state [ℎ (𝑡 - 1)] should be kept and how much new information should be added.
The GRU computes a reset gate r(t). This reset gate determines how much of the previous step should be forgotten.
After this, the GRU uses the reset gates to create candidate memory content. This is the new potential hidden state. This is based on the current input and previous hidden state.
After all this, the GRU computes the new hidden state h(t-1) and the candidate memory content (h ~ t) using the update gate.
GRU networks have their own set of advantages and disadvantages. Let me discuss them one by one.
The advantages of the GRU neural network are as follows:
Now, let me discuss some of the disadvantages of GRU (gated recurrent unit):
The gated recurrent unit has helped solve a lot of problems we faced in deep learning. It helped us especially mitigate the vanishing gradient problem. This tutorial has given you a good idea of how to use GRUs in your next deep-learning project.
Just like gated recurrent units, there are a lot of advanced concepts you need to master in deep learning. I would suggest checking out certified courses from reputed platforms. One such platform that comes to mind is upGrad. Their brilliant courses are in collaboration with some of the best universities around the world. Some of the best professors in the field curate their courses.
ResNet, or Residual Network, is a deep neural network architecture that is specifically developed to solve the problem of vanishing gradients in very deep networks. This problem occurs when training deep networks with numerous layers because gradients can become exceedingly tiny during backpropagation, making it impossible for the model to learn efficiently.
ResNet architectures can contain anywhere from ten to hundreds of layers. The original ResNet, introduced by Kaiming He et al. in 2015, contains versions with 18, 34, 50, 101, and 152 layers. Later variants and adaptations may include more layers, such as ResNet-1001 or ResNet-200.
ResNet's efficiency comes from its usage of skip connections, which allow for the training of extremely deep networks. This method helps to solve the vanishing gradient problem, making it easier to train deeper models.
The fundamental difference between VGG and ResNet is their architecture. VGG uses a deep but uniform architecture with modest 3x3 filters, whereas ResNet uses skip connections to ease the training of extremely deep networks, allowing it to overcome the depth restrictions experienced by networks such as VGG.
ResNet's advantages include successful deep network training with skip connections, which leads to enhanced performance, better feature reuse, and scalability to hundreds of layers without losing performance.
ResNet is often seen to be better than VGG due to its capacity to train deeper networks more successfully, which is achieved using skip connections that mitigate the vanishing gradient problem. This leads to better performance and feature representation.
VGG is often faster than ResNet for inference (generating predictions on fresh data) due to its simpler architecture with fewer layers and computations.
One is not better than the other. The job and dataset determine whether VGG or ResNet should be used. ResNet is often favored for deeper networks and jobs that require extracting complicated features, whereas VGG may be appropriate for simpler structures or smaller datasets.
Author
Talk to our experts. We are available 7 days a week, 9 AM to 12 AM (midnight)
Indian Nationals
1800 210 2020
Foreign Nationals
+918045604032
1.The above statistics depend on various factors and individual results may vary. Past performance is no guarantee of future results.
2.The student assumes full responsibility for all expenses associated with visas, travel, & related costs. upGrad does not provide any a.