PReLU
May 20, 2023
PReLU stands for Parametric Rectified Linear Unit. It’s an activation function used in artificial neural networks that was first introduced in 2015 by He et al. It’s a type of activation function that adds a parameter to the standard ReLU function, which makes it possible to learn the optimal value of the slope of the negative part of the function. PReLU has been shown to improve the performance of deep neural networks in various computer vision tasks such as image classification, face recognition, and object detection.
ReLU
To understand PReLU, we first need to understand ReLU. ReLU stands for Rectified Linear Unit, which is a type of activation function that is commonly used in artificial neural networks. The ReLU function is defined as follows:
f(x) = max(0, x)
The ReLU function sets all negative values to zero and leaves all positive values unchanged. The ReLU function is simple, computationally efficient, and has been shown to work well in practice.
The Problem with ReLU
While ReLU has many advantages, it also has a major drawback: the “dying ReLU” problem. The problem occurs when the gradient of the ReLU function is zero for all negative inputs, which means that the weights of the corresponding neurons are not updated during the backpropagation phase of the training process. In other words, these neurons are “dead” and do not contribute to the learning process. The dying ReLU problem can significantly slow down the training process and even prevent the network from converging.
PReLU
PReLU is a solution to the dying ReLU problem. PReLU adds a small parameter to the negative part of the ReLU function, which makes it possible to learn the optimal value of the slope of the negative part of the function. The PReLU function is defined as follows:
f(x) = max(0, x) + alpha * min(0, x)
where alpha is a learnable parameter that is shared across all neurons in the same channel. The PReLU function reduces the number of “dead” neurons in the network and improves the overall performance of the network.
Training PReLU
During the training process, the value of alpha is updated using backpropagation, just like the weights of the neurons. The update rule is similar to the one used for the weights:
alpha = alpha - learning_rate * delta_alpha
where delta_alpha is the gradient of the loss function with respect to alpha. The value of alpha is usually initialized to a small positive value, such as 0.01.
Advantages of PReLU
PReLU has several advantages over ReLU and other activation functions:
-
Reduced number of “dead” neurons: PReLU reduces the number of “dead” neurons in the network, which improves the learning speed and overall performance of the network.
-
Improved generalization: PReLU improves the generalization performance of the network by introducing more non-linearity and enabling the network to learn more complex functions.
-
Reduced overfitting: PReLU reduces overfitting by introducing more non-linearity and regularizing the network.
Applications of PReLU
PReLU has been applied to various computer vision tasks such as image classification, face recognition, and object detection. For example, in the ResNet architecture, PReLU is used instead of ReLU to improve the performance of the network. PReLU has also been used in speech recognition, natural language processing, and other domains.