# Activation Function

May 20, 2023

In the field of machine learning and artificial intelligence, an activation function is a mathematical function that is used to introduce non-linearity into artificial neural networks. Activation functions are applied to the output of a neuron to determine whether the neuron should be activated or not based on the input received.

## Background

Activation functions were first introduced in the context of artificial neural networks in the 1940s by Warren McCulloch and Walter Pitts. Since then, various activation functions have been proposed and used in the field of machine learning and artificial intelligence.

The activation function is an essential component of an artificial neural network. It determines the output of a neuron based on the inputs received from the previous layer. The output of the neuron is then passed onto the next layer of the network.

## Types of Activation Functions

There are several types of activation functions, including the following:

### Step Function

The step function is one of the oldest and simplest activation functions used in artificial neural networks. It is a binary function that returns a value of either 1 or 0 depending on whether the input is greater than or less than a certain threshold.

```
1 if x > 0
f(x) =
0 otherwise
```

The step function is not differentiable, which makes it unsuitable for use in certain types of neural networks.

### Linear Function

The linear function is another simple activation function that is used in some neural networks. It returns the input value without modifying it.

```
f(x) = x
```

The linear function is differentiable, but it does not introduce non-linearity into the network, which makes it less effective than other activation functions.

### Sigmoid Function

The sigmoid function is a popular activation function that is used in many neural networks. It is a mathematical function that maps any input value to a value between 0 and 1. The sigmoid function is differentiable, which makes it suitable for use in backpropagation algorithms.

```
1
f(x) = -------------
1 + e^(-x)
```

The sigmoid function introduces non-linearity into the network, which makes it effective for certain types of problems. However, the sigmoid function can suffer from the problem of vanishing gradients, which occurs when the derivative of the function approaches zero as the input becomes very large or very small.

### Rectified Linear Unit Function

The rectified linear unit (ReLU) function is another popular activation function that is used in many neural networks. It returns the input value if it is greater than zero, and returns zero otherwise.

```
f(x) = max(0, x)
```

The ReLU function is simple and computationally efficient, which makes it suitable for use in large-scale neural networks. However, the ReLU function is not differentiable when the input is zero, which can cause problems during backpropagation.

### Softmax Function

The softmax function is an activation function that is used in the output layer of a neural network for classification problems. It maps the output of the previous layer to a probability distribution over the classes.

```
e^(xi)
f(xi) = -------------
sum(e^(xj))
```

The softmax function ensures that the output of the network is a valid probability distribution, which makes it suitable for use in classification problems.