Wednesday, December 13, 2017

Neural Network Mathematics

Perceptron is nothing but it’s a type of artificial neuron. Artificial neuron is nothing but it mimics our brain as explained in my previous post.

Now we need to understand what’s the magic behind the neural network to predict any kind of output basis on the input. To get it’s better understanding, we should be knowing basic the sigmoid function and gradient decent algorithm.

I have explained the high level architecture of predicting the hand written numbers. But this time, I will be more focusing on the various types of process which comes from input to predict the output.

Below is the pictorial view of the stages of the neural network and will try to add the code in the upcoming post.

Stage 1: Define the inputs, weights, bias and label the outputs.

Stage 2: Summation of all inputs and add the bias

Stage 3: Calculate the forward pass

Stage 4: Calculate the backward pass and updates the weights

Stage 5: Repeat the process till we get the desired output.

Click Here To Read Rest Of The Post...

Monday, December 11, 2017

Rectified Linear Unit Activation Function In Deep Learning

Sigmoid Activation function provides S shape curve graph and the values are always between 0 and 1. Sigmoid function converts the large negative values to 0 and high positive values to 1.

But with Sigmoid Activation function, we have major drawback is that it’s value is 0 near 0 and 1 which means during the back propagation method the weights values will never ever change and we will not get any output. Secondly, Sigmoid Activation Function faces the gradient vanishing problem at 0 and 1.

Below is the image of Sigmoid Curve:

To get rid from the above issues, we can use Rectified Linear Unit Activation Function which is also known as RELU. RELU function has range between 0 and Infinity. Hence Sigmoid Activation function can be used to predict the values between 0 and 1 whereas can be used to model real positive number. The best of RELU is that whenever we increase the input values of X, the gradient also changes.

RELU can be defined with the simple below mentioned Mathematical notation:
RELU(x) = MAX (0, x)
The functions says if the input value is 0 or less than 0, RELU will return 0 else it will return the input value of x.

Below is the image of Sigmoid Curve:

Click Here To Read Rest Of The Post...

Saturday, December 9, 2017

What is Data Pre-Processing In Machine Learning?

Data Pre-Processing means adding some type of mathematical operation without losing the content of the data. Let’s take example, we want to do the dimensionality reduction so that we can visualize the data more efficiently in 2D graph. It means we need to have some kind of pre-processing of on same data so that we can drop some data without losing it’s actual meaning. Let’s take another example from the previous post and get deeper understanding of data pre-processing. Below is the matrix or dataset which has price of pizza in INR which is fully dependent on it’s size. In India, the price is in INR, but for United States it will be Dollars and for Dubai it must be in Dirham. For size in India it is in inches, but in United States it might be in Centimeters. But if we are developing some kind of relation in that case we can pre-process the data and fit everything between 0 to 1. By doing this, our dependency on INR, Dollars and Inches is completely gone off.
In this case, we can take the maximum and minimum value of every column and apply the below formula on the existing data. By doing this we get the new form of data whose values will be lying between 0 and 1 without losing its actual importance.

New Pre-Process data will look alike below

The advantage of pre-processing of data is that now we can fit anything between 0 and 1 which means unit square. Below is the comparison of before pre-processing and after pre-processing of that, now we can have good visibility and everything is well fitted in unit square.

Click Here To Read Rest Of The Post...

Wednesday, December 6, 2017

Architecture Of Predicting Hand Written Digits

Logistic regression and Softmax Activation functions are the most important functions which really help us to predict the handwritten digits from the MNIST dataset. I am using Keras to predict the numbers but for this post, I have created a high level diagram which helps everyone to understand which function is required at which layer to predict the MNIST hand written digit numbers.

Click Here To Read Rest Of The Post...

Monday, December 4, 2017

When To Use Softmax Activation Algorithm In Deep Learning

In my previous post, we have discussed how to use simple linear regression method if we have target class which is directly dependent on input class. In case of finding single value out of two, which is normally called binary distribution in that case we have to use the logistic regression method.

In today’s post, I am discussing more on the softmax activation function which is used when we have to predict the event over n number of different events. Softmax function calculates the probability of each target class over the possible target classes for the given inputs.

Let’s understand it from example that we have list of hand written digits and we want to predict out of 10 (0 - 9) digits which digit has high probability to appear.In this case, we have 10 target classes and out of 10 target classes we have to show a single class which is nothing but a single digit. In this case, we will use the softmax activation function.

I have use the below code to create softmax function graph for the numbers starting from 0 to 10 and found higher number has high probability to come. So it means we can use this activation function in deep learning or in neural network while predicting the target class out of multiple target classes.

Use the below code to see the basic functionality of softmax activation function
import numpy as np
import matplotlib.pyplot as plt

def softmax(add_inputs):
  #Softmax Equation
  return np.exp(add_inputs) / float(sum(np.exp(add_inputs)))

def line_graph(x, y, x_title, y_title):
  plt.plot(x, y)

x = range(0, 10)
y = softmax(x)
line_graph(x, y, "Inputs", "Softmax Probability")

Click Here To Read Rest Of The Post...

Sunday, December 3, 2017

Predict Probability By Using Logistic Regression In Machine Learning

Logistic regression is used to predict the outcome variable which is categorical. A categorical variable is a variable that can take only specific and limited values like gender male or female, yes or not etc.

We have example of students who has studied for specific hours and basis on that they are marked as pass or fail.

Below is the dataset used for the example:
In the previous post, we have seen how to use linear regression method to solve the problem. Let’s use the same linear regression method for the above dataset and plot it.

As per the graph, we can’t see any relation between the pass and fail with the number of hours studied. But let’s try to plot by using our equation of line as used in the previous post.

As per the above output, the linear regression is predicting all the values starting between 0 and more than 1. But we need our answer either in 0 or 1. The predictions given by linear regression algorithm is not matching what we are looking for. So it means we need a better regression line than this which can help us provide the output either on 0 or 1. Not less than 0 and not more than 1.

So logistic regression seems to be the right choice for this example. Most often we want to predict the outcomes in yes or no. In that case we can apply the logistic regression algorithm and get the desired outcome. Logistic regression outcomes always falls between 0 to 1 and it predicts the outcomes in terms of probability also. The more the probability is the more accurate the outcome would be. This can be achieved by using Logistic Function.

Logistic Function is given by
Where L is the maximum Curve’s value, K is the steepness of the curve and x0 is x value of sigmoid’s midpoint.

A standard logistic function is called sigmoid function and let’s substitute the below values in the logistic functions and see what’s the result would be.
K = 1, x0 = 0, L = 1 If we substitute all the above values in the logistic function, we get the below function which is nothing but a sigmoid.

Let’s draw the sigmoid curve and see how it looks alike. The sigmoid is not only using to classify 0 or 1 but along with this it is also telling the probability of certain event whether it is going to occur or not.

Now let’s solve the above example with Logistic Regression and see how the curve looks like.

Now I am trying to predict if student I studying for 8.1 hours will he be fail or pass and what would be its probabilities.

I am getting the answer that this student will be having passing probability of 80% and fail probability of 20%.

Click Here To Read Rest Of The Post...

Friday, December 1, 2017

Predict Prices By Using Linear Regression Algorithm In Machine Learning

Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous variables where the output of variable is directly proportional to input. The variable which we will be predicting is called criterion variable which is along the Y axis and the variable which we are basing our predictions is known as predictor which is along the X axis. When there is single predictor variable that method is called simple linear regression method and when we have more than 1 predictor that model is called multiple linear regression method.

Statistical relation has lot of examples like height and weight, speed and petrol consumption, router bandwidth and router cpu relation, pizza size and price relation etc.

In the previous post, we have used the KNN neighbor algorithm to predict the values. In this post, we will be using the simple linear regression method to predict the pizza prices.

Before moving to the machine learning code, we need to first understand what equation of line is. The equation of line represents the relationship between the X and Y axis. Formula of finding equation of line is y = mx + c

C is Y intercept where the line meets the Y axis.

Slope of the line is nothing but the difference between the (Y2 – Y1)/(X2-X1)

With the above equation, if we get the values of m and c which remains constant and for every value of x we can get the value of y. Now let’s take example of pizza size and its price. First plot the variables with the existing dataset and after that we will use the simple linear regression to predict the pizza prices by giving pizza size.

Below is the graph of pizza size vs pizza price. This example is the right fit for simple linear regression method because as the size increase the price is also increasing. So we can say that there is direct relationship between pizza size and price.

Now we have to find the values of m and c, so that we can find predict any new price of any size of pizza. To get this done, we will be using the Scikit Learn library and import the linear regression model and find the slope and y intercept value.

Use the below python code in Jupyter Notebook and predict the price value of pizza sizes from 100 to 110.

import matplotlib.pyplot as plt
from sklearn import linear_model
pizza_size = [[4.0],[4.5],[5.0],[5.6],[5.8],[6.0],[7.2],[7.4],[7.8],[9]]
pizza_price = [42,45,50,55,58,62,65,70,72,75]
print("Pizza Size and Pizza Price")
for row in zip(pizza_size,pizza_price):

#Instantiating Linear Regression Model
reg = linear_model.LinearRegression(),pizza_price)

#storing the slope
m = reg.coef_
#storing y intercept
b = reg.intercept_

print("Slope Of the Line Is:", m)
print("Y intercept is:", b)

# This is used to plot the existing relationship on graph
#Equation of straight Line is y = m*x + b
#Now we know the m and b values, so we can predict the straight line points
predicted_values = [m * x + b for x in pizza_size]

#Plot the straight line

#Predict the Pizza Size Prices from 100 to 110
for i in range(100,110):
print("The price of pizza will be:",reg.predict(i))

Click Here To Read Rest Of The Post...