## Wednesday, December 27, 2017

### Understanding Back propagation In Neural Networks

I wanted to ask a question, when newborn baby born does he able to think and start recognizing the things at day 1. The answer is no because baby has to undergo a training process at every second that let him or her know that this is your mother, father, brother and sisters. Once this training is completed, the connection between the neurons become so strong; easily he or she start recognizing his family members.

But what happens if someone try to show the earlier known faces with some resembling faces like sister of mother who is not mother but resembles like mother? The baby tries to relate the existing images with the older images of mother and figure out that this is not my mother but exactly looks like mother. The entire process of rethinking and making it correct thinking known as back propagation.

Neural Network Mathematics explained how does neural networks can be trained by using simple algorithms. Back propagation is the one of the good way to let your connections know that the current given weight and bias value is not good and we need to change it to get better results.

Let’s imagine a three layer neuron network as below shown in the image with “w” as weights and “b” as bias. These are random numbers or we can use Gaussian method also to populate these numbers.

In order to train a network we need to define error or loss function between its output and its desired output as “d” which network is supposed to return. Here we are defining the cost function as mini squared error. There are other methods also to calculate the error but the basic principle will remain be the same.

The objective of the loss function is to provide more accuracy with minimum loss at any given point of time. Once we know the loss, after that we start calculating the gradient of the error of the network with respect to the network modifiable weights. So in short back propagation is nothing but to adjust the weights and bias of the exiting network and provide the desired output which matches the test output.

## Sunday, December 24, 2017

### Brief About Block Chain - Technology Used By Crypto Currencies

Hyperledger is the umbrella open source project that The Linux Foundation has created and hosted since 2015. It aims at advancing and promoting cross-industry blockchain technologies to ensure accountability, transparency, and trust among business partners. As a result, Hyperledger makes business network and transactions more efficient.

These benefits are valued by leaders across many industries, including technology, finance, healthcare, supply chain, and automotive, among several others. Hyperledger offers different blockchain platforms like Iroha, Sawtooth and Fabric.

Looking back to the last half century of computer technologies and architectures, one may observe a trend of fluctuation between the centralization and subsequent decentralization of computing power, storage, infrastructure, protocols, and code.

Mainframe computers are largely centralized. They typically house all computing power, memory, data storage, and code. Access to mainframes is mainly by 'dumb terminals', which only take inputs and outputs, and do not store or process data.

With the advent of personal computers and private networks, similar computational capabilities were now housed both on the clients, as well as the servers. This, in part, gave rise to the 'client-server' architecture, which supported the development of relational database systems. Massive data sets, which are housed on mainframes, could move onto a distributed architecture. This data could replicate from server to server, and subsets of the data could be accessed and processed on clients, and then, synced back to the server.

Over time, Internet and cloud computing architectures enabled global access from a variety of computing devices; whereas mainframes were largely designed to address the needs of large corporations and governments. Even though this 'cloud architecture' is decentralized in terms of hardware, it has given rise to application-level centralization (e.g. Facebook, Twitter, Google, etc). Currently, we are witnessing the transition from centralized computing, storage, and processing to decentralized architectures and systems.

A distributed ledger is a type of data structure which resides across multiple computer devices, generally spread across locations or regions. Distributed Ledger Technology includes blockchain technologies and smart contracts. While distributed ledgers existed prior to Bitcoin, the Bitcoin blockchain marks the convergence of a host of technologies, including timestamping of transactions, Peer-to-Peer (P2P) networks, cryptography, and shared computational power, along with a new consensus algorithm.

In summary, distributed ledger technology generally consists of three basic components:
• A data model that captures the current state of the ledger
• A language of transactions that changes the ledger state
• A protocol used to build consensus among participants around which transactions will be accepted

According to hyperledger.com "A blockchain is a peer-to-peer distributed ledger forged by consensus, combined with a system for "smart contracts" and other assistive technologies."

Smart contracts are simply computer programs that execute predefined actions when certain conditions within the system are met. Consensus refers to a system of ensuring that parties agree to a certain state of the system as the true state.

Blockchain is a specific form or subset of distributed ledger technologies, which constructs a chronological chain of blocks, hence the name 'block-chain'. A block refers to a set of transactions that are bundled together and added to the chain at the same time. In the Bitcoin blockchain, the miner nodes bundle unconfirmed and valid transactions into a block. Each block contains a given number of transactions. In the Bitcoin network, miners must solve a cryptographic challenge to propose the next block. This process is known as 'proof of work', and requires significant computing power. We shall discuss proof of work in more detail in the Consensus Algorithms section.

Timestamping is another key feature of blockchain technology. Each block is timestamped, with each new block referring to the previous block. Combined with cryptographic hashes, this timestamped chain of blocks provides an immutable record of all transactions in the network, from the very first (or genesis) block. A block commonly consists of four pieces of metadata:
• The reference to the previous block
• The proof of work, also known as a nonce
• The timestamp
• The Merkle tree root for the transactions included in this block.

"Merkle trees are used to summarize all the transactions in a block, producing an overall digital fingerprint of the entire set of transactions, providing a very efficient process to verify whether a transaction is included in a block."

Transactions
The record of an event, cryptographically secured with a digital signature, that is verified, ordered, and bundled together into blocks, form the transactions in the blockchain. In the Bitcoin blockchain, transactions involve the transfer of bitcoins, while in other blockchains, transactions may involve the transfer of any asset or a record of some service being rendered. Furthermore, a smart contract within the blockchain may allow automatic execution of transactions upon meeting predefined criteria.

Cryptography
Cryptography has a key role to play both in the security, as well as in the immutability of the transactions recorded on blockchains. Cryptography is the study of the techniques used to allow secure communication between different parties and to ensure the authenticity and immutability of the data being communicated. For blockchain technologies, cryptography is used to prove that a transaction was created by the right person. It is also used to link transactions into a block in a tamper-proof way, as well as create the links between blocks, to form a blockchain.

## Wednesday, December 13, 2017

### Neural Network Mathematics

Perceptron is nothing but it’s a type of artificial neuron. Artificial neuron is nothing but it mimics our brain as explained in my previous post.

Now we need to understand what’s the magic behind the neural network to predict any kind of output basis on the input. To get it’s better understanding, we should be knowing basic the sigmoid function and gradient decent algorithm.

I have explained the high level architecture of predicting the hand written numbers. But this time, I will be more focusing on the various types of process which comes from input to predict the output.

Below is the pictorial view of the stages of the neural network and will try to add the code in the upcoming post.

Stage 1: Define the inputs, weights, bias and label the outputs.

Stage 2: Summation of all inputs and add the bias

Stage 3: Calculate the forward pass

Stage 4: Calculate the backward pass and updates the weights

Stage 5: Repeat the process till we get the desired output.

## Monday, December 11, 2017

### Rectified Linear Unit Activation Function In Deep Learning

Sigmoid Activation function provides S shape curve graph and the values are always between 0 and 1. Sigmoid function converts the large negative values to 0 and high positive values to 1.

But with Sigmoid Activation function, we have major drawback is that it’s value is 0 near 0 and 1 which means during the back propagation method the weights values will never ever change and we will not get any output. Secondly, Sigmoid Activation Function faces the gradient vanishing problem at 0 and 1.

Below is the image of Sigmoid Curve:

To get rid from the above issues, we can use Rectified Linear Unit Activation Function which is also known as RELU. RELU function has range between 0 and Infinity. Hence Sigmoid Activation function can be used to predict the values between 0 and 1 whereas can be used to model real positive number. The best of RELU is that whenever we increase the input values of X, the gradient also changes.

RELU can be defined with the simple below mentioned Mathematical notation:
RELU(x) = MAX (0, x)
The functions says if the input value is 0 or less than 0, RELU will return 0 else it will return the input value of x.

Below is the image of Sigmoid Curve:

## Saturday, December 9, 2017

### What is Data Pre-Processing In Machine Learning?

Data Pre-Processing means adding some type of mathematical operation without losing the content of the data. Let’s take example, we want to do the dimensionality reduction so that we can visualize the data more efficiently in 2D graph. It means we need to have some kind of pre-processing of on same data so that we can drop some data without losing it’s actual meaning. Let’s take another example from the previous post and get deeper understanding of data pre-processing. Below is the matrix or dataset which has price of pizza in INR which is fully dependent on it’s size. In India, the price is in INR, but for United States it will be Dollars and for Dubai it must be in Dirham. For size in India it is in inches, but in United States it might be in Centimeters. But if we are developing some kind of relation in that case we can pre-process the data and fit everything between 0 to 1. By doing this, our dependency on INR, Dollars and Inches is completely gone off. In this case, we can take the maximum and minimum value of every column and apply the below formula on the existing data. By doing this we get the new form of data whose values will be lying between 0 and 1 without losing its actual importance.
New Pre-Process data will look alike below

The advantage of pre-processing of data is that now we can fit anything between 0 and 1 which means unit square. Below is the comparison of before pre-processing and after pre-processing of that, now we can have good visibility and everything is well fitted in unit square.

## Wednesday, December 6, 2017

### Architecture Of Predicting Hand Written Digits

Logistic regression and Softmax Activation functions are the most important functions which really help us to predict the handwritten digits from the MNIST dataset. I am using Keras to predict the numbers but for this post, I have created a high level diagram which helps everyone to understand which function is required at which layer to predict the MNIST hand written digit numbers.

## Monday, December 4, 2017

### When To Use Softmax Activation Algorithm In Deep Learning

In my previous post, we have discussed how to use simple linear regression method if we have target class which is directly dependent on input class. In case of finding single value out of two, which is normally called binary distribution in that case we have to use the logistic regression method.

In today’s post, I am discussing more on the softmax activation function which is used when we have to predict the event over n number of different events. Softmax function calculates the probability of each target class over the possible target classes for the given inputs.

Let’s understand it from example that we have list of hand written digits and we want to predict out of 10 (0 - 9) digits which digit has high probability to appear.In this case, we have 10 target classes and out of 10 target classes we have to show a single class which is nothing but a single digit. In this case, we will use the softmax activation function.

I have use the below code to create softmax function graph for the numbers starting from 0 to 10 and found higher number has high probability to come. So it means we can use this activation function in deep learning or in neural network while predicting the target class out of multiple target classes.

Use the below code to see the basic functionality of softmax activation function
import numpy as np
import matplotlib.pyplot as plt

#Softmax Equation

def line_graph(x, y, x_title, y_title):
plt.plot(x, y)
plt.xlabel(x_title)
plt.ylabel(y_title)
plt.show()

x = range(0, 10)
y = softmax(x)
line_graph(x, y, "Inputs", "Softmax Probability")

## Sunday, December 3, 2017

### Predict Probability By Using Logistic Regression In Machine Learning

Logistic regression is used to predict the outcome variable which is categorical. A categorical variable is a variable that can take only specific and limited values like gender male or female, yes or not etc.

We have example of students who has studied for specific hours and basis on that they are marked as pass or fail.

Below is the dataset used for the example:
In the previous post, we have seen how to use linear regression method to solve the problem. Let’s use the same linear regression method for the above dataset and plot it.

As per the graph, we can’t see any relation between the pass and fail with the number of hours studied. But let’s try to plot by using our equation of line as used in the previous post.

As per the above output, the linear regression is predicting all the values starting between 0 and more than 1. But we need our answer either in 0 or 1. The predictions given by linear regression algorithm is not matching what we are looking for. So it means we need a better regression line than this which can help us provide the output either on 0 or 1. Not less than 0 and not more than 1.

So logistic regression seems to be the right choice for this example. Most often we want to predict the outcomes in yes or no. In that case we can apply the logistic regression algorithm and get the desired outcome. Logistic regression outcomes always falls between 0 to 1 and it predicts the outcomes in terms of probability also. The more the probability is the more accurate the outcome would be. This can be achieved by using Logistic Function.

Logistic Function is given by
Where L is the maximum Curve’s value, K is the steepness of the curve and x0 is x value of sigmoid’s midpoint.

A standard logistic function is called sigmoid function and let’s substitute the below values in the logistic functions and see what’s the result would be.
K = 1, x0 = 0, L = 1 If we substitute all the above values in the logistic function, we get the below function which is nothing but a sigmoid.

Let’s draw the sigmoid curve and see how it looks alike. The sigmoid is not only using to classify 0 or 1 but along with this it is also telling the probability of certain event whether it is going to occur or not.

Now let’s solve the above example with Logistic Regression and see how the curve looks like.

Now I am trying to predict if student I studying for 8.1 hours will he be fail or pass and what would be its probabilities.

I am getting the answer that this student will be having passing probability of 80% and fail probability of 20%.

## Friday, December 1, 2017

### Predict Prices By Using Linear Regression Algorithm In Machine Learning

Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous variables where the output of variable is directly proportional to input. The variable which we will be predicting is called criterion variable which is along the Y axis and the variable which we are basing our predictions is known as predictor which is along the X axis. When there is single predictor variable that method is called simple linear regression method and when we have more than 1 predictor that model is called multiple linear regression method.

Statistical relation has lot of examples like height and weight, speed and petrol consumption, router bandwidth and router cpu relation, pizza size and price relation etc.

In the previous post, we have used the KNN neighbor algorithm to predict the values. In this post, we will be using the simple linear regression method to predict the pizza prices.

Before moving to the machine learning code, we need to first understand what equation of line is. The equation of line represents the relationship between the X and Y axis. Formula of finding equation of line is y = mx + c

C is Y intercept where the line meets the Y axis.

Slope of the line is nothing but the difference between the (Y2 – Y1)/(X2-X1)

With the above equation, if we get the values of m and c which remains constant and for every value of x we can get the value of y. Now let’s take example of pizza size and its price. First plot the variables with the existing dataset and after that we will use the simple linear regression to predict the pizza prices by giving pizza size.

Below is the graph of pizza size vs pizza price. This example is the right fit for simple linear regression method because as the size increase the price is also increasing. So we can say that there is direct relationship between pizza size and price.

Now we have to find the values of m and c, so that we can find predict any new price of any size of pizza. To get this done, we will be using the Scikit Learn library and import the linear regression model and find the slope and y intercept value.

Use the below python code in Jupyter Notebook and predict the price value of pizza sizes from 100 to 110.

import matplotlib.pyplot as plt
from sklearn import linear_model
pizza_size = [[4.0],[4.5],[5.0],[5.6],[5.8],[6.0],[7.2],[7.4],[7.8],]
pizza_price = [42,45,50,55,58,62,65,70,72,75]
print("Pizza Size and Pizza Price")
for row in zip(pizza_size,pizza_price):
print(row,'->',row)

#Instantiating Linear Regression Model
reg = linear_model.LinearRegression()
reg.fit(pizza_size,pizza_price)

#storing the slope
m = reg.coef_
#storing y intercept
b = reg.intercept_

print("Slope Of the Line Is:", m)
print("Y intercept is:", b)

# This is used to plot the existing relationship on graph
plt.scatter(pizza_size,pizza_price,color='red')
#Equation of straight Line is y = m*x + b
#Now we know the m and b values, so we can predict the straight line points
predicted_values = [m * x + b for x in pizza_size]

#Plot the straight line
plt.plot(pizza_size,predicted_values,'r--')
plt.xlabel("pizza_size")
plt.ylabel("pizza_price")
plt.show()

#Predict the Pizza Size Prices from 100 to 110
for i in range(100,110):
print("The price of pizza will be:",reg.predict(i))