# A Visual And Interactive Look at Basic Neural Network Math

In the previous post, we looked at the basic concepts of neural networks. Let us now take another example as an excuse to guide us to explore some of the basic mathematical ideas involved in prediction with neural networks.

This will be a neural network model building on what we discussed in the previous post, but will have a higher prediction accuracy because it utilizes hidden layers and activation functions.

The dataset we’ll use this time will be the Titanic passenger list from Kaggle. It lists the names and other information of the passengers and shows whether each passenger survived the sinking event or not.

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th… female 38.0 1 0 PC 17599 71.2833 C85 C
3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S

we won’t bother with most of the columns for now. we’ll just use the sex and age columns as our features, and survival as our label that we’ll try to predict.

Age Sex Survived?
22 0 0
38 1 1
26 1 1
… 891 rows total

we’ll attempt to build a network that predicts whether a passenger survived or not.

neural networks need their inputs to be numeric. so we had to change the sex column – male is now 0, female is 1. you’ll notice the dataset already uses something similar for the survival column – survived is 1, did not survive is 0. Calculating a prediction is done by plugging in a value for "age" and "sex". The calculation then flows from the left to the right. Before we can use this net for prediction, however, we'll have to run a "training" process that will give us the values for the weights (w) and bias (b).
Note: we have slightly adjusted the way we represent the networks from the previous post. The bias node specifically is more commonly represented like this

let’s recap the elements that make up this network and how they work: an input neuron is where we plug in an input value (e.g. the age of a person). it’s where the calculation starts. the outgoing connection and the rest of the graph tell us what other calculations we need to do to calculate a prediction. if a connection has a weight, then the value is multiplied by that weight as it passes through it.

connection_output = weight * connection_input if a neuron has inputs, it sums their value and sends that sum along its outgoing connection(s).

node_output = input_1 + input_2


### Sigmoid # to turn the network’s calculation into a probability value between 0 and 1, we have to pass the value from the output layer through a “sigmoid” formula. sigmoid squashes the output value of a neuron to between 0 and 1 according to a specific curve.

f(x)=

def sigmoid(x):
return 1/(1 + math.exp(-x))

output = sigmoid(value)


### Sigmoid Visualization #

 0 0

f(0) = $\frac{1}{1 + e^{-x}}$ = to bring it all together, calculating a prediction with this shallow network looks like this:

def sigmoid(x):
return 1/(1 + math.exp(-x))

def calculate_prediction(age, sex, weight_1, weight_2, bias):

# Multiply the inputs by their weights, sum the results up
layer_2_node = age * weight_1 + sex * weight_2 + 1 * bias

prediction = sigmoid(layer_2_node)
return prediction


weight_1 =   -0.016852 # Associated with "Age"
weight_2 =   0.704039  # Associated with "Sex" (where male is 0, female is 1)
bias =       -0.116309


intuitively, the weights indicate how much their associated property contribute to the prediction – odds of survival improve the younger a person is (since a larger age multiplied by the negative weight value gives a bigger negative number). they improve more if the person is female.

### Prediction Calculation #

the trained network now looks like this: (hover or click on values in the table to see their individual predictions) it’s often useful to apply certain math functions to the weighted outputs. these are called “activation functions” because historically they translated the output of the neuron into either 1 (on/active) or 0 (off).

def activation_function(x):
# Do something to the value
...

weighted_sum = weight * (input_1 + input_2)
output = activation_function(weighted_sum)


### ReLU # a leading choice for activation function is called relu. it returns 0 if its input is negative, returns the number itself otherwise. very simple!

f(x) = max(0, x)

# Naive scalar relu implementation. In the real world, most calculations are done on vectors
def relu(x):
if x < 0:
return 0
else:
return x

output = relu(value)


### ReLU Visualization #

 0 interact a little with relu to see how it transforms various values

 0

f(0) = max( 0, x) =

## Closing

Written on March 23, 2018