0%

Numpy手撸神经网络和Keras

Keras简单的神经网络

首先使用Keras来搭建一个简单的神经网络

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Package imports
# Matplotlib is a matlab like plotting library
import matplotlib
import matplotlib.pyplot as plt
# Numpy handles matrix operations
import numpy as np
# SciKitLearn is a useful machine learning utilities library
import sklearn
# The sklearn dataset module helps generating datasets
import sklearn.datasets
import sklearn.linear_model

# Display plots inline and change default figure size
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (16.0, 9.0)
1
2
3
4
5
# Generate a dataset and plot it
np.random.seed(0)
# 这是两个卫星的数据集, 但是现在有一些噪音, 需要将它们分开
X, y = sklearn.datasets.make_moons(200, noise=0.15)
y = y.reshape(200,1)
1
from keras.layers import Dense, Activation
1
from keras.models import Sequential
1
model = Sequential()
WARNING:tensorflow:From /home/didong/anaconda/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.
1
model.add(Dense(3,input_dim=2))
WARNING:tensorflow:From /home/didong/anaconda/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

WARNING:tensorflow:From /home/didong/anaconda/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.
1
model.add(Activation('tanh'))
1
2
model.add(Dense(1))
model.add(Activation('sigmoid'))
1
model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 3)                 9         
_________________________________________________________________
activation_1 (Activation)    (None, 3)                 0         
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 4         
_________________________________________________________________
activation_2 (Activation)    (None, 1)                 0         
=================================================================
Total params: 13
Trainable params: 13
Non-trainable params: 0
_________________________________________________________________
1
model.compile(optimizer='sgd',loss='binary_crossentropy',metrics=['acc'])
WARNING:tensorflow:From /home/didong/anaconda/anaconda3/lib/python3.7/site-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

WARNING:tensorflow:From /home/didong/anaconda/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3376: The name tf.log is deprecated. Please use tf.math.log instead.

WARNING:tensorflow:From /home/didong/anaconda/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
1
history = model.fit(X,y,epochs=50)
WARNING:tensorflow:From /home/didong/anaconda/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead.

Epoch 1/50
200/200 [==============================] - 1s 3ms/step - loss: 0.7140 - acc: 0.3950
Epoch 2/50
200/200 [==============================] - 0s 44us/step - loss: 0.7082 - acc: 0.4100
Epoch 3/50
200/200 [==============================] - 0s 43us/step - loss: 0.7028 - acc: 0.4350
Epoch 4/50
200/200 [==============================] - 0s 43us/step - loss: 0.6974 - acc: 0.4700
Epoch 5/50
200/200 [==============================] - 0s 42us/step - loss: 0.6921 - acc: 0.4950
Epoch 6/50
200/200 [==============================] - 0s 44us/step - loss: 0.6871 - acc: 0.5300
Epoch 7/50
200/200 [==============================] - 0s 42us/step - loss: 0.6820 - acc: 0.5650
Epoch 8/50
200/200 [==============================] - 0s 43us/step - loss: 0.6777 - acc: 0.5900
Epoch 9/50
200/200 [==============================] - 0s 44us/step - loss: 0.6729 - acc: 0.6400
Epoch 10/50
200/200 [==============================] - 0s 42us/step - loss: 0.6685 - acc: 0.6650
Epoch 11/50
200/200 [==============================] - 0s 42us/step - loss: 0.6640 - acc: 0.7050
Epoch 12/50
200/200 [==============================] - 0s 42us/step - loss: 0.6596 - acc: 0.7350
Epoch 13/50
200/200 [==============================] - 0s 42us/step - loss: 0.6554 - acc: 0.7550
Epoch 14/50
200/200 [==============================] - 0s 43us/step - loss: 0.6514 - acc: 0.7750
Epoch 15/50
200/200 [==============================] - 0s 42us/step - loss: 0.6473 - acc: 0.8500
Epoch 16/50
200/200 [==============================] - 0s 42us/step - loss: 0.6434 - acc: 0.8600
Epoch 17/50
200/200 [==============================] - 0s 43us/step - loss: 0.6397 - acc: 0.8450
Epoch 18/50
200/200 [==============================] - 0s 42us/step - loss: 0.6362 - acc: 0.8400
Epoch 19/50
200/200 [==============================] - 0s 41us/step - loss: 0.6323 - acc: 0.8250
Epoch 20/50
200/200 [==============================] - 0s 48us/step - loss: 0.6290 - acc: 0.8150
Epoch 21/50
200/200 [==============================] - 0s 42us/step - loss: 0.6254 - acc: 0.8150
Epoch 22/50
200/200 [==============================] - 0s 42us/step - loss: 0.6222 - acc: 0.8100
Epoch 23/50
200/200 [==============================] - 0s 42us/step - loss: 0.6187 - acc: 0.8100
Epoch 24/50
200/200 [==============================] - 0s 44us/step - loss: 0.6158 - acc: 0.8100
Epoch 25/50
200/200 [==============================] - 0s 43us/step - loss: 0.6126 - acc: 0.8100
Epoch 26/50
200/200 [==============================] - 0s 43us/step - loss: 0.6094 - acc: 0.8050
Epoch 27/50
200/200 [==============================] - 0s 43us/step - loss: 0.6064 - acc: 0.8050
Epoch 28/50
200/200 [==============================] - 0s 43us/step - loss: 0.6031 - acc: 0.8050
Epoch 29/50
200/200 [==============================] - 0s 43us/step - loss: 0.5997 - acc: 0.8050
Epoch 30/50
200/200 [==============================] - 0s 42us/step - loss: 0.5968 - acc: 0.8050
Epoch 31/50
200/200 [==============================] - 0s 43us/step - loss: 0.5939 - acc: 0.8000
Epoch 32/50
200/200 [==============================] - 0s 43us/step - loss: 0.5906 - acc: 0.7900
Epoch 33/50
200/200 [==============================] - 0s 42us/step - loss: 0.5876 - acc: 0.7850
Epoch 34/50
200/200 [==============================] - 0s 43us/step - loss: 0.5846 - acc: 0.7850
Epoch 35/50
200/200 [==============================] - 0s 43us/step - loss: 0.5818 - acc: 0.7850
Epoch 36/50
200/200 [==============================] - 0s 43us/step - loss: 0.5789 - acc: 0.7850
Epoch 37/50
200/200 [==============================] - 0s 43us/step - loss: 0.5760 - acc: 0.7850
Epoch 38/50
200/200 [==============================] - 0s 42us/step - loss: 0.5731 - acc: 0.7800
Epoch 39/50
200/200 [==============================] - 0s 43us/step - loss: 0.5706 - acc: 0.7800
Epoch 40/50
200/200 [==============================] - 0s 42us/step - loss: 0.5680 - acc: 0.7850
Epoch 41/50
200/200 [==============================] - 0s 42us/step - loss: 0.5653 - acc: 0.7850
Epoch 42/50
200/200 [==============================] - 0s 43us/step - loss: 0.5625 - acc: 0.7850
Epoch 43/50
200/200 [==============================] - 0s 43us/step - loss: 0.5597 - acc: 0.7850
Epoch 44/50
200/200 [==============================] - 0s 42us/step - loss: 0.5573 - acc: 0.7850
Epoch 45/50
200/200 [==============================] - 0s 43us/step - loss: 0.5548 - acc: 0.7850
Epoch 46/50
200/200 [==============================] - 0s 43us/step - loss: 0.5523 - acc: 0.7800
Epoch 47/50
200/200 [==============================] - 0s 42us/step - loss: 0.5497 - acc: 0.7750
Epoch 48/50
200/200 [==============================] - 0s 42us/step - loss: 0.5475 - acc: 0.7850
Epoch 49/50
200/200 [==============================] - 0s 42us/step - loss: 0.5450 - acc: 0.7850
Epoch 50/50
200/200 [==============================] - 0s 42us/step - loss: 0.5424 - acc: 0.7900

使用Numpy手撸的简单的神经网络

然后使用numpy对上面的神经网络做一个简单的重复

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Package imports
# Matplotlib is a matlab like plotting library
import matplotlib
import matplotlib.pyplot as plt

# Numpy handles matrix operations
import numpy as np

# SciKitLearn is a useful machine learning utilities library
import sklearn
# The sklearn dataset module helps generating datasets
import sklearn.datasets
import sklearn.linear_model

# Display plots inline and change default figure size
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
def sigmoid(x):
return 1/(1+np.exp(-x))

def bce_loss(y, y_hat):
# minval的目的是为了防止计算出来的值太接近零, 导致计算机将其作为0来计算
minval = 0.000000000001
N = y.shape[0]
l = -1/N * np.sum(y * np.log(y_hat.clip(min=minval)) + (1-y) * np.log((1-y_hat).clip(min=minval)))
return l

def bce_loss_derivative(y,y_hat):
# 至于为什么是这个值, 可以参考https://stats.stackexchange.com/questions/219241/gradient-for-logistic-loss-function
# 注意, 这里的y_hat是经过sigmoid函数得到的结果, 事实上就是链接当中的p, 而链接中的y_hat指的是线性计算后的结果
# 这里的求导是对sigmoid函数的输入求导, 即对线性的结果进行求导
return (y_hat-y)

# 前向传播函数
def forward_prop(model,a0):
# Load parameters from model
W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']

# Linear step
z1 = a0.dot(W1) + b1

# First activation function
a1 = np.tanh(z1)

# Second linear step
z2 = a1.dot(W2) + b2

# Second activation function
a2 = sigmoid(z2)

# 前向传播得到的中间值
cache = {'a0':a0,'z1':z1,'a1':a1,'z1':z1,'a2':a2}
return cache
1
2
3
def tanh_derivative(x):
# 原因参考https://socratic.org/questions/what-is-the-derivative-of-tanh-x
return (1 - np.power(x, 2))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# 反向传播函数
def backward_prop(model,cache,y):
# Load parameters from model
W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']

# Load forward propagation results
a0,a1, a2 = cache['a0'],cache['a1'],cache['a2']

# Backpropagation
# Calculate loss derivative with respect to output
# 这里本来就是对dz2进行求导
dz2 = bce_loss_derivative(y=y,y_hat=a2)

# Calculate loss derivative with respect to second layer weights
dW2 = (a1.T).dot(dz2)

# Calculate loss derivative with respect to second layer bias
# 考虑到db2并不是一列值, 而是一个单独的值, 所以求和叠加
db2 = np.sum(dz2, axis=0, keepdims=True)

# Calculate loss derivative with respect to first layer
dz1 = dz2.dot(W2.T) * tanh_derivative(a1)

# Calculate loss derivative with respect to first layer weights
dW1 = np.dot(a0.T, dz1)

# Calculate loss derivative with respect to first layer bias
db1 = np.sum(dz1, axis=0)

# Store gradients
grads = {'dW2':dW2,'db2':db2,'dW1':dW1,'db1':db1}
return grads

# 以上计算都可以通过算式反推得出
# a0 --W1--b1--> z1 --tanh--> a1 --W2--b2--> z2 --sigmoid--> a2
# 上面是正向传播的过程, 反推就可以得出反向传播的过程
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Helper function to plot a decision boundary.
# If you don't fully understand this function don't worry, it just generates the contour plot below.
def plot_decision_boundary(pred_func):
# Set min and max values and give it some padding
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
h = 0.01
# Generate a grid of points with distance h between them
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
# Predict the function value for the whole gid
Z = pred_func(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
# Plot the contour and training examples
plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral)
plt.scatter(X[:, 0], X[:, 1], c=y.flatten(), cmap=plt.cm.Spectral)
1
2
3
4
5
# Generate a dataset and plot it
np.random.seed(0)
X, y = sklearn.datasets.make_moons(200, noise=0.15)
y = y.reshape(200,1)
plt.scatter(X[:,0], X[:,1], s=40, c=y.flatten(), cmap=plt.cm.Spectral)
<matplotlib.collections.PathCollection at 0x7fb08a5e0240>

png

1
2
3
4
5
6
7
8
9
10
def predict(model, x):
# Do forward pass
c = forward_prop(model,x)
#get y_hat
y_hat = c['a2']

# Turn values to either 1 or 0
y_hat[y_hat > 0.5] = 1
y_hat[y_hat < 0.5] = 0
return y_hat
1
2
3
4
5
6
7
8
9
10
11
def calc_accuracy(model,x,y):
# Get total number of examples
m = y.shape[0]
# Do a prediction with the model
pred = predict(model,x)
# Ensure prediction and truth vector y have the same shape
pred = pred.reshape(y.shape)
# Calculate the number of wrong examples
error = np.sum(np.abs(pred-y))
# Calculate accuracy
return (m - error)/m * 100
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def initialize_parameters(nn_input_dim,nn_hdim,nn_output_dim):
# First layer weights
W1 = 2 *np.random.randn(nn_input_dim, nn_hdim) - 1

# First layer bias
b1 = np.zeros((1, nn_hdim))

# Second layer weights
W2 = 2 * np.random.randn(nn_hdim, nn_output_dim) - 1

# Second layer bias
b2 = np.zeros((1, nn_output_dim))

# Package and return model
model = { 'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}
return model
1
2
3
4
5
6
7
8
9
10
11
12
13
def update_parameters(model,grads,learning_rate):
# Load parameters
W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']

# Update parameters
W1 -= learning_rate * grads['dW1']
b1 -= learning_rate * grads['db1']
W2 -= learning_rate * grads['dW2']
b2 -= learning_rate * grads['db2']

# Store and return parameters
model = { 'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}
return model
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def train(model,X_,y_,learning_rate, num_passes=20000, print_loss=False):
# Gradient descent. For each batch...
for i in range(0, num_passes):

# Forward propagation
cache = forward_prop(model,X_)

# Backpropagation
grads = backward_prop(model,cache,y)
# Gradient descent parameter update
# Assign new parameters to the model
model = update_parameters(model=model,grads=grads,learning_rate=learning_rate)

# Pring loss & accuracy every 100 iterations
if print_loss and i % 100 == 0:
y_hat = cache['a2']
print('Loss after iteration',i,':',bce_loss(y,y_hat))
print('Accuracy after iteration',i,':',calc_accuracy(model,X_,y_),'%')

return model
1
2
3
4
# Hyper parameters
hiden_layer_size = 3
# I picked this value because it showed good results in my experiments
learning_rate = 0.01
1
2
3
4
5
# Initialize the parameters to random values. We need to learn these.
np.random.seed(0)
# This is what we return at the end
model = initialize_parameters(nn_input_dim=2, nn_hdim= hiden_layer_size, nn_output_dim= 1)
model = train(model,X,y,learning_rate=learning_rate,num_passes=1000,print_loss=True)
Loss after iteration 0 : 0.7590872634269914
Accuracy after iteration 0 : 86.5 %
Loss after iteration 100 : 0.2574839032266012
Accuracy after iteration 100 : 87.5 %
Loss after iteration 200 : 0.23296065120486092
Accuracy after iteration 200 : 91.0 %
Loss after iteration 300 : 0.06607469435615165
Accuracy after iteration 300 : 98.5 %
Loss after iteration 400 : 0.039048891767398106
Accuracy after iteration 400 : 99.0 %
Loss after iteration 500 : 0.03162355657934422
Accuracy after iteration 500 : 99.5 %
Loss after iteration 600 : 0.02808346934457852
Accuracy after iteration 600 : 99.5 %
Loss after iteration 700 : 0.02596724219386473
Accuracy after iteration 700 : 99.5 %
Loss after iteration 800 : 0.02453302540660454
Accuracy after iteration 800 : 99.5 %
Loss after iteration 900 : 0.023480001190425943
Accuracy after iteration 900 : 99.5 %
1
2
3
# Plot the decision boundary
plot_decision_boundary(lambda x: predict(model,x))
plt.title("Decision Boundary for hidden layer size 3")
Text(0.5, 1.0, 'Decision Boundary for hidden layer size 3')

png

1
2


Thank you for your reward !