Setting up your Machine Learning Application
Bias/Variance trade-off
Bias对应训练集误差,Variance对应验证集误差,举例如下
high variance | high bias | high bias&high variance | low bias&low variance | |
---|---|---|---|---|
Train set error | 1% | 15% | 15% | 0.5% |
Dev set error | 11% | 16% | 30% | 1% |
Regularzation
L2 Regularization
L2正则化的方法是在成本函数$J$上加一个正则化参数,对这个整体进行优化.也称为”weight decay”,相比原来梯度下降的速度更快
J(w,b)=1mm∑i=1L(ˆy(i),y(i))+λ2m||w||22式中 ||w||22=∑nxj=1w2j=∑n[l]i=1∑n[l−1]j=1(w[l]ij)2=WTW
L1正则化的正则化参数是 λ2m∑nxj=1|wj|
Dropout Regularization
dropout在计算机视觉领域应用较多,防止过拟合.dropout的一大缺点是成本函数J没有明确的定义,所以在调试的时候通常把dropout函数关闭
“Inverted dropout” dropout with layer l=3 keep_prob = 0.8
import numpy as np
d3 = np.rando.rand(a3.shape[0],a3.shape[1])<keep_prob
a3 = np.multiply(a3,d3)
a3 /= keep_prob
Early stopping
在验证集误差开始增大的时候就停止训练,尽管训练误差仍然在减小
正则化输入
通过数据预处理把输入值在各个维度都处理为均值为0,方差为1的标准化输入,可以让成本函数$J$更好优化.
均值归0:
μ=1mm∑i=1x(i)
x:=x−μ
方差缩为1:
σ2=1mm∑i=1x(i)∗∗2
x/=σ2
梯度消失和梯度爆炸
W[l]>I在深度神经网络中将引起梯度爆炸,W[l]<I将引起梯度消失
深度神经网络中的权重初始化
Z=W1X1+W2X2+...+WnXn为了防止Z过大可以设置Wi=1n
W^[l] = np.random.randn(shape)*np.sqrt(2/n^[l-1])#He 初始化
梯度的估算与检查
梯度估算中双侧峰值比单侧峰值估算更准确
梯度检查
Instructions:
- First compute “gradapprox” using the formula above (1) and a small value of $\varepsilon$. Here are the Steps to follow:
-
θ+=θ+ε
-
θ−=θ−ε
-
J+=J(θ+)
-
J−=J(θ−)
-
gradapprox=J+−J−2ε
-
θ+=θ+ε
- Then compute the gradient using backward propagation, and store the result in a variable “grad”
- Finally, compute the relative difference between “gradapprox” and the “grad” using the following formula:
difference=∣∣grad−gradapprox∣∣2∣∣grad∣∣2+∣∣gradapprox∣∣2
You will need 3 Steps to compute this formula:
- 1’. compute the numerator using np.linalg.norm(…)
- 2’. compute the denominator. You will need to call np.linalg.norm(…) twice.
- 3’. divide them.
- If this difference is small (say less than $10^{-7}$), you can be quite confident that you have computed your gradient correctly. Otherwise, there may be a mistake in the gradient computation.
-
Previous
Neural Networks and Deep Learning 第四周笔记 -
Next
Hyperparameter tuning, Regularization and Optimization 第二周笔记