feature normalization

feature normalization是指的 changing the shape of the distribution of the data吗?(我认为feature scaling是 changing the range of the data,和normalization不同)什么情况下需要做feature normalization?做feature engineering时候需要先看一下每个numerical feature的distribution?

请老师批评指正,谢谢!

feature normalization or standardization 并不会改变 data distribution。咱们在课上class 4给大家介绍过,feature normalization对于一部分model (e.g. 用gradient descent求解参数的linear model,比如linear regression, logistic regression) 是有用的,它可以加速模型参数求解过程。但是对于tree-base model, feature normalization几乎就没有任何增益作用了。

谢谢老师。所以feature normalization和feature scaling都是 changing the range of the data,是一回事,对吗?

是的,他们都只是改变mean, variance, 不会改变distribution。如果你一定要改变distribution的话,可以考虑做log 或者做 differentiation