面试中就case study问how you develop your model究竟在问什么

techie.student.1 · 2021 年12 月 2 日 08:03

想请问老师和朋友们，针对面试当中的ML workflow的考察点是什么，要hit哪些点才算达到面试官期望？尽管自己知道workflow of ML基本流程如下：
Data Exploration: 检查数据类型，查看数据stats，看distribution和相关性
Feature Engineering: 数据异常处理，missing value, encoding categorical variables(one hot encoding; ordinal encoding)
Standardization or normalization (if needed)
Feature Selection: 1. Completeness of features; 2. Between variables(correlation) 3. Between Independent Variables and the Dependent Variable (filter method; wrapper method; model embedded method)
Model Selection: baseline selection[是通过baseline来检查overfitting or underfitting吗], k-fold on the validation set, tuning the model selected on the validation set
Model Validation: ROC_AUC, Recision-recall, Recall, precision, f1 on the testing set
Model Serving:online training？ personalization？ how often to update the model

还想知道所列出来的workflow里是否有没cover到的面试官想要考察的点？

miao.wang · 2021 年12 月 2 日 08:39

你提到的这个workflow与我在Techie ML Project 1里面给大家画的图是一致。在具体工作中，除了你提到的这些步骤，在model validation之后我们一般还要做online A/B test，确认这个模型有positive impact之后，才会fully ramp up. 可以参考我在这篇case study文章中画的一个缩略图。

在实际面试中，面试官不仅会考察你提到的这些偏technical的内容，同等重要的（甚至有时候更重要的）是与具体use case有关的讨论，比如我们如何定义label, 如果构造features, 如何定义project goal, 如果模型loss function优化的目标与具体a/b test primary metric (revenue)有差别的话，如何处理。这些问题都是需要结合具体应用场景来分析的。当然，你这里提到的technical parts已经比较全面了。

techie.student.1 · 2021 年12 月 2 日 08:40

很有帮助谢谢老师的指导！

feature engineering 有参考资料推荐吗？比如不同domain下feature engineering的一些思考方向

miao.wang · 2021 年12 月 2 日 08:40

不同domain下的feature engineering算是product sense相关的topic, 这部分我觉得没有很好的书籍。推荐大家多看一下你要面试公司的tech blog，应该会最有帮助。

我们在Techie也在积极准备一个<数据科学案例分析实战课>，会给大家梳理不同domain的case study问题，课程筹备完毕后会再推荐给大家，到时候会在Techie会员公告栏里面给大家update。