技术控

    今日:25| 主题:49195
收藏本版 (1)
最新软件应用技术尽在掌握

[其他] Model evaluation, model selection, and algorithm selection in machine learning

[复制链接]
数学不好万岁。 发表于 2016-10-3 12:59:32
249 5

立即注册CoLaBug.com会员,免费获得投稿人的专业资料,享用更多功能,玩转个人品牌!

您需要 登录 才可以下载或查看,没有帐号?立即注册

x
Introduction

   Almost every machine learning algorithm comes with a large number of settings that we, the machine learning researchers and practitioners, need to specify. These tuning knobs, the so-called hyperparameters, help us control the behavior of machine learning algorithms when optimizing for performance, finding the right balance between bias and variance. Hyperparameter tuning for performance optimization is an art in itself, and there are no hard-and-fast rules that guarantee best performance on a given dataset. InPart I andPart II, we saw different holdout and bootstrap techniques for estimating the generalization performance of a model. We learned about the bias-variance trade-off, and we computed the uncertainty of our estimates. In this third part, we will focus on different methods of cross-validation for model evaluation and model selection. We will use these cross-validation techniques to rank models from several hyperparameter configurations and estimate how well they generalize to independent datasets.
  About Hyperparameters and Model Selection

   Previously, we used the holdout method or different flavors of bootstrapping to estimate the generalization performance of our predictive models. We split our dataset into two parts: a training and a test dataset. After the machine learning algorithm fit a model to the training set, we evaluated it on the independent test set that we withheld from the machine learning algorithm during model fitting. While we were discussing challenges such as the bias-variance trade-off , we used fixed hyperparameter settings in our learning algorithms, such as the number of k in the K-nearest neighbors algorithm. We defined hyperparameters as the parameters of the learning algorithm itself, which we have to specify a priori — before model fitting. In contrast, we refered to the parameters of our resulting model as the model parameters .
   So, what are hyperparameters, exactly? Considering the k-nearest neighbors algorithm, one example of a hyperparameter is the integer value of k . If we set k=3 , the k-nearest neighbors algorithm will predict a class label based on a majority vote among the 3-nearest neighbors in the training set. The distance metric for finding these nearest neighbors is yet another hyperparameter of the algorithm.
   
Model evaluation, model selection, and algorithm selection in machine learning-1 (techniques,different,guarantee,learning,settings)

   Now, the k-nearest neighbors algorithm may not be an ideal choice for illustrating the difference between hyperparameters and model parameters, since it is a lazy learner and a nonparametric method. In this context, lazy learning (or instance-based learning ) means that there is no training or model fitting stage: A k-nearest neighbors model literally stores or memorizes the training data and uses it only at prediction time. Thus, each training instance represents a parameter in a k-nearest neighbors model. In short, nonparametric models are models that cannot be described by a fixed number of parameters that are being adjusted to the training set. The structure of parametric models is not decided by the training data rather than being set a priori ; nonparamtric models do not assume that the data follows certain probability distributions unlike parametric methods (exceptions of nonparametric methods that make such assumptions are Bayesian nonparametric methods). Hence, we may say that nonparametric methods make fewer assumptions about the data than parametric methods.
   In contrast to k-nearest neighbors, a simple example of a parametric method would be logistic regression , a generalized linear model with a fixed number of model parameters: a weight coefficient for each feature variable in the dataset plus a bias (or intercept) unit. These weight coefficients in logistic regression, the model parameters, are updated by maximizing a log-likelihood function or minimizing the logistic cost. For fitting a model to the training data, a hyperparameter of a logistic regression algorithm could be the number of iterations or passes over the training set (epochs) in a gradient-based optimization. Another example of a hyperparameter would be the value of a regularization parameter such as the lambda -term in L2-regularized logistic regression:
Model evaluation, model selection, and algorithm selection in machine learning-2 (techniques,different,guarantee,learning,settings)
   Changing the hyperparameter values when running a learning algorithm over a training set may result in different models. The process of finding the best-performing model from a set of models that were produced by different hyperparameter settings is called model selection . In the next section, we will look at an extension to the holdout method that helps us with this selection process.
  The Three-Way Holdout Method for Hyperparameter Tuning

   InPart I, we learned that resubstituion validation is a bad approach for estimating of the generalization performance. Since we want to know how well our model generalizes to new data, we used the holdout method to split the dataset into two parts, a training set and an independent test set. Can we use the holdout method for hyperparameter tuning? The answer is “yes!” However, we have to make a slight modification to our initial approach, the “two-way” split, and split the dataset into three parts: a training, a validation, and a test set.
   We can regard the process of hyperparameter tuning and model selection as a meta-optimization task. While the learning algorithm optimizes an objective function on the training set (with exception to lazy learners), hyperparameter optimization is yet another task on top of it; here, we typically want to optimize a performance metric such as classification accuracy or the area under a Receiver Operating Characteristic curve. After the tuning stage, selecting a model based on the test set performance seems to be a reasonable approach. However, reusing the test set multiple times would introduce a bias and our final performance estimate and likely result in overly optimistic estimates of the generalization performance — we can say that “the test set leaks information.” To avoid this problem, we could use a three-way split, dividing the dataset into a training, validation, and test dataset. Having a training-validation pair for hyperparameter tuning and model selections allows us to keep the test set “independent” for model evaluation. Now, remember our discussion of the “3 goals” of performance estimation?
  
       
  • We want to estimate the generalization accuracy, the predictive performance of our model on future (unseen) data.   
  • We want to increase the predictive performance by tweaking the learning algorithm and selecting the best-performing model from a given hypothesis space.   
  • We want to identify the machine learning algorithm that is best-suited for the problem at hand; thus, we want to compare different algorithms, selecting the best-performing one as well as the best-performing model from the algorithm’s hypothesis space.  
   The “three-way holdout method” is one way to tackle points 1 and 2 (more on point 3 in the next article, Part IV). Though, if we are only interested in point 2, selecting the best model, and do not care so much about an “unbiased” estimate of the generalization performance, we could stick to the two-way split for model selection. Thinking back of our discussion about learning curves and pessimistic biases inPart II, we noted that a machine learning algorithm often benefits from more labeled data; the smaller the dataset, the higher the pessimistic bias and the variance — the sensitivity of our model towards the way we partition the data.
   “There ain’t no such thing as a free lunch.” The three-way holdout method for hyperparameter tuning and model selection is not the only — and certainly often not the best — way to approach this task. In later sections, we will learn about alternative methods and discuss their advantages and trade-offs. However, before we move on to the probably most popular method for model selection, k-fold cross-validation (or sometimes also called “rotation estimation” in older literature), let us have a look at an illustration of the 3-way split holdout method:
友荐云推荐




上一篇:Refactoring CD Pipelines – Part 2: Metadata-Driven Pipeline
下一篇:Web Developer Reading List: JavaScript Functions
酷辣虫提示酷辣虫禁止发表任何与中华人民共和国法律有抵触的内容!所有内容由用户发布,并不代表酷辣虫的观点,酷辣虫无法对用户发布内容真实性提供任何的保证,请自行验证并承担风险与后果。如您有版权、违规等问题,请通过"联系我们"或"违规举报"告知我们处理。

hsimy 发表于 2016-10-3 14:49:15
谢谢数学不好万岁。分享好东西
回复 支持 反对

使用道具 举报

myg61793625 发表于 2016-10-6 01:38:54
前排,留史!
回复 支持 反对

使用道具 举报

94507 发表于 2016-10-7 21:06:44
我看着大家顶!
回复 支持 反对

使用道具 举报

得得cms123 发表于 2016-10-11 15:56:36
看了这么多帖子,第一次看到这么高质量内容!
回复 支持 反对

使用道具 举报

只是沒有結果 发表于 2016-11-17 14:46:37
我最恨别人用鼠标指着我的头.
回复 支持 反对

使用道具 举报

*滑动验证:
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

我要投稿

推荐阅读

扫码访问 @iTTTTT瑞翔 的微博
回页顶回复上一篇下一篇回列表手机版
手机版/CoLaBug.com ( 粤ICP备05003221号 | 文网文[2010]257号 )|网站地图 酷辣虫

© 2001-2016 Comsenz Inc. Design: Dean. DiscuzFans.

返回顶部 返回列表