丰泉机械(丰泉环保电力有限公司怎么样)

barry0015个月前产品信息817

  机器学习过程中的四个误区:

数据泄露;过拟合;数据采用和切分;数据质量。

  In a recent presentation, Ben Hamnerdescribed the common pitfalls in machine learning projects he and his colleagues have observed during competitions on Kaggle.

  The talk was titled “Machine Learning Gremlins” and was presented in February 2014 at Strata.

  In this post we take a look at the pitfalls from Ben’s talk, what they look like and how to avoid them.

  Machine Learning Process

  Early in the talk, Ben presented a snap-shot of the process for working a machine learning problem end-to-end.

  

  Machine Learning Process

  Taken from “Machine Learning Gremlins” by Ben Hamner

  This snapshot included 9 steps, as follows:

Start with a business problem

Source data

Split data

Select an evaluation metric

Perform feature extraction

Model Training

Feature Selection

Model Selection

Production System

  He commented that the process is iterative rather than linear.

  He also commented that each step in this process can go wrong, derailing the whole project.

  Discriminating Dogs and Cats

  Ben presented a case study problem for building an automatic cat door that can let the cat in and keep the dog out. This was an instructive example as it touched on a number of key problems in working a data problem.

  

  Discriminating Dogs and Cats

  Taken from “Machine Learning Gremlins” by Ben Hamner

  Sample Size

  The first great takeaway from this example was that he studied accuracy of the model against data sample size and showed that more samples correlated with greater accuracy.

  He then added more data until accuracy leveled off. This was a great example of understanding how easy it can be get an idea of the sensitivity of your system to sample size and adjust accordingly.

  Wrong Problem

  The second great takeaway from this example was that the system failed, it let in all cats in the neighborhood.

  It was a clever example highlighting the importance of understanding the constraints of the problem that needs to be solved, rather than the problem that you want to solve.

  Pitfalls In Machine Learning Projects

  Ben went on to discuss four common pitfalls in when working on machine learning problems.

  Although these problems are common, he points out that they can be identified and addressed relatively easily.

丰泉机械(丰泉环保电力有限公司怎么样)

  

  Overfitting

  Taken from “Machine Learning Gremlins” by Ben Hamner

Data Leakage: The problem of making use of data in the model to which a production system would not have access. This is particularly common in time series problems. Can also happen with data like system id’s that may indicate a class label. Run a model and take a careful look at the attributes that contribute to the success of the model. Sanity check and consider whether it makes sense. (check out the referenced paper “Leakage in Data Mining” PDF)

Overfitting: Modeling the training data too closely such that the model also includes noise in the model. The result is poor ability to generalize. This becomes more of a problem in higher dimensions with more complex class boundaries.

Data Sampling and Splitting: Related to data leakage, you need to very careful that the train/test/validation sets are indeed independent samples. Much thought and work is required for time series problems to ensure that you can reply data to the system chronologically and validate model accuracy.

Data Quality: Check the consistency of your data. Ben gave an example of flight data where some aircraft were landing before taking off. Inconsistent, duplicate, and corrupt data needs to be identified and explicitly handled. It can directly hurt the modeling problem and ability of a model to generalize.

丰泉机械(丰泉环保电力有限公司怎么样)

Summary

  Ben’s talk “Machine Learning Gremlins” is a quick and practical talk.

  You will get a useful crash course in the common pitfalls we are all susceptible to when working on a data problem.

  出处:machinelearningmastery。

标签: 丰泉机械

相关文章

新昱机械(新昱鑫桥梁钢构有限责任公司)

新昱机械(新昱鑫桥梁钢构有限责任公司)

伺服电机的主要优点包括:高精度:伺服电机可以实现非常精确的定位和运动控制,具有高精度的闭环控制,能够克服步进电机失步的问题。快速响应:伺服电机响应速度快,能够在短时间内做出相应的动作。稳定性强:伺服电...

济南至高升降机械(济南升降机械厂家排名)

济南至高升降机械(济南升降机械厂家排名)

  在很多卖家选择相信淘宝客服外包服务之后,又有一些卖家会担心这样的问题:  若找淘宝客服外包接单量会提升吗?究竟靠不靠谱?会不会还会影响以前的订单数呢?  还是不放心,没有自己招的人亲力亲为好。。 ...

机械手表转盘(机械手表转盘有什么用)

机械手表转盘(机械手表转盘有什么用)

  最近粉丝后台留言想了解一些关于加拿大留学的信息,小编为大家准备了从中国到入校全程最强指导及推荐院校,供大家参考,希望对大家由所帮助。  加拿大留学申请要求  录取要求一、英语成绩  加拿大各所大学...

托辊机械厂(托辊工厂)

托辊机械厂(托辊工厂)

6月10日,在风和日丽、花团锦簇的古城宣化,2023首届装备制造产业大会暨“中国钻机之乡”张家口宣化高质量发展大会在京张奥园区盛大开幕。来自国家部委领导、装备制造行业精英、资深专家学者等数百位嘉宾相聚...

输送机械设备网(输送机械设备厂家)

输送机械设备网(输送机械设备厂家)

  全自动软化水处理设备采用的是离子交换来去除水中的钙镁离子,原水的硬度高,是因为水中的钙镁离子含量高,利用树脂来吸附钙镁离子,水流过树脂的时候,吸附钙镁离子,随着不断的吸附达到了饱和,就需要再生了。...

上海道顶机械(上海道函机械制造厂)

上海道顶机械(上海道函机械制造厂)

    这是晚年李翰章(前排右坐者)、李鸿章(前排左坐者)兄弟及儿孙辈合影  一、李鸿章家族——外交大臣  晚清时期,这个家族对中国政坛的影响,与近现代中国政治、军事、外交、经济、科技各界的联系,故家...

发表评论    

◎欢迎参与讨论,请在这里发表您的看法、交流您的观点。