{"id":54,"date":"2017-05-23T18:56:08","date_gmt":"2017-05-23T18:56:08","guid":{"rendered":"http:\/\/bskog.com\/ai\/?p=54"},"modified":"2017-05-23T18:56:08","modified_gmt":"2017-05-23T18:56:08","slug":"decision-trees","status":"publish","type":"post","link":"http:\/\/bskog.com\/ai\/2017\/05\/23\/decision-trees\/","title":{"rendered":"Decision Trees"},"content":{"rendered":"<p>When you have to deal with non-linear decision making, you can use decision trees to transform it into a linear decision surface.<\/p>\n<p>Let say we have a buddy that goes wakeboarding\u00a0if the weather is sunny but\u00a0not windy. Whenever he sees the sun is up he considers going wakeboarding, but if it is too windy there will be too much waves on the lake and it is not as fun as when the surface is still. This data is not linearly separable, as shown in the following image: We can&#8217;t separate this with one line.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-60\" src=\"http:\/\/bskog.com\/ai\/wp-content\/uploads\/2017\/05\/Ska\u0308rmavbild-2017-05-23-kl.-21.32.13.png\" alt=\"\" width=\"1472\" height=\"628\" \/><\/p>\n<p>Decision trees\u00a0enables you to\u00a0make several linearly separable decisions one after another. In this case when we look at the data we can clearly see that it follows a pattern. First we can see that for instance if it is windy, he will not go wakeboarding regardless of if it is sunny or not. So by asking ourselves, is it windy? we can get a definite answer if it is windy, then we will not go wakeboarding.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-61\" src=\"http:\/\/bskog.com\/ai\/wp-content\/uploads\/2017\/05\/Ska\u0308rmavbild-2017-05-23-kl.-21.32.28.png\" alt=\"\" width=\"1472\" height=\"634\" \/>If the answer was yes, we had an answer to our question if we shall go wakeboarding, and if it is not windy, we will ask another question: is it sunny? And if not, we won&#8217;t go wakeboarding, but if it is sunny, we&#8217;ll head to the lake.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-64\" src=\"http:\/\/bskog.com\/ai\/wp-content\/uploads\/2017\/05\/Ska\u0308rmavbild-2017-05-23-kl.-21.42.40.png\" alt=\"\" width=\"1424\" height=\"606\" \/><\/p>\n<p>This way, we could make a non linearly separable data set into a linear one by stepping through a decision tree.<\/p>\n<p>&nbsp;<\/p>\n<p>If you then look at a little bit more complex data. You can see that for\u00a0after a certain treshold on x, the data behaves differently. Here the data is not linearly separable.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-66\" src=\"http:\/\/bskog.com\/ai\/wp-content\/uploads\/2017\/05\/Ska\u0308rmavbild-2017-05-23-kl.-21.54.00.png\" alt=\"\" width=\"1422\" height=\"642\" \/><\/p>\n<p>This can be made into linear decisions using a decision tree. <img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-67\" src=\"http:\/\/bskog.com\/ai\/wp-content\/uploads\/2017\/05\/Ska\u0308rmavbild-2017-05-23-kl.-21.53.42.png\" alt=\"\" width=\"1424\" height=\"628\" \/><\/p>\n<p>Decision trees are easy to understand and interpret since they can be visualized as a tree structure. The data need not be prepared much whereas other methods may require normalisation. As we saw earlier that naive Base was good for classifying text, a decision tree is good when it comes to numerical and categorical data. They are, however, prone to overfitting. If some classes dominate in the training data, the generated tree may become biased. In that case the testing data needs to be balanced\u00a0before training.<\/p>\n<p>You want to find split points and variables that can separate the dataset into as pure subsets as possible, meaning that there\u00a0the classes\u00a0are preferably of only one type in the subset of the\u00a0data. This is then done recursively with the generated subsets until you have classified the data.<\/p>\n<p>Entropy (a measure of impurity) is what a\u00a0decision tree uses to determine where to split the data when constructing the tree. Entropy is the opposite of purity, if all examples of a sample are of the same class, then the entropy is 0. If all examples are <span style=\"text-decoration: underline\">evenly<\/span> split between all the available classes, then\u00a0the entropy is 1.<\/p>\n<p>Decision Trees use the entropy of a node to calculate what the split shall be.<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-74\" src=\"http:\/\/bskog.com\/ai\/wp-content\/uploads\/2017\/05\/entropy.png\" alt=\"\" width=\"484\" height=\"198\" \/><\/p>\n<p>P<sub>i<\/sub> is a fraction\u00a0of examples in the\u00a0class i.<\/p>\n<p>&nbsp;<\/p>\n<p>The way a decision tree affects its boundaries (choosing which features to make a split on) using entropy is by maximizing\u00a0something called information gain.<\/p>\n<blockquote><p>Information gain = entropy(parent) &#8211; [weighted average]entropy(children)<\/p><\/blockquote>\n<p>&nbsp;<\/p>\n<p>See\u00a0the example found in<a href=\"https:\/\/classroom.udacity.com\/courses\/ud120\/lessons\/2258728540\/concepts\/23916986910923\"> this video <\/a>for an explanation about using information gain to choose which feature to use to split data in a decision tree.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>Here you can find more information about decision trees:\u00a0<a href=\"http:\/\/scikit-learn.org\/stable\/modules\/tree\">http:\/\/scikit-learn.org\/stable\/modules\/tree<\/a><\/p>\n<p>If you use the <a href=\"http:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier\">DecisionTreeClassifier<\/a> in Scikit-learn you can tinker a bit with the parameters to set the\u00a0criterions how it shall behave when splitting the train data into tree branches.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>When you have to deal with non-linear decision making, you can use decision trees to transform it into a linear decision surface. Let say we have a buddy that goes wakeboarding\u00a0if the weather is sunny but\u00a0not windy. Whenever he sees the sun is up he considers going wakeboarding, but if it is too windy there &hellip; <\/p>\n<p class=\"link-more\"><a href=\"http:\/\/bskog.com\/ai\/2017\/05\/23\/decision-trees\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Decision Trees&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3,32,34],"tags":[],"class_list":["post-54","post","type-post","status-publish","format-standard","hentry","category-algorithms","category-shallow-machine-learning","category-supervised-machine-learning"],"_links":{"self":[{"href":"http:\/\/bskog.com\/ai\/wp-json\/wp\/v2\/posts\/54","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/bskog.com\/ai\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/bskog.com\/ai\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/bskog.com\/ai\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/bskog.com\/ai\/wp-json\/wp\/v2\/comments?post=54"}],"version-history":[{"count":0,"href":"http:\/\/bskog.com\/ai\/wp-json\/wp\/v2\/posts\/54\/revisions"}],"wp:attachment":[{"href":"http:\/\/bskog.com\/ai\/wp-json\/wp\/v2\/media?parent=54"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/bskog.com\/ai\/wp-json\/wp\/v2\/categories?post=54"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/bskog.com\/ai\/wp-json\/wp\/v2\/tags?post=54"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}