Understanding C4.5 Decision tree algorithm

Sumitkrsharma
3 min readJan 25, 2022

--

C4.5 algorithm is improvement over ID3 algorithm, where “C” is shows algorithm is written in C and 4.5 specifics version of algorithm. splitting criterion used by C4.5 is the normalized information gain (difference in entropy). The attribute with the highest normalized information gain is chosen to make the decision. The C4.5 algorithm then recurse on the partitioned sub lists.

In-depth Understand of algorithm:

This algorithm has a few base cases.

All the samples in the list belong to the same class. When this happens, it simply creates a leaf node for the decision tree saying to choose that class.

None of the features provide any information gain. In this case, C4.5 creates a decision node higher up the tree using the expected value of the class.

Instance of previously-unseen class encountered. Again, C4.5 creates a decision node higher up the tree using the expected value.

Steps in algorithm:

· Check for the above base cases.

· For each attribute a, find the normalized information gain ratio from splitting on a.

· Let a_best be the attribute with the highest normalized information gain.

· Create a decision node that splits on a_best.

· Recurse on the sublists obtained by splitting on a_best, and add those nodes as children of node.

Let’s understand numerical working of C4.5 algorithm on example dataset :

Note: Logarithm with base 2 is used in al calculations.

https://sefiks.com/2018/05/13/a-step-by-step-c4-5-decision-tree-example/

Entropy(Decision) = ∑ — p(I) . log p(I) = — p(Yes) . log p(Yes) — p(No) . log2(No) = — (9/14) . log(9/14) — (5/14) . log(5/14) = 0.940

Here, we need to calculate gain ratios instead of gains.

GainRatio(A) = Gain(A) / SplitInfo(A)

SplitInfo(A) = -∑ |Dj|/|D| x log|Dj|/|D|

Let’s calculate for Wind Attribute:

Gain(Decision, Wind) = Entropy(Decision) — ∑ ( p(Decision|Wind) . Entropy(Decision|Wind) )

Gain(Decision, Wind) = Entropy(Decision) — [ p(Decision|Wind=Weak) . Entropy(Decision|Wind=Weak) ] + [ p(Decision|Wind=Strong) . Entropy(Decision|Wind=Strong) ]

Entropy(Decision|Wind=Weak) = — p(No) . logp(No) — p(Yes) . logp(Yes) = — (2/8) . log(2/8) — (6/8) . log(6/8) = 0.811

Entropy(Decision|Wind=Strong) = — (3/6) . log(3/6) — (3/6) . log(3/6) = 1

Gain(Decision, Wind) = 0.940 — (8/14).(0.811) — (6/14).(1) = 0.940–0.463–0.428 = 0.049

There are 8 decisions for weak wind, and 6 decisions for strong wind.

SplitInfo(Decision, Wind) = -(8/14).log(8/14) — (6/14).log(6/14) = 0.461 + 0.524 = 0.985

GainRatio(Decision, Wind) = Gain(Decision, Wind) / SplitInfo(Decision, Wind) = 0.049 / 0.985 = 0.049

Similarly, for wind, Outlook, Humidity<> 80 and Temperature <>83

let’s consider gain as splitting criterion and request you to please follow same steps with Gain Ratio.

If we will use gain, then outlook will be the root node because it has the highest gain value.

Performs similar steps for all attributes over outlook and the resultant tree looks like:

https://sefiks.com/2018/05/13/a-step-by-step-c4-5-decision-tree-example/

Improvements in C4.5 over ID3:

· Handling both continuous and discrete

· Handling training data with missing attribute values

· Handling attributes with differing costs.

· Pruning trees after creation

Limitations:

The limitations of C4. 5 is its information entropy, it gives poor results for larger distinct attributes.

In this post we understand C4.5 a decision tree algorithm in detail manner, with its definition, pseudo-code and numerical example. In coming posts we follow our path of learning and explore more machine learning algorithms.

Thank you learners !!!

--

--