Cross entropy

12/2/2023

ML notation, multiple events, binary classes. Both notations, both the single and multiple event case, multiple classes. Information theory notation, single event case, multiple classes. If you read them make sure you think about (a) how they are notating the true distribution and the approximating distribution (b) whether they are writing about single events or multiple events (c) whether they are writing about binary classes or multi classes and (d) whether the \(y\)’s are indicators or labels. I don’t know if there is a single best place to learn about cross-entropy, but below are a few places that were helpful. The image below summarizes the many confusing differences between these formulas. In previous formulas, as you loop through classes you would set \(y_j\) to be 1 whenever the outcome was the class being considered otherwise you set it to 0. When it has an outcome of 0, \(y_i\) is 0. When the event has an outcome of 1, \(y_i\) is 1.

Third, it’s easier to think of \(y_i\) (when not wrapped by a \(p\)) as a label, rather than an indicator variable.
In previous formulas, \(p(y_j)\) referred to the probability of the particular class being considered in the summation loop for that event. Here, \(p(y_i)\) refers to the probability of a positive result on an event.
Second, because it is summing over events, it uses \(p(y_i)\) rather than \(p(y_j)\).
First, this formula sums over events, whereas the first formula in this section only sums over classes.
Its notation differs from previous formulas in a few ways: Superficially, this looks a lot like the first formula, but it’s actually just a clever and somewhat confusing way of writing the second formula for binary classes.

When 0, Focal Loss is equivalent to Cross Entropy. From the experiments, 2 worked the best for the authors of the Focal Loss paper. In information theory, the cross-entropy for an event with \(M\) discrete outcome classes is Considering 2, the loss value calculated for 0.9 comes out to be 4.5e-4 and down-weighted by a factor of 100, for 0.6 to be 3.5e-2 down-weighted by a factor of 6.25. I always come away only half-understanding it.Ī major cause of confusion is that different sources use different notations and conventions. Things that confused me about cross-entropy Įvery once in a while, I try to better understand cross-entropy by skimming over some Medium posts and StackExchange answers.

0 Comments

Cross entropy

Leave a Reply.

Author

Archives

Categories