160 Questions for Data Science - Part 11

Intro (11/?)

I totally forgot about writing some new stuff here, but recently I found these 160 data science interview questions on Hackernoon, and decided to try to answer each one of them in order to force me to study all of those interesting topics. I will post my answers (hopefully, right and comprehensible) trying to write ~23 answers each couple of days.

If you spot anything wrong, contact me please!

How to interpret the AU ROC score ‍

The AU ROC score is a value that goes from 0 to 1 and is used as a metric for the total precision of a model. Its value is higher the more the classes are correctly separated, and it’s not biased by a class imbalance (if a class is over represented it won’t change the AU ROC score).

What is the PR (precision-recall) curve

It’s the curve obtained by plotting the precision of a classification model over its recall at various values of its threshold. It can show unbalance in the dataset between classes. It is better to use this curve than the ROC one if there is an imbalance between classes, since the ROC will not be affected by it, representing how the model would classify an ideal dataset.

What is the area under the PR curve, is it a useful metric

Just like the AU ROC, the area under PR curve is a useful metric to represent the skill of a model across different thresholds. The higher, the better the model is.