160 Questions for Data Science - Part 8

Intro (8/?)

I totally forgot about writing some new stuff here, but recently I found these 160 data science interview questions on Hackernoon, and decided to try to answer each one of them in order to force me to study all of those interesting topics. I will post my answers (hopefully, right and comprehensible) trying to write ~23 answers each couple of days.

If you spot anything wrong, contact me please!

What is a sigmoid, what does it do

A sigmoid is an S shaped function that as R as domain, and a codomain restricted depending on its form. Thanks to this characteristics, it can be used to map any value to a specific domain, and is used in Machine Learning to force an input to one of two possible outputs.

How do we evaluate classification models

We evaluate classification models by checking how the predictions made by the model compare with the actual values of the samples. The most naive method is to check the ratio between correct prediction and total samples. This value can be useful, but is also heavily biased when one class vastly outnumbers the other(s).

In those cases, there are more precise evaluations one can get from a model: Precision and Recall. The former is the ratio between the total actual members of a class and the sum of total actual members and falsely predicted members: this value represent how ‘valuable’ is the prediction of a model for a specific class. The latter is the ratio between the total actual members of a class and the sum of total actual members and the actual members of the same class predicted falsely: this value represent how much the model is capable of recognizing a class.

When evaluating binary models, the recall value for the positive class is called Sensitivity, for the negative class is Sensibility.

More comples evaluation values are ROC and AOC. As an example, we use a binary logistic regression: on a plot with sensitivity and sensibility on the axes, one plots the different values that the model gets by changing the threshold of probability that separates the two classes. The ideal plot rises steeply, in order to reach the area of the plot that has the highest true positive ratio and the lowest false negatives.

What is accuracy

The accuracy is the ratio between correct prediction and total of samples.