160 Questions for Data Science - Part 7

Intro (7/?)

I totally forgot about writing some new stuff here, but recently I found these 160 data science interview questions on Hackernoon, and decided to try to answer each one of them in order to force me to study all of those interesting topics. I will post my answers (hopefully, right and comprehensible) trying to write ~23 answers each couple of days.

If you spot anything wrong, contact me please!

What is classification, which models would you use to solve a classification problem

Classification is a problem where the output of a model is a categorical variable. This mean that from a continuous/discrete input the output must be forced to be a discrete value.

Some simple and explainable models one can use to solve a classification problem are Decision Tree and Logistic Regression. Other, more complex and not explainable, are Support Vector Machine, Random Forest or a Neural Network.

What is logistic regression, when do we need to use it

Logistic regression is a model used to solve classification problems. The output of a model used to classify to n possibile classes is the probability that the input belongs to one of those classes. Therefore the output for each class has domain [0, 1] and the sum of all outputs is 1.

For example, in binary logistic regression the representation of a logistic regression can be a line that divides the feature space and separates samples that the model assigns to one class from the other.

We use it to solve classification problems.

Is logistic regression a linear model

Yes, since the output of this model depends on the sum of its input and parameters, and not on their product (or other combinations).