Thông tin tài liệu:
Lecture "Applied data science: Classification" includes content: Classification - logistic regression review ; classification evaluation metrics; the expected value framework;... We invite you to consult!
Nội dung trích xuất từ tài liệu:
Lecture Applied data science: Classification
Classification
Overview
1. Introduction 8. Validation
2. Application 9. Regularisation
3. EDA 10. Clustering
4. Learning Process 11. Evaluation
5. Bias-Variance Tradeoff 12. Deployment
6. Regression (review) 13. Ethics
7. Classification
Lecture outline
- Classification - Logistic regression review
- Classification evaluation metrics
- The expected value framework
Classification problems
Response is categorical, e.g. credit card default (Yes/No), favourite movie types
(Action/Drama/Animation)
Exemplary techniques - logistic regression, classification tree, K-NN, etc.
Logistic regression formulation
Logistic regression coefficients are estimated by
maximising the likelihood function
Logistic regression example
responding
Yes No
student_Yes 127 2817
student_No 206 6850
Total 333 9667
Training set responding
Yes No
student_Yes 84 1959
student_No 150 4808
Total 234 6767
Test set responding
Yes No
student_Yes 43 858
student_No 56 2042
Total 99 2900
Logistic regression results
Logistic regression results interpretation
Prediction from multiple classifiers
The ROC curve
The ROC curve
Each point corresponds to a confusion matrix
Point A is more ‘conservative’ than B, which is
more ‘conservative’ than C
Points that are closer to the upper left are
preferred. Point (0,1) represents the perfect
classifier
Points along the diagonal represent random
guessing - no classifiers should be in the
lower right
The ROC curves from different classifiers
p n
Predicted Yes 46 12
Predicted No 53 2888
The expected value analytical framework
The targeted marketing example.
Assume that we sell the product for $200, production related cost is $100 and
shipping and handling cost is $1. What would be the minimum probability of
responding we should target.
Expected value of a classifier
Expected value of a classifier
From the above example, let’s use 0.35 as the threshold and assume the matrix of
cost/benefit information is as below. What would be total expected value of the
logistic regression classifier per customer?
Actual Yes Actual No
Predicted Yes $99 $-1
Predicted No $0 $0
The profit curves
Actual Yes Actual No Actual Yes Actual No
Predicted Yes $99 $-1 Predicted Yes $99 $-10
Predicted No $0 $0 Predicted No $0 $0