Danh mục

Lecture Applied data science: Classification

Số trang: 18      Loại file: pdf      Dung lượng: 846.97 KB      Lượt xem: 28      Lượt tải: 0    
tailieu_vip

Xem trước 2 trang đầu tiên của tài liệu này:

Thông tin tài liệu:

Lecture "Applied data science: Classification" includes content: Classification - logistic regression review ; classification evaluation metrics; the expected value framework;... We invite you to consult!
Nội dung trích xuất từ tài liệu:
Lecture Applied data science: Classification Classification Overview 1. Introduction 8. Validation 2. Application 9. Regularisation 3. EDA 10. Clustering 4. Learning Process 11. Evaluation 5. Bias-Variance Tradeoff 12. Deployment 6. Regression (review) 13. Ethics 7. Classification Lecture outline - Classification - Logistic regression review - Classification evaluation metrics - The expected value framework Classification problems Response is categorical, e.g. credit card default (Yes/No), favourite movie types (Action/Drama/Animation) Exemplary techniques - logistic regression, classification tree, K-NN, etc. Logistic regression formulation Logistic regression coefficients are estimated by maximising the likelihood function Logistic regression example responding Yes No student_Yes 127 2817 student_No 206 6850 Total 333 9667 Training set responding Yes No student_Yes 84 1959 student_No 150 4808 Total 234 6767 Test set responding Yes No student_Yes 43 858 student_No 56 2042 Total 99 2900 Logistic regression results Logistic regression results interpretation Prediction from multiple classifiers The ROC curve The ROC curve Each point corresponds to a confusion matrix Point A is more ‘conservative’ than B, which is more ‘conservative’ than C Points that are closer to the upper left are preferred. Point (0,1) represents the perfect classifier Points along the diagonal represent random guessing - no classifiers should be in the lower right The ROC curves from different classifiers p n Predicted Yes 46 12 Predicted No 53 2888 The expected value analytical framework The targeted marketing example. Assume that we sell the product for $200, production related cost is $100 and shipping and handling cost is $1. What would be the minimum probability of responding we should target. Expected value of a classifier Expected value of a classifier From the above example, let’s use 0.35 as the threshold and assume the matrix of cost/benefit information is as below. What would be total expected value of the logistic regression classifier per customer? Actual Yes Actual No Predicted Yes $99 $-1 Predicted No $0 $0 The profit curves Actual Yes Actual No Actual Yes Actual No Predicted Yes $99 $-1 Predicted Yes $99 $-10 Predicted No $0 $0 Predicted No $0 $0

Tài liệu được xem nhiều: