Convexity of ROC curves
Số trang: 4
Loại file: pdf
Dung lượng: 344.67 KB
Lượt xem: 20
Lượt tải: 0
Xem trước 2 trang đầu tiên của tài liệu này:
Thông tin tài liệu:
In work "Convexity of ROC curves", interested in the accuracy measures of binary classifiers in the context of artificial intelligence. More specifically, we will be interested in the receiver operating characteristic (ROC) curves. By using analytical and optimization, we prove the main result of the paper which is “ROC curves of optimal machines are convex”.
Nội dung trích xuất từ tài liệu:
Convexity of ROC curves HỘI NGHỊ TOÀN QUỐC KHOA HỌC TRÁI ĐẤT VÀ TÀI NGUYÊN VỚI PHÁT TRIỂN BỀN VỮNG (ERSD 2022) Convexity of ROC curves Le Bich Phuong1,*, Ha Huu Cao Trinh1, Nguyen Thi Mai Hoa2 1 Hanoi University of Mining and Geology 2 Banking Academy of VietnamABSTRACTIn this work, we are interested in the accuracy measures of binary classifiers in the context of artificialintelligence. More specifically, we will be interested in the receiver operating characteristic (ROC) curves.By using analytical and optimization, we prove the main result of the paper which is “ROC curves of optimalmachines are convex”.Keywords: ROC curves; convexity; artificial intelligence.1. Introduction Convexity of the ROC curve is not something new, and many research papers and monographs alreadydiscussed this convexity property in an empirical way, (Nabila Abraham, Naimul Mefraz Khan, 2019;Nabila Abraham, Naimul Mefraz Khan, 2019; Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016;Max Kuhn, Kjell Johnson, 2013) . However, the only place we we found a rigorous theorem on convexityof the ROC curve is (T. Gneiting, Peter Vogel, 2018), where the authors showed that the convexity of the ROCcurve is equivalent to another natural condition, namely “the conditional event probability (or thelikelyhood ratio) is nondecerasing”. In this paper, we study optimal machines , and show that if a machineis optimal then its ROC curve is automatically convex.2. Preliminaries2.1. ROC curves Let us fix some notations for this paper. Denote by an input space together with some probabilitymeasure P , and Y : 0,1 (1)a binary classification problem on . For example, is the population, and Y is covid-positive (1) orcovid-negative (0) . Y is often called the ground truth. We want to build a binary machine M : 0,1 (2)(a test, whose values are in the interval 0,1 ) that predicts the value of Y . Given a threshold 0,1 ,for each element x we put Y x 1 if M x 0 and Y x 0 if M x (3) The performance (i.e., precision) of the predictor Y with respect to the ground truth Y can be measuredby two basic performance indicators, called the sensitivity (true positive rate) TP and specificity ( truenegative rate) TN , defined by the following formulas: P M x , Y x 1 TP P Y 1 Y 1 , (4) P Y x 1 P M x , Y x 0 TN P Y 0 Y 0 . (5) P Y x 0 *Tác giả liên hệEmail: lebichphuong@humg.edu.vn 1261 The curve ROC : 0,1 0,1 0,1 given by the formula ROC 1 TN , TP (6)is called the ROC (receiver operating characteristic) curve of the machine M in the literature, and is verywidely used in many fields and references. The number FP 1 TN is called the false positive rateat the threshold . The ROC curves goes “backward” from the point ROC 0 1,1 to the point ROC 1 0,0 in theunit square, and the higher the curve the more accurate the machine. The so called AUC (area under thecurve) is the area of the region under the ROC curve in the unit square, and is a popular measure for theaccuracy of the machine. See Figure 1 for an illustration.2.2. Information projection, sigmoid functions and optimal machines Conceptually, we can describe a binary machine M as a composition of two steps: M , (7)where : (8)may be called the information projection map from the original data space to a certain “distilledfeatures space” or information space , and : 0, 1 (9)is a function from t ...
Nội dung trích xuất từ tài liệu:
Convexity of ROC curves HỘI NGHỊ TOÀN QUỐC KHOA HỌC TRÁI ĐẤT VÀ TÀI NGUYÊN VỚI PHÁT TRIỂN BỀN VỮNG (ERSD 2022) Convexity of ROC curves Le Bich Phuong1,*, Ha Huu Cao Trinh1, Nguyen Thi Mai Hoa2 1 Hanoi University of Mining and Geology 2 Banking Academy of VietnamABSTRACTIn this work, we are interested in the accuracy measures of binary classifiers in the context of artificialintelligence. More specifically, we will be interested in the receiver operating characteristic (ROC) curves.By using analytical and optimization, we prove the main result of the paper which is “ROC curves of optimalmachines are convex”.Keywords: ROC curves; convexity; artificial intelligence.1. Introduction Convexity of the ROC curve is not something new, and many research papers and monographs alreadydiscussed this convexity property in an empirical way, (Nabila Abraham, Naimul Mefraz Khan, 2019;Nabila Abraham, Naimul Mefraz Khan, 2019; Ian Goodfellow, Yoshua Bengio, Aaron Courville, 2016;Max Kuhn, Kjell Johnson, 2013) . However, the only place we we found a rigorous theorem on convexityof the ROC curve is (T. Gneiting, Peter Vogel, 2018), where the authors showed that the convexity of the ROCcurve is equivalent to another natural condition, namely “the conditional event probability (or thelikelyhood ratio) is nondecerasing”. In this paper, we study optimal machines , and show that if a machineis optimal then its ROC curve is automatically convex.2. Preliminaries2.1. ROC curves Let us fix some notations for this paper. Denote by an input space together with some probabilitymeasure P , and Y : 0,1 (1)a binary classification problem on . For example, is the population, and Y is covid-positive (1) orcovid-negative (0) . Y is often called the ground truth. We want to build a binary machine M : 0,1 (2)(a test, whose values are in the interval 0,1 ) that predicts the value of Y . Given a threshold 0,1 ,for each element x we put Y x 1 if M x 0 and Y x 0 if M x (3) The performance (i.e., precision) of the predictor Y with respect to the ground truth Y can be measuredby two basic performance indicators, called the sensitivity (true positive rate) TP and specificity ( truenegative rate) TN , defined by the following formulas: P M x , Y x 1 TP P Y 1 Y 1 , (4) P Y x 1 P M x , Y x 0 TN P Y 0 Y 0 . (5) P Y x 0 *Tác giả liên hệEmail: lebichphuong@humg.edu.vn 1261 The curve ROC : 0,1 0,1 0,1 given by the formula ROC 1 TN , TP (6)is called the ROC (receiver operating characteristic) curve of the machine M in the literature, and is verywidely used in many fields and references. The number FP 1 TN is called the false positive rateat the threshold . The ROC curves goes “backward” from the point ROC 0 1,1 to the point ROC 1 0,0 in theunit square, and the higher the curve the more accurate the machine. The so called AUC (area under thecurve) is the area of the region under the ROC curve in the unit square, and is a popular measure for theaccuracy of the machine. See Figure 1 for an illustration.2.2. Information projection, sigmoid functions and optimal machines Conceptually, we can describe a binary machine M as a composition of two steps: M , (7)where : (8)may be called the information projection map from the original data space to a certain “distilledfeatures space” or information space , and : 0, 1 (9)is a function from t ...
Tìm kiếm theo từ khóa liên quan:
Kỷ yếu Hội nghị toàn quốc Khoa học trái đất Phát triển bền vững Convexity of ROC curves Receiver operating characteristic Artificial intelligence True positive rateTài liệu liên quan:
-
342 trang 350 0 0
-
Phát triển du lịch bền vững tại Hòa Bình: Vai trò của các bên liên quan
10 trang 327 0 0 -
Phát triển bền vững của doanh nghiệp Việt Nam thông qua bộ chỉ số doanh nghiệp bền vững (CSI)
8 trang 321 0 0 -
Ebook Managing risk and information security: Protect to enable - Part 2
102 trang 279 0 0 -
95 trang 271 1 0
-
Tăng trưởng xanh ở Việt Nam qua các chỉ số đo lường định lượng
11 trang 246 0 0 -
Phát triển bền vững vùng Tây Nguyên: Từ lý luận đến thực tiễn
6 trang 213 0 0 -
9 trang 208 0 0
-
Giáo trình Tài nguyên rừng - Nguyễn Xuân Cự, Đỗ Đình Sâm
157 trang 182 0 0 -
Đổi mới tư duy về phát triển bền vững: Nhìn từ hai cách tiếp cận phát triển bền vững
5 trang 177 0 0