Thông tin tài liệu:
Lecture "Applied data science: Evaluation, deployment, ethics" includes content: evaluation validation components, evaluation dataset, simpon's paradox, some ethical issues in ML,... We invite you to consult!
Nội dung trích xuất từ tài liệu:
Lecture Applied data science: Evaluation, deployment, ethics
Applied Data Science
Sonpvh, 2022
1. Introduction 8. Validation
2. Application 9. Regularization
3. EDA 10. Clustering
4. Learning Process 11. Evaluation
5. Bias – Variance TradeOff 12. Deployment
6. Regression 13. Ethics
7. Classification
1
- Loan amount
+ Interest GOOD
+/- Duration
Good/Bad user BAD
classification
Label
Loan for specific
Definition
purpose
DATA MODELING
”BLACKBOX”
Deployment - • Hypothesis
Monitor • Algorithm
• Data (Labels –
Label
Evaluation Features)
Collection
Metrics
Test
Benchmarking
2
Business Data
Understand Understandi
ing ng
Data
Data
Data Unify Preparati
Analysi
on
s
Deployme
Modeling
nt
Evaluatio
n
BUSINESS
UNDERSTANDING EVALUATION TESTs
• Purposes DATA MODELING
”BLACKBOX”
• Target distribution • Hypothesis
• Target/nonTarget • Algorithm
• Data (Labels –
definition Features)
• Usecases – constrains
• Evaluation MONITORs DEPLOYMENT
• …
3
Supervisory
Validation System
Examination
Validation of Validation of
Output Process
Problems & Business User-
Model Design Backtest Benchmarking Data Quality
reports case
Studies on the Validation of Internal Rating Systems [1] 4
• Time-based Evaluation
• Sliced-based Evaluation
• Product-based Evaluation
DATASET
• Perturbation Evaluation
• What-if Evaluation
• …
BACKTEST
DATA
• Definition Conformance
QUALITY • Uniqueness
METRICS • Completeness • Derivation Integrity
• Validity (consistency)
• Accuracy • Accessibility
Dr. Manjunath T.N, 2011 [3] 5
• Timeliness
Time-based evaluation
TRAIN TEST time
Product-based evaluation TEST Sliced-based Evaluation
• Important segmentations: Age, Gender,
Location …
• Models perform differently on different
time/product/segments
Milk A Milk B Milk C Milk D
...