Evaluate the performance of the confidence scores generated by one or more aggregation methods. Assumes probabilistic confidence scores for the metrics selected.

confidence_score_evaluation(confidence_scores, outcomes)

Arguments

confidence_scores

A dataframe in the format output by the aggreCAT:: aggregation methods

outcomes

A dataframe with two columns: paper_id (corresponding to the id's from the confidence_scores), and outcome containing the known outcome of replication studies

Value

Evaluated dataframe with four columns: method (character variable describing the aggregation method), AUC (Area Under the Curve (AUC) scores of ROC curves - see ?precrec::auc), Brier_Score (see ?DescTools::BrierScore) and Classification_Accuracy(classification accuracy measured for pcc = percent correctly classified; see ?rfUtilities::accuracy).

Examples

if (FALSE) {
confidence_score_evaluation(data_ratings,
                            data_outcomes)
}