model_tests.FEAT.SubgroupDisparity¶
SubgroupDisparity Objects¶
@dataclass
class SubgroupDisparity(ModelTest)
Test if the maximum difference / ratio of a specified metric for any 2 groups within a specified protected attribute exceeds the given threshold.
If chi2 is used, the p-value calculated from a chi-square test of independence should be greater than the level of significance as specified by the threshold.
Arguments:
attr
- Column name of the protected attribute.metric
- Type of performance metric for the test, For classification problem, choose from 'fpr' - false positive rate, 'fnr' - false negative rate, 'pr' - positive rate. For regression problem, choose from 'mse' - mean squared error, 'mae' - mean absolute error.method
- Type of method for the test, choose from 'chi2', 'ratio' or 'diff'.threshold
- Threshold for maximum difference / ratio, or the significance level of chi-sq test.test_name
- Name of the test, default is 'Subgroup Disparity Test'.test_desc
- Description of the test. If none is provided, an automatic description will be generated based on the rest of the arguments passed in.
get_metric_dict¶
def get_metric_dict(df: pd.DataFrame) -> Tuple[dict, list]
Calculate metric ratio / difference and size for each subgroup of the protected attribute on a given df.
Arguments:
df
- Dataframe.
Returns:
A dictionary of each subgroup and the calculated ratio or difference.
get_contingency_table¶
def get_contingency_table(df: pd.DataFrame) -> list
Obtain the contingency table of the metric of interest for each subgroup of a protected attribute on a given df.
Arguments:
df
- Dataframe.
Returns:
List of metric value.
plot¶
def plot(alpha: float = 0.05, save_plots: bool = True)
Plot the metric of interest across the attribute subgroups, and their confidence interval bands.
Arguments:
alpha
- Significance level for confidence interval.save_plots
- If True, saves the plots to the class instance.
get_result¶
def get_result(df_test_with_output: pd.DataFrame) -> Dict[str, float]
Calculate maximum ratio / diff or chi-sq test for any 2 subgroups' metrics on a given df.
Arguments:
df_test_with_output
- Dataframe containing protected attributes with "prediction" and "truth" column.
run¶
def run(df_test_with_output: pd.DataFrame) -> bool
Runs test by calculating result and evaluating if it passes a defined condition.
Arguments:
df_test_with_output
- Dataframe containing protected attributes with "prediction_probas" and "truth" column. protected attribute should not be encoded.