mlm_insights.core.metrics.bias_and_fairness package

Submodules

mlm_insights.core.metrics.bias_and_fairness.class_imbalance module

class mlm_insights.core.metrics.bias_and_fairness.class_imbalance.ClassImbalance(config: ~typing.Dict[str, ~mlm_insights.constants.definitions.ConfigParameter] = <factory>, feature_values_or_threshold: ~typing.List[str] = <factory>, drop_nan_values: bool = False, feature_value_count: int = 0, total: int = 0, nan_count: int = 0)

Bases: MetricBase

Class imbalance (CI) bias occurs when a feature (also referred to as facet) value d has fewer training samples when compared with another feature value a in the dataset.

The formula for the class imbalance measure is:

CI = (na - nd)/(na + nd)

where,

na = the number of members of insensitive group (feature_values_or_threshold)

nd = the number of members of sensitive group

Its values range over the interval [-1, 1]. Value close to 0 indicates balanced feature Negative value indicates under-representation of the feature_values_or_threshold group Positive value indicates reverse bias i.e towards insensitive group

Configuration

feature_values_or_threshold: List[str]
  • list of categorical values present in the dataset for the given feature

drop_nan_valuesboolean
  • flag to exclude the nan values while calculating class imbalance

Exceptions

  • InvalidParameterException
    • if the feature_values_or_threshold group is not present

    • if the feature_values_or_threshold group list is empty

Limitations

Currently support list of categorical values for a given feature

Returns

class_imbalance_value: float
  • Class Imbalance value .

feature_values_or_threshold: List[str]
  • list of categorical values present in the dataset for the given feature

Examples

import pandas as pd

from mlm_insights.builder.builder_component import MetricDetail, EngineDetail
from mlm_insights.builder.insights_builder import InsightsBuilder
from mlm_insights.constants.types import FeatureType, DataType, VariableType
from mlm_insights.core.metrics.bias_and_fairness.class_imbalance import ClassImbalance,            CONFIG_KEY_FOR_FEATURE_VALUES_OR_THRESHOLD,            CLASS_IMBALANCE_VALUE
from mlm_insights.core.metrics.metric_metadata import MetricMetadata


def main():
    input_schema = {
        'transport': FeatureType(data_type=DataType.STRING,
                                 variable_type=VariableType.NOMINAL),

        'gender': FeatureType(data_type=DataType.STRING,
                              variable_type=VariableType.NOMINAL)

    }

    data_frame = pd.DataFrame({'transport': ['bus', 'bus', 'train', 'walk', 'walk', 'car', 'car'],
                               'gender': ['M', 'M', 'F', 'F', 'M', 'M', 'F']})

    metric_details = MetricDetail(univariate_metric={"gender": [
        MetricMetadata(klass=ClassImbalance, config={CONFIG_KEY_FOR_FEATURE_VALUES_OR_THRESHOLD: ['F']})
    ]}, dataset_metrics=[])

    runner = InsightsBuilder().                 with_input_schema(input_schema).                 with_data_frame(data_frame=data_frame).                 with_metrics(metrics=metric_details).                 with_engine(engine=EngineDetail(engine_name='native')).                 build()

    run_result = runner.run()
    profile = run_result.profile
    profile_json = profile.get_feature('gender').get_metric(
        MetricMetadata(klass=ClassImbalance, config={CONFIG_KEY_FOR_FEATURE_VALUES_OR_THRESHOLD: ["F"]})
    ).get_result()

    print(round(profile_json[CLASS_IMBALANCE_VALUE], 1))
    # -0.1


if __name__ == '__main__':
    main()




Returns the standard metric result as:
{
    'metric_name': 'ClassImbalance',
    'metric_description': 'Class Imbalance Bias',
    'variable_count': 2,
    'variable_names': [class_imbalance_value],
    'variable_types': [CONTINUOUS,NOMINAL],
    'variable_dtypes':[FLOAT,STRING],
    'variable_dimensions': [1,2],
    'metric_data': [<float value>],
    'metadata': {},
    'error': None
}
compute(column: Series, **kwargs: Any) None

Computes the class imbalance of the given features based on the feature_values_or_threshold group value

Parameters

columnpd.Series

Input column.

compute_ci() float
classmethod create(config: Dict[str, ConfigParameter] | None = None) ClassImbalance

Factory Method to create an object. The configuration will be available in config.

Returns

Count

An Instance of Class Imbalance.

drop_nan_values: bool = False
feature_value_count: int = 0
feature_values_or_threshold: List[str]
get_result(**kwargs: Any) Dict[str, Any]

Returns class imbalance of input data.

Returns

dict[str:float]: class imbalance of the data. {“class_imbalance_value”: ci value, “feature_values_or_threshold”:[‘value’]}

get_standard_metric_result(**kwargs: Any) StandardMetricResult

Returns Standard Metric for class imbalance.

Returns

StandardMetricResult: class imbalance Metric in standard format.

merge(other_metric: ClassImbalance, **kwargs: Any) ClassImbalance

Merge two ClassImbalance metrics into one, without mutating the others.

Parameters

other_metricClassImbalance

Other ClassImbalance metric that needs to be merged.

Returns

ClassImbalance

A new instance of ClassImbalance containing insensitive group list,number of insensitive count, drop_nan_values, nan_count, feature_value_count and total_count imbalance value after merging.

nan_count: int = 0
total: int = 0

Module contents