Test Configuration ================== Up until now, we have seen computation which produces the metrics results. As a machine learning engineer/data scientist, I would like to apply certain tests/checks on the metrics to understand if the metric values are breaching certain thresholds. For e.g.: I might want to be alerted if the Minimum metric of a feature is beyond a certain threshold or Q1 Quartiles for a feature in a prediction run is not within the threshold of the corresponding value in reference profile or for a classification model the precision score is within 10% deviation from the precision score of the baseline run. Test/Test Suites enables comprehensive validation of customer's machine learning models and data via a suite of test and test suites for various types of use cases such as : - Data Integrity - Data Quality - Model Performance (Classification, Regression) - Drift - Correlation, etc. They provide a structured / easier way to add thresholds on metrics. This can be used for Notifications and alerts for continuous Model Monitoring allowing them to take remediative actions. * :ref:`How it works` * :ref:`Defining Tests for a run` - :ref:`Using Application config` - :ref:`Using Test config` * :ref:`Insights Test Types` - :ref:`Predicate-based Tests` - :ref:`Metric-based Tests` * :ref:`Understanding Test Configuration` - :ref:`Defining Feature Tests` - :ref:`Defining Dataset Tests` - :ref:`Defining Global Test Tags` - :ref:`Defining Predicate-based Tests` - :ref:`Defining Metric-based Tests` * :ref:`List of Available Tests` - :ref:`Predicate-based Tests` - :ref:`Metric-based Tests` * :ref:`Test Results` - :ref:`Test Summary` - :ref:`Test Result` .. _How it works: ----------------- How it works ----------------- 1. User has initiated a Baseline/Prediction run(s). 2. User works with Tests, Test Condition, Threshold , Test Results and Test Report. 3. Each Test has: * Test Condition (implicit or user provided). An example of user provided is >=, <=, etc. Implicit is used when running tests for a specific metric. * Threshold (either user provided or captured from reference profile). For eg: user can provide a value of 200 when evaluating Mean of a feature with greater than test * Test Configuration. Each test can take a test-specific config which tweaks its behavior. For eg: when using `TestGreaterThan` , user can decide whether to do a > or >= by setting appropriate config. 4. Tests are of various types allowing flexibility and ease of use: - :ref:`Predicate-based Tests` - :ref:`Metric-based Tests` 5. Tests are executed producing test evaluation results. Each test evaluation results consists of: - Test Name - Test Description - Test Status (Pass/Fail/Error) - Test Assertion (expected v actual) - System Tags - User-defined Tags - Test Configuration (if any) - Test Errors (if any) 7. Test results can be stored in a customer provided bucket. 8. Further post processors can be added to push the alerts to oci monitoring based on each test evaluation result. .. _Defining Tests for a run: ------------------------ Defining Tests for a run ------------------------ Tests can be provided in a declarative fashion using JSON format. All the tests need to be defined under a new key test_config in the Application Configuration or a Test Configuration. .. _Using Application config: ++++++++++++++++++++++++ Using Application config ++++++++++++++++++++++++ - All the tests need to be defined under a new key `test_config` in Application Configuration. .. code-block:: JSON { "input_schema": {...}, // other components go here "test_config": {} } .. _Using Test config: ++++++++++++++++++ Using Test config ++++++++++++++++++ - Test can be defined in separate test config where user would need to pass the monitor id and the test config to be used for the run. .. code-block:: JSON { "monitor_id": , "test_config": {} } .. note:: - The monitor config or application config and test config should be same, else a validation error would be raised. .. note:: - If test config is defined in the application config as well as a test config is also provided for a run using the TEST_CONFIG_LOCATION input parameter, then the test defined in the TEST_CONFIG_LOCATION input parameter would be evaluated. .. _Insights Test Types: ------------------- Insights Test Types ------------------- Before we take a deep dive into the test configuration schema, this section explains the Test Types. Currently, Insights supports the following Test Types: - Predicate-based Tests - Metric-based Tests .. _Predicate-based Tests: +++++++++++++++++++++ Predicate-based Tests +++++++++++++++++++++ - General-purpose test to evaluate single conditions against a single metric for a feature. - Each test provides a single predicates (test condition) of the form : `lhs rhs` - For eg: lets consider a test to evaluate whether `Mean` of a feature is greater than 100.23. In this case: - `lhs` is the value of `Mean` metric, - `rhs` is `100.23`, - `` is greater than (`>=`) - For eg: `TestGreaterThan` is a predicate-based test which tests if a metric value is greater than a specific threshold - For a list of all predicate-based tests and their examples, please refer to section: :ref:`List of Predicate-based Tests` - Allows fetching the compared value (`rhs`) from a dynamic source such as a reference profile .. _Metric-based Tests: ++++++++++++++++++ Metric-based Tests ++++++++++++++++++ - Tests specific to an Insights metric - Has in-built metric key and test condition - For eg: `TestIsPositive` is a metric-based test which works on the `IsPositive` metric only and tests if a feature has all positive values - For a list of all metric-based tests and their examples, please refer to section: :ref:`List of Metric-based Tests` - When no threshold values are provided, fetches the built-in metric values from reference profile .. note:: - The metric associated with any metric-based or predicate-based test that is configured by the user must be present in the config. For e.g.: The `Count` metric should be present in the config if user wishes to run `TestIsComplete` test. - If the metric associated with a particular metric-based or predicate-based test is not found during test execution, the test's status is set to `ERROR` and error details are added to the test result. .. _Understanding Test Configuration: ---------------------------------- Understanding Test Configuration ---------------------------------- We will now look at the details of the `test_config` key in the sections below. .. _Defining Feature Tests: ++++++++++++++++++++++ Defining Feature Tests ++++++++++++++++++++++ - All Insights Tests for a specific feature need to be defined under `feature_metric_tests` key. The general structure is as below: .. code-block:: JSON { "test_config": { "feature_metric_tests": [ { "feature_name": "Feature_1", "tests": [ { // Each test is defined here } ] }, { "feature_name": "Feature_2", "tests": [ { // Each test is defined here } ] } ] } } .. note:: - The feature name provided in the `feature_name` key must be present in the Profile i.e it should come from features defined in either `input_schema` or via conditional features - If the feature provided in `feature_name` is not found during test execution, the test's status is set to `ERROR` and error details are added to the test result .. _Defining Dataset Tests: ++++++++++++++++++++++++ Defining Dataset Tests ++++++++++++++++++++++++ - All Insights Tests for the entire dataset need to be defined under `dataset_metric_tests` key. - Dataset metric tests are evaluated against Dataset metrics. - The general structure is as below: .. code-block:: JSON { "test_config": { "dataset_metric_tests": [ { // Each test is defined here }, { // Each test is defined here } ] } } .. _Defining Global Test Tags: ++++++++++++++++++++++++++ Defining Global Test Tags ++++++++++++++++++++++++++ - User can set user-defined, free-form tags for all the tests in the `tags` key. - Both key and value can be any user-defined values of type `string` only. - These tags are then attached to each test and available in each test's `TestResult` via `user_defined_tags` property. - The general structure is as below: .. code-block:: JSON { "test_config": { "tags": { "tag_1": "tag_1_value", "tag_2": "tag_2_value" } } } .. _Defining Predicate-based Tests: ++++++++++++++++++++++++++++++++ Defining Predicate-based Tests ++++++++++++++++++++++++++++++++ - A Predicate-based test is defined in `feature_metric_tests` under `tests` key and in `dataset_metric_tests`. - The general structure is as shown below: .. code-block:: JSON { "test_name": "", "metric_key": "", "threshold_value": "<>", "threshold_source": "REFERENCE", "threshold_metric": "", "tags": { "key_1": "value_1" }, "config": {} } - The details of each of the above properties are described below: .. list-table:: :widths: 20 10 40 30 :header-rows: 1 * - Key - Required - Description - Examples * - test_name - Yes - * Insights-provided Test Name. * Must be one of the following names as defined in section :ref:`List of Available Tests` - `TestGreaterThan` * - metric_key - Yes - * Metric key on which to run the test evaluation. * Each Insights Metric is emitted in a Standard Metric Result format. The metric key must be one of the values in `variable_names` * If a metric has nore than one variables, qualify the metric key with Metric name. * For eg: consider `Quartiles` metric which emits the metric result as. To run test evaluation against lets say `q1` value, `metric_key` = . i.e `Quartiles.q1` .. code-block:: JSON { metric_name: 'Quartiles', variable_names: ['q1', 'q2', 'q3'], // other details omitted for brevity } - `Min`, `Quartiles.q1` * - threshold_value - Yes, if `threshold_metric` is not provided. Otherwise No - * A static user-defined threshold value against which the metric value is compared against * The type of threshold value is dependent on each predicate-based test * For eg: - For `TestIsBetween` user needs to provide a range of values as `[min, max]` - For `TestIsGreaterThan`, user needs to provide a single number value - `100.0`, `[200, 400]` * - threshold_source - No - * Set `threshold_source` to `REFERENCE` to evaluate metric value against the corresponding metric value from reference profile * When this is set, ensure reference profile is made available to the prediction run - Always set to `REFERENCE` * - threshold_metric - Yes, if `threshold_value` is not provided. Otherwise No - * Set `threshold_metric` to evaluate the metric value against another metric. * For eg: if one wants to test whether `Min` metric is greater than the `Mean` metric, set `metric_key` to `Min` and `threshold_metric` to `Mean` * When used in conjunction with `threshold_source` set to `REFERENCE`, the metric value for the metric provided in `threshold_metric` is fetched from reference profile - `Min`, `Quartiles.q1` * - tags - No - * User can set user-defined, free-form tags for a specific test in the `tags` key. * Both key and value can be any user-defined values of type `string` only. * These tags are then attached to the test and available in the test's `TestResult` via `user_defined_tags` property. - .. code-block:: JSON "tags": { "key_1": "value_1" } One can provide multiple tags in the above format. .. _Defining Metric-based Tests: ++++++++++++++++++++++++++++++++ Defining Metric-based Tests ++++++++++++++++++++++++++++++++ - A Metric-based test is defined in `feature_metric_tests` under `tests` key. - The general structure is as shown below: .. code-block:: JSON { "test_name": "", "threshold_value": "<>", "tags": { "key_1": "value_1" } } - The details of each of the above properties are described below: .. list-table:: :widths: 20 10 40 30 :header-rows: 1 * - Key - Required - Description - Examples * - test_name - Yes - * Insights-provided Test Name. * Must be one of the following names as defined in section :ref:`List of Available Tests` - `TestNoNewCategory` * - threshold_value - No - * A static user-defined threshold value against which the metric value is compared against * The type of threshold value is dependent on each predicate-based test * For eg: - For `TestIsComplete` user needs to provide a single number 100.0 - For `TestNoNewCategory`, user needs to provide a list of string values * When `threshold_value` is not provided, the general behavior is to fetch the corresponding metric from reference profile - `100.0`, `[200, 400]` * - tags - No - * User can set user-defined, free-form tags for a specific test in the `tags` key. * Both key and value can be any user-defined values of type `string` only. * These tags are then attached to the test and available in the test's `TestResult` via `user_defined_tags` property. - .. code-block:: JSON "tags": { "key_1": "value_1" } One can provide multiple tags in the above format. .. _List of Available Tests: ------------------------- List of Available Tests ------------------------- .. _List of Predicate-based Tests: ++++++++++++++++++++++++ Predicate-based Tests ++++++++++++++++++++++++ .. list-table:: :widths: 10 40 25 25 :header-rows: 1 * - Test Name - Test Description - Test Configuration - Examples * - TestGreaterThan - * Tests if the left value is greater than or equal to the right value. * Is of the form: `lhs >[=] rhs`, where lhs = left hand side and rhs = right hand side * Both left and right values must be one of int, float or boolean - * `strictly : bool` * When set to True, condition is >=, else condition is > * Default value is false - Tests whether Min metric of a feature >= 7500. .. code-block:: JSON { "test_name": "TestGreaterThan", "metric_key": "Min", "threshold_value": 7500, "config": { "strictly": true } } Tests whether Min metric of a feature > Median metric of the same feature .. code-block:: JSON { "test_name": "TestGreaterThan", "metric_key": "Min", "threshold_metric": "Median" } Tests whether Min metric of a feature > p25 i.e Q1 metric of the same feature .. code-block:: JSON { "test_name": "TestGreaterThan", "metric_key": "Min", "threshold_metric": "Quartiles.q1" } Tests whether Min metric of a feature > p25 i.e Q1 metric of reference profile .. code-block:: JSON { "test_name": "TestGreaterThan", "metric_key": "Min", "threshold_metric": "Quartiles.q1", "threshold_source": "REFERENCE" } Tests whether RowCount metric > RowCount of reference profile .. code-block:: JSON { "test_name": "TestGreaterThan", "metric_key": "RowCount", "threshold_source": "REFERENCE" } * - TestLessThan - * Tests if the left value is less than or equal to the right value. * Is of the form: `lhs <[=] rhs`, where lhs = left hand side and rhs = right hand side * Both left and right values must be one of int, float or boolean - * `strictly : bool` * When set to true, condition is <=, else condition is < * Default value is false - Tests whether Min metric of a feature <= 7500. .. code-block:: JSON { "test_name": "TestLessThan", "metric_key": "Min", "threshold_value": 7500, "config": { "strictly": true } } Tests whether Min metric of a feature < Median metric of the same feature .. code-block:: JSON { "test_name": "TestLessThan", "metric_key": "Min", "threshold_metric": "Median" } Tests whether Min metric of a feature < p25 i.e Q1 metric of the same feature .. code-block:: JSON { "test_name": "TestLessThan", "metric_key": "Min", "threshold_metric": "Quartiles.q1" } Tests whether Min metric of a feature < p25 i.e Q1 metric of reference profile .. code-block:: JSON { "test_name": "TestLessThan", "metric_key": "Min", "threshold_metric": "Quartiles.q1", "threshold_source": "REFERENCE" } * - TestEqual - * Tests if the left value is equal to the right value. * Is of the form: `lhs == rhs`, where lhs = left hand side and rhs = right hand side * Both left and right values must be one of int, float or boolean - None - Tests whether Min metric of a feature = 7500. .. code-block:: JSON { "test_name": "TestEqual", "metric_key": "Min", "threshold_value": 7500 } * - TestIsBetween - * Tests if a numerical value is between a minimum and maximum value * Is of the form: `min <[=] lhs <[=] max`, where lhs = left hand side, min and max are range of values * lhs must be one of int, float * rhs must be a list of 2 values, each of which must be one of int or float - * `strictly : bool` * When set to true, condition condition is (minimum value <= test value <= maximum value) * When set to false, condition condition is (minimum value < test value < maximum value) * Default value is false - Tests whether Min metric of a feature lies within the range 7500 to 8000 .. code-block:: JSON { "test_name": "TestIsBetween", "metric_key": "Min", "threshold_value": [7500, 8000], "config": { "strictly": true } } * - TestDeviation - * Tests if the deviation between two values is within threshold * Both left and right values must be one of int or float * Right value is fetched from the reference profile for the configured metric - * `deviation_threshold : float` * The threshold value that should be used to compare with * Default value is 0.1 (i.e 10%) - * Suppose Mean metric of a feature in prediction profile is 200.0 and 205.0 in reference profile * Deviation thrershold has been set to 10% or 0.10 * Deviation is calculated as 205.0 - 200.0 / 200.0 = 0.025 i.e 2.5% * Actual deviation is less than the deviation threshold i,e 0.025 < 0.10 .. code-block:: JSON { "test_name": "TestDeviation", "metric_key": "Mean", "config": { "deviation_threshold": 0.10 } } .. _List of Metric-based Tests: ++++++++++++++++++++++++ Metric-based Tests ++++++++++++++++++++++++ .. list-table:: :widths: 10 40 10 15 25 :header-rows: 1 * - Test Name - Test Description - Test Configuration - Metric - Examples * - TestIsComplete - * Tests whether completion percentage of a feature is greater than the threshold value (in percentage) * Threshold value can either be provided via `threshold_value` OR * Validated against the completion % in reference profile - None - `Count` - Tests whether completion percentage of a feature >= 95% i.e 95% of values are non-NaN .. code-block:: JSON { "test_name": "TestIsComplete", "threshold_value": 95.0 } Tests whether completion % of a feature >= completion % of the feature in reference profile .. code-block:: JSON { "test_name": "TestIsComplete" } * - TestIsMatchingInferenceType - * Tests whether all the values in a feature match a data type specified by the threshold value * Threshold value can either be provided via `threshold_value` OR * validated against the inference type in reference profile * Accepted values for `threshold_value`: Integer, String, Float, Boolean, None * Test errors out if `threshold_value` is `None` and no reference profile is provided - None - `TypeMetric` - Tests whether type of a feature is `Integer` .. code-block:: JSON { "test_name": "TestIsMatchingInferenceType", "threshold_value": "Integer" } Tests whether type of a feature matches the type in reference profile .. code-block:: JSON { "test_name": "TestIsMatchingInferenceType" } * - TestIsNegative - * Tests whether all the values in a feature are negative - None - `IsNegative` - .. code-block:: JSON { "test_name": "TestIsNegative" } * - TestIsPositive - * Tests whether all the values in a feature are positive - None - `IsPositive` - .. code-block:: JSON { "test_name": "TestIsPositive" } * - TestIsNonZero - * Tests whether all the values in a feature are non-zero - None - `IsNonZero` - .. code-block:: JSON { "test_name": "TestIsNonZero" } * - TestNoNewCategory - * Tests whether any new categories are found in a feature for a prediction run that are not present in `threshold_value`. * Test status is set to `FAILED` if new category(ies) are present * Threshold value can either be provided via `threshold_value` (must be a list) OR * Validated against the categories found in reference profile * Use the test for only categorical features - None - `TopKFrequentElements` - Tests whether categories in a feature match the `threshold_value` list values .. code-block:: JSON { "test_name": "TestNoNewCategory", "threshold_value": ["cat_a", "cat_b"] } Tests whether categories in a feature match the `values present in reference profile .. code-block:: JSON { "test_name": "TestNoNewCategory" } .. _Test Results: ---------------- Test Results ---------------- In this section, we will describe the test results returned after a successful run. .. _Test Summary: +++++++++++++++ Test Summary +++++++++++++++ Test Summary returns the following information about the executed tests. - Count of tests executed - Count of passed tests - Count of failed tests - Count of error tests - Tests error out when the test validation fails or error is encountered during test execution .. _Test Result: ++++++++++++++++++++++++ Test Result ++++++++++++++++++++++++ Each test returns a result in standard format which includes the following properties: .. list-table:: :widths: 10 50 40 :header-rows: 1 * - Key - Description - Example * - name - Name of the test - `TestGreaterThan`, `TestIsPositive` * - description - * Test description in a structured format * For predicate-based tests, the descriptions are structured in the following formats depending on test configuration. - `The of feature is . Test condition : [predicate condition] ` - `The of feature is . of feature is . Test condition : [predicate condition] ` - `The of feature is . of feature is in Reference profile. Test condition is [predicate condition] ` - `The is . is in Reference profile. Test condition is [predicate condition] ` - * The Min of feature feature_1 is 23.45. Test condition : 23.45 >= 4.5 * The Min of feature feature_1 is 23.45. Median of feature feature_1 is 34.5. Test condition : 23.45 >= 34.5 * The Min of feature feature_1 is 23.45. Min of feature feature_1 is 4.5 in Reference profile. Test condition is 23.45 >= 4.5 * The RMSE is 23.45. RMSE is 12.34 in Reference profile. Test condition is 23.45 >= 12.34 * The Min of feature feature_1 is 23.45. Test Condition: 23.45 deviates by +/- 4% from 1.2 * - status - * Each test when executed produces a status which is one of the following: PASSED, FAILED, ERROR * When test passes a given condition, status is set to `PASSED` * When test fails a given condition, status is set to `FAILED` * When test exeuction encounters an error, status is set to `ERROR` - * - Test Assertion Info - Each test returns the `expected` and `actual` information which helps in understanding why a particular passed/failed - * - error - When a test encounters error(s) during its execution, returns an error description -