mlm_insights.core.metrics.datetime_metrics package

Submodules

mlm_insights.core.metrics.datetime_metrics.common module

mlm_insights.core.metrics.datetime_metrics.common.get_maximum(max_date_1: str, max_date_2: str, errors: str, date_format: str = '') str
mlm_insights.core.metrics.datetime_metrics.common.get_maximum_date(column: Series, errors: str, date_format: str = '', unit: str = '', origin: str = '') Tuple[str, int]
mlm_insights.core.metrics.datetime_metrics.common.get_minimum(min_date_1: str, min_date_2: str, errors: str, date_format: str = '') str
mlm_insights.core.metrics.datetime_metrics.common.get_minimum_date(column: Series, errors: str, date_format: str = '', unit: str = '', origin: str = '') Tuple[str, int]

mlm_insights.core.metrics.datetime_metrics.constants module

mlm_insights.core.metrics.datetime_metrics.datetime_duration module

class mlm_insights.core.metrics.datetime_metrics.datetime_duration.DateTimeDuration(config: ~typing.Dict[str, ~mlm_insights.constants.definitions.ConfigParameter] = <factory>, min_date: str = '', max_date: str = '', unit: str = 's', errors: str = 'coerce', date_format: str = '%Y-%m-%d %H:%M:%S', origin: str = 'unix', duration_unit: str = 'D', valid_duration_units: ~typing.List[str] = <factory>, invalid_rows_count: int = 0)

Bases: MetricBase

Feature Metric to compute the longest duration in terms of min and max date values (MAX - MIN) in a feature
It takes into consideration removing NaN values while computing total count
It is an exact univariate metric which can process only DATETIME & TIMESTAMP data types.

Configuration

duration_unit: str
  • Unit for the output duration. Must be one of Y, M, W, D, h, m, s

  • Default unit is D i.e. days

errors: str
  • Specify how to handle date type in case datetime is non-parsable. Default is “coerce” i.e. treat non-parseable dates as NaT

NOTE: For details on arguments, check https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html NOTE: This metric relies on the configuration provided in feature schema date_format: str

  • Format string for datetime. Default is “%Y-%m-%d %H:%M:%S”

unit: str
  • In case input is Timestamp, specify the unit. Default is “s”

origin: str
  • In case input is Timestamp, specify the origin. Default is “unix”

Returns

max_duration: float
  • Longest duration for the date time feature i.e. MAX - MIN

invalid_rows_count: int
  • Count of the values which are not valid date times. This includes: missing values, invalid dates

and date values whose format are different from the one specified

date_format: str
  • Format string used in output

unit: str
  • Unit specified in config

origin: str
  • Origin specified in config

duration_unit: str
  • Duration unit specified in config

Examples

import pandas as pd

from mlm_insights.builder.builder_component import MetricDetail, EngineDetail
from mlm_insights.builder.insights_builder import InsightsBuilder
from mlm_insights.constants.types import FeatureType, DataType, VariableType, ColumnType
from mlm_insights.core.metrics.datetime_metrics.datetime_duration import DateTimeDuration
from mlm_insights.core.metrics.metric_metadata import MetricMetadata
import pandas as pd

def main():
    input_schema = {
        'date_created': FeatureType(
            data_type=DataType.DATETIME,
            variable_type=VariableType.DATETIME,
            column_type=ColumnType.INPUT,
            config={'date_format': '%Y-%m-%d %H:%M:%S'})
    }
    data_frame = pd.DataFrame({'date_created': ["2024-08-05", "2025-01-22", "2024-11-10", None]})
    metric_details = MetricDetail(univariate_metric=
                                  {"date_created": [MetricMetadata(klass=DateTimeDuration,
                                                                   config={CONFIG_DURATION_UNIT_KEY: 'D'})]},
                                  dataset_metrics=[])

    runner = InsightsBuilder().                 with_input_schema(input_schema).                 with_data_frame(data_frame=data_frame).                 with_metrics(metrics=metric_details).                 with_engine(engine=EngineDetail(engine_name="native")).                 build()

    profile_json = runner.run().profile.to_json()
    feature_metrics = profile_json['feature_metrics']
    print(feature_metrics['date_created']["DateTimeDuration"])


if __name__ == "__main__":
    main()

#
# Returns the standard metric result as:
# {
#     'metric_name': 'DateTimeDuration',
#     'metric_description': 'Feature Metric to compute the longest duration in terms of min and max date values',
#     'variable_count': 5,
#     'variable_names': ['max_duration', 'invalid_rows_count', 'date_format', 'unit', 'origin', 'duration_unit],
#     'variable_types': ['DATETIME', 'DISCRETE', 'NOMINAL', 'NOMINAL', 'NOMINAL'],
#     'variable_dtypes': ['FLOAT', 'INTEGER', 'STRING', 'STRING', 'STRING'],
#     'variable_dimensions': [0, 0, 0, 0, 0, 0],
#     'metric_data': [12.0, 0, '%Y-%m-%d %H:%M:%S', 's', 'unix', 'D'],
#     'metadata': {}
# }
compute(column: Series, **kwargs: Any) None
Computes the minimum and maximum datetime for the dataset. In case of a partitioned dataset,

computes the minimum and maximum datetime for the specific partition

Parameters

columnpd.Series

Input column.

classmethod create(config: Dict[str, ConfigParameter] | None = None) DateTimeDuration

Factory Method to create an object. The configuration will be available in config.

Returns

An Instance of DateTimeDuration.

date_format: str = '%Y-%m-%d %H:%M:%S'
duration_unit: str = 'D'
errors: str = 'coerce'
get_result(**kwargs: Any) Dict[str, Any]

Returns minimum DateTimeDuration metric.

Returns

string: minimum datetime in specified format.

get_standard_metric_result(**kwargs: Any) StandardMetricResult

Returns Standard Metric for DateTimeDuration metric.

Returns

StandardMetricResult: DateTimeDuration Metric in standard format.

classmethod get_supported_variable_types() List[VariableType]

Method to retrieve the list of Feature Variable type supported for the metric

Returns

List of Feature Variable type supported by the metric

invalid_rows_count: int = 0
max_date: str = ''
merge(other_metric: DateTimeDuration, **kwargs: Any) DateTimeDuration

Merge two DateTimeDuration metric into one, without mutating the others.

Parameters

other_metricDateTimeDuration

Other DateTimeDuration that need be merged.

Returns

DateTimeDuration

A new instance of DateTimeDuration after merging.

min_date: str = ''
origin: str = 'unix'
unit: str = 's'
valid_duration_units: List[str]

mlm_insights.core.metrics.datetime_metrics.datetime_max module

class mlm_insights.core.metrics.datetime_metrics.datetime_max.DateTimeMax(config: ~typing.Dict[str, ~mlm_insights.constants.definitions.ConfigParameter] = <factory>, max_date: str = '', unit: str = 's', errors: str = 'coerce', date_format: str = '%Y-%m-%d %H:%M:%S', origin: str = 'unix', invalid_rows_count: int = 0)

Bases: MetricBase

Feature Metric to compute maximum datetime in a column
It takes into consideration removing NaN values while computing total count
It is an exact univariate metric which can process only DATETIME & TIMESTAMP data types.

Configuration

errors: str
  • Specify how to handle date type in case datetime is non-parsable.

NOTE: For details on arguments, check https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html NOTE: This metric relies on the configuration provided in feature schema date_format: str

  • Format string for datetime, same format will be used in output. Default is “%Y-%m-%d %H:%M:%S”

unit: str
  • In case input is Timestamp, specify the unit. Default is “s”

origin: str
  • In case input is Timestamp, specify the origin. Default is “unix”

NOTE: For details on arguments, check https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html

Returns

datetime_max: str
  • Maximum datetime

invalid_rows_count: int
  • Count of the values which are not valid date times. This includes: missing values, invalid dates

and date values whose format are different from the one specified

date_format: str
  • Format string used in output

unit: str
  • Unit specified in config

origin: str
  • Origin specified in config

Examples

import pandas as pd

from mlm_insights.builder.builder_component import MetricDetail, EngineDetail
from mlm_insights.builder.insights_builder import InsightsBuilder
from mlm_insights.constants.types import FeatureType, DataType, VariableType, ColumnType
from mlm_insights.core.metrics.datetime_metrics.datetime_max import DateTimeMax
from mlm_insights.core.metrics.metric_metadata import MetricMetadata
import pandas as pd

def main():
    input_schema = {
        'date_created': FeatureType(
            data_type=DataType.DATETIME,
            variable_type=VariableType.DATETIME,
            column_type=ColumnType.INPUT,
            config={'date_format': '%Y-%m-%d %H:%M:%S'})
    }
    data_frame = pd.DataFrame({'date_created': ["2024-08-05", "2025-01-22", "2024-11-10", None]})
    metric_details = MetricDetail(univariate_metric=
                                  {"date_created": [MetricMetadata(klass=DateTimeMax)]},
                                  dataset_metrics=[])

    runner = InsightsBuilder().                 with_input_schema(input_schema).                 with_data_frame(data_frame=data_frame).                 with_metrics(metrics=metric_details).                 with_engine(engine=EngineDetail(engine_name="native")).                 build()

    profile_json = runner.run().profile.to_json()
    feature_metrics = profile_json['feature_metrics']
    print(feature_metrics['date_created']["DateTimeMax"])


if __name__ == "__main__":
    main()

# Returns the standard metric result as:
#    {
#     'metric_name': 'DateTimeMax',
#     'metric_description': 'Feature Metric to compute maximum date value',
#     'variable_count': 4,
#     'variable_names': ['datetime_max', 'invalid_rows_count', 'date_format', 'unit', 'origin'],
#     'variable_types': ['DATETIME', 'DISCRETE', 'NOMINAL', 'NOMINAL', 'NOMINAL'],
#     'variable_dtypes': ['STRING', 'INTEGER', 'STRING', 'STRING', 'STRING'],
#     'variable_dimensions': [0, 0, 0, 0, 0],
#     'metric_data': ['2025-01-22 00:00:00', 0, '%Y-%m-%d %H:%M:%S', 's', 'unix'],
#     'metadata': {}
# }
compute(column: Series, **kwargs: Any) None

Computes the maximum datetime for the dataset. In case of a partitioned dataset, computes the maximum datetime for the specific partition

Parameters

columnpd.Series

Input column.

classmethod create(config: Dict[str, ConfigParameter] | None = None) DateTimeMax

Factory Method to create an object. The configuration will be available in config.

Returns

An Instance of DateTimeMax.

date_format: str = '%Y-%m-%d %H:%M:%S'
errors: str = 'coerce'
get_result(**kwargs: Any) Dict[str, Any]

Returns maximum DateTimeMax metric.

Returns

string: maximum datetime in specified format.

get_standard_metric_result(**kwargs: Any) StandardMetricResult

Returns Standard Metric for DateTimeMax metric.

Returns

StandardMetricResult: DateTimeMax Metric in standard format.

classmethod get_supported_variable_types() List[VariableType]

Method to retrieve the list of Feature Variable type supported for the metric

Returns

List of Feature Variable type supported by the metric

invalid_rows_count: int = 0
max_date: str = ''
merge(other_metric: DateTimeMax, **kwargs: Any) DateTimeMax

Merge two DateTimeMax metric into one, without mutating the others.

Parameters

other_metricDateTimeMax

Other DateTimeMax that need be merged.

Returns

DateTimeMax

A new instance of DateTimeMax after merging.

origin: str = 'unix'
unit: str = 's'

mlm_insights.core.metrics.datetime_metrics.datetime_min module

class mlm_insights.core.metrics.datetime_metrics.datetime_min.DateTimeMin(config: ~typing.Dict[str, ~mlm_insights.constants.definitions.ConfigParameter] = <factory>, min_date: str = '', unit: str = 's', errors: str = 'coerce', date_format: str = '%Y-%m-%d %H:%M:%S', origin: str = 'unix', invalid_rows_count: int = 0)

Bases: MetricBase

Feature Metric to compute minimum datetime in a column
It takes into consideration removing NaN values while computing total count
It is an exact univariate metric which can process only DATETIME & TIMESTAMP data types.

Configuration

errors: str
  • Specify how to handle date type in case datetime is non-parsable.

NOTE: For details on arguments, check https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html NOTE: This metric relies on the configuration provided in feature schema date_format: str

  • Format string for datetime, same format will be used in output. Default is “%Y-%m-%d %H:%M:%S”

unit: str
  • In case input is Timestamp, specify the unit. Default is “s”

origin: str
  • In case input is Timestamp, specify the origin. Default is “unix”

Returns

datetime_min: str
  • Minimum datetime

invalid_rows_count: int
  • Count of the values which are not valid date times. This includes: missing values, invalid dates

and date values whose format are different from the one specified

date_format: str
  • Format string used in output

unit: str
  • Unit specified in config

origin: str
  • Origin specified in config

Examples

import pandas as pd

from mlm_insights.builder.builder_component import MetricDetail, EngineDetail
from mlm_insights.builder.insights_builder import InsightsBuilder
from mlm_insights.constants.types import FeatureType, DataType, VariableType, ColumnType
from mlm_insights.core.metrics.datetime_metrics.datetime_min import DateTimeMin
from mlm_insights.core.metrics.metric_metadata import MetricMetadata
import pandas as pd

def main():
    input_schema = {
        'date_created': FeatureType(
            data_type=DataType.DATETIME,
            variable_type=VariableType.DATETIME,
            column_type=ColumnType.INPUT,
            config={'date_format': '%Y-%m-%d %H:%M:%S'})
    }
    data_frame = pd.DataFrame({'date_created': ["2024-08-05", "2025-01-22", "2024-11-10", None]})
    metric_details = MetricDetail(univariate_metric=
                                  {"date_created": [MetricMetadata(klass=DateTimeMin)]},
                                  dataset_metrics=[])

    runner = InsightsBuilder().                 with_input_schema(input_schema).                 with_data_frame(data_frame=data_frame).                 with_metrics(metrics=metric_details).                 with_engine(engine=EngineDetail(engine_name="native")).                 build()

    profile_json = runner.run().profile.to_json()
    feature_metrics = profile_json['feature_metrics']
    print(feature_metrics['date_created']["DateTimeMin"])


if __name__ == "__main__":
    main()

#
# Returns the standard metric result as:
# {
#     'metric_name': 'DateTimeMin',
#     'metric_description': 'Feature Metric to compute minimum date value',
#     'variable_count': 4,
#     'variable_names': ['datetime_min', 'invalid_rows_count', 'date_format', 'unit', 'origin'],
#     'variable_types': ['DATETIME', 'DISCRETE', 'NOMINAL', 'NOMINAL', 'NOMINAL'],
#     'variable_dtypes': ['STRING', 'INTEGER', 'STRING', 'STRING', 'STRING'],
#     'variable_dimensions': [0, 0, 0, 0, 0],
#     'metric_data': ['2024-08-05 00:00:00', 0, '%Y-%m-%d %H:%M:%S', 's', 'unix'],
#     'metadata': {}
# }
compute(column: Series, **kwargs: Any) None

Computes the minimum datetime for the dataset. In case of a partitioned dataset, computes the minimum datetime for the specific partition

Parameters

columnpd.Series

Input column.

classmethod create(config: Dict[str, ConfigParameter] | None = None) DateTimeMin

Factory Method to create an object. The configuration will be available in config.

Returns

An Instance of DateTimeMin.

date_format: str = '%Y-%m-%d %H:%M:%S'
errors: str = 'coerce'
get_result(**kwargs: Any) Dict[str, Any]

Returns minimum DateTimeMin metric.

Returns

string: minimum datetime in specified format.

get_standard_metric_result(**kwargs: Any) StandardMetricResult

Returns Standard Metric for DateTimeMin metric.

Returns

StandardMetricResult: DateTimeMin Metric in standard format.

classmethod get_supported_variable_types() List[VariableType]

Method to retrieve the list of Feature Variable type supported for the metric

Returns

List of Feature Variable type supported by the metric

invalid_rows_count: int = 0
merge(other_metric: DateTimeMin, **kwargs: Any) DateTimeMin

Merge two DateTimeMin metric into one, without mutating the others.

Parameters

other_metricDateTimeMin

Other DateTimeMin that need be merged.

Returns

DateTimeMin

A new instance of DateTimeMin after merging.

min_date: str = ''
origin: str = 'unix'
unit: str = 's'

Module contents