Release Notes 

24.3.0 

Features and Improvements 

New: Added explanations for forecasting predictions! (Comparative Feature Importance)
- Added a model-agnostic prediction explainer to support forecasting tasks.
- For any data point within the forecasting horizon, this explainer can indicate which factors increased or decreased the model’s prediction compared to a previous reference point.
- The generated explanations can be retrieved either with show-in-notebook (designed based on a custom waterfall plot) or as a DataFrame.

Bug fixes 

Fixed a bug in initializing the Ray engine on Mac OS
Resolved a bug where plot_forecast would not show y_train, on series without date-time index
Fixed a bug that was causing Pipeline.refit to always reuse the same dataset given as input to Pipeline.train , when the two functions were called in sequence
Fixed multiple small bugs to improve reproducibility of results from AutoML.
Fixed a race condition occurring when model training jobs are cancelled (e.g., due to time-budgets)

24.2.0 

Features and Improvements 

Added new AutoRecommender class for supporting recommendation tasks
- Added support for ALS, BPR, ItemKNN, and Trexx models.
Made enhancements to time-series preprocessing and ML forecasting models to improve running time and quality of forecast on long time-series and those of irregular periodicity/seasonality.

Bug fixes 

Added option to provide fixed hyperparameters in a model’s search space. A mix of tunable and non-tunable hyperparameters is also supported.
Fixed a bug that was occasionally causing AutoMLx to raise an exception in case of small time budgets or long computations.
Fixed HyperGD indefinite stall when the parameter space contains a categorical variable with only one possible value
Fixed a bug that was incorrectly raising import errors on some missing optional dependencies if they were not installed.
Fixed a bug that was causing forecasting on time-series with int64 index, and auto cross validation to fail.
Fixed the set of supported search strategies for hyperparameter optimization
Fixed a bug that was causing AutoMLx to select more computationally expensive models in case of small time budgets compared to previous versions.

24.1.0 

Features and Improvements 

Added regression-based ML models for forecasting: ExtratreesForecaster, LGBMForecaster, and XGBForecaster.
Added new train_model and evaluate_model_quality functions to simplify AutoML for business users, which feature:
- The ability to accept data as either a path to a CSV file or a pandas dataframe
- The train_model function will automatically identify, tune, train and return the best model it can for the provided data.
- The evaluate_model_quality function will return the score of the given model on a new, user-provided dataset.
Added James-Stein encoding as a default categorical encoder for classification and regression tasks

Bug fixes 

Fixed a bug that was causing feature selection to fail if any of the feature ranking computations failed, even though only one is required.
Fixed a bug that was causing the cache directory cleanup to occasionally fail with the Ray backend engine
Fixed a bug in AutoML that caused sub-optimal models to be returned occasionally when n_algos_tuned > 1 .
Fixed a bug that caused the SARIMAX forecasting model to converge to a poor configuration and return NaN values during model tuning.

Possibly breaking changes 

Calling automlx.init(logger=None) will no longer initialize handlers for python’s root logger. Backward compatibility is achieved by calling automlx.init(logger="auto") (default behaviour, equivalent to automlx.init() ), which will intialize the root logger to log to standard output at the specified loglevel.
The argument summary_frame for AutoForecaster.plot_forecast was renamed to predictions .

Miscellaneous 

version of sktime library was upgraded to 0.24.0.

23.4.1 

Bug fixes 

Fixed incorrect package meta-data.

23.4.0 

Features and Improvements 

Added support for image classification task
- Added support for Torch vision ResNet and EfficientNet models
- Image data is lazily loaded from disk
Added new Ray-based engine
- Support for single-machine and distributed execution of multiple concurrent jobs/trials
- Includes utilities to control the AutoMLx temporary caching directory and ray object spilling settings
- Includes support for caching image dataset transformations to disk. Utilities for controlling the cache directory and related security settings are provided.
Added new express AutoClassifier, AutoRegressor, AutoForecaster and AutoAnomalyDetector classes – for example, can be imported with from automlx import AutoClassifier
Adaptive sampling is now skipped in AutoML when it is not needed (that is, if feature selection, HPO and threshold tuning are not active).
AutoML Pipeline now accepts a search_strategy parameter, which determines the search algorithm used by the tuning step. These include all sampling strategies from Optuna, for example, TPEs and NSGA-II.
Added ModelBiasMitigator , a Bias Mitigation tool to help improve a trained model’s fairness metric score. It can be imported with from automlx.fairness.bias_mitigation import ModelBiasMitigator
Added a new log level, sensitive_info (15), which is used to prevent exposing sensitive information in higher log levels (info, warning, etc.)
Threshold tuning has been improved to scale the prediction probabilities instead of modifying the prediction threshold. This means that model prediction probabilities are more interprettable when threshold tuning is enabled.
The time budget for individual AutoML steps can now be controlled by passing a dictionary with the budget for each individual step to the AutoML time_budget argument.
Added support for TreeSHAP, which provides fast local feature importance explanations for tree-based models.
Added install options automlx[classic], automlx[explain] alongside automlx[forecasting], automlx[onnx], automlx[deep-learning], and automlx[viz]. Install options create minimal sized wheels for the associated task. You can overload install options if combined functionality is desired. e.g., automlx[forecasting,viz].
Added enhancements to speed up adaptive sampling.
Improvements (for example, lazy loading and prolific, intelligent sampling) to enable AutoML to run on very large datasets (for example, one billion rows) for classification and regression.
Enhanced local feature importance to compute explanations in parallel when multiple rows are explained together.

Bug fixes 

Fixed a bug in SVC and LinearSVC models that caused prediction probabilities (but not predicted labels) to change depending on the rows passed to Pipeline.predict_proba .
The AutoML Pipeline now raises a warning instead of automatically dropping slow models for large datasets, if the user explicitly passes them into the model_list argument.
Fixed a bug in the local feature importance and counterfactual explainers, ensuring target labels can be passed as strings as well as integers.
Addressed a bug related to the rendering of ipywidgets that prevented some explainer visualizations from loading.

Possibly breaking changes 

The AutoMLx Package now needs to be imported as import automlx instead of import automl
Removed support for the following deprecated items:
- Internal (never-documented) attributes of the AutoML pipeline.
- The dask and spark execution engines and related options.
- The ModelTune interface.
- All Pipeline attributes matching *_trials_ , which contain information about the trials performed by the AutoML pipeline. These are replaced by two new dataframe attributes completed_trials_summary_ and completed_trials_detailed_ ,.
- AutoML optimization levels 1 and 2.
- The Pipeline attribute selected_features_ . Instead, users should use selected_features_names_ or selected_features_names_raw_ to access the names of the selected engineered or original features, respectively.
ONNX conversion:
- ONNX models produced from Pipeline objects now take as input a dictionary of Numpy arrays instead of a single tensor. Every array is an input column from the prediction dataframe
The y argument within the explain_prediction method of the tabular explainer is deprecated.

23.2.3 

Possibly breaking changes 

The automlx package has been renamed to “oracle-automlx”. You can still import the package with import automl ; however, you will need to install it as pip install oracle-automlx .

23.2.2 

Bug fixes 

Fixed a bug that was causing logging messages to be written to stderr rather than stdout by default

23.2.1 

Features and Improvements 

Added install options automlx[forecasting], automlx[onnx], and automlx[deep-learning] alongside automlx[viz]. Install options create minimal sized wheels for the associated task. You can overload install options if combined functionality is desired. e.g., automlx[forecasting,viz].

Bug fixes 

Fixed bug where ETSForecaster could fail the entire pipeline when it fails to convergence.
Fixed bug which causes pipeline to set forecast horizon to zero when forecasting short length time series (less than 8 datapoints).
Fixed bug which could cause model fit failure for some Seasonal Decompose (e.g., STL) models for series which have short length (less than 3 times seasonality period).
Fixed bug where BoxCox transformer could produce NaNs as the result of inverse transformation.
Fixed a bug that caused the advanced feature importance sampling strategies to raise an exception.

Possibly breaking changes 

Deep-learning models for classification (TorchMLPClassifier, CatboostClassifier, TabNetClassifier), regression (TorchMLPRegressor) and anomaly detection (AutoEncoderOD) now require install option automlx[deep-learning].
If a logger is not pre-initialized or a loglevel is not explicitly stated in init(), then we will log to stderr as is the default behavior in the logging module of Python Standard Library.
Changed the initialization of the logging module to:
- no longer log to file by default;
- not overwrite the global logging configuration if it was already setup.

23.2.0 

Features and Improvements 

Added support for TabNet classifier.
- Training TabNet with CPUs is slow, so it is disabled by default until GPU support is added.
- To enable TabNet, add ‘TabNetClassifier’ to the model_list when initializing the AutoML Pipeline.
New counterfactual Explainer (ACE)
- Added the AutoMLx Counterfactual Explainer (ACE) for classification and anomaly detection tasks.
- ACE is faster and finds more valid counterfactuals than DiCE.
- It guarantees to find a counterfactual for each query instance if the reference dataset set contains an example with the desired class.
Fairness Feature Importance is now available for tabular datasets! MLExplainer has a new explain_model_fairness() function to compute global feature importance attributions for fairness metrics.
Added threshold tuning for binary and multi-class classification tasks. Threshold Tuning can be enabled by passing threshold_tuning=True to the Pipeline object when it is created.
Python 3.10 support added.

Deprecations 

Removed support for Uber Orbit forecaster due to in-built bayesian inference engine instability.
Added deprecation warnings to objects that will be removed or replaced in 23.4.0.
- Deprecations include:
  - Internal (never-documented) attributes of the AutoML pipeline.
  - The dask and spark execution engines and related options.
  - The ModelTune interface. Similar functionality can be achieved by using the AutoML pipeline and disabling all stages except the tuning stage.
  - All Pipeline attributes matching *_trials_ , which contain information about the trials performed by the AutoML pipeline. These will be replaced by two new dataframe attributes completed_trials_summary_ and completed_trials_detailed_ ,.
  - AutoML optimization levels 1 and 2.
  - The Pipeline attribute selected_features_ . Instead, users should use selected_features_names_ or selected_features_names_raw_ to access the names of the selected engineered or raw features, respectively.
Deprecation warnings can be suppressed using from automl import init; init(check_deprecation_warnings=False)

Miscellaneous 

Bump packages
- fbprophet==0.7.1 to prophet==1.1.2
- torch to 1.13.1
- onnx to 1.12.0
- onnxruntime to 1.12.1

Possibly breaking changes 

score_metric is no longer accepted in the MLExplainer factory function. It is now an optional argument to the TabularExplainer ’s explain_model and explain_model_fairness methods.

23.1.1 

Features and Improvements 

Unsupervised anomaly detection
- Implemented N-1 experts for hyperparameter tuning
- Added N-1 experts-based contamination factor identification
Overhauled package documentation

Bug fixes 

Fixed a bug in feature importance explainers for when the dataset contains feature names that are numpy integers and an AutoML pipeline is being explained.

23.1.0 

Features and Improvements 

Fairness metrics are now available to measure bias in both datasets and trained models. Fairness metrics can be imported from automl.fairness.metrics .
Explanations can now be computed from custom user-defined metrics.
Introduced max_tuning_trials option that controls maximum HPO trials per algorithm.
New explainer (Counterfactual)
- Added a model-agnostic counterfactual explainer for classification, regression, and anomaly detection tasks.
- The explainer can find diverse counterfactuals for the desired prediction, while the user is able to choose which features to vary and their permitted range.
- Counterfactual explanations can be visualized either with What-if explainer or dataframe.
Added support of surrogate explainer for local text explanation.
Code updated to comply with security checks with Python Bandit.
Added catboost as a new classification model.

Bug fixes 

Fixed a bug on LIME’s explanation Bar Chart where annotations were misplaced for dataset stringified integers feature names.
Fixed a bug where features would be placed incorrectly on plots’ axis when trying to visualize explanations for categorical features.
Deleted internal state to reduce memory consumption in explanations
Fixed a bug where dataset downcasting to int32 and float32 was only applied during training but not for doing the final fit or collecting predictions.
Preprocessing of datetime columns is now much faster.
Fixed a bug where dependencies of automl would on import initialize a rootLogger preventing subsequent applications from using logging.basicConfig() .
Fixed a bug where the AutoTune step would override default params even if it did not find any better params than the default ones.
Propagated dataset downcasting to all relevant pipeline stages, potentially reducing memory consumption for very large datasets.
Changed AutoTune behavior to consider using default hyperparameters scored at the end of feature selection step if they performed better than those AutoTune tried within timebudget. .

Deprecations 

Added deprecation warnings for the following:
- Some attributes in the pipeline that are not publicly documented.
- Attributes of the pipeline containing trial information, which were renamed to completed_trials_summary_ and completed_trials_detailed_ . The stage column is renamed to step .
- Optimization levels of 1 and 2.
- Dask and spark engines and engine options.
- The ModelTune class.
To disable the warnings:
- In the initialization, set the argument check_deprecation_warnings to False.

22.4.2 

Features and Improvements 

Added support for explaining selected features in local and global permutation importance, as well as automatically detecting which features were selected by an AutoML model.

Bug fixes 

Fixed a bug in local perturbation-based feature attribution explainers for the n_iter='auto' option that caused the iterations to be set too high.
Enhanced performance of local feature importance explainers to improve running times by batching inference calls together.

22.4.1 

Features and Improvements 

Pipeline now accepts a min_class_instances input argument to manually specify the number of examples every class must have when doing classification. The value for min_class_instances must be at least 2.

Bug fixes 

Fixed a bug where IPython and ipywidgets are not properly guarded as an optional dependencies which make them required.
Fixed a bug introduced by last dependency update which caused fbprophet to not produce forecasts with correct index type, when fbprophet was installed manually.

22.4.0 

Features and Improvements 

New feature dependence explainers
- Added an Accumulated Local Effects (ALE) explainer
- ALE explanations can be computed for up to two features if at least one is not categorical.
New explainer (What-IF)
- Added a What-IF explainer for classification and regression tasks
- What-IF explanations include exploration of the behavior of an ML model on a single sample as well as on the entire dataset.
- Sample exploration (edit a sample value and see how the model predictions changes) and relationships’ visualization (how a feature is related to predictions or other features) are supported.
New feature importance aggregators
- Added ALFI (Aggregate Local Feature Importance) that gives a visual summary of multiple local explanations.
New local feature importance explainer
- Added support for surrogate-based (LIME+) local feature importance explainers

Bug fixes 

Import failure due to CUDA: The package no longer crashes when imported on a machine with CUDA installed.
Fixed a bug where TorchMLPClassifier would fail when trying to predict a single instance.
Fixed a bug where OracleAutoMLx_Forecasting.ipynb would fail if visualization packages were not already installed.
Fixed a bug that caused the pipeline.transform to raise an exception if a single row was passed.
Explanation documentation
- Our documentation website ( http://automl.oraclecorp.com/ ) now includes documentation for the explanation objects returned by our explainers.
Enhanced performance of local feature importance explainers to address long running times.
Improved visualization of facet for the columns with cardinality equal to 1 by selecting the bars’ width and pads properly.

22.3.0 

Features and Improvements 

New Explainer
- Added support for KernelSHAP (a new feature importance tabulator), which provides fast approximations for the Shapley feature importance method.
Support ARM architecture ( aarch64 )
- Released platform-specific wheel file for ARM machines.

Miscellaneous 

Clarified documentation on the accepted data formats for input datasets and added a more meaningful corresponding error message.

22.2.0 

Features and Improvements 

New profiler
- Profiler tracks CPU and memory utilization
Timeseries forecasting pipeline
- Added the support for multivariate datasets
- Added the support for exogenous variables
- Enhanced heteroskedasticity detection technique
- Applied Box-Cox transform-inverse_transform with params determined via MLE to handle heteroskedasticity
Explainers / MLX integration
- New global text explainer
  - Added support
- New feature importance attribution explainers
  - Added several local and global feature importance explainers, including permutation importance, exactly Shapley, and SHAP-PI.
  - The explainers support for classification, regression and anomaly detection
  - The explainers can also be configured to explain the importance of features to any model (explanation_type=’observational’) as well as for a particular model (explanation_type=’interventional’).
  - Observational explanations are supported for all tasks; interventional explanations are only supported for classification and regression.
- New feature dependence explainers
  - Added a partial dependence plot (PDP) and individual conditional expectations (ICE) explainer
  - PDP explanations include visualization support for up to 4 dimensions. PDPs in higher dimension can be returned as dataframes.
Unsupervised Anomaly Detection
- Added N-1 Experts: a new experimental metric for UAD Model Selection
Documentation
- Added the description of init function of the automl to documentation
- Cleaned up documentation for more consistency among different sections and added cross-references

Bug fixes 

Timeseries forecasting pipeline
- Statsmodel exception for some frequencies, users are now able to pass in timeperiod as a parameter
Preprocessing
- Datetime preprocessor
  - Fixed the bug regarding column expansion and None/Null/Nan values
- Standard preprocessor refitting
  - The standard preprocessor used to first be fit on a subsample of the training set, and then re-fit at the very end of the pipeline using the full training set. This occasionally resulted in a different number of engineered features being produced. As a result, the features identified during the model selection module could no longer exist. The standard preprocessor is now fit only once.
ONNX predictions inconsistency
- Changed the ONNX conversion function to reduce the difference between the ONNX dumped model and the original pipeline object predictions
- Improved ONNX conversion runtime
- ONNX conversion now only requires a sample from the training or test set as input. This sample is used to infer the final types and shapes

Possibly breaking changes 

Removed matplotlib as a dependency of the AutoMLx package
- Forecasting predictions can now instead be visualized only using plotly using the same interface as before, automl.utils.plot_forecast. The alternate visualizations that were provided with plotly using automl.utils.plot_forecast_interactive has been removed.
Updated the AutoMLx package dependencies
- All dependency versions have been reviewed and updated to address all known CVEs
- A few unneeded dependencies have also been removed.