biofefi.services.feature_importance package

Submodules

biofefi.services.feature_importance.ensemble_methods module

biofefi.services.feature_importance.ensemble_methods.calculate_ensemble_fuzzy(feature_importance_results, opt: Namespace)
biofefi.services.feature_importance.ensemble_methods.calculate_ensemble_majorityvote(feature_importance_results, logger: Logger)

Calculate majority vote of feature importance results. For majority vote, each vector in the feature importance matrix has their features ranked based on their importance. Subsequently, the final feature importance is the average of the most common rank order for each feature. For example, feature Xi has a final rank vector of [1, 1, 1, 2], where each rank rk is established by a different feature importance method k. The final feature importance value for feature Xi is the average value from the three feature importance methods that ranked it as 1. :param feature_importance_results: Dictionary containing feature importance results for each model :param logger: Logger

Returns:

Majority vote of feature importance results

Return type:

ensemble_majorityvote

biofefi.services.feature_importance.ensemble_methods.calculate_ensemble_mean(feature_importance_results, logger: Logger)

Calculate mean of feature importance results :param feature_importance_results: Dictionary containing feature importance results for each model :param logger: Logger

Returns:

Mean of feature importance results

Return type:

ensemble_mean

biofefi.services.feature_importance.global_methods module

biofefi.services.feature_importance.global_methods.calculate_global_shap_values(model, X: DataFrame, shap_reduce_data: int, logger: Logger) tuple[DataFrame, Any]

Calculate SHAP values for a given model and dataset.

Parameters:
  • model – Model object.

  • X (pd.DataFrame) – The dataset.

  • shap_reduce_data (int) – The percentage of data to use for SHAP calculation.

  • logger (Logger) – The logger.

Returns:

SHAP dataframe and SHAP values.

Return type:

tuple[pd.DataFrame, Any]

biofefi.services.feature_importance.global_methods.calculate_permutation_importance(model, X: DataFrame, y: Series, permutation_importance_scoring: str, permutation_importance_repeat: int, random_state: int, logger: Logger)

Calculate permutation importance for a given model and dataset.

Parameters:
  • model – Model object.

  • X (pd.DataFrame) – Input features.

  • y (pd.Series) – Target variable.

  • permutation_importance_scoring (str) – Permutation importance scoring method.

  • permutation_importance_repeat (int) – Number of repeats for importance scoring.

  • random_state (int) – Seed for the random state.

  • logger (Logger) – The logger.

Returns:

Permutation importance values

Return type:

permutation_importance

biofefi.services.feature_importance.local_methods module

biofefi.services.feature_importance.local_methods.calculate_lime_values(model, X: DataFrame, problem_type: ProblemTypes, logger: Logger) DataFrame

Calculate LIME values for a given model and dataset.

Parameters:
  • model – The model.

  • X (pd.DataFrame) – The dataset.

  • problem_type (ProblemTypes) – The problem type.

  • logger (Logger) – The logger.

Returns:

The LIME values.

Return type:

pd.DataFrame

biofefi.services.feature_importance.local_methods.calculate_local_shap_values(model, X: DataFrame, shap_reduce_data: int, logger: Logger) tuple[DataFrame, Any]

Calculate local SHAP values for a given model and dataset.

Parameters:
  • model – Model object.

  • X (pd.DataFrame) – The dataset.

  • shap_reduce_data (int) – The percentage of data to use for SHAP calculation.

  • logger (Logger) – The logger.

Returns:

SHAP dataframe and SHAP values.

Return type:

tuple[pd.DataFrame, Any]

biofefi.services.feature_importance.results module

biofefi.services.feature_importance.results.save_fuzzy_sets_plots(universe, membership_functions, x_cols, exec_opt: ExecutionOptions, plot_opt: PlottingOptions, logger: Logger)
biofefi.services.feature_importance.results.save_importance_results(feature_importance_df: DataFrame, model_type, importance_type: str, feature_importance_type: str, experiment_name: str, fi_opt: FeatureImportanceOptions, plot_opt: PlottingOptions, logger: Logger, shap_values=None)

Save the feature importance results to a CSV file and the plots.

Parameters:
  • feature_importance_df (pd.DataFrame) – DataFrame of feature importance results.

  • model_type (_type_) – Type of model.

  • importance_type (str) – Type of feature importance method.

  • feature_importance_type (str) – Type of feature importance method (Again for some reason).

  • experiment_name (str) – Name of the experiment, to know where to save outputs.

  • fi_opt (FeatureImportanceOptions) – Feature importance options.

  • plot_opt (PlottingOptions) – Plotting options.

  • logger (Logger) – The logger.

  • shap_values (_type_, optional) – SHAP values. Defaults to None.

biofefi.services.feature_importance.results.save_target_clusters_plots(df_cluster, exec_opt: ExecutionOptions, plot_opt: PlottingOptions, logger: Logger)

Module contents