helix.services.feature_importance package

Submodules

helix.services.feature_importance.ensemble_methods module

helix.services.feature_importance.ensemble_methods.calculate_ensemble_fuzzy(feature_importance_results, opt: Namespace)
helix.services.feature_importance.ensemble_methods.calculate_ensemble_majorityvote(feature_importance_dict: dict[str, DataFrame], logger: Logger) tuple[DataFrame, DataFrame]

Calculate majority vote of feature importance results. For majority vote, each vector in the feature importance matrix has their features ranked based on their importance. Subsequently, the final feature importance is the average of the most common rank order for each feature. For example, feature Xi has a final rank vector of [1, 1, 1, 2], where each rank rk is established by a different feature importance method k. The final feature importance value for feature Xi is the average value from the three feature importance methods that ranked it as 1.

Parameters:
  • feature_importance_dict

    Dictionary where: - Keys are model names - Values are DataFrames where:

    • Rows are features

    • Columns are importance methods (SHAP, Permutation, etc.)

    • Values are already normalized between 0 and 1

  • logger – Logger instance

Returns:

DataFrame with features as index and a single column

containing the majority vote importance.

Return type:

pd.DataFrame

helix.services.feature_importance.ensemble_methods.calculate_ensemble_mean(feature_importance_dict: dict[str, DataFrame], logger: Logger) tuple[DataFrame, DataFrame]

Calculate mean of feature importance results across models and methods.

Parameters:
  • feature_importance_dict

    Dictionary where: - Keys are model names - Values are DataFrames where:

    • Rows are features

    • Columns are importance methods (SHAP, Permutation, etc.)

    • Values are already normalized between 0 and 1

  • logger – Logger instance

Returns:

DataFrame with unique features as index and a single column

containing the overall mean importance across all models, methods, and repetitions.

Return type:

pd.DataFrame

helix.services.feature_importance.global_methods module

helix.services.feature_importance.global_methods.calculate_global_shap_values(model, X: DataFrame, logger: Logger) tuple[DataFrame, Any]

Calculate SHAP values for a given model and dataset.

Parameters:
  • model – Model object.

  • X (pd.DataFrame) – The dataset.

  • logger (Logger) – The logger.

Returns:

SHAP dataframe and SHAP values.

Return type:

tuple[pd.DataFrame, Any]

helix.services.feature_importance.global_methods.calculate_permutation_importance(model, X: DataFrame, y: Series, permutation_importance_scoring: str, permutation_importance_repeat: int, random_state: int, logger: Logger)

Calculate permutation importance for a given model and dataset.

Parameters:
  • model – Model object.

  • X (pd.DataFrame) – Input features.

  • y (pd.Series) – Target variable.

  • permutation_importance_scoring (str) – Permutation importance scoring method.

  • permutation_importance_repeat (int) – Number of repeats for importance scoring.

  • random_state (int) – Seed for the random state.

  • logger (Logger) – The logger.

Returns:

Permutation importance values

Return type:

permutation_importance

helix.services.feature_importance.local_methods module

helix.services.feature_importance.local_methods.calculate_lime_values(model, X: DataFrame, problem_type: ProblemTypes, logger: Logger) DataFrame

Calculate LIME values for a given model and dataset.

Parameters:
  • model – The model.

  • X (pd.DataFrame) – The dataset.

  • problem_type (ProblemTypes) – The problem type.

  • logger (Logger) – The logger.

Returns:

The LIME values.

Return type:

pd.DataFrame

helix.services.feature_importance.local_methods.calculate_local_shap_values(model, X: DataFrame, logger: Logger) tuple[DataFrame, Any]

Calculate local SHAP values for a given model and dataset.

Parameters:
  • model – Model object.

  • X (pd.DataFrame) – The dataset.

  • logger (Logger) – The logger.

Returns:

SHAP dataframe and SHAP values.

Return type:

tuple[pd.DataFrame, Any]

helix.services.feature_importance.results module

helix.services.feature_importance.results.save_fuzzy_sets_plots(universe, membership_functions, x_cols, exec_opt: ExecutionOptions, plot_opt: PlottingOptions, logger: Logger)
helix.services.feature_importance.results.save_importance_results(feature_importance_df: DataFrame, model_type, importance_type: str, feature_importance_type: str, experiment_name: str, fi_opt: FeatureImportanceOptions, plot_opt: PlottingOptions, logger: Logger, shap_values=None)

Save the feature importance results to a CSV file and the plots.

Parameters:
  • feature_importance_df (pd.DataFrame) – DataFrame of feature importance results.

  • model_type (_type_) – Type of model.

  • importance_type (str) – Type of feature importance method.

  • feature_importance_type (str) – Type of feature importance method (Again for some reason).

  • experiment_name (str) – Name of the experiment, to know where to save outputs.

  • fi_opt (FeatureImportanceOptions) – Feature importance options.

  • plot_opt (PlottingOptions) – Plotting options.

  • logger (Logger) – The logger.

  • shap_values (_type_, optional) – SHAP values. Defaults to None.

helix.services.feature_importance.results.save_target_clusters_plots(df_cluster, exec_opt: ExecutionOptions, plot_opt: PlottingOptions, logger: Logger)

Module contents