helix.services.feature_importance package¶
Submodules¶
helix.services.feature_importance.ensemble_methods module¶
- helix.services.feature_importance.ensemble_methods.calculate_ensemble_fuzzy(feature_importance_results, opt: Namespace)¶
- helix.services.feature_importance.ensemble_methods.calculate_ensemble_majorityvote(feature_importance_dict: dict[str, DataFrame], logger: Logger) tuple[DataFrame, DataFrame] ¶
Calculate majority vote of feature importance results. For majority vote, each vector in the feature importance matrix has their features ranked based on their importance. Subsequently, the final feature importance is the average of the most common rank order for each feature. For example, feature Xi has a final rank vector of [1, 1, 1, 2], where each rank rk is established by a different feature importance method k. The final feature importance value for feature Xi is the average value from the three feature importance methods that ranked it as 1.
- Parameters:
feature_importance_dict –
Dictionary where: - Keys are model names - Values are DataFrames where:
Rows are features
Columns are importance methods (SHAP, Permutation, etc.)
Values are already normalized between 0 and 1
logger – Logger instance
- Returns:
- DataFrame with features as index and a single column
containing the majority vote importance.
- Return type:
pd.DataFrame
- helix.services.feature_importance.ensemble_methods.calculate_ensemble_mean(feature_importance_dict: dict[str, DataFrame], logger: Logger) tuple[DataFrame, DataFrame] ¶
Calculate mean of feature importance results across models and methods.
- Parameters:
feature_importance_dict –
Dictionary where: - Keys are model names - Values are DataFrames where:
Rows are features
Columns are importance methods (SHAP, Permutation, etc.)
Values are already normalized between 0 and 1
logger – Logger instance
- Returns:
- DataFrame with unique features as index and a single column
containing the overall mean importance across all models, methods, and repetitions.
- Return type:
pd.DataFrame
helix.services.feature_importance.global_methods module¶
- helix.services.feature_importance.global_methods.calculate_global_shap_values(model, X: DataFrame, logger: Logger) tuple[DataFrame, Any] ¶
Calculate SHAP values for a given model and dataset.
- Parameters:
model – Model object.
X (pd.DataFrame) – The dataset.
logger (Logger) – The logger.
- Returns:
SHAP dataframe and SHAP values.
- Return type:
tuple[pd.DataFrame, Any]
- helix.services.feature_importance.global_methods.calculate_permutation_importance(model, X: DataFrame, y: Series, permutation_importance_scoring: str, permutation_importance_repeat: int, random_state: int, logger: Logger)¶
Calculate permutation importance for a given model and dataset.
- Parameters:
model – Model object.
X (pd.DataFrame) – Input features.
y (pd.Series) – Target variable.
permutation_importance_scoring (str) – Permutation importance scoring method.
permutation_importance_repeat (int) – Number of repeats for importance scoring.
random_state (int) – Seed for the random state.
logger (Logger) – The logger.
- Returns:
Permutation importance values
- Return type:
permutation_importance
helix.services.feature_importance.local_methods module¶
- helix.services.feature_importance.local_methods.calculate_lime_values(model, X: DataFrame, problem_type: ProblemTypes, logger: Logger) DataFrame ¶
Calculate LIME values for a given model and dataset.
- Parameters:
model – The model.
X (pd.DataFrame) – The dataset.
problem_type (ProblemTypes) – The problem type.
logger (Logger) – The logger.
- Returns:
The LIME values.
- Return type:
pd.DataFrame
- helix.services.feature_importance.local_methods.calculate_local_shap_values(model, X: DataFrame, logger: Logger) tuple[DataFrame, Any] ¶
Calculate local SHAP values for a given model and dataset.
- Parameters:
model – Model object.
X (pd.DataFrame) – The dataset.
logger (Logger) – The logger.
- Returns:
SHAP dataframe and SHAP values.
- Return type:
tuple[pd.DataFrame, Any]
helix.services.feature_importance.results module¶
- helix.services.feature_importance.results.save_fuzzy_sets_plots(universe, membership_functions, x_cols, exec_opt: ExecutionOptions, plot_opt: PlottingOptions, logger: Logger)¶
- helix.services.feature_importance.results.save_importance_results(feature_importance_df: DataFrame, model_type, importance_type: str, feature_importance_type: str, experiment_name: str, fi_opt: FeatureImportanceOptions, plot_opt: PlottingOptions, logger: Logger, shap_values=None)¶
Save the feature importance results to a CSV file and the plots.
- Parameters:
feature_importance_df (pd.DataFrame) – DataFrame of feature importance results.
model_type (_type_) – Type of model.
importance_type (str) – Type of feature importance method.
feature_importance_type (str) – Type of feature importance method (Again for some reason).
experiment_name (str) – Name of the experiment, to know where to save outputs.
fi_opt (FeatureImportanceOptions) – Feature importance options.
plot_opt (PlottingOptions) – Plotting options.
logger (Logger) – The logger.
shap_values (_type_, optional) – SHAP values. Defaults to None.
- helix.services.feature_importance.results.save_target_clusters_plots(df_cluster, exec_opt: ExecutionOptions, plot_opt: PlottingOptions, logger: Logger)¶