biofefi.services.feature_importance package¶
Submodules¶
biofefi.services.feature_importance.ensemble_methods module¶
- biofefi.services.feature_importance.ensemble_methods.calculate_ensemble_fuzzy(feature_importance_results, opt: Namespace)¶
- biofefi.services.feature_importance.ensemble_methods.calculate_ensemble_majorityvote(feature_importance_results, logger: Logger)¶
Calculate majority vote of feature importance results. For majority vote, each vector in the feature importance matrix has their features ranked based on their importance. Subsequently, the final feature importance is the average of the most common rank order for each feature. For example, feature Xi has a final rank vector of [1, 1, 1, 2], where each rank rk is established by a different feature importance method k. The final feature importance value for feature Xi is the average value from the three feature importance methods that ranked it as 1. :param feature_importance_results: Dictionary containing feature importance results for each model :param logger: Logger
- Returns:
Majority vote of feature importance results
- Return type:
ensemble_majorityvote
- biofefi.services.feature_importance.ensemble_methods.calculate_ensemble_mean(feature_importance_results, logger: Logger)¶
Calculate mean of feature importance results :param feature_importance_results: Dictionary containing feature importance results for each model :param logger: Logger
- Returns:
Mean of feature importance results
- Return type:
ensemble_mean
biofefi.services.feature_importance.global_methods module¶
- biofefi.services.feature_importance.global_methods.calculate_global_shap_values(model, X: DataFrame, shap_reduce_data: int, logger: Logger) tuple[DataFrame, Any]¶
Calculate SHAP values for a given model and dataset.
- Parameters:
model – Model object.
X (pd.DataFrame) – The dataset.
shap_reduce_data (int) – The percentage of data to use for SHAP calculation.
logger (Logger) – The logger.
- Returns:
SHAP dataframe and SHAP values.
- Return type:
tuple[pd.DataFrame, Any]
- biofefi.services.feature_importance.global_methods.calculate_permutation_importance(model, X: DataFrame, y: Series, permutation_importance_scoring: str, permutation_importance_repeat: int, random_state: int, logger: Logger)¶
Calculate permutation importance for a given model and dataset.
- Parameters:
model – Model object.
X (pd.DataFrame) – Input features.
y (pd.Series) – Target variable.
permutation_importance_scoring (str) – Permutation importance scoring method.
permutation_importance_repeat (int) – Number of repeats for importance scoring.
random_state (int) – Seed for the random state.
logger (Logger) – The logger.
- Returns:
Permutation importance values
- Return type:
permutation_importance
biofefi.services.feature_importance.local_methods module¶
- biofefi.services.feature_importance.local_methods.calculate_lime_values(model, X: DataFrame, problem_type: ProblemTypes, logger: Logger) DataFrame¶
Calculate LIME values for a given model and dataset.
- Parameters:
model – The model.
X (pd.DataFrame) – The dataset.
problem_type (ProblemTypes) – The problem type.
logger (Logger) – The logger.
- Returns:
The LIME values.
- Return type:
pd.DataFrame
- biofefi.services.feature_importance.local_methods.calculate_local_shap_values(model, X: DataFrame, shap_reduce_data: int, logger: Logger) tuple[DataFrame, Any]¶
Calculate local SHAP values for a given model and dataset.
- Parameters:
model – Model object.
X (pd.DataFrame) – The dataset.
shap_reduce_data (int) – The percentage of data to use for SHAP calculation.
logger (Logger) – The logger.
- Returns:
SHAP dataframe and SHAP values.
- Return type:
tuple[pd.DataFrame, Any]
biofefi.services.feature_importance.results module¶
- biofefi.services.feature_importance.results.save_fuzzy_sets_plots(universe, membership_functions, x_cols, exec_opt: ExecutionOptions, plot_opt: PlottingOptions, logger: Logger)¶
- biofefi.services.feature_importance.results.save_importance_results(feature_importance_df: DataFrame, model_type, importance_type: str, feature_importance_type: str, experiment_name: str, fi_opt: FeatureImportanceOptions, plot_opt: PlottingOptions, logger: Logger, shap_values=None)¶
Save the feature importance results to a CSV file and the plots.
- Parameters:
feature_importance_df (pd.DataFrame) – DataFrame of feature importance results.
model_type (_type_) – Type of model.
importance_type (str) – Type of feature importance method.
feature_importance_type (str) – Type of feature importance method (Again for some reason).
experiment_name (str) – Name of the experiment, to know where to save outputs.
fi_opt (FeatureImportanceOptions) – Feature importance options.
plot_opt (PlottingOptions) – Plotting options.
logger (Logger) – The logger.
shap_values (_type_, optional) – SHAP values. Defaults to None.
- biofefi.services.feature_importance.results.save_target_clusters_plots(df_cluster, exec_opt: ExecutionOptions, plot_opt: PlottingOptions, logger: Logger)¶