biofefi.services package¶
Subpackages¶
- biofefi.services.feature_importance package
- biofefi.services.machine_learning package
Submodules¶
biofefi.services.configuration module¶
- biofefi.services.configuration.load_data_preprocessing_options(path: Path) PreprocessingOptions¶
Load data preprocessing options from the given path. The path will be to a json file containing the options.
- Parameters:
path (Path) – The path the json file containing the options.
- Returns:
The data preprocessing options.
- Return type:
- biofefi.services.configuration.load_execution_options(path: Path) ExecutionOptions¶
Load experiment execution options from the given path. The path will be to a json file containing the options.
- Parameters:
path (Path) – The path the json file containing the options.
- Returns:
The execution options.
- Return type:
- biofefi.services.configuration.load_fi_options(path: Path) FeatureImportanceOptions | None¶
Load feature importance options.
- Parameters:
path (Path) – The path to the feature importance options file.
- Returns:
The feature importance options.
- Return type:
FeatureImportanceOptions | None
- biofefi.services.configuration.load_plot_options(path: Path) PlottingOptions¶
Load plotting options from the given path. The path will be to a json file containing the plot options.
- Parameters:
path (Path) – The path the json file containing the options.
- Returns:
The plotting options.
- Return type:
- biofefi.services.configuration.save_options(path: Path, options: T)¶
Save options to a json file at the specified path.
- Parameters:
path (Path) – The path to the json file.
options (T) – The options to save.
biofefi.services.experiments module¶
- biofefi.services.experiments.create_experiment(save_dir: Path, plotting_options: PlottingOptions, execution_options: ExecutionOptions)¶
Create an experiment on disk with it’s global plotting options saved as a json file.
- Parameters:
save_dir (Path) – The path to where the experiment will be created.
plotting_options (PlottingOptions) – The plotting options to save.
- biofefi.services.experiments.delete_previous_fi_results(experiment_path: Path)¶
Delete previous feature importance results.
- Parameters:
experiment_path (Path) – The path to the experiment.
- biofefi.services.experiments.find_previous_fi_results(experiment_path: Path) bool¶
Find previous feature importance results.
- Parameters:
experiment_path (Path) – The path to the experiment.
- Returns:
whether previous experiments exist or not.
- Return type:
bool
- biofefi.services.experiments.get_experiments(base_dir: Path | None = None) list[str]¶
Get the list of experiments in the BioFEFI experiment directory.
If base_dir is not specified, the default from biofefi_experiments_base_dir is used
- Parameters:
base_dir (Path | None, optional) – Specify a base directory for experiments.
None. (Defaults to)
- Returns:
The list of experiments.
- Return type:
list[str]
biofefi.services.logs module¶
- biofefi.services.logs.get_logs(log_dir: Path) str¶
Get the latest log file for the latest run to display.
- Parameters:
log_dir (Path) – The directory to search for the latest logs.
- Raises:
NotADirectoryError – log_dir does not point to a directory.
- Returns:
The text of the latest log file.
- Return type:
str
biofefi.services.metrics module¶
- biofefi.services.metrics.get_metrics(problem_type: ProblemTypes, logger: object = None) dict¶
Get the metrics functions for a given problem type.
For classification: - Accuracy - F1 - Precision - Recall - ROC AUC
For Regression - R2 - MAE - RMSE
- Parameters:
problem_type (ProblemTypes) – Where the problem is classification or regression.
logger (object, optional) – The logger. Defaults to None.
- Raises:
ValueError – When you give an incorrect problem type.
- Returns:
A dict of score names and functions.
- Return type:
dict
biofefi.services.ml_models module¶
- biofefi.services.ml_models.get_model(model_type: type, model_params: dict | None = None) MlModel¶
Produce a machine learning model with the provided parameters, configured for the given problem type.
If the model is to be used in a grid search, specify model_params=None.
- Parameters:
model_type (type) – The Python type (constructor) of the model to instantiate.
model_params (dict, optional) – The parameters to pass to the model constructor. Defaults to None.
- Returns:
A new instance of the requested machine learning model.
- Return type:
MlModel
- biofefi.services.ml_models.get_model_type(model_type: str, problem_type: ProblemTypes) type¶
Fetch the appropriate type for a given model name based on the problem type.
- Parameters:
model_type (dict) – The kind of model.
problem_type (ProblemTypes) – Type of problem (classification or regression).
- Raises:
ValueError – If a model type is not recognised or unsupported.
- Returns:
The constructor for a machine learning model class.
- Return type:
type
- biofefi.services.ml_models.load_models(path: Path) dict[str, list]¶
Load pre-trained machine learning models.
- Parameters:
path (Path) – The path to the directory where the models are saved.
- Returns:
The pre-trained models.
- Return type:
dict[str, list]
- biofefi.services.ml_models.load_models_to_explain(path: Path, model_names: list) dict[str, list]¶
Load pre-trained machine learning models.
- Parameters:
path (Path) – The path to the directory where the models are saved.
model_names (str) – The name of the models to explain.
- Returns:
The pre-trained models.
- Return type:
dict[str, list]
- biofefi.services.ml_models.models_exist(path: Path) bool¶
- biofefi.services.ml_models.save_model(model, path: Path)¶
Save a machine learning model to the given file path.
- Parameters:
model (_type_) – The model to save. Must be picklable.
path (Path) – The file path to save the model.
- biofefi.services.ml_models.save_models_metrics(metrics: dict, path: Path)¶
Save the statistical metrics of the models to the given file path.
- Parameters:
metrics (dict) – The metrics to save.
path (Path) – The file path to save the metrics.
biofefi.services.plotting module¶
- biofefi.services.plotting.plot_auc_roc(y_classes_labels: ndarray, y_score_probs: ndarray, set_name: str, model_name: str, directory: Path, plot_opts: PlottingOptions | None = None)¶
Plot the ROC curve for a multi-class classification model. :param y_classes_labels: The true labels of the classes. :type y_classes_labels: numpy.ndarray :param y_score_probs: The predicted probabilities of the classes. :type y_score_probs: numpy.ndarray :param set_name: The name of the set (train or test). :type set_name: string :param model_name: The name of the model. :type model_name: string :param directory: The directory path to save the plot. :type directory: Path :param Returns: :param None:
- biofefi.services.plotting.plot_confusion_matrix(estimator, X, y, set_name: str, model_name: str, directory: Path, plot_opts: PlottingOptions | None = None)¶
Plot the confusion matrix for a multi-class or binary classification model.
- Parameters:
estimator – The trained model.
X – The features.
y – The true labels.
set_name – The name of the set (train or test).
model_name – The name of the model.
directory – The directory path to save the plot.
plot_opts – Options for styling the plot. Defaults to None.
- Returns:
None
- biofefi.services.plotting.plot_global_shap_importance(shap_values: DataFrame, plot_opts: PlottingOptions, num_features_to_plot: int, title: str) Figure¶
Produce a bar chart of global SHAP values.
- Parameters:
shap_values (pd.DataFrame) – The DataFrame containing the global SHAP values.
plot_opts (PlottingOptions) – The plotting options.
num_features_to_plot (int) – The number of top features to plot.
title (str) – The plot title.
- Returns:
The bar chart of global SHAP values.
- Return type:
Figure
- biofefi.services.plotting.plot_lime_importance(df: DataFrame, plot_opts: PlottingOptions, num_features_to_plot: int, title: str) Figure¶
Plot LIME importance.
- Parameters:
df (pd.DataFrame) – The LIME data to plot
plot_opts (PlottingOptions) – The plotting options.
num_features_to_plot (int) – The top number of features to plot.
title (str) – The title of the plot.
- Returns:
The LIME plot.
- Return type:
Figure
- biofefi.services.plotting.plot_local_shap_importance(shap_values: Explainer, plot_opts: PlottingOptions, num_features_to_plot: int, title: str) Figure¶
Plot a beeswarm plot of the local SHAP values.
- Parameters:
shap_values (shap.Explainer) – The SHAP explainer to produce the plot from.
plot_opts (PlottingOptions) – The plotting options.
num_features_to_plot (int) – The number of top features to plot.
title (str) – The plot title.
- Returns:
The beeswarm plot of local SHAP values.
- Return type:
Figure
- biofefi.services.plotting.plot_scatter(y, yp, r2: float, set_name: str, dependent_variable: str, model_name: str, plot_opts: PlottingOptions | None = None)¶
_summary_
- Parameters:
y (_type_) – True y values.
yp (_type_) – Predicted y values.
r2 (float) – R-squared between y`and `yp.
set_name (str) – “Train” or “Test”.
dependent_variable (str) – The name of the dependent variable.
model_name (str) – Name of the model.
plot_opts (PlottingOptions | None, optional)
None. (Options for styling the plot. Defaults to)
biofefi.services.preprocessing module¶
- biofefi.services.preprocessing.find_non_numeric_columns(data: DataFrame | Series) List[str]¶
Find non-numeric columns in a DataFrame or check if a Series contains non-numeric values.
- Parameters:
data (Union[pd.DataFrame, pd.Series]) – The DataFrame or Series to check.
- Returns:
- If data is a DataFrame, returns a list of non-numeric column names.
If data is a Series, returns [“Series”] if it contains non-numeric values, else an empty list.
- Return type:
List[str]
- biofefi.services.preprocessing.normalise_independent_variables(normalisation_method: str, X)¶
Normalise the independent variables based on the selected method.
- Parameters:
normalisation_method (str) – The normalisation method to use.
X (pd.DataFrame) – The independent variables to normalise.
- Returns:
The normalised independent variables.
- Return type:
pd.DataFrame
- biofefi.services.preprocessing.run_feature_selection(preprocessing_opts: PreprocessingOptions, data: DataFrame) DataFrame¶
Run feature selection on the data based on the selected methods.
- Parameters:
feature_selection_methods (dict) – A dictionary of the feature selection methods to use.
data (pd.DataFrame) – The data to perform feature selection on.
- Returns:
The processed data.
- Return type:
pd.DataFrame
- biofefi.services.preprocessing.run_preprocessing(data: DataFrame, experiment_path: Path, config: PreprocessingOptions) DataFrame¶
- biofefi.services.preprocessing.transform_dependent_variable(transformation_y_method: str, y)¶
Transform the dependent variable based on the selected method.
- Parameters:
transformation_y_method (str) – The transformation method to use.
y (pd.Series) – The dependent variable to transform.
- Returns:
The transformed dependent variable.
- Return type:
pd.Series
biofefi.services.weights_init module¶
- biofefi.services.weights_init.kaiming_init(m: Module, nonlinearity: str = 'relu') None¶
Initializes the weights of Linear layers using Kaiming initialization.
- Parameters:
m (torch.nn.Module) – The module to initialize.
nonlinearity (str) – The nonlinearity used in the network
(e.g.
'relu'
"relu". ('leaky_relu'). Defaults to)
- Returns:
None
- biofefi.services.weights_init.normal_init(m: Module, mean: float = 0.0, std: float = 0.02) None¶
Initializes the weights of Linear layers using a normal distribution.
- Parameters:
m (torch.nn.Module) – The module to initialize.
mean (float) – The mean of the normal distribution. Defaults to 0.0.
std (float) – The standard deviation of the normal distribution.
0.02. (Defaults to)
- Returns:
None
- biofefi.services.weights_init.xavier_init(m: Module) None¶
Initializes the weights of Linear layers using Xavier initialization.
- Parameters:
m (torch.nn.Module) – The module to initialize.
- Returns:
None