biofefi.services package

Subpackages

Submodules

biofefi.services.configuration module

biofefi.services.configuration.load_data_preprocessing_options(path: Path) PreprocessingOptions

Load data preprocessing options from the given path. The path will be to a json file containing the options.

Parameters:

path (Path) – The path the json file containing the options.

Returns:

The data preprocessing options.

Return type:

PreprocessingOptions

biofefi.services.configuration.load_execution_options(path: Path) ExecutionOptions

Load experiment execution options from the given path. The path will be to a json file containing the options.

Parameters:

path (Path) – The path the json file containing the options.

Returns:

The execution options.

Return type:

ExecutionOptions

biofefi.services.configuration.load_fi_options(path: Path) FeatureImportanceOptions | None

Load feature importance options.

Parameters:

path (Path) – The path to the feature importance options file.

Returns:

The feature importance options.

Return type:

FeatureImportanceOptions | None

biofefi.services.configuration.load_plot_options(path: Path) PlottingOptions

Load plotting options from the given path. The path will be to a json file containing the plot options.

Parameters:

path (Path) – The path the json file containing the options.

Returns:

The plotting options.

Return type:

PlottingOptions

biofefi.services.configuration.save_options(path: Path, options: T)

Save options to a json file at the specified path.

Parameters:
  • path (Path) – The path to the json file.

  • options (T) – The options to save.

biofefi.services.experiments module

biofefi.services.experiments.create_experiment(save_dir: Path, plotting_options: PlottingOptions, execution_options: ExecutionOptions)

Create an experiment on disk with it’s global plotting options saved as a json file.

Parameters:
  • save_dir (Path) – The path to where the experiment will be created.

  • plotting_options (PlottingOptions) – The plotting options to save.

biofefi.services.experiments.delete_previous_fi_results(experiment_path: Path)

Delete previous feature importance results.

Parameters:

experiment_path (Path) – The path to the experiment.

biofefi.services.experiments.find_previous_fi_results(experiment_path: Path) bool

Find previous feature importance results.

Parameters:

experiment_path (Path) – The path to the experiment.

Returns:

whether previous experiments exist or not.

Return type:

bool

biofefi.services.experiments.get_experiments(base_dir: Path | None = None) list[str]

Get the list of experiments in the BioFEFI experiment directory.

If base_dir is not specified, the default from biofefi_experiments_base_dir is used

Parameters:
  • base_dir (Path | None, optional) – Specify a base directory for experiments.

  • None. (Defaults to)

Returns:

The list of experiments.

Return type:

list[str]

biofefi.services.logs module

biofefi.services.logs.get_logs(log_dir: Path) str

Get the latest log file for the latest run to display.

Parameters:

log_dir (Path) – The directory to search for the latest logs.

Raises:

NotADirectoryErrorlog_dir does not point to a directory.

Returns:

The text of the latest log file.

Return type:

str

biofefi.services.metrics module

biofefi.services.metrics.get_metrics(problem_type: ProblemTypes, logger: object = None) dict

Get the metrics functions for a given problem type.

For classification: - Accuracy - F1 - Precision - Recall - ROC AUC

For Regression - R2 - MAE - RMSE

Parameters:
  • problem_type (ProblemTypes) – Where the problem is classification or regression.

  • logger (object, optional) – The logger. Defaults to None.

Raises:

ValueError – When you give an incorrect problem type.

Returns:

A dict of score names and functions.

Return type:

dict

biofefi.services.ml_models module

biofefi.services.ml_models.get_model(model_type: type, model_params: dict | None = None) MlModel

Produce a machine learning model with the provided parameters, configured for the given problem type.

If the model is to be used in a grid search, specify model_params=None.

Parameters:
  • model_type (type) – The Python type (constructor) of the model to instantiate.

  • model_params (dict, optional) – The parameters to pass to the model constructor. Defaults to None.

Returns:

A new instance of the requested machine learning model.

Return type:

MlModel

biofefi.services.ml_models.get_model_type(model_type: str, problem_type: ProblemTypes) type

Fetch the appropriate type for a given model name based on the problem type.

Parameters:
  • model_type (dict) – The kind of model.

  • problem_type (ProblemTypes) – Type of problem (classification or regression).

Raises:

ValueError – If a model type is not recognised or unsupported.

Returns:

The constructor for a machine learning model class.

Return type:

type

biofefi.services.ml_models.load_models(path: Path) dict[str, list]

Load pre-trained machine learning models.

Parameters:

path (Path) – The path to the directory where the models are saved.

Returns:

The pre-trained models.

Return type:

dict[str, list]

biofefi.services.ml_models.load_models_to_explain(path: Path, model_names: list) dict[str, list]

Load pre-trained machine learning models.

Parameters:
  • path (Path) – The path to the directory where the models are saved.

  • model_names (str) – The name of the models to explain.

Returns:

The pre-trained models.

Return type:

dict[str, list]

biofefi.services.ml_models.models_exist(path: Path) bool
biofefi.services.ml_models.save_model(model, path: Path)

Save a machine learning model to the given file path.

Parameters:
  • model (_type_) – The model to save. Must be picklable.

  • path (Path) – The file path to save the model.

biofefi.services.ml_models.save_models_metrics(metrics: dict, path: Path)

Save the statistical metrics of the models to the given file path.

Parameters:
  • metrics (dict) – The metrics to save.

  • path (Path) – The file path to save the metrics.

biofefi.services.plotting module

biofefi.services.plotting.plot_auc_roc(y_classes_labels: ndarray, y_score_probs: ndarray, set_name: str, model_name: str, directory: Path, plot_opts: PlottingOptions | None = None)

Plot the ROC curve for a multi-class classification model. :param y_classes_labels: The true labels of the classes. :type y_classes_labels: numpy.ndarray :param y_score_probs: The predicted probabilities of the classes. :type y_score_probs: numpy.ndarray :param set_name: The name of the set (train or test). :type set_name: string :param model_name: The name of the model. :type model_name: string :param directory: The directory path to save the plot. :type directory: Path :param Returns: :param None:

biofefi.services.plotting.plot_confusion_matrix(estimator, X, y, set_name: str, model_name: str, directory: Path, plot_opts: PlottingOptions | None = None)

Plot the confusion matrix for a multi-class or binary classification model.

Parameters:
  • estimator – The trained model.

  • X – The features.

  • y – The true labels.

  • set_name – The name of the set (train or test).

  • model_name – The name of the model.

  • directory – The directory path to save the plot.

  • plot_opts – Options for styling the plot. Defaults to None.

Returns:

None

biofefi.services.plotting.plot_global_shap_importance(shap_values: DataFrame, plot_opts: PlottingOptions, num_features_to_plot: int, title: str) Figure

Produce a bar chart of global SHAP values.

Parameters:
  • shap_values (pd.DataFrame) – The DataFrame containing the global SHAP values.

  • plot_opts (PlottingOptions) – The plotting options.

  • num_features_to_plot (int) – The number of top features to plot.

  • title (str) – The plot title.

Returns:

The bar chart of global SHAP values.

Return type:

Figure

biofefi.services.plotting.plot_lime_importance(df: DataFrame, plot_opts: PlottingOptions, num_features_to_plot: int, title: str) Figure

Plot LIME importance.

Parameters:
  • df (pd.DataFrame) – The LIME data to plot

  • plot_opts (PlottingOptions) – The plotting options.

  • num_features_to_plot (int) – The top number of features to plot.

  • title (str) – The title of the plot.

Returns:

The LIME plot.

Return type:

Figure

biofefi.services.plotting.plot_local_shap_importance(shap_values: Explainer, plot_opts: PlottingOptions, num_features_to_plot: int, title: str) Figure

Plot a beeswarm plot of the local SHAP values.

Parameters:
  • shap_values (shap.Explainer) – The SHAP explainer to produce the plot from.

  • plot_opts (PlottingOptions) – The plotting options.

  • num_features_to_plot (int) – The number of top features to plot.

  • title (str) – The plot title.

Returns:

The beeswarm plot of local SHAP values.

Return type:

Figure

biofefi.services.plotting.plot_scatter(y, yp, r2: float, set_name: str, dependent_variable: str, model_name: str, plot_opts: PlottingOptions | None = None)

_summary_

Parameters:
  • y (_type_) – True y values.

  • yp (_type_) – Predicted y values.

  • r2 (float) – R-squared between y`and `yp.

  • set_name (str) – “Train” or “Test”.

  • dependent_variable (str) – The name of the dependent variable.

  • model_name (str) – Name of the model.

  • plot_opts (PlottingOptions | None, optional)

  • None. (Options for styling the plot. Defaults to)

biofefi.services.preprocessing module

biofefi.services.preprocessing.find_non_numeric_columns(data: DataFrame | Series) List[str]

Find non-numeric columns in a DataFrame or check if a Series contains non-numeric values.

Parameters:

data (Union[pd.DataFrame, pd.Series]) – The DataFrame or Series to check.

Returns:

If data is a DataFrame, returns a list of non-numeric column names.

If data is a Series, returns [“Series”] if it contains non-numeric values, else an empty list.

Return type:

List[str]

biofefi.services.preprocessing.normalise_independent_variables(normalisation_method: str, X)

Normalise the independent variables based on the selected method.

Parameters:
  • normalisation_method (str) – The normalisation method to use.

  • X (pd.DataFrame) – The independent variables to normalise.

Returns:

The normalised independent variables.

Return type:

pd.DataFrame

biofefi.services.preprocessing.run_feature_selection(preprocessing_opts: PreprocessingOptions, data: DataFrame) DataFrame

Run feature selection on the data based on the selected methods.

Parameters:
  • feature_selection_methods (dict) – A dictionary of the feature selection methods to use.

  • data (pd.DataFrame) – The data to perform feature selection on.

Returns:

The processed data.

Return type:

pd.DataFrame

biofefi.services.preprocessing.run_preprocessing(data: DataFrame, experiment_path: Path, config: PreprocessingOptions) DataFrame
biofefi.services.preprocessing.transform_dependent_variable(transformation_y_method: str, y)

Transform the dependent variable based on the selected method.

Parameters:
  • transformation_y_method (str) – The transformation method to use.

  • y (pd.Series) – The dependent variable to transform.

Returns:

The transformed dependent variable.

Return type:

pd.Series

biofefi.services.weights_init module

biofefi.services.weights_init.kaiming_init(m: Module, nonlinearity: str = 'relu') None

Initializes the weights of Linear layers using Kaiming initialization.

Parameters:
  • m (torch.nn.Module) – The module to initialize.

  • nonlinearity (str) – The nonlinearity used in the network

  • (e.g.

  • 'relu'

  • "relu". ('leaky_relu'). Defaults to)

Returns:

None

biofefi.services.weights_init.normal_init(m: Module, mean: float = 0.0, std: float = 0.02) None

Initializes the weights of Linear layers using a normal distribution.

Parameters:
  • m (torch.nn.Module) – The module to initialize.

  • mean (float) – The mean of the normal distribution. Defaults to 0.0.

  • std (float) – The standard deviation of the normal distribution.

  • 0.02. (Defaults to)

Returns:

None

biofefi.services.weights_init.xavier_init(m: Module) None

Initializes the weights of Linear layers using Xavier initialization.

Parameters:

m (torch.nn.Module) – The module to initialize.

Returns:

None

Module contents