tangram.utils.cross_val

tangram.utils.cross_val(adata_sc, adata_sp, cluster_label=None, mode='clusters', scale=True, lambda_d=0, lambda_g1=1, lambda_g2=0, lambda_r=0, lambda_count=1, lambda_f_reg=1, target_count=None, num_epochs=1000, device='cuda:0', learning_rate=0.1, cv_mode='loo', return_gene_pred=False, density_prior=None, random_state=None, verbose=False)

Executes cross validation

Parameters

adata_sc (AnnData) – single cell data
adata_sp (AnnData) – gene spatial data
cluster_label (str) – the level that the single cell data will be aggregate at, this is only valid for clusters mode mapping
mode (str) – Optional. Tangram mapping mode. Currently supported: ‘cell’, ‘clusters’, ‘constrained’. Default is ‘clusters’.
scale (bool) – Optional. Whether weight input single cell by # of cells in cluster, only valid when cluster_label is not None. Default is True.
lambda_g1 (float) – Optional. Strength of Tangram loss function. Default is 1.
lambda_d (float) – Optional. Strength of density regularizer. Default is 0.
lambda_g2 (float) – Optional. Strength of voxel-gene regularizer. Default is 0.
lambda_r (float) – Optional. Strength of entropy regularizer. Default is 0.
lambda_count (float) – Optional. Regularizer for the count term. Default is 1. Only valid when mode == ‘constrained’
lambda_f_reg (float) – Optional. Regularizer for the filter, which promotes Boolean values (0s and 1s) in the filter. Only valid when mode == ‘constrained’. Default is 1.
target_count (int) – Optional. The number of cells to be filtered. Default is None.
num_epochs (int) – Optional. Number of epochs. Default is 1000.
learning_rate (float) – Optional. Learning rate for the optimizer. Default is 0.1.
device (str or torch.device) – Optional. Default is ‘cuda:0’.
cv_mode (str) – Optional. cross validation mode, ‘loo’ (‘leave-one-out’) and ‘10fold’ supported. Default is ‘loo’.
return_gene_pred (bool) – Optional. if return prediction and true spatial expression data for test gene, only applicable when ‘loo’ mode is on, default is False.
density_prior (ndarray or str) – Spatial density of spots, when is a string, value can be ‘rna_count_based’ or ‘uniform’, when is a ndarray, shape = (number_spots,). This array should satisfy the constraints sum() == 1. If not provided, the density term is ignored.
random_state (int) – Optional. pass an int to reproduce training. Default is None.
verbose (bool) – Optional. If print training details. Default is False.

Returns

a dictionary contains information of cross validation (hyperparameters, average test score and train score, etc.) adata_ge_cv (AnnData): predicted spatial data by LOOCV. Only returns when return_gene_pred is True and in ‘loo’ mode. test_gene_df (Pandas dataframe): dataframe with columns: ‘score’, ‘is_training’, ‘sparsity_sp’(spatial data sparsity)

Return type

cv_dict (dict)