sctk.generate_qc_clusters

sctk.generate_qc_clusters(ad, metrics, aux_ad=None, n_pcs=None, n_neighbors=None, res=0.2, clus_key='qc_cluster', umap_key='X_umap_qc', return_aux=False) → AnnData

Generate quality control (QC) clusters for an AnnData object. The object is modified in-place.

This function generates QC clusters for an AnnData object using the specified QC metrics. If an auxiliary AnnData object is not provided, this function will create one by performing PCA, nearest neighbor graph construction, and UMAP embedding on the specified QC metrics.

Parameters:

ad – AnnData object to generate QC clusters for.
metrics – List of QC metrics to use for generating QC clusters. Must be present as obs columns.
aux_ad – Optional auxiliary AnnData object to use for generating QC clusters, created by an earlier call of this function and returned if return_aux is set to True. Its neighbour graph will be used for clustering and its UMAP will be transferred to the input object.
n_pcs – Number of principal components to use for PCA. If not provided, this will be set to max(2, len(metrics) - 2).
n_neighbors – Number of nearest neighbors to use for constructing the nearest neighbor graph. If not provided, this will be set to min(max(5, int(ad.n_obs / 500)), 10).
res – Resolution parameter to use for the Leiden clustering algorithm.
clus_key – Obs column to store the QC clusters in.
umap_key – Obsm key to store the QC UMAP coordinates in.
return_aux – If True, return the auxiliary AnnData object used for generating QC clusters.

Returns:

If return_aux is False, returns None. Otherwise, returns the auxiliary AnnData object used for generating QC clusters.

Raises:

None. –

Examples

>>> import scanpy as sc
>>> import sctk
>>> adata = sc.datasets.pbmc3k()
>>> sctk.calculate_qc(adata)
>>> metrics_list = ["n_counts", "n_genes", "percent_mito", "percent_ribo", "percent_hb"]
>>> sctk.generate_qc_clusters(adata, metrics=metrics_list)