rectanglepy.pp.build_rectangle_signatures

rectanglepy.pp.build_rectangle_signatures(adata, cell_type_col='cell_type', bulks=None, *, optimize_cutoffs=True, layer=None, raw=False, p=0.015, lfc=1.5, n_cpus=None, gene_expression_threshold=0.5)

Builds rectangle signatures based on single-cell count data and annotations.

Parameters:
  • adata (AnnData) – The single-cell count data as a DataFrame. DataFrame must have the genes as index and cell identifier as columns. Each entry should be in raw counts.

  • bulks (Optional[DataFrame] (default: None)) – The bulk data as a DataFrame. DataFrame must have the bulk identifier as index and the genes as columns. Each entry should be in transcripts per million (TPM).

  • cell_type_col (str (default: 'cell_type')) – The annotations corresponding to the single-cell count data. Series data should have the cell identifier as index and the annotations as values.

  • layer (Optional[str] (default: None)) – The Anndata layer to use for the single-cell data. Defaults to None.

  • raw (bool (default: False)) – A flag indicating whether to use the raw Anndata data. Defaults to False.

  • optimize_cutoffs (default: True) – Indicates whether to optimize the p-value and log fold change cutoffs using gridsearch. Defaults to True.

  • p (default: 0.015) – The p-value threshold for the DE analysis (only used if optimize_cutoffs is False).

  • lfc (default: 1.5) – The log fold change threshold for the DE analysis (only used if optimize_cutoffs is False).

  • n_cpus (Optional[int] (default: None)) – The number of cpus to use for the DE analysis. Defaults to the number of cpus available.

  • gene_expression_threshold (default: 0.5) – The gene expression threshold for the DE analysis. How many cells need to express a gene to be considered in DGE

Return type:

RectangleSignatureResult

Returns:

The result of the rectangle signature analysis which is of type RectangleSignatureResult.