# Usage

TACOS performs differentially ploidy analysis in order to find those with a statistically significant difference in the transition between phase 1 (control) and phase 2 (case).

To run TACOS you need a data.frame object with cell_name on the first column, the name (label) of the cluster on the second column and median ploidy of each amplicon of each cell in the other columns. This file could be obtained by Mosaic (opens new window).

cell_name label amplicon_1 amplicon_2 amplicon_3 amplicon_4 amplicon_n
AACAACCTAAAGCAGCAT-1-P1 WT 2 2 2 2 2
AACAACCTACACCTCAAC-1-P1 cluster_2 7.111584 1.2653305 2.5542889 1.328682 0.6647728
AACAACTGGCCACAAGGT-1-P1 cluster_2 2.633361 0.3476699 1.9356978 2.686965 0.3287827
AACAATGCATCCAGCTAT-1-P2 cluster_3 1.494538 2.3383582 4.1786981 2.286371 1.5406782
AACAGCAGTATTGGTTCC-1-P2 cluster_3 3.407511 3.8793449 1.7101954 2.249478 5.2756396
AACAGCAGTCCTGACCTC-1-P2 cluster_1 5.006996 0.4637173 0.3197284 1.168645 1.4187598

In addition, you need a data.frame with each row represented by a cluster and the fraction of each cluster on the second and third column of the P1 and P2, respectively.

cluster freq_P1 freq_P2
WT 6.349206 0
cluster_2 0 50.49
cluster_3 28.673835 27.895
cluster_4 35.02304 21.615

First of all you load the data.frame into R

library(TACOS)

#Create output directory
outdir <- "/path/to/outputdir"
cmd <- paste("mkdir", outdir)
system(cmd)


df <- fread("path/to/ploidy.table.csv", sep=",",header=TRUE)
freqcluster <- fread("path/to/freqcluster.table.txt", sep="\t",header=TRUE)
clust <- freqcluster$cluster
amplicon <- colnames(df)[-c(1,2)]
freqcluster <- RelevantClones(freqcluster)
phase1 <- freqcluster$cluster[which(freqcluster$phase == "phase1")]
phase2 <- freqcluster$cluster[which(freqcluster$phase == "phase2")]

The function RelevantClones() adds a tag relative to the specificity of the cluster. In other words, it evaluates the percentage of cells in each cluster and assigns the cluster to phase1 or phase2. For clusters that are present in percentage in both phase1 and phase2, RelevantClones() considers a percentage difference of 15 and then if the percentage difference between phase1 and phase2 is > 15 then the tag "phase1" will be added to the cluster otherwise if the percentage difference between phase1 and phase2 is < -15 then the tag "phase2" will be added to the cluster. If the ploidy difference is between -15 and 15 a tag "Not phase-specific clone" will be added and the cluster will be excluded from downstream analysis.

This is how appears the freqcluster data.frame after RelevantClones function:

cluster freq_P1 freq_P2 phase
WT 6.349206 0 phase1
cluster_2 0 50.49 phase2
cluster_3 28.673835 27.895 Not phase-specific clone
cluster_4 35.02304 21.615 Not phase-specific clone

The function MatrixGen() creates a matrix in which the amplicons are represented in each row and the median ploidy of those within each cluster in each column.

mat <- MatrixGen(dataframe = df,clusters = clust,amplicons = amplicon)
rownames(mat) <- amplicon
colnames(mat) <- clust

To assess overall similarity between cluster we use the R function dist to calculate the Euclidean distance between cluster.

#cluster sample based on distance
cloneDist <- dist(t(mat)) 
cloneDistMatrix <- as.matrix(cloneDist)
colors <- colorRampPalette(rev(brewer.pal(9,"Blues")))(255)

cloneheatmap <- pheatmap(cloneDistMatrix, clustering_distance_rows = cloneDist,
                         clustering_distance_cols = cloneDist,
                         col=colors)

Another way to visualize cluster-to-cluster distances is a principal components analysis (PCA). TACOS provides the function plotPCA() to performs PCA analysis and it returns 2 plot, one with a 2D PCA and the other with the proportion of variance explained by each PCs.

plotPCA(mat,phase1,phase2, height = 7, width = 15)

The core function of this package is DiffPloidy() which performs differential ploidy analysis to find amplicons with a statistically significant altered ploidy in the transition from P1 to P2 considering the labelled clusters obtained by RelevantClones() function.
This function takes as input the data.frame obtained from Mosaic, the phase1 and phase2 clusters, the amplicons, and requires a statistical method to be used for statistical analysis. Currently TACOS allows identification of significantly altered ploidy using 3 methods: the t-test, Wilcoxon rak sum test and Kruskall Wallis. For the 3 methods it applies a Benjamini & Hochberg correction for multiple comparison.

DiffMat <- DiffPloidy(df,phase1,phase2, amplicons = amplicon,method="wilcoxon")

For example, if we use Wilcoxon Rank sum test, the output of DiffPloidy() looks like:

amplicon ploidy_p1 min_p1 max_p1 ploidy_p2 min_p1 max_p2 wilcoxon_pvalue padj clones
amplicon_1 1.96 0 2.91 6.61 0.19 28.78 3.64-e9 1.13e-8 WT-cluster_2
amplicon_2 2.25 0 3.32 4.32 0.89 10.25 1.23-e6 5.68e-8 WT-cluster_2
amplicon_3 2.08 0 6.74 1.3 0.00 3.25 8.64-e4 5.23e-3 WT-cluster_2

The function plotAmplicon() creates a boxplot of the distribution of the ploidy of specific amplicon among specific clusters or not. If specific_clusters= FALSE, the function plot the ploidy distrubution among all the clusters, conversely it accepts the cluster label.

plotAmplicon(df, amplicon = "name_of_the_amplicon", specific_clusters = FALSE, jitter = FALSE)