MARS-seq / MARS-seq2.0

(1) I think the sequence of "2nd RT Primer" in Supplementary Table S7 in the original publication (Science 343, 776-779 (2014)) might be in the wrong orientation (i.e. they showed the sequence from 3' -> 5').

(2) The author claimed in the Supplementary Method (Page 8) that UMIs are 4-8 bp in length, but from the oligo sequence in Supplementary Table S7, they are only 4 bp in length. 4 bp were drawn in this workflow here.

(3) The author claimed in the Supplementary Method (Page 8) that plate barcodes are 6 bp in length, but from the oligo sequence in Supplementary Table S8, they seem to be 7 bp in length (4bp + 3 Ns). Maybe only 6 bp was only used to identify a plate. I'm not entirely sure about this.

(4) In May, 2019, MARS-seq2.0 was published in Nature Protocol, and the oligos used in the protocol is almost the same to the original MARS-seq. The cell barcodes and UMIs are longer in MARS-seq2.0. The improvements are mainly related to throughput, robustness, noise reduction and costs. Check the publication for more details. The exact sequences for the RT1 primers and ligation adaptor primers can be found in the Supplementary Table 1 and Supplementary Table 2 of the Nature Protocols paper.

(5) Oligos used in the original MARS-seq were shown in this workflow.


Adapter and primer sequences:

1st RT primer (MARS-seq): 5'- CGATTGAGGCCGGTAATACGACTCACTATAGGGGCGACGTGTGCTCTTCCGATCT[6-bp cell barcode][4-bp UMI]TTTTTTTTTTTTTTTTTTTTN -3'

1st RT primer (MARS-seq2.0): 5'- CGATTGAGGCCGGTAATACGACTCACTATAGGGGCGACGTGTGCTCTTCCGATCT[7-bp cell barcode][8-bp UMI]TTTTTTTTTTTTTTTTTTTTN -3'

T7 promoter: 5'- TAATACGACTCACTATAGGG -3'

Barcode plate ligation adaptor:


        For MARS-seq, they are called Lig_NNNX4_ix[1-8]: 5'/5Phos/- [7-bp plate barcode]AGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'

For MARS-seq2.0, they are called lig_N5X4_ix[1-32]: 5'/5Phos/- [9-bp plate barcode]AGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'

2nd RT primer: 5'- CTACACGACGCTCTTCCGATCT -3'

P5_Rd1_PCR primer: 5'- AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT -3'

P7_Rd2_PCR primer: 5'- CAAGCAGAAGACGGCATACGAGATGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT -3'

Illumina TruSeq Read1 primer: 5'- ACACTCTTTCCCTACACGACGCTCTTCCGATCT -3'

Illumina TruSeq Read2 primer: 5'- GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT -3'

Illumina P5 adapter: 5'- AATGATACGGCGACCACCGAGATCTACAC -3'

Illumina P7 adapter: 5'- CAAGCAGAAGACGGCATACGAGAT -3'

Sequence of Barcode plate ligation adaptor:


For MARS-seq:
    lig_NNNX4_ix1 5'-/5Phos/ GACTNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_NNNX4_ix2 5'-/5Phos/ CATGNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_NNNX4_ix3 5'-/5Phos/ CCAANNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_NNNX4_ix4 5'-/5Phos/ CTGTNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_NNNX4_ix5 5'-/5Phos/ GTAGNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_NNNX4_ix6 5'-/5Phos/ TGATNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_NNNX4_ix7 5'-/5Phos/ ATCANNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_NNNX4_ix8 5'-/5Phos/ TAGANNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'

For MARS-seq 2.0, the are 32 of them, and there 5 Ns in each adaptor:
    lig_N5X4_ix1  5'-/5Phos/GACTNNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix2  5'-/5Phos/CATGNNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix3  5'-/5Phos/CCAANNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix4  5'-/5Phos/CTGTNNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix5  5'-/5Phos/GTAGNNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix6  5'-/5Phos/TGATNNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix7  5'-/5Phos/ATCANNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix8  5'-/5Phos/TAGANNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix9  5'-/5Phos/AAGTNNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix10 5'-/5Phos/GGCGNNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix11 5'-/5Phos/GTTTNNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix12 5'-/5Phos/GCGCNNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix13 5'-/5Phos/GAAANNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix14 5'-/5Phos/TACCNNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix15 5'-/5Phos/CGGANNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix16 5'-/5Phos/CCCTNNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix17 5'-/5Phos/TCAGNNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix18 5'-/5Phos/CTCGNNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix19 5'-/5Phos/CTACNNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix20 5'-/5Phos/CTTANNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix21 5'-/5Phos/TGGCNNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix22 5'-/5Phos/AGCTNNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix23 5'-/5Phos/CAGCNNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix24 5'-/5Phos/ACTTNNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix25 5'-/5Phos/TCTANNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix26 5'-/5Phos/ACCGNNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix27 5'-/5Phos/ATGCNNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix28 5'-/5Phos/GATCNNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix29 5'-/5Phos/GGACNNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix30 5'-/5Phos/GTCCNNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix31 5'-/5Phos/CGAGNNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'
    lig_N5X4_ix32 5'-/5Phos/GCATNNNNNAGATCGGAAGAGCGTCGTGTAG /3SpC3/-3'


Step-by-step library generation (the 5'-/acrydite/iSpPC/ is omitted for simplicity)

(1) Anneal 1st RT primer to mRNA and reverse transcription:


5'- XXXXXXXXXXXXXXXXXXXX(A)n
                 <-----N(T)20[4-bp UMI][6-bp cell barcode]TCTAGCCTTCTCGTGTGCAGCGGGGATATCACTCAGCATAATGGCCGGAGTTAGC -5'

(2) Pool all single cells, and RNaseH and DNA Pol I based second strand synthesis:


5'- XXXXXXXXXXXXXXXXXXXX(pA)[4-bp UMI][6-bp barcode]AGATCGGAAGAGCACACGTCGCCCCTATAGTGAGTCGTATTACCGGCCTCAATCG
    XXXXXXXXXXXXXXXXXXXX(dT)[4-bp UMI][6-bp barcode]TCTAGCCTTCTCGTGTGCAGCGGGGATATCACTCAGCATAATGGCCGGAGTTAGC -5'
                                                                         ↵
                                                                    IVT starts from here

(3) T7 in vitro transcription to amplify cDNA (resulting in single stranded RNA):


5'- GGCGACGUGUGCUCUUCCGAUCU[6-bp cell barcode][4-bp UMI](dU)XXXXXXXXXXXXXXXXXXXXX -3'

(4) Heat fragment the amplified RNA (aRNA), and perform ssDNA/RNA ligation (T4 RNA ligase I) with lig_NNNX4_ix[1-8] with plate barcode:


Due to the 3' block of the lig_NNNX4_ix[1-8], there is only one ligation possibility,
which is the 5' end of the lig_NNNX4_ix[1-8] ligating to 3' of aRNA:

3'- GATGTGCTGCGAGAAGGCTAGA[7-bp plate barcode]XXX...XXX(dU)[4-bp UMI][6-bp cell barcode]UCUAGCCUUCUCGUGUGCAGCGG -5'

(5) Add 2nd RT primer to revesrse transcribe the aRNA:


5'- CTACACGACGCTCTTCCGATCT-------->
3'- GATGTGCTGCGAGAAGGCTAGA[7-bp plate barcode]XXX...XXX(dU)[4-bp UMI][6-bp cell barcode]UCUAGCCUUCUCGUGUGCAGCGG -5'

(6) Resulting first strand cDNA looks like this:


5'- CTACACGACGCTCTTCCGATCT[7-bp plate barcode]XXX...XXX(pA)[4-bp UMI][6-bp cell barcode]AGATCGGAAGAGCACACGTCGCC -3'

(7) Add P5_Rd1_PCR & P7_Rd2_PCR primers for library preparation and amplification:


5'- AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-------------->
                                    5'- CTACACGACGCTCTTCCGATCT[7-bp plate barcode]XXX...XXX(pA)[4-bp UMI][6-bp cell barcode]AGATCGGAAGAGCACACGTCGCC -3'
                                                                                                              <-------------TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTAGAGCATACGGCAGAAGACGAAC -5'

(8) Final library structure (not sure what NNN between Partial Rd1 and 4bp plate barcode is):


5'- AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNXXX...XXX(pA)NNNNNNNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCTCGTATGCCGTCTTCTGCTTG -3'
3'- TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGANNNNNNNXXX...XXX(dT)NNNNNNNNNNTCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTAGAGCATACGGCAGAAGACGAAC -5'
            Illumina P5              Illumina Truseq Read1      7bp     cDNA      4bp   6bp       Illumina Truseq Read2              Illumina P7
                                                               plate              UMI   cell
                                                              barcode                  barcode


Library sequencing:

(1) Add Illumina Truseq Read1 sequencing primer to sequence the first read (bottom strand as template, the first 6 - 7 bp are plate barcode, then followed by cDNA sequence):


                         5'- ACACTCTTTCCCTACACGACGCTCTTCCGATCT----------->
3'- TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGANNNNNNNXXX...XXX(dT)NNNNNNNNNNTCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGTAGAGCATACGGCAGAAGACGAAC -5'

(2) Cluster regeneration, and add Illumina Truseq Read2 sequencing primer to sequence read 2 (top strand as template, these are the cell barcodes and UMI reads, with some dT at the end depending on the cycle numbers):


5'- AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNXXX...XXX(pA)NNNNNNNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCTCGTATGCCGTCTTCTGCTTG -3'
                                                                             <--------------TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG -5'