About Illumina sequencing libraries

The nature of Illumina sequencing is still sequencing by synthesis (SBS). Any SBS method requires the presence of adaptors, which are short double-stranded DNA oligos whose sequences are known to us. The adaptors are designed by scientists, and there are a few popular adaptor sequences that are used in the NGS field. The following three (actually two main types to be honest) are the most popular ones:

Truseq Single Index Library:


5'- AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-insert-AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
3'- TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA-insert-TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'
          Illumina P5                   Truseq Read 1                        Truseq Read 2                 i7        Illumina P7

Truseq Dual Index Library:


5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCT-insert-AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA-insert-TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'
          Illumina P5               i5            Truseq Read 1                          Truseq Read 2                 i7        Illumina P7

Nextera Dual Index Library:


5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNTCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-insert-CTGTCTCTTATACACATCTCCGAGCCCACGAGACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNAGCAGCCGTCGCAGTCTACACATATTCTCTGTC-insert-GACAGAGAATATGTGTAGAGGCTCGGGTGCTCTGNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'
           Illumina P5              i5             Nextera Read 1                                Nextera Read 2        i7         Illumina P7

There are some other adaptors that can be used, but they either become obsolete or not used usually. If you are interested, you can check the Illumina adapter sequences document for full details. Basically, the nature of an Illumina library preparation is the process of adding those coloured adaptor seuqences to both sides of the DNA of your interest, which is the "-insert-" bit in the above examples. All the commercial kits you buy for the library preparation do EXACTLY that, no matter where your buy them, no matter what you want to sequence. In addition, you can also mix different types of adaptors due to the reaons mentioned in the "Library sequencing" section below. For example, you can use Truseq Read 1 at the left hand side, and Nextera Read 2 at the right hand side. Or you can use Nextera Read 1 at the left hand side and Truseq Read 2 at the right hand side. It is toally fine but not recommended for beginners. The "N"s in the above examples are the indices, or barcodes that discriminate different samples. The index at the right hand side is often called "i7", or index1, which is the index in the P7 primer; the index at the left hand side is called "i5", or index2, which is the index in the P5 primer. This is because the index "i7" is sequenced first before "i5" is sequenced.

Now, the mission is: how to add those adaptors? Well, this is where you can become creative. In A profusion of confusion in NGS methods naming, Hadfield and Retief mentioned over 300 NGS methods in their Enseqlopedia wiki. In this GitHub repository, many single cell genomics methods are documented. ALL of them do and ONLY do one thing: add those adaptors to the sides of the DNA they want to sequence. Then, what is the difference among all those methods? They differ from how they add those adaptors. The purpose of this "scg_lib_structs" repository is simply to document how those single cell methods add the coloured adaptors.

Another important detail is that we should know the only required seqeuences are the following two:


Illumina P5 adaptor: 5'- AATGATACGGCGACCACCGAGATCTACAC -3'
Illumina P7 adaptor: 5'- CAAGCAGAAGACGGCATACGAGAT -3'

If you use Illumina sequencing, you have to use and make sure they are at the sides of your DNA fragments, like shown in the three examples above. The adaptor sequences in the middle like Truseq Read 1, Truseq Read 2, Nextera Read 1 and Nextera Read 2 can be changed. If you change them, you have to add your own sequencing primers to the machine during sequencing. A few single cell genomic methods listed in this GitHub page used their own adaptors.

Library sequencing:

Once those adaptors are added propertly, we are ready to sequence them using Illumina machines. In the sequencing reagents provided by Illumina, the seuqencing primers are actually a mixture of different primers, including Truseq, Nextera and even those primers from kits that are obsolete. Therefore, you actually can sequence different types of libraries together. For example, Truseq librareis and Nextera libraries can be mixed together and sequenced together without any problem. There are some slight differences among different Illumina machines, in terms of how they sequence Read 1, Read 2, Index 1 and Index 2. Check this document for more details.

(Step 1) Add Read 1 sequencing primer mixture to sequence the first read (bottom strand as template):


Truseq Single Index Library:

                         5'- ACACTCTTTCCCTACACGACGCTCTTCCGATCT---->
3'- TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA-insert-TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'


Truseq Dual Index Library:

                                     5'- ACACTCTTTCCCTACACGACGCTCTTCCGATCT---->
3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA-insert-TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'


Nextera Dual Index Library:

                                     5'- TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG------>
3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNAGCAGCCGTCGCAGTCTACACATATTCTCTGTC-insert-GACAGAGAATATGTGTAGAGGCTCGGGTGCTCTGNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'

(Step 2) Add Index 1 sequencing primer mixture to sequence the first index (index 1, i7, bottom strand as template):


Truseq Single Index Library:

                                                                   5'- GATCGGAAGAGCACACGTCTGAACTCCAGTCAC------->
3'- TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA-insert-TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'


Truseq Dual Index Library:

                                                                               5'- GATCGGAAGAGCACACGTCTGAACTCCAGTCAC------->
3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA-insert-TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'


Nextera Dual Index Library:

                                                                              5'- CTGTCTCTTATACACATCTCCGAGCCCACGAGAC------->
3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNAGCAGCCGTCGCAGTCTACACATATTCTCTGTC-insert-GACAGAGAATATGTGTAGAGGCTCGGGTGCTCTGNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'

(Step 3 of Miseq, Hiseq2000/2500, MiniSeq (Rapid) and NovaSeq 6000 v1.0) Folds over and sequence the second index (index 2, i5, bottom strand as template):


Truseq Single Index Library (not really needed):

5'- AATGATACGGCGACCACCGAGATCTACAC------->
3'- TTACTATGCCGCTGGTGGCTCTAGATGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA-insert-TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'


Truseq Dual Index Library:

5'- AATGATACGGCGACCACCGAGATCTACAC------->
3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA-insert-TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'


Nextera Dual Index Library:

5'- AATGATACGGCGACCACCGAGATCTACAC------->
3'- TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNAGCAGCCGTCGCAGTCTACACATATTCTCTGTC-insert-GACAGAGAATATGTGTAGAGGCTCGGGTGCTCTGNNNNNNNNTAGAGCATACGGCAGAAGACGAAC -5'

(Step 3 of iSeq 100, MiniSeq (Standard), NextSeq, HiSeq X, HiSeq 3000/4000 and NovaSeq 6000 v1.5) Add Index 2 sequencing primer mixture to sequence the second index (index 2, i5, top strand as template):


Truseq Single Index Library:

5'- AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-insert-AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
                     <-------TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA -5'


Truseq Dual Index Library:

5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCT-insert-AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
                                 <-------TGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGA -5'


Nextera Dual Index Library:

5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNTCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-insert-CTGTCTCTTATACACATCTCCGAGCCCACGAGACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
                                 <-------AGCAGCCGTCGCAGTCTACACATATTCTCTGTC -5'

(Step 4) Cluster regeneration, add Read 2 sequencing primer mixture to sequence the second read (top strand as template):


Truseq Single Index Library:

5'- AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT-insert-AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
                                                               <------TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG -5'


Truseq Dual Index Library:

5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNACACTCTTTCCCTACACGACGCTCTTCCGATCT-insert-AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
                                                                           <------TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG -5'


Nextera Dual Index Library:

5'- AATGATACGGCGACCACCGAGATCTACACNNNNNNNNTCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-insert-CTGTCTCTTATACACATCTCCGAGCCCACGAGACNNNNNNNNATCTCGTATGCCGTCTTCTGCTTG -3'
                                                                           <------GACAGAGAATATGTGTAGAGGCTCGGGTGCTCTG -5'