What if my data comes from a long-read sequencing technology…
When working with long-read sequencing (LRS), you will typically obtain full-length transcripts with IDs that are specific to your experiment, and therefore are not supported by tappAS. Still, you can use tappAS to study your LRS dataset but additional steps are required.
To make LRS data compatible with tappAS, we make use of the SQANTI3 and IsoAnnotLite tools.
(The current version of IsoAnnotLite is 2.7.3, make sure you use the last version of the script).
SQANTI3 is a pipeline for the structural characterization of isoforms obtained by full-length transcript sequencing. SQANTI3 takes full-length transcript sequences in FASTA format, which can be obtained after Iso-seq3 (PacBio) or FLAIR (Nanopore) processing. The only additional requirement is that the species genome and transcriptome files are available, therefore SQANTI3 is restricted today to work with sequenced species. SQANTI3 provides a wide range of descriptors of transcript quality and generates a graphical report to aid in the interpretation of LRS results. More information on SQANTI3 can be obtained here.
There are two ways to transform your SQANTI3 output to tappAS:
- Transform structural information. In this scenario, the SQANTI3 transcript types (FSM, ISM, NIC, etc) are processed into a tappAS gff3 file containing this information. This can be fed to tappAS to visualize transcript models and study differential expression, isoform usage, and UTR analysis. However, no functional analysis is possible here.
- Transform structural information and add functional data. This is the recommended option if your species is supported by tappAS. In this case, you make use of the tappAS species specific GFF3 to map functional elements to your LRS dataset. Please, bear in mind that in this case, you must run SQANTI3 with the same reference genome annotation as used in tappAS.
IsoAnnotLite is a python3 script that takes a SQANTI3 output, and optionally a tappAS precomputed gff3 file, and returns a new gff3 file fully compatible with the tappAS software.
How to proceed
There are four basic steps you need to follow:
As an example, the next command will be a form to call the script with a gff3 reference to use as annotation:
- Use SQANTI with your LRS fasta file to obtain «_corrected.gtf», «_classification.txt» and «_junctions.txt» files.
- If your species is supported by tappAS, download the corresponding GFF3 here.
- Download the IsoAnnotLite, script and unzip.
- Run the basic IsoAnnotLite command as indicated below (note that all arguments are optional [use -h for more information]).
python3 IsoAnnotLite.py my_corrected.gtf my_classification.txt my_junctions.txt -gff3 tappAS.gff3 -o output_name_newGFF3 -stdout output_name_statisticalResults -novel -nointronic -saveTranscriptIDs
Argument -novel allows IsoAnnotLite to compare every transcript against all the transcripts that belong to the same gene. This procedure takes more time but gets more annotations. However, it is not recommended for those transcripts that already have a reference in the GFF3 file.
If you do not use it, IsoAnnotLite will use this method only for those transcripts that do have not a reference transcript in the SQANTI output or do not have features information to get from the reference transcript (novel transcripts).
Example of use:
python3 IsoAnnotLite.py PacBio_corrected.gtf PacBio_classification.txt PacBio_junctions.txt -gff3 Mus_musculus_Ensembl_86.gff3
Note that SQANTI3 files should be provided in the order indicated above. In case of problems, use the argument «-h» to get help.