tRNA 1.0.0
The tRNA
package allows tRNA sequences and structures to be accessed and used
for subsetting. In addition, it provides visualization tools to compare feature
parameters of multiple tRNA sets and correlate them to additional data. The
package expects a GRanges
object with certain columns as input. The following
columns are a requirement: tRNA_length
, tRNA_type
, tRNA_anticodon
,
tRNA_seq
, tRNA_str
, tRNA_CCA.end
.
To work with the tRNA packages, tRNA information can be retrieved in a number of ways:
GRanges
object can be constructed by hand containing the
required colums (see above)import.tRNAscanAsGRanges()
from tRNAscanImport
packageimport.tRNAdb()
from tRNAdbImport
packageFor this vignette a predefined GRanges
object is loaded.
library(tRNA)
data("gr", package = "tRNA", envir = environment())
To retrieve the sequences for individual tRNA structure elements
gettRNAstructureGRanges
or gettRNAstructureSeqs
can be used.
# just get the coordinates of the anticodonloop
gettRNAstructureGRanges(gr,
structure = "anticodonLoop")
## $anticodonLoop
## IRanges object with 299 ranges and 0 metadata columns:
## start end width
## <integer> <integer> <integer>
## TGG 31 37 7
## TGC 32 38 7
## CAA 31 37 7
## AGA 31 37 7
## TAA 31 37 7
## ... ... ... ...
## CAT 32 38 7
## GAA 31 37 7
## TTA 31 37 7
## TAC 32 38 7
## CAT 32 38 7
gettRNAstructureSeqs(gr,
joinFeatures = TRUE,
structure = "anticodonLoop")
## $anticodonLoop
## A DNAStringSet instance of length 299
## width seq names
## [1] 7 TTTGGGT TGG
## [2] 7 CTTGCAA TGC
## [3] 7 TTCAAGC CAA
## [4] 7 TTAGAAA AGA
## [5] 7 CTTAAGA TAA
## ... ... ...
## [295] 7 CTCATAA CAT
## [296] 7 TTGAAGA GAA
## [297] 7 TTTTAGT TTA
## [298] 7 TTTACAC TAC
## [299] 7 GTCATGA CAT
In addition the sequences can be returned already joined to get a fully blank
padded set of sequences. The boundaries of the individual structures is returned
as metadata of the RNAStringSet
object.
seqs <- gettRNAstructureSeqs(gr[1:10],
joinCompletely = TRUE)
seqs
## A DNAStringSet instance of length 10
## width seq
## [1] 85 GGGCGTGTGGTC-TAGT-GGTAT-GATTCTCG...----GCCTGGGTTCAATTCCCAGCTCGCCCC
## [2] 85 GGGCACATGGCGCAGTT-GGT-AGCGCGCTTC...----GCATCGGTTCGATTCCGGTTGCGTCCA
## [3] 85 GGTTGTTTGGCC-GAGC-GGTAA-GGCGCCTG...-GATGCAAGAGTTCGAATCTCTTAGCAACCA
## [4] 85 GGCAACTTGGCC-GAGT-GGTAA-GGCGAAAG...GCCCGCGCAGGTTCGAGTCCTGCAGTTGTCG
## [5] 85 GGAGGGTTGGCC-GAGT-GGTAA-GGCGGCAG...GTCCGCGCGAGTTCGAACCTCGCATCCTTCA
## [6] 85 GCGGATTTAGCTCAGTT-GGG-AGAGCGCCAG...----GCCTGTGTTCGATCCACAGAATTCGCA
## [7] 85 GGTCTCTTGGCC-CAGTTGGTAA-GGCACCGT...----ACAGCGGTTCGATCCCGCTAGAGACCA
## [8] 85 GCGCAAGTGGTTTAGT--GGT-AAAATCCAAC...-----CCCCGGTTCGATTCCGGGCTTGCGCA
## [9] 85 GGCAACTTGGCC-GAGT-GGTAA-GGCGAAAG...GCCCGCGCAGGTTCGAGTCCTGCAGTTGTCG
## [10] 85 GCTTCTATGGCC-AAGTTGGTAA-GGCGCCAC...----ACATCGGTTCAAATCCGATTGGAAGCA
# getting the tRNA structure boundaries
metadata(seqs)[["tRNA_structures"]]
## IRanges object with 15 ranges and 0 metadata columns:
## start end width
## <integer> <integer> <integer>
## acceptorStem.prime5 1 7 7
## Dprime5 8 9 2
## DStem.prime5 10 13 4
## Dloop 14 23 10
## DStem.prime3 24 27 4
## ... ... ... ...
## TStem.prime5 61 65 5
## Tloop 66 72 7
## TStem.prime3 73 77 5
## acceptorStem.prime3 78 84 7
## discriminator 85 85 1
Be aware, that gettRNAstructureGRanges
and gettRNAstructureSeqs
might not be
working as expected, if the tRNA sequences in questions are armless or deviate a
lot from the canonical tRNA model. It was thouroughly tested using human
mitochondrial tRNA and other tRNAs missing certain features. However, for fringe
cases this might not be the case. Please report these cases, if you find them,
with an example.
The structure information can be queried for subsetting using the function
hasAccpeptorStem()
as an example. Please have a look at?hasAccpeptorStem
for
all available subsetting functions.
gr[hasAcceptorStem(gr, unpaired = TRUE)]
# mismatches and bulged are subsets of unpaired
gr[hasAcceptorStem(gr, mismatches = TRUE)]
gr[hasAcceptorStem(gr, bulged = TRUE)]
# combination of different structure parameters
gr[hasAcceptorStem(gr, mismatches = TRUE) &
hasDloop(gr, length = 8)]
To get an overview of tRNA feature and compare different datasets,
gettRNAFeaturePlots
can be used. It accepts a named GRangesList
as input.
Internally it will calculate a list of features values based on the functions
mentioned above and the data contained in the mcols of the GRanges
objects.
The results can be retrieved without generating plots using gettRNASummary
.
# load tRNA data for E. coli and H. sapiens
data("gr_eco", package = "tRNA", envir = environment())
data("gr_human", package = "tRNA", envir = environment())
# get summary plots
grl <- GRangesList(Sce = gr,
Hsa = gr_human,
Eco = gr_eco)
plots <- gettRNAFeaturePlots(grl)
plots$length
plots$tRNAscan_score
plots$gc
plots$tRNAscan_intron
plots$variableLoop_length
To check whether features correlate with additional scores, optional arguments
can be added to gettRNAFeaturePlots
or used from the score
column of the
GRanges
objects. For the first option provided a list of scores with the same
dimensions as the GRangesList
object as argument scores
and for the latter
set plotScore = TRUE
.
# score column will be used
plots <- gettRNAFeaturePlots(grl,
plotScores = TRUE)
plots <- gettRNAFeaturePlots(grl,
scores = list(runif(length(grl[[1]]),0,100),
runif(length(grl[[2]]),0,100),
runif(length(grl[[3]]),0,100)))
plots$length
plots$variableLoop_length
The plots can be of course modified manually and changed to suit your needs.
plots$length$layers <- plots$length$layers[c(-1,-2)]
plots$length + ggplot2::geom_boxplot()
The colours of the plots can be customized directly on creation with the following options.
options("tRNA_colour_palette")
## $tRNA_colour_palette
## [1] "Set1"
options("tRNA_colour_yes")
## $tRNA_colour_yes
## [1] "green"
options("tRNA_colour_no")
## $tRNA_colour_no
## [1] "red"
To retrieve detailed information on the base pairing gettRNABasePairing()
or
getBasePairing()
can be used depending on the input type. The DotBracket
annotation is expected with pairs of <>{}[]()
(Please note the orientation.
For <>
the orientation is variable, since the tRNAscan files use the ><
annotation by default).
head(gettRNABasePairing(gr)[[1]])
head(getBasePairing(gr[1]$tRNA_str)[[1]])
The loop ids for the structure elements can be retrieved with gettRNALoopIDs()
or getLoopIDs()
. Please not that pseudoloops are currently not correctly
reflected in the output of these functions.
gettRNALoopIDs(gr)[[1]]
## [1] 1 2 3 4 5 5 6 6 6 7 8 9 9 9 9 9 9 9 9 9 9 9 8 7
## [25] 6 10 11 12 13 14 14 14 14 14 14 14 14 14 13 12 11 10 6 6 6 6 15 16
## [49] 17 18 19 19 19 19 19 19 19 19 19 18 17 16 15 6 5 5 4 3 2 1 0
getLoopIDs(gr[1]$tRNA_str)
## $`1`
## [1] 1 2 3 4 5 5 6 6 6 7 8 9 9 9 9 9 9 9 9 9 9 9 8 7
## [25] 6 10 11 12 13 14 14 14 14 14 14 14 14 14 13 12 11 10 6 6 6 6 15 16
## [49] 17 18 19 19 19 19 19 19 19 19 19 18 17 16 15 6 5 5 4 3 2 1 0
sessionInfo()
## R version 3.5.1 Patched (2018-07-12 r74967)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.8-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.8-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] tRNA_1.0.0 GenomicRanges_1.34.0 GenomeInfoDb_1.18.0
## [4] IRanges_2.16.0 S4Vectors_0.20.0 BiocGenerics_0.28.0
## [7] BiocStyle_2.10.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.19 assertive.properties_0.0-4
## [3] assertive.types_0.0-3 assertive.data.us_0.0-2
## [5] Biostrings_2.50.0 assertthat_0.2.0
## [7] rprojroot_1.3-2 digest_0.6.18
## [9] R6_2.3.0 plyr_1.8.4
## [11] backports_1.1.2 evaluate_0.12
## [13] assertive.code_0.0-3 ggplot2_3.1.0
## [15] highr_0.7 pillar_1.3.0
## [17] assertive.strings_0.0-3 zlibbioc_1.28.0
## [19] rlang_0.3.0.1 lazyeval_0.2.1
## [21] assertive_0.3-5 assertive.data_0.0-1
## [23] rmarkdown_1.10 labeling_0.3
## [25] stringr_1.3.1 RCurl_1.95-4.11
## [27] munsell_0.5.0 compiler_3.5.1
## [29] xfun_0.4 pkgconfig_2.0.2
## [31] htmltools_0.3.6 tidyselect_0.2.5
## [33] tibble_1.4.2 GenomeInfoDbData_1.2.0
## [35] bookdown_0.7 assertive.sets_0.0-3
## [37] codetools_0.2-15 crayon_1.3.4
## [39] dplyr_0.7.7 withr_2.1.2
## [41] bitops_1.0-6 grid_3.5.1
## [43] jsonlite_1.5 assertive.base_0.0-7
## [45] gtable_0.2.0 magrittr_1.5
## [47] assertive.models_0.0-2 scales_1.0.0
## [49] stringi_1.2.4 XVector_0.22.0
## [51] assertive.matrices_0.0-1 bindrcpp_0.2.2
## [53] assertive.reflection_0.0-4 assertive.datetimes_0.0-2
## [55] RColorBrewer_1.1-2 tools_3.5.1
## [57] glue_1.3.0 purrr_0.2.5
## [59] assertive.numbers_0.0-2 yaml_2.2.0
## [61] colorspace_1.3-2 BiocManager_1.30.3
## [63] assertive.files_0.0-2 assertive.data.uk_0.0-2
## [65] knitr_1.20 bindr_0.1.1