Improving prokaryotic transposable elements identification using a combination of de novo and profile HMM methods View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2013-12

AUTHORS

Choumouss Kamoun, Thibaut Payen, Aurélie Hua-Van, Jonathan Filée

ABSTRACT

BACKGROUND: Insertion Sequences (ISs) and their non-autonomous derivatives (MITEs) are important components of prokaryotic genomes inducing duplication, deletion, rearrangement or lateral gene transfers. Although ISs and MITEs are relatively simple and basic genetic elements, their detection remains a difficult task due to their remarkable sequence diversity. With the advent of high-throughput genome and metagenome sequencing technologies, the development of fast, reliable and sensitive methods of ISs and MITEs detection become an important challenge. So far, almost all studies dealing with prokaryotic transposons have used classical BLAST-based detection methods against reference libraries. Here we introduce alternative methods of detection either taking advantages of the structural properties of the elements (de novo methods) or using an additional library-based method using profile HMM searches. RESULTS: In this study, we have developed three different work flows dedicated to ISs and MITEs detection: the first two use de novo methods detecting either repeated sequences or presence of Inverted Repeats; the third one use 28 in-house transposase alignment profiles with HMM search methods. We have compared the respective performances of each method using a reference dataset of 30 archaeal and 30 bacterial genomes in addition to simulated and real metagenomes. Compared to a BLAST-based method using ISFinder as library, de novo methods significantly improve ISs and MITEs detection. For example, in the 30 archaeal genomes, we discovered 30 new elements (+20%) in addition to the 141 multi-copies elements already detected by the BLAST approach. Many of the new elements correspond to ISs belonging to unknown or highly divergent families. The total number of MITEs has even doubled with the discovery of elements displaying very limited sequence similarities with their respective autonomous partners (mainly in the Inverted Repeats of the elements). Concerning metagenomes, with the exception of short reads data (<300 bp) for which both techniques seem equally limited, profile HMM searches considerably ameliorate the detection of transposase encoding genes (up to +50%) generating low level of false positives compare to BLAST-based methods. CONCLUSION: Compared to classical BLAST-based methods, the sensitivity of de novo and profile HMM methods developed in this study allow a better and more reliable detection of transposons in prokaryotic genomes and metagenomes. We believed that future studies implying ISs and MITEs identification in genomic data should combine at least one de novo and one library-based method, with optimal results obtained by running the two de novo methods in addition to a library-based search. For metagenomic data, profile HMM search should be favored, a BLAST-based step is only useful to the final annotation into groups and families. More... »

PAGES

700

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/1471-2164-14-700

DOI

http://dx.doi.org/10.1186/1471-2164-14-700

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1028545197

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/24118975


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Archaea", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Bacteria", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Base Sequence", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Computational Biology", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "DNA Transposable Elements", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Databases, Nucleic Acid", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Genome, Archaeal", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Genome, Bacterial", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Inverted Repeat Sequences", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Markov Chains", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Metagenome", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Molecular Sequence Data", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Prokaryotic Cells", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Reference Standards", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Reproducibility of Results", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Evolution, G\u00e9nomes, Comportement, Ecologie", 
          "id": "https://www.grid.ac/institutes/grid.463972.d", 
          "name": [
            "Laboratoire Evolution, G\u00e9nomes, Sp\u00e9ciation, CNRS UPR9034/Universit\u00e9 Paris-Sud, Gif-sur-Yvette, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Kamoun", 
        "givenName": "Choumouss", 
        "id": "sg:person.0614556141.15", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0614556141.15"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Evolution, G\u00e9nomes, Comportement, Ecologie", 
          "id": "https://www.grid.ac/institutes/grid.463972.d", 
          "name": [
            "Laboratoire Evolution, G\u00e9nomes, Sp\u00e9ciation, CNRS UPR9034/Universit\u00e9 Paris-Sud, Gif-sur-Yvette, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Payen", 
        "givenName": "Thibaut", 
        "id": "sg:person.01133112546.30", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01133112546.30"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Evolution, G\u00e9nomes, Comportement, Ecologie", 
          "id": "https://www.grid.ac/institutes/grid.463972.d", 
          "name": [
            "Laboratoire Evolution, G\u00e9nomes, Sp\u00e9ciation, CNRS UPR9034/Universit\u00e9 Paris-Sud, Gif-sur-Yvette, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Hua-Van", 
        "givenName": "Aur\u00e9lie", 
        "id": "sg:person.01330106466.63", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01330106466.63"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Evolution, G\u00e9nomes, Comportement, Ecologie", 
          "id": "https://www.grid.ac/institutes/grid.463972.d", 
          "name": [
            "Laboratoire Evolution, G\u00e9nomes, Sp\u00e9ciation, CNRS UPR9034/Universit\u00e9 Paris-Sud, Gif-sur-Yvette, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Fil\u00e9e", 
        "givenName": "Jonathan", 
        "id": "sg:person.01223015443.31", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01223015443.31"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1128/mmbr.00031-06", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1000098941"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1371/journal.pcbi.1002195", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1001034143"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/bti1018", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1002022998"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/gkq140", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1003390958"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2164-11-44", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1004398472", 
          "https://doi.org/10.1186/1471-2164-11-44"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/gb-2011-12-3-r30", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1008730363", 
          "https://doi.org/10.1186/gb-2011-12-3-r30"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0022-2836(05)80360-2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1013618994"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nrmicro1235", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1021596942", 
          "https://doi.org/10.1038/nrmicro1235"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nrmicro1235", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1021596942", 
          "https://doi.org/10.1038/nrmicro1235"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/hdy.2009.165", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1024011623", 
          "https://doi.org/10.1038/hdy.2009.165"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/hdy.2009.165", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1024011623", 
          "https://doi.org/10.1038/hdy.2009.165"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/gbe/evr077", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1025708121"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/nar/gkh340", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1025846396"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btq461", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1025904619"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1073/pnas.0611553104", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1026323697"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1128/aem.02181-07", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1026780577"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1038/emboj.2010.241", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1032481438"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0168-9525(00)02024-2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1035021159"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.molcel.2007.12.008", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1036002400"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/gbe/evr096", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1037354727"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.gene.2009.01.019", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1037587389"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/molbev/msj085", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1037702207"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2148-8-18", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1042349355", 
          "https://doi.org/10.1186/1471-2148-8-18"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btp033", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1044035298"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1111/j.1558-5646.2011.01395.x", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1046701297"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/molbev/msm014", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1051801709"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://app.dimensions.ai/details/publication/pub.1083313551", 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2013-12", 
    "datePublishedReg": "2013-12-01", 
    "description": "BACKGROUND: Insertion Sequences (ISs) and their non-autonomous derivatives (MITEs) are important components of prokaryotic genomes inducing duplication, deletion, rearrangement or lateral gene transfers. Although ISs and MITEs are relatively simple and basic genetic elements, their detection remains a difficult task due to their remarkable sequence diversity. With the advent of high-throughput genome and metagenome sequencing technologies, the development of fast, reliable and sensitive methods of ISs and MITEs detection become an important challenge. So far, almost all studies dealing with prokaryotic transposons have used classical BLAST-based detection methods against reference libraries. Here we introduce alternative methods of detection either taking advantages of the structural properties of the elements (de novo methods) or using an additional library-based method using profile HMM searches.\nRESULTS: In this study, we have developed three different work flows dedicated to ISs and MITEs detection: the first two use de novo methods detecting either repeated sequences or presence of Inverted Repeats; the third one use 28 in-house transposase alignment profiles with HMM search methods. We have compared the respective performances of each method using a reference dataset of 30 archaeal and 30 bacterial genomes in addition to simulated and real metagenomes. Compared to a BLAST-based method using ISFinder as library, de novo methods significantly improve ISs and MITEs detection. For example, in the 30 archaeal genomes, we discovered 30 new elements (+20%) in addition to the 141 multi-copies elements already detected by the BLAST approach. Many of the new elements correspond to ISs belonging to unknown or highly divergent families. The total number of MITEs has even doubled with the discovery of elements displaying very limited sequence similarities with their respective autonomous partners (mainly in the Inverted Repeats of the elements). Concerning metagenomes, with the exception of short reads data (<300 bp) for which both techniques seem equally limited, profile HMM searches considerably ameliorate the detection of transposase encoding genes (up to +50%) generating low level of false positives compare to BLAST-based methods.\nCONCLUSION: Compared to classical BLAST-based methods, the sensitivity of de novo and profile HMM methods developed in this study allow a better and more reliable detection of transposons in prokaryotic genomes and metagenomes. We believed that future studies implying ISs and MITEs identification in genomic data should combine at least one de novo and one library-based method, with optimal results obtained by running the two de novo methods in addition to a library-based search. For metagenomic data, profile HMM search should be favored, a BLAST-based step is only useful to the final annotation into groups and families.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1186/1471-2164-14-700", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1023790", 
        "issn": [
          "1471-2164"
        ], 
        "name": "BMC Genomics", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "14"
      }
    ], 
    "name": "Improving prokaryotic transposable elements identification using a combination of de novo and profile HMM methods", 
    "pagination": "700", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "87fb177a25de75c84629e8bcce72fbfcf058661f83d8a680bcd11832ddf12cfe"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "24118975"
        ]
      }, 
      {
        "name": "nlm_unique_id", 
        "type": "PropertyValue", 
        "value": [
          "100965258"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/1471-2164-14-700"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1028545197"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/1471-2164-14-700", 
      "https://app.dimensions.ai/details/publication/pub.1028545197"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-11T01:58", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8700_00000505.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "http://link.springer.com/10.1186%2F1471-2164-14-700"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/1471-2164-14-700'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/1471-2164-14-700'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/1471-2164-14-700'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/1471-2164-14-700'


 

This table displays all metadata directly associated to this object as RDF triples.

228 TRIPLES      21 PREDICATES      69 URIs      36 LITERALS      24 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/1471-2164-14-700 schema:about N08ed42d5b70f4764956b6b1962a8acb1
2 N0a48880369a3403f991ccba511ed945b
3 N1b6b81dfd8ef4762878bd2b2f022d65b
4 N1bc9df884f7146449d465a9c64debe18
5 N483c04f2effd426789dda7366a754400
6 N63e89734b15e4ff98dd148e4b5776fa4
7 N63ea85ff1872482b9d38586aec378b73
8 N6ee6c099b3c24049afeb39817457a698
9 N8f90b8d5dccc483289f7e962747252a7
10 Nb0af6c4171ba462ba450f13441468df7
11 Nb54d358e8a8440c68629bf932e0843e3
12 Nc4f3809b99cb4757b0adeb6c9047c15a
13 Ne849e0ba200945aa8c721019a88a6194
14 Nf46f55a711df449c8465f5895afd9f52
15 Nfa04ef3e16354447abe5977a867af978
16 anzsrc-for:06
17 anzsrc-for:0604
18 schema:author N4b0137f83de744288a49e1286ac16709
19 schema:citation sg:pub.10.1038/hdy.2009.165
20 sg:pub.10.1038/nrmicro1235
21 sg:pub.10.1186/1471-2148-8-18
22 sg:pub.10.1186/1471-2164-11-44
23 sg:pub.10.1186/gb-2011-12-3-r30
24 https://app.dimensions.ai/details/publication/pub.1083313551
25 https://doi.org/10.1016/j.gene.2009.01.019
26 https://doi.org/10.1016/j.molcel.2007.12.008
27 https://doi.org/10.1016/s0022-2836(05)80360-2
28 https://doi.org/10.1016/s0168-9525(00)02024-2
29 https://doi.org/10.1038/emboj.2010.241
30 https://doi.org/10.1073/pnas.0611553104
31 https://doi.org/10.1093/bioinformatics/bti1018
32 https://doi.org/10.1093/bioinformatics/btp033
33 https://doi.org/10.1093/bioinformatics/btq461
34 https://doi.org/10.1093/gbe/evr077
35 https://doi.org/10.1093/gbe/evr096
36 https://doi.org/10.1093/molbev/msj085
37 https://doi.org/10.1093/molbev/msm014
38 https://doi.org/10.1093/nar/gkh340
39 https://doi.org/10.1093/nar/gkq140
40 https://doi.org/10.1111/j.1558-5646.2011.01395.x
41 https://doi.org/10.1128/aem.02181-07
42 https://doi.org/10.1128/mmbr.00031-06
43 https://doi.org/10.1371/journal.pcbi.1002195
44 schema:datePublished 2013-12
45 schema:datePublishedReg 2013-12-01
46 schema:description BACKGROUND: Insertion Sequences (ISs) and their non-autonomous derivatives (MITEs) are important components of prokaryotic genomes inducing duplication, deletion, rearrangement or lateral gene transfers. Although ISs and MITEs are relatively simple and basic genetic elements, their detection remains a difficult task due to their remarkable sequence diversity. With the advent of high-throughput genome and metagenome sequencing technologies, the development of fast, reliable and sensitive methods of ISs and MITEs detection become an important challenge. So far, almost all studies dealing with prokaryotic transposons have used classical BLAST-based detection methods against reference libraries. Here we introduce alternative methods of detection either taking advantages of the structural properties of the elements (de novo methods) or using an additional library-based method using profile HMM searches. RESULTS: In this study, we have developed three different work flows dedicated to ISs and MITEs detection: the first two use de novo methods detecting either repeated sequences or presence of Inverted Repeats; the third one use 28 in-house transposase alignment profiles with HMM search methods. We have compared the respective performances of each method using a reference dataset of 30 archaeal and 30 bacterial genomes in addition to simulated and real metagenomes. Compared to a BLAST-based method using ISFinder as library, de novo methods significantly improve ISs and MITEs detection. For example, in the 30 archaeal genomes, we discovered 30 new elements (+20%) in addition to the 141 multi-copies elements already detected by the BLAST approach. Many of the new elements correspond to ISs belonging to unknown or highly divergent families. The total number of MITEs has even doubled with the discovery of elements displaying very limited sequence similarities with their respective autonomous partners (mainly in the Inverted Repeats of the elements). Concerning metagenomes, with the exception of short reads data (<300 bp) for which both techniques seem equally limited, profile HMM searches considerably ameliorate the detection of transposase encoding genes (up to +50%) generating low level of false positives compare to BLAST-based methods. CONCLUSION: Compared to classical BLAST-based methods, the sensitivity of de novo and profile HMM methods developed in this study allow a better and more reliable detection of transposons in prokaryotic genomes and metagenomes. We believed that future studies implying ISs and MITEs identification in genomic data should combine at least one de novo and one library-based method, with optimal results obtained by running the two de novo methods in addition to a library-based search. For metagenomic data, profile HMM search should be favored, a BLAST-based step is only useful to the final annotation into groups and families.
47 schema:genre research_article
48 schema:inLanguage en
49 schema:isAccessibleForFree true
50 schema:isPartOf N643bb2d77a404f5181bee14a76bed627
51 Ndf1850afba5b4c91adb281aa11406842
52 sg:journal.1023790
53 schema:name Improving prokaryotic transposable elements identification using a combination of de novo and profile HMM methods
54 schema:pagination 700
55 schema:productId N706a576af7cd472b883f7bb802e7fa0a
56 N81b1b14fd3434d08867ab5957a987ebe
57 Nc8f6970c8a4a4ae79a7db96d4e9f0269
58 Ndb8d93d87a9e4afd9279ba319998decd
59 Nf36808a89d8047058fe2034870405288
60 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028545197
61 https://doi.org/10.1186/1471-2164-14-700
62 schema:sdDatePublished 2019-04-11T01:58
63 schema:sdLicense https://scigraph.springernature.com/explorer/license/
64 schema:sdPublisher Nae67f11a4fe34ea8976523d56e79f916
65 schema:url http://link.springer.com/10.1186%2F1471-2164-14-700
66 sgo:license sg:explorer/license/
67 sgo:sdDataset articles
68 rdf:type schema:ScholarlyArticle
69 N08ed42d5b70f4764956b6b1962a8acb1 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
70 schema:name Genome, Bacterial
71 rdf:type schema:DefinedTerm
72 N0a48880369a3403f991ccba511ed945b schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
73 schema:name Databases, Nucleic Acid
74 rdf:type schema:DefinedTerm
75 N1b6b81dfd8ef4762878bd2b2f022d65b schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
76 schema:name DNA Transposable Elements
77 rdf:type schema:DefinedTerm
78 N1bc9df884f7146449d465a9c64debe18 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
79 schema:name Prokaryotic Cells
80 rdf:type schema:DefinedTerm
81 N483c04f2effd426789dda7366a754400 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
82 schema:name Molecular Sequence Data
83 rdf:type schema:DefinedTerm
84 N4b0137f83de744288a49e1286ac16709 rdf:first sg:person.0614556141.15
85 rdf:rest Nfab727b6d0794bc8a164b621794c34b9
86 N63e89734b15e4ff98dd148e4b5776fa4 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
87 schema:name Inverted Repeat Sequences
88 rdf:type schema:DefinedTerm
89 N63ea85ff1872482b9d38586aec378b73 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
90 schema:name Markov Chains
91 rdf:type schema:DefinedTerm
92 N643bb2d77a404f5181bee14a76bed627 schema:issueNumber 1
93 rdf:type schema:PublicationIssue
94 N6ee6c099b3c24049afeb39817457a698 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
95 schema:name Genome, Archaeal
96 rdf:type schema:DefinedTerm
97 N701d1032cc674069bf0d859b70216b05 rdf:first sg:person.01223015443.31
98 rdf:rest rdf:nil
99 N706a576af7cd472b883f7bb802e7fa0a schema:name doi
100 schema:value 10.1186/1471-2164-14-700
101 rdf:type schema:PropertyValue
102 N81b1b14fd3434d08867ab5957a987ebe schema:name readcube_id
103 schema:value 87fb177a25de75c84629e8bcce72fbfcf058661f83d8a680bcd11832ddf12cfe
104 rdf:type schema:PropertyValue
105 N8f90b8d5dccc483289f7e962747252a7 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
106 schema:name Archaea
107 rdf:type schema:DefinedTerm
108 Nae67f11a4fe34ea8976523d56e79f916 schema:name Springer Nature - SN SciGraph project
109 rdf:type schema:Organization
110 Nb0af6c4171ba462ba450f13441468df7 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
111 schema:name Computational Biology
112 rdf:type schema:DefinedTerm
113 Nb54d358e8a8440c68629bf932e0843e3 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
114 schema:name Reproducibility of Results
115 rdf:type schema:DefinedTerm
116 Nc366f8323e7044b995bdabfdbf94f679 rdf:first sg:person.01330106466.63
117 rdf:rest N701d1032cc674069bf0d859b70216b05
118 Nc4f3809b99cb4757b0adeb6c9047c15a schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
119 schema:name Metagenome
120 rdf:type schema:DefinedTerm
121 Nc8f6970c8a4a4ae79a7db96d4e9f0269 schema:name pubmed_id
122 schema:value 24118975
123 rdf:type schema:PropertyValue
124 Ndb8d93d87a9e4afd9279ba319998decd schema:name nlm_unique_id
125 schema:value 100965258
126 rdf:type schema:PropertyValue
127 Ndf1850afba5b4c91adb281aa11406842 schema:volumeNumber 14
128 rdf:type schema:PublicationVolume
129 Ne849e0ba200945aa8c721019a88a6194 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
130 schema:name Reference Standards
131 rdf:type schema:DefinedTerm
132 Nf36808a89d8047058fe2034870405288 schema:name dimensions_id
133 schema:value pub.1028545197
134 rdf:type schema:PropertyValue
135 Nf46f55a711df449c8465f5895afd9f52 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
136 schema:name Base Sequence
137 rdf:type schema:DefinedTerm
138 Nfa04ef3e16354447abe5977a867af978 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
139 schema:name Bacteria
140 rdf:type schema:DefinedTerm
141 Nfab727b6d0794bc8a164b621794c34b9 rdf:first sg:person.01133112546.30
142 rdf:rest Nc366f8323e7044b995bdabfdbf94f679
143 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
144 schema:name Biological Sciences
145 rdf:type schema:DefinedTerm
146 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
147 schema:name Genetics
148 rdf:type schema:DefinedTerm
149 sg:journal.1023790 schema:issn 1471-2164
150 schema:name BMC Genomics
151 rdf:type schema:Periodical
152 sg:person.01133112546.30 schema:affiliation https://www.grid.ac/institutes/grid.463972.d
153 schema:familyName Payen
154 schema:givenName Thibaut
155 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01133112546.30
156 rdf:type schema:Person
157 sg:person.01223015443.31 schema:affiliation https://www.grid.ac/institutes/grid.463972.d
158 schema:familyName Filée
159 schema:givenName Jonathan
160 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01223015443.31
161 rdf:type schema:Person
162 sg:person.01330106466.63 schema:affiliation https://www.grid.ac/institutes/grid.463972.d
163 schema:familyName Hua-Van
164 schema:givenName Aurélie
165 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01330106466.63
166 rdf:type schema:Person
167 sg:person.0614556141.15 schema:affiliation https://www.grid.ac/institutes/grid.463972.d
168 schema:familyName Kamoun
169 schema:givenName Choumouss
170 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0614556141.15
171 rdf:type schema:Person
172 sg:pub.10.1038/hdy.2009.165 schema:sameAs https://app.dimensions.ai/details/publication/pub.1024011623
173 https://doi.org/10.1038/hdy.2009.165
174 rdf:type schema:CreativeWork
175 sg:pub.10.1038/nrmicro1235 schema:sameAs https://app.dimensions.ai/details/publication/pub.1021596942
176 https://doi.org/10.1038/nrmicro1235
177 rdf:type schema:CreativeWork
178 sg:pub.10.1186/1471-2148-8-18 schema:sameAs https://app.dimensions.ai/details/publication/pub.1042349355
179 https://doi.org/10.1186/1471-2148-8-18
180 rdf:type schema:CreativeWork
181 sg:pub.10.1186/1471-2164-11-44 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004398472
182 https://doi.org/10.1186/1471-2164-11-44
183 rdf:type schema:CreativeWork
184 sg:pub.10.1186/gb-2011-12-3-r30 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008730363
185 https://doi.org/10.1186/gb-2011-12-3-r30
186 rdf:type schema:CreativeWork
187 https://app.dimensions.ai/details/publication/pub.1083313551 schema:CreativeWork
188 https://doi.org/10.1016/j.gene.2009.01.019 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037587389
189 rdf:type schema:CreativeWork
190 https://doi.org/10.1016/j.molcel.2007.12.008 schema:sameAs https://app.dimensions.ai/details/publication/pub.1036002400
191 rdf:type schema:CreativeWork
192 https://doi.org/10.1016/s0022-2836(05)80360-2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1013618994
193 rdf:type schema:CreativeWork
194 https://doi.org/10.1016/s0168-9525(00)02024-2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035021159
195 rdf:type schema:CreativeWork
196 https://doi.org/10.1038/emboj.2010.241 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032481438
197 rdf:type schema:CreativeWork
198 https://doi.org/10.1073/pnas.0611553104 schema:sameAs https://app.dimensions.ai/details/publication/pub.1026323697
199 rdf:type schema:CreativeWork
200 https://doi.org/10.1093/bioinformatics/bti1018 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002022998
201 rdf:type schema:CreativeWork
202 https://doi.org/10.1093/bioinformatics/btp033 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044035298
203 rdf:type schema:CreativeWork
204 https://doi.org/10.1093/bioinformatics/btq461 schema:sameAs https://app.dimensions.ai/details/publication/pub.1025904619
205 rdf:type schema:CreativeWork
206 https://doi.org/10.1093/gbe/evr077 schema:sameAs https://app.dimensions.ai/details/publication/pub.1025708121
207 rdf:type schema:CreativeWork
208 https://doi.org/10.1093/gbe/evr096 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037354727
209 rdf:type schema:CreativeWork
210 https://doi.org/10.1093/molbev/msj085 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037702207
211 rdf:type schema:CreativeWork
212 https://doi.org/10.1093/molbev/msm014 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051801709
213 rdf:type schema:CreativeWork
214 https://doi.org/10.1093/nar/gkh340 schema:sameAs https://app.dimensions.ai/details/publication/pub.1025846396
215 rdf:type schema:CreativeWork
216 https://doi.org/10.1093/nar/gkq140 schema:sameAs https://app.dimensions.ai/details/publication/pub.1003390958
217 rdf:type schema:CreativeWork
218 https://doi.org/10.1111/j.1558-5646.2011.01395.x schema:sameAs https://app.dimensions.ai/details/publication/pub.1046701297
219 rdf:type schema:CreativeWork
220 https://doi.org/10.1128/aem.02181-07 schema:sameAs https://app.dimensions.ai/details/publication/pub.1026780577
221 rdf:type schema:CreativeWork
222 https://doi.org/10.1128/mmbr.00031-06 schema:sameAs https://app.dimensions.ai/details/publication/pub.1000098941
223 rdf:type schema:CreativeWork
224 https://doi.org/10.1371/journal.pcbi.1002195 schema:sameAs https://app.dimensions.ai/details/publication/pub.1001034143
225 rdf:type schema:CreativeWork
226 https://www.grid.ac/institutes/grid.463972.d schema:alternateName Evolution, Génomes, Comportement, Ecologie
227 schema:name Laboratoire Evolution, Génomes, Spéciation, CNRS UPR9034/Université Paris-Sud, Gif-sur-Yvette, France
228 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...