Comparative Analysis Of Completely Sequenced Genomes View Homepage


Ontology type: schema:MonetaryGrant     


Grant Info

YEARS

2009-2019

FUNDING AMOUNT

26366012.0 USD

ABSTRACT

The rapidly growing database of completely and nearly completely sequenced genomes of bacteria, archaea, eukaryotes and viruses (several thousand genomes already available and many more in progress) creates both extensive new opportunities and major new challenges for genome research. During the year in review, we performed a variety of studies that took advantage of the genomic information to establish fundamental principles of genome evolution. To a large extent, we have focused on cancer genome evolution. Cancer arises through the accumulation of somatic mutations over time. Understanding the sequence of mutation occurrence during cancer progression can assist early and accurate diagnosis and improve clinical decision-making. Here we employ long short-term memory (LSTM) networks, a class of recurrent neural network, to learn the evolution of a tumor through an ordered sequence of mutations. We demonstrate the capacity of LSTMs to learn complex dynamics of the mutational time series governing tumor progression, allowing accurate prediction of the mutational burden and the occurrence of mutations in the sequence. Using the probabilities learned by the LSTM, we simulate mutational data and show that the simulation results are statistically indistinguishable from the empirical data. We identify passenger mutations that are significantly associated with established cancer drivers in the sequence and demonstrate that the genes carrying these mutations are substantially enriched in interactions with the corresponding driver genes. Breaking the network into modules consisting of driver genes and their interactors, we show that these interactions are associated with poor patient prognosis, thus likely conferring growth advantage for tumor progression. Thus, application of LSTM provides for prediction of numerous additional conditional drivers and reveals hitherto unknown aspects of cancer evolution. In another cancer genomics project, we explored proteomic and genomic signatures of repeat instability in cancer and adjacent normal tissues. Repetitive sequences are hotspots of evolution at multiple levels. However, due to difficulties involved in their assembly and analysis, the role of repeats in tumor evolution is poorly understood. We developed a rigorous motif-based methodology to quantify variations in the repeat content, beyond microsatellites, in proteomes and genomes directly from proteomic and genomic raw data. This method was applied to a wide range of tumors and normal tissues. We identify high similarity between repeat instability patterns in tumors and their patient-matched adjacent normal tissues. Nonetheless, tumor-specific signatures both in protein expression and in the genome strongly correlate with cancer progression and robustly predict the tumorigenic state. In a patient, the hierarchy of genomic repeat instability signatures accurately reconstructs tumor evolution, with primary tumors differentiated from metastases. We observe an inverse relationship between repeat instability and point mutation load within and across patients independent of other somatic aberrations. Thus, repeat instability is a distinct, transient, and compensatory adaptive mechanism in tumor evolution and a potential signal for early detection. Additionally, we have continued intensive research into evolutionary genomics of viruses and antivirus defense systems. In particular, we carried out a detailed investigation of CRISPR-Cas systems encoded in mobile genetic elements and involved in counter-defence and other functions. The principal function of CRISPR-Cas systems in archaea and bacteria is defence against mobile genetic elements (MGEs), including viruses, plasmids and transposons. However, the relationships between CRISPR-Cas and MGEs are far more complex. Several classes of MGE contributed to the origin and evolution of CRISPR-Cas, and, conversely, CRISPR-Cas systems and their components were recruited by various MGEs for functions that remain largely uncharacterized. We investigated and substantially expanded the range of CRISPR-Cas components carried by MGEs. Three groups of Tn7-like transposable elements encode 'minimal' type I CRISPR-Cas derivatives capable of target recognition but not cleavage, and another group encodes an inactivated type V variant. These partially inactivated CRISPR-Cas variants might mediate guide RNA-dependent integration of the respective transposons. Numerous plasmids and some prophages encode type IV systems, with similar predicted properties, that appear to contribute to competition among plasmids and between plasmids and viruses. Many prokaryotic viruses also carry CRISPR mini-arrays, some of which recognize other viruses and are implicated in inter-virus conflicts, and solitary repeat units, which could inhibit host CRISPR-Cas systems. We also have developed a general theory of the origin of viruses from primordial replicators that various cellular proteins as capsid formation. Viruses are ubiquitous parasites of cellular life and the most abundant biological entities on Earth. It is widely accepted that viruses are polyphyletic, but a consensus scenario for their ultimate origin is still lacking. Traditionally, three scenarios for the origin of viruses have been considered: descent from primordial, precellular genetic elements, reductive evolution from cellular ancestors and escape of genes from cellular hosts, achieving partial replicative autonomy and becoming parasitic genetic elements. These classical scenarios give different timelines for the origin(s) of viruses and do not explain the provenance of the two key functional modules that are responsible, respectively, for viral genome replication and virion morphogenesis. We developed a 'chimeric' scenario under which different types of primordial, selfish replicons gave rise to viruses by recruiting host proteins for virion formation. We also propose that new groups of viruses have repeatedly emerged at all stages of the evolution of life, often through the displacement of ancestral structural and genome replication genes. Taken together, these studies advance the existing understanding of the general principles and specific aspects of genome evolution in diverse life forms, in particular, viruses and mobile elements, as well as cancer genome evolution. More... »

URL

http://projectreporter.nih.gov/project_info_description.cfm?aid=10007522

Related SciGraph Publications

  • 2019-07-31. Multiple origins of prokaryotic and eukaryotic single-stranded DNA viruses from bacterial and archaeal plasmids in NATURE COMMUNICATIONS
  • 2019-06-05. CRISPR–Cas in mobile genetic elements: counter-defence and beyond in NATURE REVIEWS MICROBIOLOGY
  • 2019-05-29. Origin of viruses: primordial replicators recruiting capsids from hosts in NATURE REVIEWS MICROBIOLOGY
  • 2018-06-20. A distinct abundant group of microbial rhodopsins discovered using functional metagenomics in NATURE
  • 2018-04-11. Taxonomy of the order Mononegavirales: update 2018 in ARCHIVES OF VIROLOGY
  • 2018-04-10. Vast diversity of prokaryotic virus genomes encoding double jelly-roll major capsid proteins uncovered by genomic and metagenomic sequence analysis in VIROLOGY JOURNAL
  • 2018-03-05. Anti-CRISPR proteins encoded by archaeal lytic viruses inhibit subtype I-D immunity in NATURE MICROBIOLOGY
  • 2017-11-13. Discovery of an expansive bacteriophage family that includes the most abundant viruses from the human gut in NATURE MICROBIOLOGY
  • 2017-05-30. Discovery of extremely halophilic, methyl-reducing euryarchaea provides insights into the evolutionary origin of methanogenesis in NATURE MICROBIOLOGY
  • 2017-02-10. Evolution of RNA- and DNA-guided antivirus defense systems in prokaryotes and eukaryotes: common ancestry vs convergence in BIOLOGY DIRECT
  • 2017-01-23. Diversity and evolution of class 2 CRISPR–Cas systems in NATURE REVIEWS MICROBIOLOGY
  • 2016-07-05. Germline viral “fossils” guide in silico reconstruction of a mid-Cenozoic era marsupial adeno-associated virus in SCIENTIFIC REPORTS
  • 2016-02-24. Just how Lamarckian is CRISPR-Cas immunity: the continuum of evolvability mechanisms in BIOLOGY DIRECT
  • 2015-11-11. A novel group of diverse Polinton-like viruses discovered by metagenome analysis in BMC BIOLOGY
  • 2015-10-05. Archaeal ancestors of eukaryotes: not so elusive any more in BMC BIOLOGY
  • 2015-09-28. An updated evolutionary classification of CRISPR–Cas systems in NATURE REVIEWS MICROBIOLOGY
  • 2015-09-16. Why the Central Dogma: on the nature of the great biological exclusion principle in BIOLOGY DIRECT
  • 2015-07-30. Decrease of mRNA Editing after Spinal Cord Injury is Caused by Down-regulation of ADAR2 that is Triggered by Inflammatory Response in SCIENTIFIC REPORTS
  • 2015-05-10. Ancient systems of sodium/potassium homeostasis as predecessors of membrane bioenergetics in BIOCHEMISTRY (MOSCOW)
  • 2015-04-28. Evolution of the RAG1-RAG2 locus: both proteins came from the same transposon in BIOLOGY DIRECT
  • 2015-04-25. A new family of hybrid virophages from an animal gut metagenome in BIOLOGY DIRECT
  • 2015-04-18. The Turbulent Network Dynamics of Microbial Evolution and the Statistical Tree of Life in JOURNAL OF MOLECULAR EVOLUTION
  • 2015-04-16. Gene-specific selective sweeps in bacteria and archaea caused by negative frequency-dependent selection in BMC BIOLOGY
  • 2015-03-31. Babela massiliensis, a representative of a widespread bacterial phylum with unusual adaptations to parasitism in amoebae in BIOLOGY DIRECT
  • 2015-03-29. Plant viruses of the Amalgaviridae family evolved via recombination between viruses with double-stranded and negative-strand RNA genomes in BIOLOGY DIRECT
  • 2015-03-13. Immunity, suicide or both? Ecological determinants for the combined evolution of anti-pathogen defense systems in BMC EVOLUTIONARY BIOLOGY
  • 2015-02-24. No evidence of inhibition of horizontal gene transfer by CRISPR–Cas on evolutionary timescales in THE ISME JOURNAL: MULTIDISCIPLINARY JOURNAL OF MICROBIAL ECOLOGY
  • 2014-12-22. Polintons: a hotbed of eukaryotic virus, transposon and plasmid evolution in NATURE REVIEWS MICROBIOLOGY
  • 2014-12-09. Evolution of adaptive immunity from transposable elements combined with innate immune systems in NATURE REVIEWS GENETICS
  • 2014-09-05. The evolutionary journey of Argonaute proteins in NATURE STRUCTURAL & MOLECULAR BIOLOGY
  • 2014-08-21. Genomes in turmoil: quantification of genome dynamics in prokaryote supergenomes in BMC BIOLOGY
  • 2014-08-12. Dark matter in archaeal genomes: a rich source of novel mobile elements, defense systems and secretory complexes in EXTREMOPHILES
  • 2014-07-02. Pseudo-chaotic oscillations in CRISPR-virus coevolution predicted by bifurcation analysis in BIOLOGY DIRECT
  • 2014-06-18. Evolution of eukaryotic single-stranded DNA viruses of the Bidnaviridae family from genes of four other groups of widely different viruses in SCIENTIFIC REPORTS
  • 2014-05-19. Casposons: a new superfamily of self-synthesizing DNA transposons at the origin of prokaryotic CRISPR-Cas immunity in BMC BIOLOGY
  • 2014-04-29. Conservation of major and minor jelly-roll capsid proteins in Polinton (Maverick) transposons suggests that they are bona fide viruses in BIOLOGY DIRECT
  • 2014-03-13. Classification and quantification of bacteriophage taxa in human gut metagenomes in THE ISME JOURNAL: MULTIDISCIPLINARY JOURNAL OF MICROBIAL ECOLOGY
  • 2013-08-11. Parabolic replicator dynamics and the principle of minimum Tsallis information gain in BIOLOGY DIRECT
  • 2013-06-29. “Megavirales”, a proposed new order for eukaryotic nucleocytoplasmic large DNA viruses in ARCHIVES OF VIROLOGY
  • 2013-05-23. Virophages, polintons, and transpovirons: a complex evolutionary network of diverse selfish genetic elements with different reproduction strategies in VIROLOGY JOURNAL
  • 2013-04-22. Insights into archaeal evolution and symbiosis from the genomes of a nanoarchaeon and its inferred crenarchaeal host from Obsidian Pool, Yellowstone National Park in BIOLOGY DIRECT
  • 2013-04-15. Seeing the Tree of Life behind the phylogenetic forest in BMC BIOLOGY
  • 2013-04-04. Mimiviridae: clusters of orthologous genes, reconstruction of gene repertoire evolution and proposed expansion of the giant virus family in VIROLOGY JOURNAL
  • 2013-04-04. Functional and evolutionary implications of gene orthology in NATURE REVIEWS GENETICS
  • 2012-12-14. Updated clusters of orthologous genes for Archaea: a complex ancestor of the Archaea and the byways of horizontal gene transfer in BIOLOGY DIRECT
  • 2012-10-26. The Role of Energy in the Emergence of Biology from Chemistry in ORIGINS OF LIFE AND EVOLUTION OF BIOSPHERES
  • 2012-10. Open Questions on the Origin of Life at Anoxic Geothermal Fields in ORIGINS OF LIFE AND EVOLUTION OF BIOSPHERES
  • 2012-08-14. Hidden evolutionary complexity of Nucleo-Cytoplasmic Large DNA viruses of eukaryotes in VIROLOGY JOURNAL
  • 2012-06-20. Expanding networks of RNA virus evolution in BMC BIOLOGY
  • 2012-04-16. Origin and evolution of spliceosomal introns in BIOLOGY DIRECT
  • JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "type": "DefinedTerm"
          }
        ], 
        "amount": {
          "currency": "USD", 
          "type": "MonetaryAmount", 
          "value": 26366012.0
        }, 
        "description": "The rapidly growing database of completely and nearly completely sequenced genomes of bacteria, archaea, eukaryotes and viruses (several thousand genomes already available and many more in progress) creates both extensive new opportunities and major new challenges for genome research. During the year in review, we performed a variety of studies that took advantage of the genomic information to establish fundamental principles of genome evolution. To a large extent, we have focused on cancer genome evolution. Cancer arises through the accumulation of somatic mutations over time. Understanding the sequence of mutation occurrence during cancer progression can assist early and accurate diagnosis and improve clinical decision-making. Here we employ long short-term memory (LSTM) networks, a class of recurrent neural network, to learn the evolution of a tumor through an ordered sequence of mutations. We demonstrate the capacity of LSTMs to learn complex dynamics of the mutational time series governing tumor progression, allowing accurate prediction of the mutational burden and the occurrence of mutations in the sequence. Using the probabilities learned by the LSTM, we simulate mutational data and show that the simulation results are statistically indistinguishable from the empirical data. We identify passenger mutations that are significantly associated with established cancer drivers in the sequence and demonstrate that the genes carrying these mutations are substantially enriched in interactions with the corresponding driver genes. Breaking the network into modules consisting of driver genes and their interactors, we show that these interactions are associated with poor patient prognosis, thus likely conferring growth advantage for tumor progression. Thus, application of LSTM provides for prediction of numerous additional conditional drivers and reveals hitherto unknown aspects of cancer evolution. In another cancer genomics project, we explored proteomic and genomic signatures of repeat instability in cancer and adjacent normal tissues. Repetitive sequences are hotspots of evolution at multiple levels. However, due to difficulties involved in their assembly and analysis, the role of repeats in tumor evolution is poorly understood. We developed a rigorous motif-based methodology to quantify variations in the repeat content, beyond microsatellites, in proteomes and genomes directly from proteomic and genomic raw data. This method was applied to a wide range of tumors and normal tissues. We identify high similarity between repeat instability patterns in tumors and their patient-matched adjacent normal tissues. Nonetheless, tumor-specific signatures both in protein expression and in the genome strongly correlate with cancer progression and robustly predict the tumorigenic state. In a patient, the hierarchy of genomic repeat instability signatures accurately reconstructs tumor evolution, with primary tumors differentiated from metastases. We observe an inverse relationship between repeat instability and point mutation load within and across patients independent of other somatic aberrations. Thus, repeat instability is a distinct, transient, and compensatory adaptive mechanism in tumor evolution and a potential signal for early detection. Additionally, we have continued intensive research into evolutionary genomics of viruses and antivirus defense systems. In particular, we carried out a detailed investigation of CRISPR-Cas systems encoded in mobile genetic elements and involved in counter-defence and other functions. The principal function of CRISPR-Cas systems in archaea and bacteria is defence against mobile genetic elements (MGEs), including viruses, plasmids and transposons. However, the relationships between CRISPR-Cas and MGEs are far more complex. Several classes of MGE contributed to the origin and evolution of CRISPR-Cas, and, conversely, CRISPR-Cas systems and their components were recruited by various MGEs for functions that remain largely uncharacterized. We investigated and substantially expanded the range of CRISPR-Cas components carried by MGEs. Three groups of Tn7-like transposable elements encode 'minimal' type I CRISPR-Cas derivatives capable of target recognition but not cleavage, and another group encodes an inactivated type V variant. These partially inactivated CRISPR-Cas variants might mediate guide RNA-dependent integration of the respective transposons. Numerous plasmids and some prophages encode type IV systems, with similar predicted properties, that appear to contribute to competition among plasmids and between plasmids and viruses. Many prokaryotic viruses also carry CRISPR mini-arrays, some of which recognize other viruses and are implicated in inter-virus conflicts, and solitary repeat units, which could inhibit host CRISPR-Cas systems. We also have developed a general theory of the origin of viruses from primordial replicators that various cellular proteins as capsid formation. Viruses are ubiquitous parasites of cellular life and the most abundant biological entities on Earth. It is widely accepted that viruses are polyphyletic, but a consensus scenario for their ultimate origin is still lacking. Traditionally, three scenarios for the origin of viruses have been considered: descent from primordial, precellular genetic elements, reductive evolution from cellular ancestors and escape of genes from cellular hosts, achieving partial replicative autonomy and becoming parasitic genetic elements. These classical scenarios give different timelines for the origin(s) of viruses and do not explain the provenance of the two key functional modules that are responsible, respectively, for viral genome replication and virion morphogenesis. We developed a 'chimeric' scenario under which different types of primordial, selfish replicons gave rise to viruses by recruiting host proteins for virion formation. We also propose that new groups of viruses have repeatedly emerged at all stages of the evolution of life, often through the displacement of ancestral structural and genome replication genes. Taken together, these studies advance the existing understanding of the general principles and specific aspects of genome evolution in diverse life forms, in particular, viruses and mobile elements, as well as cancer genome evolution.", 
        "endDate": "2019-01-01", 
        "funder": {
          "id": "http://www.grid.ac/institutes/grid.280285.5", 
          "type": "Organization"
        }, 
        "id": "sg:grant.2726032", 
        "identifier": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "grant.2726032"
            ]
          }, 
          {
            "name": "nih_id", 
            "type": "PropertyValue", 
            "value": [
              "ZIALM000073"
            ]
          }
        ], 
        "inLanguage": [
          "en"
        ], 
        "keywords": [
          "CRISPR-Cas systems", 
          "mobile genetic elements", 
          "cancer genome evolution", 
          "genome evolution", 
          "origin of viruses", 
          "genetic elements", 
          "tumor evolution", 
          "CRISPR-Cas", 
          "repeat instability", 
          "driver genes", 
          "escape of genes", 
          "parasitic genetic elements", 
          "point mutation load", 
          "abundant biological entities", 
          "genomes of bacteria", 
          "host CRISPR-Cas systems", 
          "CRISPR-Cas components", 
          "cancer progression", 
          "diverse life forms", 
          "CRISPR-Cas variants", 
          "cancer genomics projects", 
          "tumor-specific signatures", 
          "type IV systems", 
          "viral genome replication", 
          "adjacent normal tissues", 
          "role of repeats", 
          "primordial replicators", 
          "evolution of life", 
          "evolutionary genomics", 
          "reductive evolution", 
          "repeat content", 
          "cellular life", 
          "cellular ancestor", 
          "transposable elements", 
          "tumorigenic state", 
          "genomics projects", 
          "cellular proteins", 
          "tumor progression", 
          "replication genes", 
          "genomic information", 
          "prokaryotic viruses", 
          "cellular hosts", 
          "repetitive sequences", 
          "occurrence of mutations", 
          "genome replication", 
          "genome research", 
          "genomic signatures", 
          "cancer drivers", 
          "life forms", 
          "virion morphogenesis", 
          "host proteins", 
          "genome", 
          "passenger mutations", 
          "virion formation", 
          "mutation load", 
          "cancer evolution", 
          "high similarity", 
          "mutational data", 
          "sequence of mutations", 
          "genes", 
          "capsid formation", 
          "numerous plasmids", 
          "growth advantage", 
          "mobile elements", 
          "normal tissues", 
          "ubiquitous parasite", 
          "poor patient prognosis", 
          "key functional modules", 
          "archaea", 
          "mutations", 
          "somatic mutations", 
          "plasmid", 
          "consensus scenario", 
          "mutation occurrence", 
          "biological entities", 
          "protein expression", 
          "defense system", 
          "repeat units", 
          "transposon", 
          "functional modules", 
          "hitherto unknown aspects", 
          "adaptive mechanism", 
          "somatic aberrations", 
          "sequence", 
          "protein", 
          "principal function", 
          "bacteria", 
          "compensatory-adaptive mechanisms", 
          "unknown aspects", 
          "eukaryotes", 
          "interactors", 
          "proteome", 
          "evolution", 
          "genomics", 
          "morphogenesis", 
          "virus", 
          "mutational burden", 
          "microsatellites", 
          "ancestor", 
          "prophage", 
          "repeats", 
          "tissue", 
          "new group", 
          "multiple levels", 
          "variants", 
          "variety of studies", 
          "comparative analysis", 
          "progression", 
          "replicon", 
          "origin", 
          "intensive research", 
          "ultimate origin", 
          "parasites", 
          "signatures", 
          "replicators", 
          "host", 
          "replication", 
          "expression", 
          "function", 
          "assembly", 
          "defense", 
          "interaction", 
          "cleavage", 
          "classical scenario", 
          "hotspots", 
          "target recognition", 
          "accumulation", 
          "potential signals", 
          "new opportunities", 
          "formation", 
          "similarity", 
          "wide range", 
          "drivers", 
          "general principles", 
          "aberrations", 
          "cancer", 
          "conditional drivers", 
          "elements", 
          "mechanism", 
          "role", 
          "large extent", 
          "components", 
          "competition", 
          "occurrence", 
          "major new challenges", 
          "complex dynamics", 
          "detailed investigation", 
          "patient prognosis", 
          "escape", 
          "analysis", 
          "variation", 
          "patterns", 
          "inverse relationship", 
          "understanding", 
          "tumors", 
          "variety", 
          "stage", 
          "dynamics", 
          "specific aspects", 
          "signals", 
          "primary tumor", 
          "empirical data", 
          "class", 
          "metastasis", 
          "relationship", 
          "fundamental principles", 
          "data", 
          "levels", 
          "study", 
          "instability", 
          "provenance", 
          "module", 
          "aspects", 
          "system", 
          "range", 
          "content", 
          "different types", 
          "form", 
          "types", 
          "network", 
          "extent", 
          "recognition", 
          "prediction", 
          "review", 
          "group", 
          "descent", 
          "capacity", 
          "database", 
          "derivatives", 
          "rise", 
          "accurate prediction", 
          "Earth", 
          "results", 
          "raw data", 
          "information", 
          "opportunities", 
          "research", 
          "scenarios", 
          "advantages", 
          "investigation", 
          "integration", 
          "hierarchy", 
          "detection", 
          "units", 
          "life", 
          "state", 
          "challenges", 
          "time", 
          "years", 
          "timeline", 
          "properties", 
          "series", 
          "principles", 
          "early detection", 
          "new challenges", 
          "different timelines", 
          "prognosis", 
          "applications", 
          "probability", 
          "instability patterns", 
          "entities", 
          "method", 
          "time series", 
          "conflict", 
          "methodology", 
          "project", 
          "general theory", 
          "difficulties", 
          "accurate diagnosis", 
          "burden", 
          "displacement", 
          "patients", 
          "load", 
          "diagnosis", 
          "theory", 
          "neural network", 
          "recurrent neural network", 
          "autonomy", 
          "short-term memory network", 
          "memory network", 
          "long short-term memory network", 
          "simulation results", 
          "LSTM", 
          "Application of LSTM"
        ], 
        "name": "Comparative Analysis Of Completely Sequenced Genomes", 
        "recipient": [
          {
            "id": "http://www.grid.ac/institutes/grid.280285.5", 
            "type": "Organization"
          }, 
          {
            "affiliation": {
              "id": "http://www.grid.ac/institutes/None", 
              "name": "NATIONAL LIBRARY OF MEDICINE", 
              "type": "Organization"
            }, 
            "familyName": "KOONIN", 
            "givenName": "EUGENE V", 
            "id": "sg:person.01017015051.78", 
            "type": "Person"
          }, 
          {
            "member": "sg:person.01017015051.78", 
            "roleName": "PI", 
            "type": "Role"
          }
        ], 
        "sameAs": [
          "https://app.dimensions.ai/details/grant/grant.2726032"
        ], 
        "sdDataset": "grants", 
        "sdDatePublished": "2022-01-01T19:30", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-springernature-scigraph/baseset/20220101/entities/gbq_results/grant/grant_33.jsonl", 
        "startDate": "2009-01-01", 
        "type": "MonetaryGrant", 
        "url": "http://projectreporter.nih.gov/project_info_description.cfm?aid=10007522"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/grant.2726032'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/grant.2726032'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/grant.2726032'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/grant.2726032'


     

    This table displays all metadata directly associated to this object as RDF triples.

    288 TRIPLES      19 PREDICATES      265 URIs      258 LITERALS      5 BLANK NODES

    Subject Predicate Object
    1 sg:grant.2726032 schema:about anzsrc-for:06
    2 schema:amount N83ab8589021d4808b94e72a68df46a88
    3 schema:description The rapidly growing database of completely and nearly completely sequenced genomes of bacteria, archaea, eukaryotes and viruses (several thousand genomes already available and many more in progress) creates both extensive new opportunities and major new challenges for genome research. During the year in review, we performed a variety of studies that took advantage of the genomic information to establish fundamental principles of genome evolution. To a large extent, we have focused on cancer genome evolution. Cancer arises through the accumulation of somatic mutations over time. Understanding the sequence of mutation occurrence during cancer progression can assist early and accurate diagnosis and improve clinical decision-making. Here we employ long short-term memory (LSTM) networks, a class of recurrent neural network, to learn the evolution of a tumor through an ordered sequence of mutations. We demonstrate the capacity of LSTMs to learn complex dynamics of the mutational time series governing tumor progression, allowing accurate prediction of the mutational burden and the occurrence of mutations in the sequence. Using the probabilities learned by the LSTM, we simulate mutational data and show that the simulation results are statistically indistinguishable from the empirical data. We identify passenger mutations that are significantly associated with established cancer drivers in the sequence and demonstrate that the genes carrying these mutations are substantially enriched in interactions with the corresponding driver genes. Breaking the network into modules consisting of driver genes and their interactors, we show that these interactions are associated with poor patient prognosis, thus likely conferring growth advantage for tumor progression. Thus, application of LSTM provides for prediction of numerous additional conditional drivers and reveals hitherto unknown aspects of cancer evolution. In another cancer genomics project, we explored proteomic and genomic signatures of repeat instability in cancer and adjacent normal tissues. Repetitive sequences are hotspots of evolution at multiple levels. However, due to difficulties involved in their assembly and analysis, the role of repeats in tumor evolution is poorly understood. We developed a rigorous motif-based methodology to quantify variations in the repeat content, beyond microsatellites, in proteomes and genomes directly from proteomic and genomic raw data. This method was applied to a wide range of tumors and normal tissues. We identify high similarity between repeat instability patterns in tumors and their patient-matched adjacent normal tissues. Nonetheless, tumor-specific signatures both in protein expression and in the genome strongly correlate with cancer progression and robustly predict the tumorigenic state. In a patient, the hierarchy of genomic repeat instability signatures accurately reconstructs tumor evolution, with primary tumors differentiated from metastases. We observe an inverse relationship between repeat instability and point mutation load within and across patients independent of other somatic aberrations. Thus, repeat instability is a distinct, transient, and compensatory adaptive mechanism in tumor evolution and a potential signal for early detection. Additionally, we have continued intensive research into evolutionary genomics of viruses and antivirus defense systems. In particular, we carried out a detailed investigation of CRISPR-Cas systems encoded in mobile genetic elements and involved in counter-defence and other functions. The principal function of CRISPR-Cas systems in archaea and bacteria is defence against mobile genetic elements (MGEs), including viruses, plasmids and transposons. However, the relationships between CRISPR-Cas and MGEs are far more complex. Several classes of MGE contributed to the origin and evolution of CRISPR-Cas, and, conversely, CRISPR-Cas systems and their components were recruited by various MGEs for functions that remain largely uncharacterized. We investigated and substantially expanded the range of CRISPR-Cas components carried by MGEs. Three groups of Tn7-like transposable elements encode 'minimal' type I CRISPR-Cas derivatives capable of target recognition but not cleavage, and another group encodes an inactivated type V variant. These partially inactivated CRISPR-Cas variants might mediate guide RNA-dependent integration of the respective transposons. Numerous plasmids and some prophages encode type IV systems, with similar predicted properties, that appear to contribute to competition among plasmids and between plasmids and viruses. Many prokaryotic viruses also carry CRISPR mini-arrays, some of which recognize other viruses and are implicated in inter-virus conflicts, and solitary repeat units, which could inhibit host CRISPR-Cas systems. We also have developed a general theory of the origin of viruses from primordial replicators that various cellular proteins as capsid formation. Viruses are ubiquitous parasites of cellular life and the most abundant biological entities on Earth. It is widely accepted that viruses are polyphyletic, but a consensus scenario for their ultimate origin is still lacking. Traditionally, three scenarios for the origin of viruses have been considered: descent from primordial, precellular genetic elements, reductive evolution from cellular ancestors and escape of genes from cellular hosts, achieving partial replicative autonomy and becoming parasitic genetic elements. These classical scenarios give different timelines for the origin(s) of viruses and do not explain the provenance of the two key functional modules that are responsible, respectively, for viral genome replication and virion morphogenesis. We developed a 'chimeric' scenario under which different types of primordial, selfish replicons gave rise to viruses by recruiting host proteins for virion formation. We also propose that new groups of viruses have repeatedly emerged at all stages of the evolution of life, often through the displacement of ancestral structural and genome replication genes. Taken together, these studies advance the existing understanding of the general principles and specific aspects of genome evolution in diverse life forms, in particular, viruses and mobile elements, as well as cancer genome evolution.
    4 schema:endDate 2019-01-01
    5 schema:funder grid-institutes:grid.280285.5
    6 schema:identifier N0789f6459dda4de6975b865c8fc0ba4f
    7 Nfc7df87e0da3423d9da6c7071b297b0a
    8 schema:inLanguage en
    9 schema:keywords Application of LSTM
    10 CRISPR-Cas
    11 CRISPR-Cas components
    12 CRISPR-Cas systems
    13 CRISPR-Cas variants
    14 Earth
    15 LSTM
    16 aberrations
    17 abundant biological entities
    18 accumulation
    19 accurate diagnosis
    20 accurate prediction
    21 adaptive mechanism
    22 adjacent normal tissues
    23 advantages
    24 analysis
    25 ancestor
    26 applications
    27 archaea
    28 aspects
    29 assembly
    30 autonomy
    31 bacteria
    32 biological entities
    33 burden
    34 cancer
    35 cancer drivers
    36 cancer evolution
    37 cancer genome evolution
    38 cancer genomics projects
    39 cancer progression
    40 capacity
    41 capsid formation
    42 cellular ancestor
    43 cellular hosts
    44 cellular life
    45 cellular proteins
    46 challenges
    47 class
    48 classical scenario
    49 cleavage
    50 comparative analysis
    51 compensatory-adaptive mechanisms
    52 competition
    53 complex dynamics
    54 components
    55 conditional drivers
    56 conflict
    57 consensus scenario
    58 content
    59 data
    60 database
    61 defense
    62 defense system
    63 derivatives
    64 descent
    65 detailed investigation
    66 detection
    67 diagnosis
    68 different timelines
    69 different types
    70 difficulties
    71 displacement
    72 diverse life forms
    73 driver genes
    74 drivers
    75 dynamics
    76 early detection
    77 elements
    78 empirical data
    79 entities
    80 escape
    81 escape of genes
    82 eukaryotes
    83 evolution
    84 evolution of life
    85 evolutionary genomics
    86 expression
    87 extent
    88 form
    89 formation
    90 function
    91 functional modules
    92 fundamental principles
    93 general principles
    94 general theory
    95 genes
    96 genetic elements
    97 genome
    98 genome evolution
    99 genome replication
    100 genome research
    101 genomes of bacteria
    102 genomic information
    103 genomic signatures
    104 genomics
    105 genomics projects
    106 group
    107 growth advantage
    108 hierarchy
    109 high similarity
    110 hitherto unknown aspects
    111 host
    112 host CRISPR-Cas systems
    113 host proteins
    114 hotspots
    115 information
    116 instability
    117 instability patterns
    118 integration
    119 intensive research
    120 interaction
    121 interactors
    122 inverse relationship
    123 investigation
    124 key functional modules
    125 large extent
    126 levels
    127 life
    128 life forms
    129 load
    130 long short-term memory network
    131 major new challenges
    132 mechanism
    133 memory network
    134 metastasis
    135 method
    136 methodology
    137 microsatellites
    138 mobile elements
    139 mobile genetic elements
    140 module
    141 morphogenesis
    142 multiple levels
    143 mutation load
    144 mutation occurrence
    145 mutational burden
    146 mutational data
    147 mutations
    148 network
    149 neural network
    150 new challenges
    151 new group
    152 new opportunities
    153 normal tissues
    154 numerous plasmids
    155 occurrence
    156 occurrence of mutations
    157 opportunities
    158 origin
    159 origin of viruses
    160 parasites
    161 parasitic genetic elements
    162 passenger mutations
    163 patient prognosis
    164 patients
    165 patterns
    166 plasmid
    167 point mutation load
    168 poor patient prognosis
    169 potential signals
    170 prediction
    171 primary tumor
    172 primordial replicators
    173 principal function
    174 principles
    175 probability
    176 prognosis
    177 progression
    178 project
    179 prokaryotic viruses
    180 properties
    181 prophage
    182 protein
    183 protein expression
    184 proteome
    185 provenance
    186 range
    187 raw data
    188 recognition
    189 recurrent neural network
    190 reductive evolution
    191 relationship
    192 repeat content
    193 repeat instability
    194 repeat units
    195 repeats
    196 repetitive sequences
    197 replication
    198 replication genes
    199 replicators
    200 replicon
    201 research
    202 results
    203 review
    204 rise
    205 role
    206 role of repeats
    207 scenarios
    208 sequence
    209 sequence of mutations
    210 series
    211 short-term memory network
    212 signals
    213 signatures
    214 similarity
    215 simulation results
    216 somatic aberrations
    217 somatic mutations
    218 specific aspects
    219 stage
    220 state
    221 study
    222 system
    223 target recognition
    224 theory
    225 time
    226 time series
    227 timeline
    228 tissue
    229 transposable elements
    230 transposon
    231 tumor evolution
    232 tumor progression
    233 tumor-specific signatures
    234 tumorigenic state
    235 tumors
    236 type IV systems
    237 types
    238 ubiquitous parasite
    239 ultimate origin
    240 understanding
    241 units
    242 unknown aspects
    243 variants
    244 variation
    245 variety
    246 variety of studies
    247 viral genome replication
    248 virion formation
    249 virion morphogenesis
    250 virus
    251 wide range
    252 years
    253 schema:name Comparative Analysis Of Completely Sequenced Genomes
    254 schema:recipient N56f576a1706a45e69205620ccdd3fa0c
    255 sg:person.01017015051.78
    256 grid-institutes:grid.280285.5
    257 schema:sameAs https://app.dimensions.ai/details/grant/grant.2726032
    258 schema:sdDatePublished 2022-01-01T19:30
    259 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    260 schema:sdPublisher Nb7656dd2d6b5412294a6b5523941577d
    261 schema:startDate 2009-01-01
    262 schema:url http://projectreporter.nih.gov/project_info_description.cfm?aid=10007522
    263 sgo:license sg:explorer/license/
    264 sgo:sdDataset grants
    265 rdf:type schema:MonetaryGrant
    266 N0789f6459dda4de6975b865c8fc0ba4f schema:name nih_id
    267 schema:value ZIALM000073
    268 rdf:type schema:PropertyValue
    269 N56f576a1706a45e69205620ccdd3fa0c schema:member sg:person.01017015051.78
    270 schema:roleName PI
    271 rdf:type schema:Role
    272 N83ab8589021d4808b94e72a68df46a88 schema:currency USD
    273 schema:value 26366012.0
    274 rdf:type schema:MonetaryAmount
    275 Nb7656dd2d6b5412294a6b5523941577d schema:name Springer Nature - SN SciGraph project
    276 rdf:type schema:Organization
    277 Nfc7df87e0da3423d9da6c7071b297b0a schema:name dimensions_id
    278 schema:value grant.2726032
    279 rdf:type schema:PropertyValue
    280 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
    281 rdf:type schema:DefinedTerm
    282 sg:person.01017015051.78 schema:affiliation grid-institutes:None
    283 schema:familyName KOONIN
    284 schema:givenName EUGENE V
    285 rdf:type schema:Person
    286 grid-institutes:None schema:name NATIONAL LIBRARY OF MEDICINE
    287 rdf:type schema:Organization
    288 grid-institutes:grid.280285.5 schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...