A complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using peak-motifs View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2012-07-26

AUTHORS

Morgane Thomas-Chollier, Elodie Darbo, Carl Herrmann, Matthieu Defrance, Denis Thieffry, Jacques van Helden

ABSTRACT

This protocol explains how to use the online integrated pipeline 'peak-motifs' (http://rsat.ulb.ac.be/rsat/) to predict motifs and binding sites in full-size peak sets obtained by chromatin immunoprecipitation–sequencing (ChIP-seq) or related technologies. The workflow combines four time- and memory-efficient motif discovery algorithms to extract significant motifs from the sequences. Discovered motifs are compared with databases of known motifs to identify potentially bound transcription factors. Sequences are scanned to predict transcription factor binding sites and analyze their enrichment and positional distribution relative to peak centers. Peaks and binding sites are exported as BED tracks that can be uploaded into the University of California Santa Cruz (UCSC) genome browser for visualization in the genomic context. This protocol is illustrated with the analysis of a set of 6,000 peaks (8 Mb in total) bound by the Drosophila transcription factor Krüppel. The complete workflow is achieved in about 25 min of computational time on the Regulatory Sequence Analysis Tools (RSAT) Web server. This protocol can be followed in about 1 h. More... »

PAGES

1551-1568

References to SciGraph publications

  • 2007-06-11. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing in NATURE METHODS
  • 2008-08-17. Genome-Wide Analysis of Transcription Factor Binding Sites Based on ChIP-Seq Data in NATURE METHODS
  • 2008-09-17. Model-based Analysis of ChIP-Seq (MACS) in GENOME BIOLOGY
  • 2010-08-22. ChIP-Seq identification of weakly conserved heart enhancers in NATURE GENETICS
  • 2008. Evaluating the prediction of cis-acting regulatory elements in genome sequences in MODERN GENOME ANNOTATION
  • 2007-09-27. Integration of biological networks and gene expression data using Cytoscape in NATURE PROTOCOLS
  • 2008-09-18. Using RSAT to scan genome sequences for transcription factor binding sites and cis-regulatory modules in NATURE PROTOCOLS
  • 2009-10-15. Next-generation gap in NATURE METHODS
  • 2009-10-15. Computation for ChIP-seq and RNA-seq studies in NATURE METHODS
  • 2010-08-06. PeakAnalyzer: Genome-wide annotation of chromatin binding and modification loci in BMC BIOINFORMATICS
  • 2009-03-04. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome in GENOME BIOLOGY
  • 2008-09-18. Analyzing multiple data sets by interconnecting RSAT programs via SOAP Web services—an example with ChIP-chip data in NATURE PROTOCOLS
  • 2008-09-18. Using RSAT oligo-analysis and dyad-analysis tools to discover regulatory signals in nucleic sequences in NATURE PROTOCOLS
  • 2010-08-25. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences in GENOME BIOLOGY
  • 2009-10-15. Focus on next-generation sequencing data analysis in NATURE METHODS
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1038/nprot.2012.088

    DOI

    http://dx.doi.org/10.1038/nprot.2012.088

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1009423541

    PUBMED

    https://www.ncbi.nlm.nih.gov/pubmed/22836136


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Biological Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Genetics", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Algorithms", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Animals", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Binding Sites", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Chromatin Immunoprecipitation", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Drosophila Proteins", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Drosophila melanogaster", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Embryo, Nonmammalian", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Genomics", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Kruppel-Like Transcription Factors", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Mice", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Nucleotide Motifs", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Sequence Analysis, DNA", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Software", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Time Factors", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Transcription Factors", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Workflow", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany", 
              "id": "http://www.grid.ac/institutes/grid.419538.2", 
              "name": [
                "Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Thomas-Chollier", 
            "givenName": "Morgane", 
            "id": "sg:person.01020671104.76", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01020671104.76"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Technological Advances for Genomics and Clinics, Institut National de la Sant\u00e9 et de la Recherche M\u00e9dicale (INSERM) U928 and Universit\u00e9 de la M\u00e9diterran\u00e9e, Marseille, France", 
              "id": "http://www.grid.ac/institutes/grid.493853.0", 
              "name": [
                "Technological Advances for Genomics and Clinics, Institut National de la Sant\u00e9 et de la Recherche M\u00e9dicale (INSERM) U928 and Universit\u00e9 de la M\u00e9diterran\u00e9e, Marseille, France"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Darbo", 
            "givenName": "Elodie", 
            "id": "sg:person.01206135236.18", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01206135236.18"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Technological Advances for Genomics and Clinics, Institut National de la Sant\u00e9 et de la Recherche M\u00e9dicale (INSERM) U928 and Universit\u00e9 de la M\u00e9diterran\u00e9e, Marseille, France", 
              "id": "http://www.grid.ac/institutes/grid.493853.0", 
              "name": [
                "Technological Advances for Genomics and Clinics, Institut National de la Sant\u00e9 et de la Recherche M\u00e9dicale (INSERM) U928 and Universit\u00e9 de la M\u00e9diterran\u00e9e, Marseille, France"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Herrmann", 
            "givenName": "Carl", 
            "id": "sg:person.01123310611.66", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01123310611.66"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Centro de Ciencias Genomicas, Universidad Nacional Aut\u00f3noma de M\u00e9xico (UNAM), Cuernavaca, Mexico", 
              "id": "http://www.grid.ac/institutes/grid.9486.3", 
              "name": [
                "Centro de Ciencias Genomicas, Universidad Nacional Aut\u00f3noma de M\u00e9xico (UNAM), Cuernavaca, Mexico"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Defrance", 
            "givenName": "Matthieu", 
            "id": "sg:person.01074617571.21", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01074617571.21"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Institut de Biologie de l'Ecole Normale Sup\u00e9rieure\u2014Centre National de la Recherche Scientifique Unit\u00e9 Mixte de Recherche (CNRS UMR) 8197 and INSERM U1024, Paris, France", 
              "id": "http://www.grid.ac/institutes/None", 
              "name": [
                "Technological Advances for Genomics and Clinics, Institut National de la Sant\u00e9 et de la Recherche M\u00e9dicale (INSERM) U928 and Universit\u00e9 de la M\u00e9diterran\u00e9e, Marseille, France", 
                "Institut de Biologie de l'Ecole Normale Sup\u00e9rieure\u2014Centre National de la Recherche Scientifique Unit\u00e9 Mixte de Recherche (CNRS UMR) 8197 and INSERM U1024, Paris, France"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Thieffry", 
            "givenName": "Denis", 
            "id": "sg:person.0760716207.75", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0760716207.75"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Laboratoire de Bioinformatique des G\u00e9nomes et des R\u00e9seaux (BiGRe), Universit\u00e9 Libre de Bruxelles, Bruxelles, Belgium", 
              "id": "http://www.grid.ac/institutes/grid.4989.c", 
              "name": [
                "Technological Advances for Genomics and Clinics, Institut National de la Sant\u00e9 et de la Recherche M\u00e9dicale (INSERM) U928 and Universit\u00e9 de la M\u00e9diterran\u00e9e, Marseille, France", 
                "Laboratoire de Bioinformatique des G\u00e9nomes et des R\u00e9seaux (BiGRe), Universit\u00e9 Libre de Bruxelles, Bruxelles, Belgium"
              ], 
              "type": "Organization"
            }, 
            "familyName": "van Helden", 
            "givenName": "Jacques", 
            "id": "sg:person.0626672543.46", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0626672543.46"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1038/nprot.2007.324", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1043279308", 
              "https://doi.org/10.1038/nprot.2007.324"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/ng.650", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1009136820", 
              "https://doi.org/10.1038/ng.650"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.1246", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1037362623", 
              "https://doi.org/10.1038/nmeth.1246"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2009-10-3-r25", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1049583368", 
              "https://doi.org/10.1186/gb-2009-10-3-r25"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nprot.2008.99", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1051387433", 
              "https://doi.org/10.1038/nprot.2008.99"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2008-9-9-r137", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1027608848", 
              "https://doi.org/10.1186/gb-2008-9-9-r137"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2105-11-415", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1031367942", 
              "https://doi.org/10.1186/1471-2105-11-415"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nprot.2008.98", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1011667398", 
              "https://doi.org/10.1038/nprot.2008.98"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.f.268", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1019077758", 
              "https://doi.org/10.1038/nmeth.f.268"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/gb-2010-11-8-r86", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1046347776", 
              "https://doi.org/10.1186/gb-2010-11-8-r86"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-211-75123-7_4", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1008345598", 
              "https://doi.org/10.1007/978-3-211-75123-7_4"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth1068", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1036304799", 
              "https://doi.org/10.1038/nmeth1068"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.1371", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1011651858", 
              "https://doi.org/10.1038/nmeth.1371"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.f.271", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1027512151", 
              "https://doi.org/10.1038/nmeth.f.271"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nprot.2008.97", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1006340595", 
              "https://doi.org/10.1038/nprot.2008.97"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2012-07-26", 
        "datePublishedReg": "2012-07-26", 
        "description": "This protocol explains how to use the online integrated pipeline 'peak-motifs' (http://rsat.ulb.ac.be/rsat/) to predict motifs and binding sites in full-size peak sets obtained by chromatin immunoprecipitation\u2013sequencing (ChIP-seq) or related technologies. The workflow combines four time- and memory-efficient motif discovery algorithms to extract significant motifs from the sequences. Discovered motifs are compared with databases of known motifs to identify potentially bound transcription factors. Sequences are scanned to predict transcription factor binding sites and analyze their enrichment and positional distribution relative to peak centers. Peaks and binding sites are exported as BED tracks that can be uploaded into the University of California Santa Cruz (UCSC) genome browser for visualization in the genomic context. This protocol is illustrated with the analysis of a set of 6,000 peaks (8 Mb in total) bound by the Drosophila transcription factor Kr\u00fcppel. The complete workflow is achieved in about 25 min of computational time on the Regulatory Sequence Analysis Tools (RSAT) Web server. This protocol can be followed in about 1 h.", 
        "genre": "article", 
        "id": "sg:pub.10.1038/nprot.2012.088", 
        "inLanguage": "en", 
        "isAccessibleForFree": true, 
        "isFundedItemOf": [
          {
            "id": "sg:grant.3777704", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.6770571", 
            "type": "MonetaryGrant"
          }
        ], 
        "isPartOf": [
          {
            "id": "sg:journal.1037502", 
            "issn": [
              "1754-2189", 
              "1750-2799"
            ], 
            "name": "Nature Protocols", 
            "publisher": "Springer Nature", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "8", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "7"
          }
        ], 
        "keywords": [
          "transcription factors", 
          "California Santa Cruz (UCSC) Genome Browser", 
          "Santa Cruz Genome Browser", 
          "ChIP-seq data sets", 
          "transcription factor Kr\u00fcppel", 
          "genomic context", 
          "genome browser", 
          "integrated pipeline", 
          "motif discovery algorithms", 
          "motif", 
          "significant motifs", 
          "positional distribution", 
          "complete workflow", 
          "sequence", 
          "Kr\u00fcppel", 
          "chromatin", 
          "sites", 
          "web server", 
          "peak sets", 
          "enrichment", 
          "factors", 
          "analysis", 
          "data sets", 
          "workflow", 
          "browser", 
          "set", 
          "pipeline", 
          "distribution", 
          "protocol", 
          "discovery algorithm", 
          "database", 
          "time", 
          "peak", 
          "visualization", 
          "context", 
          "technology", 
          "peak center", 
          "min", 
          "track", 
          "center", 
          "server", 
          "University", 
          "algorithm", 
          "computational time", 
          "online integrated pipeline '", 
          "full-size peak sets", 
          "memory-efficient motif discovery algorithms", 
          "BED tracks", 
          "Cruz (UCSC) genome browser", 
          "Drosophila transcription factor Kr\u00fcppel", 
          "factor Kr\u00fcppel", 
          "Regulatory Sequence Analysis Tools (RSAT) Web server", 
          "Sequence Analysis Tools (RSAT) Web server", 
          "Analysis Tools (RSAT) Web server", 
          "Tools (RSAT) Web server", 
          "full-size ChIP-seq (and similar) data sets"
        ], 
        "name": "A complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using peak-motifs", 
        "pagination": "1551-1568", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1009423541"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1038/nprot.2012.088"
            ]
          }, 
          {
            "name": "pubmed_id", 
            "type": "PropertyValue", 
            "value": [
              "22836136"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1038/nprot.2012.088", 
          "https://app.dimensions.ai/details/publication/pub.1009423541"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2022-01-01T18:26", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-springernature-scigraph/baseset/20220101/entities/gbq_results/article/article_573.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://doi.org/10.1038/nprot.2012.088"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1038/nprot.2012.088'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1038/nprot.2012.088'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1038/nprot.2012.088'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1038/nprot.2012.088'


     

    This table displays all metadata directly associated to this object as RDF triples.

    295 TRIPLES      22 PREDICATES      113 URIs      90 LITERALS      23 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1038/nprot.2012.088 schema:about N17e3859fde884caf999a5256a36c04d4
    2 N2a2df5a24405418fafee9e8c4ca92e63
    3 N2bf930f1dbac4b2685a4dee4bec6a98d
    4 N4497a185790841c6b3833a90331986fe
    5 N735f8d730ff94cc3b4843191beca1d56
    6 N76ea83965d8d4177be40779022aa10f4
    7 N8e88a44b3ef74de7b1086ef8cb94bfa7
    8 Na74d30ed69e94fad978740b49fc1c3b8
    9 Nade99f81d9fd4d5b97ffe560900f7284
    10 Naff917c39033475cb40aaa4ec4be33c6
    11 Nb1fd272384c041138066b3afdfa551f5
    12 Nb3d5db61a9bd4a8b902053785680f502
    13 Nb86ad13593d04dfe89d23387eb19b9f1
    14 Ncf9464f828d649339cfc1f0524bed893
    15 Ndec501c0171043aeaff71fc9709395e7
    16 Nec6249419c77477d97e1a322d9629611
    17 anzsrc-for:06
    18 anzsrc-for:0604
    19 schema:author Na32ace909cf743bfa4ac125437398649
    20 schema:citation sg:pub.10.1007/978-3-211-75123-7_4
    21 sg:pub.10.1038/ng.650
    22 sg:pub.10.1038/nmeth.1246
    23 sg:pub.10.1038/nmeth.1371
    24 sg:pub.10.1038/nmeth.f.268
    25 sg:pub.10.1038/nmeth.f.271
    26 sg:pub.10.1038/nmeth1068
    27 sg:pub.10.1038/nprot.2007.324
    28 sg:pub.10.1038/nprot.2008.97
    29 sg:pub.10.1038/nprot.2008.98
    30 sg:pub.10.1038/nprot.2008.99
    31 sg:pub.10.1186/1471-2105-11-415
    32 sg:pub.10.1186/gb-2008-9-9-r137
    33 sg:pub.10.1186/gb-2009-10-3-r25
    34 sg:pub.10.1186/gb-2010-11-8-r86
    35 schema:datePublished 2012-07-26
    36 schema:datePublishedReg 2012-07-26
    37 schema:description This protocol explains how to use the online integrated pipeline 'peak-motifs' (http://rsat.ulb.ac.be/rsat/) to predict motifs and binding sites in full-size peak sets obtained by chromatin immunoprecipitation–sequencing (ChIP-seq) or related technologies. The workflow combines four time- and memory-efficient motif discovery algorithms to extract significant motifs from the sequences. Discovered motifs are compared with databases of known motifs to identify potentially bound transcription factors. Sequences are scanned to predict transcription factor binding sites and analyze their enrichment and positional distribution relative to peak centers. Peaks and binding sites are exported as BED tracks that can be uploaded into the University of California Santa Cruz (UCSC) genome browser for visualization in the genomic context. This protocol is illustrated with the analysis of a set of 6,000 peaks (8 Mb in total) bound by the Drosophila transcription factor Krüppel. The complete workflow is achieved in about 25 min of computational time on the Regulatory Sequence Analysis Tools (RSAT) Web server. This protocol can be followed in about 1 h.
    38 schema:genre article
    39 schema:inLanguage en
    40 schema:isAccessibleForFree true
    41 schema:isPartOf N020548603dc04bbeb6d1b715dddeaebc
    42 Na422ab7dcf444c4b80b6a3ca9358c3ff
    43 sg:journal.1037502
    44 schema:keywords Analysis Tools (RSAT) Web server
    45 BED tracks
    46 California Santa Cruz (UCSC) Genome Browser
    47 ChIP-seq data sets
    48 Cruz (UCSC) genome browser
    49 Drosophila transcription factor Krüppel
    50 Krüppel
    51 Regulatory Sequence Analysis Tools (RSAT) Web server
    52 Santa Cruz Genome Browser
    53 Sequence Analysis Tools (RSAT) Web server
    54 Tools (RSAT) Web server
    55 University
    56 algorithm
    57 analysis
    58 browser
    59 center
    60 chromatin
    61 complete workflow
    62 computational time
    63 context
    64 data sets
    65 database
    66 discovery algorithm
    67 distribution
    68 enrichment
    69 factor Krüppel
    70 factors
    71 full-size ChIP-seq (and similar) data sets
    72 full-size peak sets
    73 genome browser
    74 genomic context
    75 integrated pipeline
    76 memory-efficient motif discovery algorithms
    77 min
    78 motif
    79 motif discovery algorithms
    80 online integrated pipeline '
    81 peak
    82 peak center
    83 peak sets
    84 pipeline
    85 positional distribution
    86 protocol
    87 sequence
    88 server
    89 set
    90 significant motifs
    91 sites
    92 technology
    93 time
    94 track
    95 transcription factor Krüppel
    96 transcription factors
    97 visualization
    98 web server
    99 workflow
    100 schema:name A complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using peak-motifs
    101 schema:pagination 1551-1568
    102 schema:productId N0f6a73f66eb648908e985cc9db9a3bfe
    103 N16d1edb2a9214d508658e6e34f2455ff
    104 N876640c2439c4dad8e71d21b0dd74d6e
    105 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009423541
    106 https://doi.org/10.1038/nprot.2012.088
    107 schema:sdDatePublished 2022-01-01T18:26
    108 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    109 schema:sdPublisher N07324714c8ee412891bc86fd3f9a1052
    110 schema:url https://doi.org/10.1038/nprot.2012.088
    111 sgo:license sg:explorer/license/
    112 sgo:sdDataset articles
    113 rdf:type schema:ScholarlyArticle
    114 N020548603dc04bbeb6d1b715dddeaebc schema:issueNumber 8
    115 rdf:type schema:PublicationIssue
    116 N07324714c8ee412891bc86fd3f9a1052 schema:name Springer Nature - SN SciGraph project
    117 rdf:type schema:Organization
    118 N0f6a73f66eb648908e985cc9db9a3bfe schema:name dimensions_id
    119 schema:value pub.1009423541
    120 rdf:type schema:PropertyValue
    121 N16d1edb2a9214d508658e6e34f2455ff schema:name pubmed_id
    122 schema:value 22836136
    123 rdf:type schema:PropertyValue
    124 N17e3859fde884caf999a5256a36c04d4 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    125 schema:name Transcription Factors
    126 rdf:type schema:DefinedTerm
    127 N1b177517093d4bbabc13c8ab638a8ee1 rdf:first sg:person.01074617571.21
    128 rdf:rest N8b481b6189d84678985a43c4e87a3fbf
    129 N2a2df5a24405418fafee9e8c4ca92e63 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    130 schema:name Nucleotide Motifs
    131 rdf:type schema:DefinedTerm
    132 N2bf930f1dbac4b2685a4dee4bec6a98d schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    133 schema:name Binding Sites
    134 rdf:type schema:DefinedTerm
    135 N4497a185790841c6b3833a90331986fe schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    136 schema:name Drosophila melanogaster
    137 rdf:type schema:DefinedTerm
    138 N576127d996fb42d2a4b0ed391a348afa rdf:first sg:person.01123310611.66
    139 rdf:rest N1b177517093d4bbabc13c8ab638a8ee1
    140 N72991fefabe1441987a65e103817c69e rdf:first sg:person.01206135236.18
    141 rdf:rest N576127d996fb42d2a4b0ed391a348afa
    142 N735f8d730ff94cc3b4843191beca1d56 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    143 schema:name Chromatin Immunoprecipitation
    144 rdf:type schema:DefinedTerm
    145 N76ea83965d8d4177be40779022aa10f4 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    146 schema:name Workflow
    147 rdf:type schema:DefinedTerm
    148 N876640c2439c4dad8e71d21b0dd74d6e schema:name doi
    149 schema:value 10.1038/nprot.2012.088
    150 rdf:type schema:PropertyValue
    151 N8b481b6189d84678985a43c4e87a3fbf rdf:first sg:person.0760716207.75
    152 rdf:rest Na708587ac7f2419fab5cd624b9cee2ef
    153 N8e88a44b3ef74de7b1086ef8cb94bfa7 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    154 schema:name Genomics
    155 rdf:type schema:DefinedTerm
    156 Na32ace909cf743bfa4ac125437398649 rdf:first sg:person.01020671104.76
    157 rdf:rest N72991fefabe1441987a65e103817c69e
    158 Na422ab7dcf444c4b80b6a3ca9358c3ff schema:volumeNumber 7
    159 rdf:type schema:PublicationVolume
    160 Na708587ac7f2419fab5cd624b9cee2ef rdf:first sg:person.0626672543.46
    161 rdf:rest rdf:nil
    162 Na74d30ed69e94fad978740b49fc1c3b8 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    163 schema:name Algorithms
    164 rdf:type schema:DefinedTerm
    165 Nade99f81d9fd4d5b97ffe560900f7284 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    166 schema:name Mice
    167 rdf:type schema:DefinedTerm
    168 Naff917c39033475cb40aaa4ec4be33c6 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    169 schema:name Time Factors
    170 rdf:type schema:DefinedTerm
    171 Nb1fd272384c041138066b3afdfa551f5 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    172 schema:name Software
    173 rdf:type schema:DefinedTerm
    174 Nb3d5db61a9bd4a8b902053785680f502 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    175 schema:name Animals
    176 rdf:type schema:DefinedTerm
    177 Nb86ad13593d04dfe89d23387eb19b9f1 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    178 schema:name Embryo, Nonmammalian
    179 rdf:type schema:DefinedTerm
    180 Ncf9464f828d649339cfc1f0524bed893 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    181 schema:name Kruppel-Like Transcription Factors
    182 rdf:type schema:DefinedTerm
    183 Ndec501c0171043aeaff71fc9709395e7 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    184 schema:name Sequence Analysis, DNA
    185 rdf:type schema:DefinedTerm
    186 Nec6249419c77477d97e1a322d9629611 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    187 schema:name Drosophila Proteins
    188 rdf:type schema:DefinedTerm
    189 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
    190 schema:name Biological Sciences
    191 rdf:type schema:DefinedTerm
    192 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
    193 schema:name Genetics
    194 rdf:type schema:DefinedTerm
    195 sg:grant.3777704 http://pending.schema.org/fundedItem sg:pub.10.1038/nprot.2012.088
    196 rdf:type schema:MonetaryGrant
    197 sg:grant.6770571 http://pending.schema.org/fundedItem sg:pub.10.1038/nprot.2012.088
    198 rdf:type schema:MonetaryGrant
    199 sg:journal.1037502 schema:issn 1750-2799
    200 1754-2189
    201 schema:name Nature Protocols
    202 schema:publisher Springer Nature
    203 rdf:type schema:Periodical
    204 sg:person.01020671104.76 schema:affiliation grid-institutes:grid.419538.2
    205 schema:familyName Thomas-Chollier
    206 schema:givenName Morgane
    207 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01020671104.76
    208 rdf:type schema:Person
    209 sg:person.01074617571.21 schema:affiliation grid-institutes:grid.9486.3
    210 schema:familyName Defrance
    211 schema:givenName Matthieu
    212 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01074617571.21
    213 rdf:type schema:Person
    214 sg:person.01123310611.66 schema:affiliation grid-institutes:grid.493853.0
    215 schema:familyName Herrmann
    216 schema:givenName Carl
    217 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01123310611.66
    218 rdf:type schema:Person
    219 sg:person.01206135236.18 schema:affiliation grid-institutes:grid.493853.0
    220 schema:familyName Darbo
    221 schema:givenName Elodie
    222 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01206135236.18
    223 rdf:type schema:Person
    224 sg:person.0626672543.46 schema:affiliation grid-institutes:grid.4989.c
    225 schema:familyName van Helden
    226 schema:givenName Jacques
    227 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0626672543.46
    228 rdf:type schema:Person
    229 sg:person.0760716207.75 schema:affiliation grid-institutes:None
    230 schema:familyName Thieffry
    231 schema:givenName Denis
    232 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0760716207.75
    233 rdf:type schema:Person
    234 sg:pub.10.1007/978-3-211-75123-7_4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008345598
    235 https://doi.org/10.1007/978-3-211-75123-7_4
    236 rdf:type schema:CreativeWork
    237 sg:pub.10.1038/ng.650 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009136820
    238 https://doi.org/10.1038/ng.650
    239 rdf:type schema:CreativeWork
    240 sg:pub.10.1038/nmeth.1246 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037362623
    241 https://doi.org/10.1038/nmeth.1246
    242 rdf:type schema:CreativeWork
    243 sg:pub.10.1038/nmeth.1371 schema:sameAs https://app.dimensions.ai/details/publication/pub.1011651858
    244 https://doi.org/10.1038/nmeth.1371
    245 rdf:type schema:CreativeWork
    246 sg:pub.10.1038/nmeth.f.268 schema:sameAs https://app.dimensions.ai/details/publication/pub.1019077758
    247 https://doi.org/10.1038/nmeth.f.268
    248 rdf:type schema:CreativeWork
    249 sg:pub.10.1038/nmeth.f.271 schema:sameAs https://app.dimensions.ai/details/publication/pub.1027512151
    250 https://doi.org/10.1038/nmeth.f.271
    251 rdf:type schema:CreativeWork
    252 sg:pub.10.1038/nmeth1068 schema:sameAs https://app.dimensions.ai/details/publication/pub.1036304799
    253 https://doi.org/10.1038/nmeth1068
    254 rdf:type schema:CreativeWork
    255 sg:pub.10.1038/nprot.2007.324 schema:sameAs https://app.dimensions.ai/details/publication/pub.1043279308
    256 https://doi.org/10.1038/nprot.2007.324
    257 rdf:type schema:CreativeWork
    258 sg:pub.10.1038/nprot.2008.97 schema:sameAs https://app.dimensions.ai/details/publication/pub.1006340595
    259 https://doi.org/10.1038/nprot.2008.97
    260 rdf:type schema:CreativeWork
    261 sg:pub.10.1038/nprot.2008.98 schema:sameAs https://app.dimensions.ai/details/publication/pub.1011667398
    262 https://doi.org/10.1038/nprot.2008.98
    263 rdf:type schema:CreativeWork
    264 sg:pub.10.1038/nprot.2008.99 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051387433
    265 https://doi.org/10.1038/nprot.2008.99
    266 rdf:type schema:CreativeWork
    267 sg:pub.10.1186/1471-2105-11-415 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031367942
    268 https://doi.org/10.1186/1471-2105-11-415
    269 rdf:type schema:CreativeWork
    270 sg:pub.10.1186/gb-2008-9-9-r137 schema:sameAs https://app.dimensions.ai/details/publication/pub.1027608848
    271 https://doi.org/10.1186/gb-2008-9-9-r137
    272 rdf:type schema:CreativeWork
    273 sg:pub.10.1186/gb-2009-10-3-r25 schema:sameAs https://app.dimensions.ai/details/publication/pub.1049583368
    274 https://doi.org/10.1186/gb-2009-10-3-r25
    275 rdf:type schema:CreativeWork
    276 sg:pub.10.1186/gb-2010-11-8-r86 schema:sameAs https://app.dimensions.ai/details/publication/pub.1046347776
    277 https://doi.org/10.1186/gb-2010-11-8-r86
    278 rdf:type schema:CreativeWork
    279 grid-institutes:None schema:alternateName Institut de Biologie de l'Ecole Normale Supérieure—Centre National de la Recherche Scientifique Unité Mixte de Recherche (CNRS UMR) 8197 and INSERM U1024, Paris, France
    280 schema:name Institut de Biologie de l'Ecole Normale Supérieure—Centre National de la Recherche Scientifique Unité Mixte de Recherche (CNRS UMR) 8197 and INSERM U1024, Paris, France
    281 Technological Advances for Genomics and Clinics, Institut National de la Santé et de la Recherche Médicale (INSERM) U928 and Université de la Méditerranée, Marseille, France
    282 rdf:type schema:Organization
    283 grid-institutes:grid.419538.2 schema:alternateName Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
    284 schema:name Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
    285 rdf:type schema:Organization
    286 grid-institutes:grid.493853.0 schema:alternateName Technological Advances for Genomics and Clinics, Institut National de la Santé et de la Recherche Médicale (INSERM) U928 and Université de la Méditerranée, Marseille, France
    287 schema:name Technological Advances for Genomics and Clinics, Institut National de la Santé et de la Recherche Médicale (INSERM) U928 and Université de la Méditerranée, Marseille, France
    288 rdf:type schema:Organization
    289 grid-institutes:grid.4989.c schema:alternateName Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe), Université Libre de Bruxelles, Bruxelles, Belgium
    290 schema:name Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe), Université Libre de Bruxelles, Bruxelles, Belgium
    291 Technological Advances for Genomics and Clinics, Institut National de la Santé et de la Recherche Médicale (INSERM) U928 and Université de la Méditerranée, Marseille, France
    292 rdf:type schema:Organization
    293 grid-institutes:grid.9486.3 schema:alternateName Centro de Ciencias Genomicas, Universidad Nacional Autónoma de México (UNAM), Cuernavaca, Mexico
    294 schema:name Centro de Ciencias Genomicas, Universidad Nacional Autónoma de México (UNAM), Cuernavaca, Mexico
    295 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...