Identification of transcription factor binding sites in the human genome sequence View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2002-09

AUTHORS

Samuel Levy, Sridhar Hannenhalli

ABSTRACT

The identification of transcription factor binding sites (TFBS) is an important initial step in determining the DNA signals that regulate transcription of the genome. We tested the performance of three distinct computational methods for the identification of TFBS applied to the human genome sequence, as judged by their ability to recover the location of experimentally determined, and uniquely mapped, TFBS taken from the TRANSFAC database. These identification methods all attempt to filter the quantity of TFBS identified by aligning positional weight matrices that describe the binding site and employ either (i) a P-value threshold for accepting a site, (ii) an over-representation measure of neighboring sites, or (iii) conservation with the mouse genome and application of P-value thresholds. The results show that the best recognition of TFBS is achieved by combining the identification of TFBS in regions of human-mouse conservation and also by applying a high stringency P-value to the TFBS identified in non-coding regions that are not conserved. Additionally, we find that only half of the 481 experimentally mapped sites can be found in sequence regions conserved with mouse, but the predictive power of the binding site identification method is up to threefold higher in the conserved regions. More... »

PAGES

510-514

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/s00335-002-2175-6

DOI

http://dx.doi.org/10.1007/s00335-002-2175-6

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1010582603

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/12370781


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Animals", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Base Sequence", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Binding Sites", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Conserved Sequence", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "DNA", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Genome, Human", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Humans", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Mice", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Molecular Sequence Data", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Homology, Nucleic Acid", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Species Specificity", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Transcription Factors", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "name": [
            "Informatics Research, Celera Corporation, 45 West Gude Drive, Rockville, Maryland 20850, USA, US"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Levy", 
        "givenName": "Samuel", 
        "id": "sg:person.01370520516.74", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01370520516.74"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "name": [
            "Informatics Research, Celera Corporation, 45 West Gude Drive, Rockville, Maryland 20850, USA, US"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Hannenhalli", 
        "givenName": "Sridhar", 
        "id": "sg:person.01341565477.18", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01341565477.18"
        ], 
        "type": "Person"
      }
    ], 
    "datePublished": "2002-09", 
    "datePublishedReg": "2002-09-01", 
    "description": "The identification of transcription factor binding sites (TFBS) is an important initial step in determining the DNA signals that regulate transcription of the genome. We tested the performance of three distinct computational methods for the identification of TFBS applied to the human genome sequence, as judged by their ability to recover the location of experimentally determined, and uniquely mapped, TFBS taken from the TRANSFAC database. These identification methods all attempt to filter the quantity of TFBS identified by aligning positional weight matrices that describe the binding site and employ either (i) a P-value threshold for accepting a site, (ii) an over-representation measure of neighboring sites, or (iii) conservation with the mouse genome and application of P-value thresholds. The results show that the best recognition of TFBS is achieved by combining the identification of TFBS in regions of human-mouse conservation and also by applying a high stringency P-value to the TFBS identified in non-coding regions that are not conserved. Additionally, we find that only half of the 481 experimentally mapped sites can be found in sequence regions conserved with mouse, but the predictive power of the binding site identification method is up to threefold higher in the conserved regions.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1007/s00335-002-2175-6", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": [
      {
        "id": "sg:journal.1100928", 
        "issn": [
          "0938-8990", 
          "1432-1777"
        ], 
        "name": "Mammalian Genome", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "9", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "13"
      }
    ], 
    "name": "Identification of transcription factor binding sites in the human genome sequence", 
    "pagination": "510-514", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "326a7544e062a1fa6e03f33d631e1e973f1129ffec0c361966823c3273ee277c"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "12370781"
        ]
      }, 
      {
        "name": "nlm_unique_id", 
        "type": "PropertyValue", 
        "value": [
          "9100916"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/s00335-002-2175-6"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1010582603"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1007/s00335-002-2175-6", 
      "https://app.dimensions.ai/details/publication/pub.1010582603"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-11T01:06", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8697_00000511.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "http://link.springer.com/10.1007%2Fs00335-002-2175-6"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s00335-002-2175-6'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s00335-002-2175-6'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s00335-002-2175-6'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s00335-002-2175-6'


 

This table displays all metadata directly associated to this object as RDF triples.

125 TRIPLES      20 PREDICATES      41 URIs      33 LITERALS      21 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/s00335-002-2175-6 schema:about N185aeacd9dcb442d8567418a51db4f67
2 N398983ba851749f7bf69e1ec5be57af0
3 N43fa0588148a4ff0ac58e85f4399b640
4 N6802375e8d124a18afe6943144319e06
5 N6cfa2eb1d1bf4d1ea8cfdc8a5ed17c8f
6 N7a8521a5666e40e4ab378e0a59fb5673
7 N8ca30fe930dc45a589df11ea73ef7ee3
8 N9fd8b2daeda846efb730d7a2044e79ee
9 Nb0c8105adfb0418b9939966ff0582122
10 Nb27a8304e27e4e3a928079d19da1532d
11 Nd486dd5bc1a9452db4687f8d8746f9e8
12 Ned1d4a1ede094c8b8719ddbed1d56ed7
13 anzsrc-for:06
14 anzsrc-for:0604
15 schema:author N05de5e7b36cb4f6987a09e85ac7e116a
16 schema:datePublished 2002-09
17 schema:datePublishedReg 2002-09-01
18 schema:description The identification of transcription factor binding sites (TFBS) is an important initial step in determining the DNA signals that regulate transcription of the genome. We tested the performance of three distinct computational methods for the identification of TFBS applied to the human genome sequence, as judged by their ability to recover the location of experimentally determined, and uniquely mapped, TFBS taken from the TRANSFAC database. These identification methods all attempt to filter the quantity of TFBS identified by aligning positional weight matrices that describe the binding site and employ either (i) a P-value threshold for accepting a site, (ii) an over-representation measure of neighboring sites, or (iii) conservation with the mouse genome and application of P-value thresholds. The results show that the best recognition of TFBS is achieved by combining the identification of TFBS in regions of human-mouse conservation and also by applying a high stringency P-value to the TFBS identified in non-coding regions that are not conserved. Additionally, we find that only half of the 481 experimentally mapped sites can be found in sequence regions conserved with mouse, but the predictive power of the binding site identification method is up to threefold higher in the conserved regions.
19 schema:genre research_article
20 schema:inLanguage en
21 schema:isAccessibleForFree false
22 schema:isPartOf N448abd7ee8704337ae23642f2e48c616
23 N46d18e48d02e4468ac734f23a2cf1903
24 sg:journal.1100928
25 schema:name Identification of transcription factor binding sites in the human genome sequence
26 schema:pagination 510-514
27 schema:productId N473ac2ba87564a9f9f402fd6418b3eb4
28 N7266f63fa59b4f9fa9c1828e89ad5f02
29 N76d433f13bd84002b458dd260cf86af1
30 Nbf43b45e30e94a96b3aca423d7d7d5df
31 Nf475fef806f04ed0897da33ce8d9a13d
32 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010582603
33 https://doi.org/10.1007/s00335-002-2175-6
34 schema:sdDatePublished 2019-04-11T01:06
35 schema:sdLicense https://scigraph.springernature.com/explorer/license/
36 schema:sdPublisher Nb3830f4813a14b9ca156c40cd55f8674
37 schema:url http://link.springer.com/10.1007%2Fs00335-002-2175-6
38 sgo:license sg:explorer/license/
39 sgo:sdDataset articles
40 rdf:type schema:ScholarlyArticle
41 N05de5e7b36cb4f6987a09e85ac7e116a rdf:first sg:person.01370520516.74
42 rdf:rest N149c1f468d374e768ef46648378ed5dd
43 N149c1f468d374e768ef46648378ed5dd rdf:first sg:person.01341565477.18
44 rdf:rest rdf:nil
45 N185aeacd9dcb442d8567418a51db4f67 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
46 schema:name Molecular Sequence Data
47 rdf:type schema:DefinedTerm
48 N398983ba851749f7bf69e1ec5be57af0 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
49 schema:name Species Specificity
50 rdf:type schema:DefinedTerm
51 N43fa0588148a4ff0ac58e85f4399b640 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
52 schema:name DNA
53 rdf:type schema:DefinedTerm
54 N448abd7ee8704337ae23642f2e48c616 schema:volumeNumber 13
55 rdf:type schema:PublicationVolume
56 N46d18e48d02e4468ac734f23a2cf1903 schema:issueNumber 9
57 rdf:type schema:PublicationIssue
58 N473ac2ba87564a9f9f402fd6418b3eb4 schema:name dimensions_id
59 schema:value pub.1010582603
60 rdf:type schema:PropertyValue
61 N6802375e8d124a18afe6943144319e06 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
62 schema:name Base Sequence
63 rdf:type schema:DefinedTerm
64 N6c4e16fc97cb448eb20c783c9fa475ce schema:name Informatics Research, Celera Corporation, 45 West Gude Drive, Rockville, Maryland 20850, USA, US
65 rdf:type schema:Organization
66 N6cfa2eb1d1bf4d1ea8cfdc8a5ed17c8f schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
67 schema:name Conserved Sequence
68 rdf:type schema:DefinedTerm
69 N7266f63fa59b4f9fa9c1828e89ad5f02 schema:name doi
70 schema:value 10.1007/s00335-002-2175-6
71 rdf:type schema:PropertyValue
72 N76d433f13bd84002b458dd260cf86af1 schema:name pubmed_id
73 schema:value 12370781
74 rdf:type schema:PropertyValue
75 N7a8521a5666e40e4ab378e0a59fb5673 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
76 schema:name Mice
77 rdf:type schema:DefinedTerm
78 N8ca30fe930dc45a589df11ea73ef7ee3 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
79 schema:name Sequence Homology, Nucleic Acid
80 rdf:type schema:DefinedTerm
81 N9fd8b2daeda846efb730d7a2044e79ee schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
82 schema:name Genome, Human
83 rdf:type schema:DefinedTerm
84 Nb0c8105adfb0418b9939966ff0582122 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
85 schema:name Animals
86 rdf:type schema:DefinedTerm
87 Nb27a8304e27e4e3a928079d19da1532d schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
88 schema:name Transcription Factors
89 rdf:type schema:DefinedTerm
90 Nb3830f4813a14b9ca156c40cd55f8674 schema:name Springer Nature - SN SciGraph project
91 rdf:type schema:Organization
92 Nb7b1a3c0ccb84b0ab7c671824a161ce1 schema:name Informatics Research, Celera Corporation, 45 West Gude Drive, Rockville, Maryland 20850, USA, US
93 rdf:type schema:Organization
94 Nbf43b45e30e94a96b3aca423d7d7d5df schema:name nlm_unique_id
95 schema:value 9100916
96 rdf:type schema:PropertyValue
97 Nd486dd5bc1a9452db4687f8d8746f9e8 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
98 schema:name Humans
99 rdf:type schema:DefinedTerm
100 Ned1d4a1ede094c8b8719ddbed1d56ed7 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
101 schema:name Binding Sites
102 rdf:type schema:DefinedTerm
103 Nf475fef806f04ed0897da33ce8d9a13d schema:name readcube_id
104 schema:value 326a7544e062a1fa6e03f33d631e1e973f1129ffec0c361966823c3273ee277c
105 rdf:type schema:PropertyValue
106 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
107 schema:name Biological Sciences
108 rdf:type schema:DefinedTerm
109 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
110 schema:name Genetics
111 rdf:type schema:DefinedTerm
112 sg:journal.1100928 schema:issn 0938-8990
113 1432-1777
114 schema:name Mammalian Genome
115 rdf:type schema:Periodical
116 sg:person.01341565477.18 schema:affiliation Nb7b1a3c0ccb84b0ab7c671824a161ce1
117 schema:familyName Hannenhalli
118 schema:givenName Sridhar
119 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01341565477.18
120 rdf:type schema:Person
121 sg:person.01370520516.74 schema:affiliation N6c4e16fc97cb448eb20c783c9fa475ce
122 schema:familyName Levy
123 schema:givenName Samuel
124 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01370520516.74
125 rdf:type schema:Person
 




Preview window. Press ESC to close (or click here)


...