Sigma: multiple alignment of weakly-conserved non-coding DNA sequence View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2006-03-16

AUTHORS

Rahul Siddharthan

ABSTRACT

BACKGROUND: Existing tools for multiple-sequence alignment focus on aligning protein sequence or protein-coding DNA sequence, and are often based on extensions to Needleman-Wunsch-like pairwise alignment methods. We introduce a new tool, Sigma, with a new algorithm and scoring scheme designed specifically for non-coding DNA sequence. This problem acquires importance with the increasing number of published sequences of closely-related species. In particular, studies of gene regulation seek to take advantage of comparative genomics, and recent algorithms for finding regulatory sites in phylogenetically-related intergenic sequence require alignment as a preprocessing step. Much can also be learned about evolution from intergenic DNA, which tends to evolve faster than coding DNA. Sigma uses a strategy of seeking the best possible gapless local alignments (a strategy earlier used by DiAlign), at each step making the best possible alignment consistent with existing alignments, and scores the significance of the alignment based on the lengths of the aligned fragments and a background model which may be supplied or estimated from an auxiliary file of intergenic DNA. RESULTS: Comparative tests of sigma with five earlier algorithms on synthetic data generated to mimic real data show excellent performance, with Sigma balancing high "sensitivity" (more bases aligned) with effective filtering of "incorrect" alignments. With real data, while "correctness" can't be directly quantified for the alignment, running the PhyloGibbs motif finder on pre-aligned sequence suggests that Sigma's alignments are superior. CONCLUSION: By taking into account the peculiarities of non-coding DNA, Sigma fills a gap in the toolbox of bioinformatics. More... »

PAGES

143-143

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/1471-2105-7-143

DOI

http://dx.doi.org/10.1186/1471-2105-7-143

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1014283709

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/16542424


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Algorithms", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Artificial Intelligence", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Base Sequence", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Chromosome Mapping", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Conserved Sequence", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Molecular Sequence Data", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Pattern Recognition, Automated", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Alignment", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Analysis, DNA", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Software", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Institute of Mathematical Sciences, CIT Campus, Taramani, Chennai 600113, India", 
          "id": "http://www.grid.ac/institutes/grid.462414.1", 
          "name": [
            "Institute of Mathematical Sciences, CIT Campus, Taramani, Chennai 600113, India"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Siddharthan", 
        "givenName": "Rahul", 
        "id": "sg:person.0614124227.85", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0614124227.85"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1038/nature01097", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1030736656", 
          "https://doi.org/10.1038/nature01097"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/bf01206331", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1024230952", 
          "https://doi.org/10.1007/bf01206331"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nature01644", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1010517605", 
          "https://doi.org/10.1038/nature01644"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-5-170", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1000329510", 
          "https://doi.org/10.1186/1471-2105-5-170"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-5-128", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1020029116", 
          "https://doi.org/10.1186/1471-2105-5-128"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-3-540-32280-1_4", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1023472380", 
          "https://doi.org/10.1007/978-3-540-32280-1_4"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-4-57", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1028432182", 
          "https://doi.org/10.1186/1471-2105-4-57"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2006-03-16", 
    "datePublishedReg": "2006-03-16", 
    "description": "BACKGROUND: Existing tools for multiple-sequence alignment focus on aligning protein sequence or protein-coding DNA sequence, and are often based on extensions to Needleman-Wunsch-like pairwise alignment methods. We introduce a new tool, Sigma, with a new algorithm and scoring scheme designed specifically for non-coding DNA sequence. This problem acquires importance with the increasing number of published sequences of closely-related species. In particular, studies of gene regulation seek to take advantage of comparative genomics, and recent algorithms for finding regulatory sites in phylogenetically-related intergenic sequence require alignment as a preprocessing step. Much can also be learned about evolution from intergenic DNA, which tends to evolve faster than coding DNA. Sigma uses a strategy of seeking the best possible gapless local alignments (a strategy earlier used by DiAlign), at each step making the best possible alignment consistent with existing alignments, and scores the significance of the alignment based on the lengths of the aligned fragments and a background model which may be supplied or estimated from an auxiliary file of intergenic DNA.\nRESULTS: Comparative tests of sigma with five earlier algorithms on synthetic data generated to mimic real data show excellent performance, with Sigma balancing high \"sensitivity\" (more bases aligned) with effective filtering of \"incorrect\" alignments. With real data, while \"correctness\" can't be directly quantified for the alignment, running the PhyloGibbs motif finder on pre-aligned sequence suggests that Sigma's alignments are superior.\nCONCLUSION: By taking into account the peculiarities of non-coding DNA, Sigma fills a gap in the toolbox of bioinformatics.", 
    "genre": "article", 
    "id": "sg:pub.10.1186/1471-2105-7-143", 
    "inLanguage": "en", 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1023786", 
        "issn": [
          "1471-2105"
        ], 
        "name": "BMC Bioinformatics", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "7"
      }
    ], 
    "keywords": [
      "non-coding DNA sequences", 
      "DNA sequences", 
      "intergenic DNA", 
      "protein-coding DNA sequences", 
      "non-coding DNA", 
      "pre-aligned sequences", 
      "comparative genomics", 
      "gene regulation", 
      "intergenic sequences", 
      "protein sequences", 
      "regulatory sites", 
      "multiple alignment", 
      "DNA", 
      "sequence", 
      "motif finders", 
      "genomics", 
      "pairwise alignment method", 
      "local alignment", 
      "new tool", 
      "species", 
      "bioinformatics", 
      "Needleman-Wunsch", 
      "regulation", 
      "real data", 
      "best possible alignment", 
      "alignment", 
      "fragments", 
      "recent algorithms", 
      "background model", 
      "earlier algorithms", 
      "new algorithm", 
      "synthetic data", 
      "evolution", 
      "sites", 
      "effective filtering", 
      "auxiliary files", 
      "algorithm", 
      "alignment method", 
      "step", 
      "possible alignments", 
      "excellent performance", 
      "tool", 
      "sigma", 
      "importance", 
      "comparative tests", 
      "data", 
      "toolbox", 
      "correctness", 
      "significance", 
      "finder", 
      "length", 
      "files", 
      "number", 
      "filtering", 
      "scheme", 
      "strategies", 
      "study", 
      "sensitivity", 
      "performance", 
      "advantages", 
      "peculiarities", 
      "focus", 
      "extension", 
      "method", 
      "model", 
      "gap", 
      "account", 
      "test", 
      "problem", 
      "multiple-sequence alignment focus", 
      "alignment focus", 
      "like pairwise alignment methods", 
      "possible gapless local alignments", 
      "gapless local alignments", 
      "PhyloGibbs motif finder", 
      "Sigma's alignments", 
      "toolbox of bioinformatics"
    ], 
    "name": "Sigma: multiple alignment of weakly-conserved non-coding DNA sequence", 
    "pagination": "143-143", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1014283709"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/1471-2105-7-143"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "16542424"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/1471-2105-7-143", 
      "https://app.dimensions.ai/details/publication/pub.1014283709"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2021-11-01T18:09", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20211101/entities/gbq_results/article/article_418.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1186/1471-2105-7-143"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-7-143'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-7-143'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-7-143'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-7-143'


 

This table displays all metadata directly associated to this object as RDF triples.

206 TRIPLES      22 PREDICATES      120 URIs      105 LITERALS      17 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/1471-2105-7-143 schema:about N0a4ea6a2733043eba7fe38a622b1790a
2 N0d3339bb2df241c298cd0ad7036a2108
3 N0eb8cb05f261462cb620d786b79be053
4 N12b5af52beb24b7fa12a9f4d46f4511c
5 N3f9bb0d661b040e1887b5b633d7291f0
6 N46b1fe15948b41e8a53214fe1b9a2ccd
7 N59140f3da14742a8b1f22e75db7a0713
8 N8bfb93da3a5445058f5cc3c94bc7a24d
9 Na5595f5943b94d6a82f1ca2222bfc793
10 Nefe454c01e3c4ac08efa8457df9155f6
11 anzsrc-for:06
12 anzsrc-for:0604
13 schema:author N7634d8004b97432498ad71a29f309ce1
14 schema:citation sg:pub.10.1007/978-3-540-32280-1_4
15 sg:pub.10.1007/bf01206331
16 sg:pub.10.1038/nature01097
17 sg:pub.10.1038/nature01644
18 sg:pub.10.1186/1471-2105-4-57
19 sg:pub.10.1186/1471-2105-5-128
20 sg:pub.10.1186/1471-2105-5-170
21 schema:datePublished 2006-03-16
22 schema:datePublishedReg 2006-03-16
23 schema:description BACKGROUND: Existing tools for multiple-sequence alignment focus on aligning protein sequence or protein-coding DNA sequence, and are often based on extensions to Needleman-Wunsch-like pairwise alignment methods. We introduce a new tool, Sigma, with a new algorithm and scoring scheme designed specifically for non-coding DNA sequence. This problem acquires importance with the increasing number of published sequences of closely-related species. In particular, studies of gene regulation seek to take advantage of comparative genomics, and recent algorithms for finding regulatory sites in phylogenetically-related intergenic sequence require alignment as a preprocessing step. Much can also be learned about evolution from intergenic DNA, which tends to evolve faster than coding DNA. Sigma uses a strategy of seeking the best possible gapless local alignments (a strategy earlier used by DiAlign), at each step making the best possible alignment consistent with existing alignments, and scores the significance of the alignment based on the lengths of the aligned fragments and a background model which may be supplied or estimated from an auxiliary file of intergenic DNA. RESULTS: Comparative tests of sigma with five earlier algorithms on synthetic data generated to mimic real data show excellent performance, with Sigma balancing high "sensitivity" (more bases aligned) with effective filtering of "incorrect" alignments. With real data, while "correctness" can't be directly quantified for the alignment, running the PhyloGibbs motif finder on pre-aligned sequence suggests that Sigma's alignments are superior. CONCLUSION: By taking into account the peculiarities of non-coding DNA, Sigma fills a gap in the toolbox of bioinformatics.
24 schema:genre article
25 schema:inLanguage en
26 schema:isAccessibleForFree true
27 schema:isPartOf N01f9d3317c7d4149a4e3b0c76a1a0218
28 N4872b4eacbf84d6bb73cc84dbebc5bc6
29 sg:journal.1023786
30 schema:keywords DNA
31 DNA sequences
32 Needleman-Wunsch
33 PhyloGibbs motif finder
34 Sigma's alignments
35 account
36 advantages
37 algorithm
38 alignment
39 alignment focus
40 alignment method
41 auxiliary files
42 background model
43 best possible alignment
44 bioinformatics
45 comparative genomics
46 comparative tests
47 correctness
48 data
49 earlier algorithms
50 effective filtering
51 evolution
52 excellent performance
53 extension
54 files
55 filtering
56 finder
57 focus
58 fragments
59 gap
60 gapless local alignments
61 gene regulation
62 genomics
63 importance
64 intergenic DNA
65 intergenic sequences
66 length
67 like pairwise alignment methods
68 local alignment
69 method
70 model
71 motif finders
72 multiple alignment
73 multiple-sequence alignment focus
74 new algorithm
75 new tool
76 non-coding DNA
77 non-coding DNA sequences
78 number
79 pairwise alignment method
80 peculiarities
81 performance
82 possible alignments
83 possible gapless local alignments
84 pre-aligned sequences
85 problem
86 protein sequences
87 protein-coding DNA sequences
88 real data
89 recent algorithms
90 regulation
91 regulatory sites
92 scheme
93 sensitivity
94 sequence
95 sigma
96 significance
97 sites
98 species
99 step
100 strategies
101 study
102 synthetic data
103 test
104 tool
105 toolbox
106 toolbox of bioinformatics
107 schema:name Sigma: multiple alignment of weakly-conserved non-coding DNA sequence
108 schema:pagination 143-143
109 schema:productId N772710a0f6604bfa8f59083283b2fbbc
110 Nb89dbd2080934443a89e3787a64297f2
111 Nf3c4e818dbd94a0381db89253c65b7d9
112 schema:sameAs https://app.dimensions.ai/details/publication/pub.1014283709
113 https://doi.org/10.1186/1471-2105-7-143
114 schema:sdDatePublished 2021-11-01T18:09
115 schema:sdLicense https://scigraph.springernature.com/explorer/license/
116 schema:sdPublisher N7a34b352ee1d430885dd64f004a5c660
117 schema:url https://doi.org/10.1186/1471-2105-7-143
118 sgo:license sg:explorer/license/
119 sgo:sdDataset articles
120 rdf:type schema:ScholarlyArticle
121 N01f9d3317c7d4149a4e3b0c76a1a0218 schema:volumeNumber 7
122 rdf:type schema:PublicationVolume
123 N0a4ea6a2733043eba7fe38a622b1790a schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
124 schema:name Pattern Recognition, Automated
125 rdf:type schema:DefinedTerm
126 N0d3339bb2df241c298cd0ad7036a2108 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
127 schema:name Sequence Analysis, DNA
128 rdf:type schema:DefinedTerm
129 N0eb8cb05f261462cb620d786b79be053 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
130 schema:name Chromosome Mapping
131 rdf:type schema:DefinedTerm
132 N12b5af52beb24b7fa12a9f4d46f4511c schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
133 schema:name Sequence Alignment
134 rdf:type schema:DefinedTerm
135 N3f9bb0d661b040e1887b5b633d7291f0 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
136 schema:name Molecular Sequence Data
137 rdf:type schema:DefinedTerm
138 N46b1fe15948b41e8a53214fe1b9a2ccd schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
139 schema:name Conserved Sequence
140 rdf:type schema:DefinedTerm
141 N4872b4eacbf84d6bb73cc84dbebc5bc6 schema:issueNumber 1
142 rdf:type schema:PublicationIssue
143 N59140f3da14742a8b1f22e75db7a0713 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
144 schema:name Artificial Intelligence
145 rdf:type schema:DefinedTerm
146 N7634d8004b97432498ad71a29f309ce1 rdf:first sg:person.0614124227.85
147 rdf:rest rdf:nil
148 N772710a0f6604bfa8f59083283b2fbbc schema:name dimensions_id
149 schema:value pub.1014283709
150 rdf:type schema:PropertyValue
151 N7a34b352ee1d430885dd64f004a5c660 schema:name Springer Nature - SN SciGraph project
152 rdf:type schema:Organization
153 N8bfb93da3a5445058f5cc3c94bc7a24d schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
154 schema:name Base Sequence
155 rdf:type schema:DefinedTerm
156 Na5595f5943b94d6a82f1ca2222bfc793 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
157 schema:name Algorithms
158 rdf:type schema:DefinedTerm
159 Nb89dbd2080934443a89e3787a64297f2 schema:name pubmed_id
160 schema:value 16542424
161 rdf:type schema:PropertyValue
162 Nefe454c01e3c4ac08efa8457df9155f6 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
163 schema:name Software
164 rdf:type schema:DefinedTerm
165 Nf3c4e818dbd94a0381db89253c65b7d9 schema:name doi
166 schema:value 10.1186/1471-2105-7-143
167 rdf:type schema:PropertyValue
168 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
169 schema:name Biological Sciences
170 rdf:type schema:DefinedTerm
171 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
172 schema:name Genetics
173 rdf:type schema:DefinedTerm
174 sg:journal.1023786 schema:issn 1471-2105
175 schema:name BMC Bioinformatics
176 schema:publisher Springer Nature
177 rdf:type schema:Periodical
178 sg:person.0614124227.85 schema:affiliation grid-institutes:grid.462414.1
179 schema:familyName Siddharthan
180 schema:givenName Rahul
181 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0614124227.85
182 rdf:type schema:Person
183 sg:pub.10.1007/978-3-540-32280-1_4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023472380
184 https://doi.org/10.1007/978-3-540-32280-1_4
185 rdf:type schema:CreativeWork
186 sg:pub.10.1007/bf01206331 schema:sameAs https://app.dimensions.ai/details/publication/pub.1024230952
187 https://doi.org/10.1007/bf01206331
188 rdf:type schema:CreativeWork
189 sg:pub.10.1038/nature01097 schema:sameAs https://app.dimensions.ai/details/publication/pub.1030736656
190 https://doi.org/10.1038/nature01097
191 rdf:type schema:CreativeWork
192 sg:pub.10.1038/nature01644 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010517605
193 https://doi.org/10.1038/nature01644
194 rdf:type schema:CreativeWork
195 sg:pub.10.1186/1471-2105-4-57 schema:sameAs https://app.dimensions.ai/details/publication/pub.1028432182
196 https://doi.org/10.1186/1471-2105-4-57
197 rdf:type schema:CreativeWork
198 sg:pub.10.1186/1471-2105-5-128 schema:sameAs https://app.dimensions.ai/details/publication/pub.1020029116
199 https://doi.org/10.1186/1471-2105-5-128
200 rdf:type schema:CreativeWork
201 sg:pub.10.1186/1471-2105-5-170 schema:sameAs https://app.dimensions.ai/details/publication/pub.1000329510
202 https://doi.org/10.1186/1471-2105-5-170
203 rdf:type schema:CreativeWork
204 grid-institutes:grid.462414.1 schema:alternateName Institute of Mathematical Sciences, CIT Campus, Taramani, Chennai 600113, India
205 schema:name Institute of Mathematical Sciences, CIT Campus, Taramani, Chennai 600113, India
206 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...