# The Protein Sequence Design Problem in Canonical Model on 2D and 3D Lattices

Ontology type: schema:Chapter

### Chapter Info

DATE

2004

AUTHORS

Piotr Berman , Bhaskar DasGupta , Dhruv Mubayi , Robert Sloan , György Turán , Yi Zhang

ABSTRACT

In this paper we investigate the protein sequence design (PSD) problem (also known as the inverse protein folding problem) under the Canonical modelon 2D and 3D lattices [12,25]. The Canonical model is specified by (i) a geometric representation of a target protein structure with amino acid residues via its contact graph, (ii) a binary folding code in which the amino acids are classified as hydrophobic (H) or polar (P), (iii) an energy functionΦ defined in terms of the target structure that should favor sequences with a dense hydrophobic core and penalize those with many solvent-exposed hydrophobic residues (in the Canonical model, the energy function Φ gives an H-H residue contact in the contact graph a value of –1 and all other contacts a value of 0), and (iv) to prevent the solution from being a biologically meaningless all H sequence, the number of H residues in the sequence S is limited by fixing an upper bound λ on the ratio between H and P amino acids. The sequence S is designed by specifying which residues are H and which ones are P in a way that realizes the global minima of the energy function Φ. In this paper, we prove the following results:(1) An earlier proof of NP-completeness of finding the global energy minima for the PSD problem on 3D lattices in [12] was based on the NP-completeness of the same problem on 2D lattices. However, the reduction was not correct and we show that the problem of finding the global energy minima for the PSD problem for 2D lattices can be solved efficiently in polynomial time. But, we show that the problem of finding the global energy minima for the PSD problem on 3D lattices is indeed NP-complete by a providing a different reduction from the problem of finding the largest clique on graphs.(2) Even though the problem of finding the global energy minima on 3D lattices is NP-complete, we show that an arbitrarily close approximation to the global energy minima can indeed be found efficiently by taking appropriate combinations of optimal global energy minima of substrings of the sequence S by providing a polynomial-time approximation scheme (PTAS). Our algorithmic technique to design such a PTAS for finding the global energy minima involves using the shifted slice-and-dice approach in [6,17,18]. This result improves the previous best polynomial-time approximation algorithm for finding the global energy minima in [12] with a performance ratio of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$1\over 2$\end{document}. More... »

PAGES

244-253

### Book

TITLE

Combinatorial Pattern Matching

ISBN

978-3-540-22341-2
978-3-540-27801-6

### Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-540-27801-6_18

DOI

http://dx.doi.org/10.1007/978-3-540-27801-6_18

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1047164753

Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service:

[
{
"@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json",
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Information and Computing Sciences",
"type": "DefinedTerm"
},
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0802",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Computation Theory and Mathematics",
"type": "DefinedTerm"
}
],
"author": [
{
"affiliation": {
"alternateName": "Department of Computer Science and Engineering, Pennsylvania State University, 16802, University Park, PA, USA",
"id": "http://www.grid.ac/institutes/grid.29857.31",
"name": [
"Department of Computer Science and Engineering, Pennsylvania State University, 16802, University Park, PA, USA"
],
"type": "Organization"
},
"familyName": "Berman",
"givenName": "Piotr",
"id": "sg:person.01274506210.27",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01274506210.27"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Department of Computer Science, University of Illinois at Chicago, 60607-7053, Chicago, IL, USA",
"id": "http://www.grid.ac/institutes/grid.185648.6",
"name": [
"Department of Computer Science, University of Illinois at Chicago, 60607-7053, Chicago, IL, USA"
],
"type": "Organization"
},
"familyName": "DasGupta",
"id": "sg:person.0763403270.10",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0763403270.10"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Department of Mathematics, Statistics & Computer Science, University of Illinois at Chicago, 60607-7045, Chicago, IL, USA",
"id": "http://www.grid.ac/institutes/grid.185648.6",
"name": [
"Department of Mathematics, Statistics & Computer Science, University of Illinois at Chicago, 60607-7045, Chicago, IL, USA"
],
"type": "Organization"
},
"familyName": "Mubayi",
"givenName": "Dhruv",
"id": "sg:person.013521032647.30",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013521032647.30"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Department of Computer Science, University of Illinois at Chicago, 60607-7053, Chicago, IL, USA",
"id": "http://www.grid.ac/institutes/grid.185648.6",
"name": [
"Department of Computer Science, University of Illinois at Chicago, 60607-7053, Chicago, IL, USA"
],
"type": "Organization"
},
"familyName": "Sloan",
"givenName": "Robert",
"id": "sg:person.01163451614.72",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01163451614.72"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Department of Mathematics, Statistics & Computer Science, University of Illinois at Chicago, 60607-7045, Chicago, IL, USA",
"id": "http://www.grid.ac/institutes/grid.185648.6",
"name": [
"Department of Mathematics, Statistics & Computer Science, University of Illinois at Chicago, 60607-7045, Chicago, IL, USA"
],
"type": "Organization"
},
"familyName": "Tur\u00e1n",
"givenName": "Gy\u00f6rgy",
"id": "sg:person.012465675663.27",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012465675663.27"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Department of Computer Science, University of Illinois at Chicago, 60607-7053, Chicago, IL, USA",
"id": "http://www.grid.ac/institutes/grid.185648.6",
"name": [
"Department of Computer Science, University of Illinois at Chicago, 60607-7053, Chicago, IL, USA"
],
"type": "Organization"
},
"familyName": "Zhang",
"givenName": "Yi",
"type": "Person"
}
],
"datePublished": "2004",
"datePublishedReg": "2004-01-01",
"description": "In this paper we investigate the protein sequence design (PSD) problem (also known as the inverse protein folding problem) under the Canonical modelon 2D and 3D lattices [12,25]. The Canonical model is specified by (i) a geometric representation of a target protein structure with amino acid residues via its contact graph, (ii) a binary folding code in which the amino acids are classified as hydrophobic (H) or polar (P), (iii) an energy function\u03a6 defined in terms of the target structure that should favor sequences with a dense hydrophobic core and penalize those with many solvent-exposed hydrophobic residues (in the Canonical model, the energy function \u03a6 gives an H-H residue contact in the contact graph a value of \u20131 and all other contacts a value of 0), and (iv) to prevent the solution from being a biologically meaningless all H sequence, the number of H residues in the sequence S is limited by fixing an upper bound \u03bb on the ratio between H and P amino acids. The sequence S is designed by specifying which residues are H and which ones are P in a way that realizes the global minima of the energy function \u03a6. In this paper, we prove the following results:(1) An earlier proof of NP-completeness of finding the global energy minima for the PSD problem on 3D lattices in\u00a0[12] was based on the NP-completeness of the same problem on 2D lattices. However, the reduction was not correct and we show that the problem of finding the global energy minima for the PSD problem for 2D lattices can be solved efficiently in polynomial time. But, we show that the problem of finding the global energy minima for the PSD problem on 3D lattices is indeed NP-complete by a providing a different reduction from the problem of finding the largest clique on graphs.(2) Even though the problem of finding the global energy minima on 3D lattices is NP-complete, we show that an arbitrarily close approximation to the global energy minima can indeed be found efficiently by taking appropriate combinations of optimal global energy minima of substrings of the sequence S by providing a polynomial-time approximation scheme (PTAS). Our algorithmic technique to design such a PTAS for finding the global energy minima involves using the shifted slice-and-dice approach in [6,17,18]. This result improves the previous best polynomial-time approximation algorithm for finding the global energy minima in [12] with a performance ratio of \\documentclass[12pt]{minimal}\n\t\t\t\t\\usepackage{amsmath}\n\t\t\t\t\\usepackage{wasysym}\n\t\t\t\t\\usepackage{amsfonts}\n\t\t\t\t\\usepackage{amssymb}\n\t\t\t\t\\usepackage{amsbsy}\n\t\t\t\t\\usepackage{mathrsfs}\n\t\t\t\t\\usepackage{upgreek}\n\t\t\t\t\\setlength{\\oddsidemargin}{-69pt}\n\t\t\t\t\\begin{document}$1\\over 2$\\end{document}.",
"editor": [
{
"familyName": "Sahinalp",
"givenName": "Suleyman Cenk",
"type": "Person"
},
{
"familyName": "Muthukrishnan",
"givenName": "S.",
"type": "Person"
},
{
"familyName": "Dogrusoz",
"givenName": "Ugur",
"type": "Person"
}
],
"genre": "chapter",
"id": "sg:pub.10.1007/978-3-540-27801-6_18",
"inLanguage": "en",
"isAccessibleForFree": false,
"isPartOf": {
"isbn": [
"978-3-540-22341-2",
"978-3-540-27801-6"
],
"name": "Combinatorial Pattern Matching",
"type": "Book"
},
"keywords": [
"dense hydrophobic core",
"NP-completeness",
"amino acid residues",
"NPs",
"polynomial-time approximation algorithm",
"early proof",
"contact graph",
"amino acids",
"hydrophobic core",
"lattice",
"acid residues",
"performance ratio",
"design problem",
"structure",
"appropriate combination",
"scheme",
"polynomial time approximation scheme",
"approximation algorithm",
"paper",
"algorithmic techniques",
"acid",
"core",
"proof",
"ratio",
"technique",
"residues",
"solution",
"reduction",
"global energy minimum",
"polynomial time",
"energy minima",
"algorithm",
"sequence design problem",
"sequence",
"problem",
"hydrophobic residues",
"protein structure",
"combination",
"approach",
"code",
"time",
"way",
"minimum",
"different reductions",
"results",
"terms",
"target structure",
"close approximation",
"one",
"same problem",
"number",
"graph",
"approximation scheme",
"model",
"cliques",
"slices",
"approximation",
"canonical model",
"sequence S",
"global minimum",
"substrings",
"representation",
"best polynomial-time approximation algorithm",
"geometric representation",
"function \u03a6.",
"largest clique",
"PSD problem",
"DICE approach",
"target protein structure",
"solvent-exposed hydrophobic residues"
],
"name": "The Protein Sequence Design Problem in Canonical Model on 2D and 3D Lattices",
"pagination": "244-253",
"productId": [
{
"name": "dimensions_id",
"type": "PropertyValue",
"value": [
"pub.1047164753"
]
},
{
"name": "doi",
"type": "PropertyValue",
"value": [
"10.1007/978-3-540-27801-6_18"
]
}
],
"publisher": {
"name": "Springer Nature",
"type": "Organisation"
},
"sameAs": [
"https://doi.org/10.1007/978-3-540-27801-6_18",
"https://app.dimensions.ai/details/publication/pub.1047164753"
],
"sdDataset": "chapters",
"sdDatePublished": "2022-05-20T07:41",
"sdPublisher": {
"name": "Springer Nature - SN SciGraph project",
"type": "Organization"
},
"sdSource": "s3://com-springernature-scigraph/baseset/20220519/entities/gbq_results/chapter/chapter_12.jsonl",
"type": "Chapter",
"url": "https://doi.org/10.1007/978-3-540-27801-6_18"
}
]

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-27801-6_18'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-27801-6_18'

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-27801-6_18'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-540-27801-6_18'

This table displays all metadata directly associated to this object as RDF triples.

179 TRIPLES      23 PREDICATES      96 URIs      89 LITERALS      7 BLANK NODES

Subject Predicate Object
2 anzsrc-for:0802
3 schema:author N5561e79d08ea404896873d5540a248d1
4 schema:datePublished 2004
5 schema:datePublishedReg 2004-01-01
6 schema:description In this paper we investigate the protein sequence design (PSD) problem (also known as the inverse protein folding problem) under the Canonical modelon 2D and 3D lattices [12,25]. The Canonical model is specified by (i) a geometric representation of a target protein structure with amino acid residues via its contact graph, (ii) a binary folding code in which the amino acids are classified as hydrophobic (H) or polar (P), (iii) an energy functionΦ defined in terms of the target structure that should favor sequences with a dense hydrophobic core and penalize those with many solvent-exposed hydrophobic residues (in the Canonical model, the energy function Φ gives an H-H residue contact in the contact graph a value of –1 and all other contacts a value of 0), and (iv) to prevent the solution from being a biologically meaningless all H sequence, the number of H residues in the sequence S is limited by fixing an upper bound λ on the ratio between H and P amino acids. The sequence S is designed by specifying which residues are H and which ones are P in a way that realizes the global minima of the energy function Φ. In this paper, we prove the following results:(1) An earlier proof of NP-completeness of finding the global energy minima for the PSD problem on 3D lattices in [12] was based on the NP-completeness of the same problem on 2D lattices. However, the reduction was not correct and we show that the problem of finding the global energy minima for the PSD problem for 2D lattices can be solved efficiently in polynomial time. But, we show that the problem of finding the global energy minima for the PSD problem on 3D lattices is indeed NP-complete by a providing a different reduction from the problem of finding the largest clique on graphs.(2) Even though the problem of finding the global energy minima on 3D lattices is NP-complete, we show that an arbitrarily close approximation to the global energy minima can indeed be found efficiently by taking appropriate combinations of optimal global energy minima of substrings of the sequence S by providing a polynomial-time approximation scheme (PTAS). Our algorithmic technique to design such a PTAS for finding the global energy minima involves using the shifted slice-and-dice approach in [6,17,18]. This result improves the previous best polynomial-time approximation algorithm for finding the global energy minima in [12] with a performance ratio of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$1\over 2$\end{document}.
7 schema:editor N7b1f470b1dd1455080251a3ba87571ce
8 schema:genre chapter
9 schema:inLanguage en
10 schema:isAccessibleForFree false
11 schema:isPartOf Nf2873c208e314e3b87a7a95ac9b110d2
12 schema:keywords DICE approach
13 NP-completeness
14 NPs
15 PSD problem
16 acid
17 acid residues
18 algorithm
19 algorithmic techniques
20 amino acid residues
21 amino acids
22 approach
23 appropriate combination
24 approximation
25 approximation algorithm
26 approximation scheme
27 best polynomial-time approximation algorithm
28 canonical model
29 cliques
30 close approximation
31 code
32 combination
33 contact graph
34 core
35 dense hydrophobic core
36 design problem
37 different reductions
38 early proof
39 energy minima
40 function Φ.
41 geometric representation
42 global energy minimum
43 global minimum
44 graph
45 hydrophobic core
46 hydrophobic residues
47 largest clique
48 lattice
49 minimum
50 model
51 number
52 one
53 paper
54 performance ratio
55 polynomial time
56 polynomial time approximation scheme
57 polynomial-time approximation algorithm
58 problem
59 proof
60 protein structure
61 ratio
62 reduction
63 representation
64 residues
65 results
66 same problem
67 scheme
68 sequence
69 sequence S
70 sequence design problem
71 slices
72 solution
73 solvent-exposed hydrophobic residues
74 structure
75 substrings
76 target protein structure
77 target structure
78 technique
79 terms
80 time
81 way
82 schema:name The Protein Sequence Design Problem in Canonical Model on 2D and 3D Lattices
83 schema:pagination 244-253
84 schema:productId N3ac175a0d977409b9ed16b91bea41ec4
85 N8b0ef5dc59f54e75b5a992dfd72c08c5
86 schema:publisher N1f179ea3d9864633a4d75e8a4e6f3b77
87 schema:sameAs https://app.dimensions.ai/details/publication/pub.1047164753
88 https://doi.org/10.1007/978-3-540-27801-6_18
89 schema:sdDatePublished 2022-05-20T07:41
91 schema:sdPublisher Ncf53d2ac94714d78a48e00fe4482cf5a
92 schema:url https://doi.org/10.1007/978-3-540-27801-6_18
94 sgo:sdDataset chapters
95 rdf:type schema:Chapter
96 N1f179ea3d9864633a4d75e8a4e6f3b77 schema:name Springer Nature
97 rdf:type schema:Organisation
98 N21b17b9c32e4484f9c37d4bf61fca694 rdf:first sg:person.012465675663.27
99 rdf:rest Nb462a49c416742398b3dc5536fc04c6d
100 N2249d8bee51f4cf78e431e0ecf425f58 schema:familyName Dogrusoz
101 schema:givenName Ugur
102 rdf:type schema:Person
103 N2ec9bf7a105d4ab8a887534f94ebd2d3 schema:familyName Sahinalp
104 schema:givenName Suleyman Cenk
105 rdf:type schema:Person
106 N3ac175a0d977409b9ed16b91bea41ec4 schema:name doi
107 schema:value 10.1007/978-3-540-27801-6_18
108 rdf:type schema:PropertyValue
109 N3fd58a9e32154596a9b9522e8a6af66f schema:affiliation grid-institutes:grid.185648.6
110 schema:familyName Zhang
111 schema:givenName Yi
112 rdf:type schema:Person
113 N4d84abf633ac4978aac01a010741f7ed rdf:first sg:person.01163451614.72
114 rdf:rest N21b17b9c32e4484f9c37d4bf61fca694
115 N5561e79d08ea404896873d5540a248d1 rdf:first sg:person.01274506210.27
116 rdf:rest Na2776f615d954591bba8b70b64968100
117 N6a93a0e86178477d9b7899db0f1da341 rdf:first sg:person.013521032647.30
118 rdf:rest N4d84abf633ac4978aac01a010741f7ed
119 N7b1f470b1dd1455080251a3ba87571ce rdf:first N2ec9bf7a105d4ab8a887534f94ebd2d3
120 rdf:rest Nc1c25f6d717949dc81acd3e27e22bbb9
121 N8b0ef5dc59f54e75b5a992dfd72c08c5 schema:name dimensions_id
122 schema:value pub.1047164753
123 rdf:type schema:PropertyValue
124 Na2776f615d954591bba8b70b64968100 rdf:first sg:person.0763403270.10
125 rdf:rest N6a93a0e86178477d9b7899db0f1da341
126 Na59250ebe17b49b7b3c160fb596ae7bf schema:familyName Muthukrishnan
127 schema:givenName S.
128 rdf:type schema:Person
129 Nb462a49c416742398b3dc5536fc04c6d rdf:first N3fd58a9e32154596a9b9522e8a6af66f
130 rdf:rest rdf:nil
131 Nbd2a566afecd4afcb2f913076edf6817 rdf:first N2249d8bee51f4cf78e431e0ecf425f58
132 rdf:rest rdf:nil
133 Nc1c25f6d717949dc81acd3e27e22bbb9 rdf:first Na59250ebe17b49b7b3c160fb596ae7bf
134 rdf:rest Nbd2a566afecd4afcb2f913076edf6817
135 Ncf53d2ac94714d78a48e00fe4482cf5a schema:name Springer Nature - SN SciGraph project
136 rdf:type schema:Organization
137 Nf2873c208e314e3b87a7a95ac9b110d2 schema:isbn 978-3-540-22341-2
138 978-3-540-27801-6
139 schema:name Combinatorial Pattern Matching
140 rdf:type schema:Book
141 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
142 schema:name Information and Computing Sciences
143 rdf:type schema:DefinedTerm
144 anzsrc-for:0802 schema:inDefinedTermSet anzsrc-for:
145 schema:name Computation Theory and Mathematics
146 rdf:type schema:DefinedTerm
147 sg:person.01163451614.72 schema:affiliation grid-institutes:grid.185648.6
148 schema:familyName Sloan
149 schema:givenName Robert
150 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01163451614.72
151 rdf:type schema:Person
152 sg:person.012465675663.27 schema:affiliation grid-institutes:grid.185648.6
153 schema:familyName Turán
154 schema:givenName György
155 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012465675663.27
156 rdf:type schema:Person
157 sg:person.01274506210.27 schema:affiliation grid-institutes:grid.29857.31
158 schema:familyName Berman
159 schema:givenName Piotr
160 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01274506210.27
161 rdf:type schema:Person
162 sg:person.013521032647.30 schema:affiliation grid-institutes:grid.185648.6
163 schema:familyName Mubayi
164 schema:givenName Dhruv
165 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013521032647.30
166 rdf:type schema:Person
167 sg:person.0763403270.10 schema:affiliation grid-institutes:grid.185648.6
168 schema:familyName DasGupta
170 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0763403270.10
171 rdf:type schema:Person
172 grid-institutes:grid.185648.6 schema:alternateName Department of Computer Science, University of Illinois at Chicago, 60607-7053, Chicago, IL, USA
173 Department of Mathematics, Statistics & Computer Science, University of Illinois at Chicago, 60607-7045, Chicago, IL, USA
174 schema:name Department of Computer Science, University of Illinois at Chicago, 60607-7053, Chicago, IL, USA
175 Department of Mathematics, Statistics & Computer Science, University of Illinois at Chicago, 60607-7045, Chicago, IL, USA
176 rdf:type schema:Organization
177 grid-institutes:grid.29857.31 schema:alternateName Department of Computer Science and Engineering, Pennsylvania State University, 16802, University Park, PA, USA
178 schema:name Department of Computer Science and Engineering, Pennsylvania State University, 16802, University Park, PA, USA
179 rdf:type schema:Organization