InChIKey collision resistance: an experimental testing View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2012-12

AUTHORS

Igor Pletnev, Andrey Erin, Alan McNaught, Kirill Blinov, Dmitrii Tchekhovskoi, Steve Heller

ABSTRACT

InChIKey is a 27-character compacted (hashed) version of InChI which is intended for Internet and database searching/indexing and is based on an SHA-256 hash of the InChI character string. The first block of InChIKey encodes molecular skeleton while the second block represents various kinds of isomerism (stereo, tautomeric, etc.). InChIKey is designed to be a nearly unique substitute for the parent InChI. However, a single InChIKey may occasionally map to two or more InChI strings (collision). The appearance of collision itself does not compromise the signature as collision-free hashing is impossible; the only viable approach is to set and keep a reasonable level of collision resistance which is sufficient for typical applications.We tested, in computational experiments, how well the real-life InChIKey collision resistance corresponds to the theoretical estimates expected by design. For this purpose, we analyzed the statistical characteristics of InChIKey for datasets of variable size in comparison to the theoretical statistical frequencies. For the relatively short second block, an exhaustive direct testing was performed. We computed and compared to theory the numbers of collisions for the stereoisomers of Spongistatin I (using the whole set of 67,108,864 isomers and its subsets). For the longer first block, we generated, using custom-made software, InChIKeys for more than 3 × 1010 chemical structures. The statistical behavior of this block was tested by comparison of experimental and theoretical frequencies for the various four-letter sequences which may appear in the first block body.From the results of our computational experiments we conclude that the observed characteristics of InChIKey collision resistance are in good agreement with theoretical expectations. More... »

PAGES

39

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/1758-2946-4-39

DOI

http://dx.doi.org/10.1186/1758-2946-4-39

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1037214563

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/23256896


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0104", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Statistics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/01", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Mathematical Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Moscow State University", 
          "id": "https://www.grid.ac/institutes/grid.14476.30", 
          "name": [
            "Department of Chemistry, Lomonosov Moscow State University, 119991, Moscow, Russia", 
            "InChI Trust, c/o FIZ CHEMIE Franklinstrasse 11, 10587, Berlin, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Pletnev", 
        "givenName": "Igor", 
        "id": "sg:person.015776151425.89", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015776151425.89"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Diabetes Canada", 
          "id": "https://www.grid.ac/institutes/grid.453237.4", 
          "name": [
            "Advanced Chemistry Development, Inc. (ACD/Labs), 8 King Street East, Suite 107, M5C 1B5, Toronto, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Erin", 
        "givenName": "Andrey", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "name": [
            "InChI Trust, c/o FIZ CHEMIE Franklinstrasse 11, 10587, Berlin, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "McNaught", 
        "givenName": "Alan", 
        "id": "sg:person.0637122264.59", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0637122264.59"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Diabetes Canada", 
          "id": "https://www.grid.ac/institutes/grid.453237.4", 
          "name": [
            "Advanced Chemistry Development, Inc. (ACD/Labs), 8 King Street East, Suite 107, M5C 1B5, Toronto, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Blinov", 
        "givenName": "Kirill", 
        "id": "sg:person.01300025613.44", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01300025613.44"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "National Institute of Standards and Technology", 
          "id": "https://www.grid.ac/institutes/grid.94225.38", 
          "name": [
            "Biomolecular Measurement Division, National Institute of Standards and Technology, 20899-8362, Gaithersburg, MD, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Tchekhovskoi", 
        "givenName": "Dmitrii", 
        "id": "sg:person.0753350664.20", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0753350664.20"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "National Institute of Standards and Technology", 
          "id": "https://www.grid.ac/institutes/grid.94225.38", 
          "name": [
            "Biomolecular Measurement Division, National Institute of Standards and Technology, 20899-8362, Gaithersburg, MD, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Heller", 
        "givenName": "Steve", 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1016/s1381-141x(99)00002-7", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1005376902"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1021/cr050559n", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1053824598"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1021/cr050559n", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1053824598"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1021/ci0341060", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1055401645"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1021/ci0341060", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1055401645"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2012-12", 
    "datePublishedReg": "2012-12-01", 
    "description": " InChIKey is a 27-character compacted (hashed) version of InChI which is intended for Internet and database searching/indexing and is based on an SHA-256 hash of the InChI character string. The first block of InChIKey encodes molecular skeleton while the second block represents various kinds of isomerism (stereo, tautomeric, etc.). InChIKey is designed to be a nearly unique substitute for the parent InChI. However, a single InChIKey may occasionally map to two or more InChI strings (collision). The appearance of collision itself does not compromise the signature as collision-free hashing is impossible; the only viable approach is to set and keep a reasonable level of collision resistance which is sufficient for typical applications.We tested, in computational experiments, how well the real-life InChIKey collision resistance corresponds to the theoretical estimates expected by design. For this purpose, we analyzed the statistical characteristics of InChIKey for datasets of variable size in comparison to the theoretical statistical frequencies. For the relatively short second block, an exhaustive direct testing was performed. We computed and compared to theory the numbers of collisions for the stereoisomers of Spongistatin I (using the whole set of 67,108,864 isomers and its subsets). For the longer first block, we generated, using custom-made software, InChIKeys for more than 3\u2009\u00d7\u20091010 chemical structures. The statistical behavior of this block was tested by comparison of experimental and theoretical frequencies for the various four-letter sequences which may appear in the first block body.From the results of our computational experiments we conclude that the observed characteristics of InChIKey collision resistance are in good agreement with theoretical expectations.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1186/1758-2946-4-39", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1042252", 
        "issn": [
          "1758-2946"
        ], 
        "name": "Journal of Cheminformatics", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "4"
      }
    ], 
    "name": "InChIKey collision resistance: an experimental testing", 
    "pagination": "39", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "b146bb6fc3cf3c5589580049ec65830f088cdcfc4ba0bfa0e099008f80e6d333"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "23256896"
        ]
      }, 
      {
        "name": "nlm_unique_id", 
        "type": "PropertyValue", 
        "value": [
          "101516718"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/1758-2946-4-39"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1037214563"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/1758-2946-4-39", 
      "https://app.dimensions.ai/details/publication/pub.1037214563"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-10T15:01", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8663_00000514.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "http://link.springer.com/10.1186%2F1758-2946-4-39"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/1758-2946-4-39'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/1758-2946-4-39'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/1758-2946-4-39'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/1758-2946-4-39'


 

This table displays all metadata directly associated to this object as RDF triples.

119 TRIPLES      21 PREDICATES      32 URIs      21 LITERALS      9 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/1758-2946-4-39 schema:about anzsrc-for:01
2 anzsrc-for:0104
3 schema:author N388ec809f4334c12b4b68d3a05315eb9
4 schema:citation https://doi.org/10.1016/s1381-141x(99)00002-7
5 https://doi.org/10.1021/ci0341060
6 https://doi.org/10.1021/cr050559n
7 schema:datePublished 2012-12
8 schema:datePublishedReg 2012-12-01
9 schema:description InChIKey is a 27-character compacted (hashed) version of InChI which is intended for Internet and database searching/indexing and is based on an SHA-256 hash of the InChI character string. The first block of InChIKey encodes molecular skeleton while the second block represents various kinds of isomerism (stereo, tautomeric, etc.). InChIKey is designed to be a nearly unique substitute for the parent InChI. However, a single InChIKey may occasionally map to two or more InChI strings (collision). The appearance of collision itself does not compromise the signature as collision-free hashing is impossible; the only viable approach is to set and keep a reasonable level of collision resistance which is sufficient for typical applications.We tested, in computational experiments, how well the real-life InChIKey collision resistance corresponds to the theoretical estimates expected by design. For this purpose, we analyzed the statistical characteristics of InChIKey for datasets of variable size in comparison to the theoretical statistical frequencies. For the relatively short second block, an exhaustive direct testing was performed. We computed and compared to theory the numbers of collisions for the stereoisomers of Spongistatin I (using the whole set of 67,108,864 isomers and its subsets). For the longer first block, we generated, using custom-made software, InChIKeys for more than 3 × 1010 chemical structures. The statistical behavior of this block was tested by comparison of experimental and theoretical frequencies for the various four-letter sequences which may appear in the first block body.From the results of our computational experiments we conclude that the observed characteristics of InChIKey collision resistance are in good agreement with theoretical expectations.
10 schema:genre research_article
11 schema:inLanguage en
12 schema:isAccessibleForFree true
13 schema:isPartOf N4e231125c35f4f33944c12b557591dfb
14 Ncd4de446e8f24a9aac372b95b73645c0
15 sg:journal.1042252
16 schema:name InChIKey collision resistance: an experimental testing
17 schema:pagination 39
18 schema:productId N280254063fd74416b88752b203c2a239
19 N33e42f3350b04386a72e64a430280a82
20 N8aeb042927c74cdba56ee2331e2ffa14
21 Ncc467773777449da8c68dd4d91f48ec0
22 Nd28a81ef9cbf4065a8e2bbbc89311489
23 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037214563
24 https://doi.org/10.1186/1758-2946-4-39
25 schema:sdDatePublished 2019-04-10T15:01
26 schema:sdLicense https://scigraph.springernature.com/explorer/license/
27 schema:sdPublisher N647b96c000794078afd5c2c21ede0ccb
28 schema:url http://link.springer.com/10.1186%2F1758-2946-4-39
29 sgo:license sg:explorer/license/
30 sgo:sdDataset articles
31 rdf:type schema:ScholarlyArticle
32 N0d6e4da8cebd4eae88d9a4271620dc46 rdf:first N4102e3884b5b4c7dacf188e3f6cb3f86
33 rdf:rest rdf:nil
34 N280254063fd74416b88752b203c2a239 schema:name readcube_id
35 schema:value b146bb6fc3cf3c5589580049ec65830f088cdcfc4ba0bfa0e099008f80e6d333
36 rdf:type schema:PropertyValue
37 N33e42f3350b04386a72e64a430280a82 schema:name dimensions_id
38 schema:value pub.1037214563
39 rdf:type schema:PropertyValue
40 N388ec809f4334c12b4b68d3a05315eb9 rdf:first sg:person.015776151425.89
41 rdf:rest Nc1378e16e4ed47b38435f00c097fe835
42 N4102e3884b5b4c7dacf188e3f6cb3f86 schema:affiliation https://www.grid.ac/institutes/grid.94225.38
43 schema:familyName Heller
44 schema:givenName Steve
45 rdf:type schema:Person
46 N43ed2fc04ccc436798d6f88f1181e038 rdf:first sg:person.0753350664.20
47 rdf:rest N0d6e4da8cebd4eae88d9a4271620dc46
48 N4868a12ba3524e7d971b6eabd3084098 schema:name InChI Trust, c/o FIZ CHEMIE Franklinstrasse 11, 10587, Berlin, Germany
49 rdf:type schema:Organization
50 N4ab0f83e3bcb4a0a8fe05ca19eaaa608 rdf:first sg:person.0637122264.59
51 rdf:rest Ncd22e1879bed45dc8546d1bd18b414cd
52 N4e231125c35f4f33944c12b557591dfb schema:volumeNumber 4
53 rdf:type schema:PublicationVolume
54 N5509549c78bb4ab499fb3f17c669399b schema:affiliation https://www.grid.ac/institutes/grid.453237.4
55 schema:familyName Erin
56 schema:givenName Andrey
57 rdf:type schema:Person
58 N647b96c000794078afd5c2c21ede0ccb schema:name Springer Nature - SN SciGraph project
59 rdf:type schema:Organization
60 N8aeb042927c74cdba56ee2331e2ffa14 schema:name pubmed_id
61 schema:value 23256896
62 rdf:type schema:PropertyValue
63 Nc1378e16e4ed47b38435f00c097fe835 rdf:first N5509549c78bb4ab499fb3f17c669399b
64 rdf:rest N4ab0f83e3bcb4a0a8fe05ca19eaaa608
65 Ncc467773777449da8c68dd4d91f48ec0 schema:name doi
66 schema:value 10.1186/1758-2946-4-39
67 rdf:type schema:PropertyValue
68 Ncd22e1879bed45dc8546d1bd18b414cd rdf:first sg:person.01300025613.44
69 rdf:rest N43ed2fc04ccc436798d6f88f1181e038
70 Ncd4de446e8f24a9aac372b95b73645c0 schema:issueNumber 1
71 rdf:type schema:PublicationIssue
72 Nd28a81ef9cbf4065a8e2bbbc89311489 schema:name nlm_unique_id
73 schema:value 101516718
74 rdf:type schema:PropertyValue
75 anzsrc-for:01 schema:inDefinedTermSet anzsrc-for:
76 schema:name Mathematical Sciences
77 rdf:type schema:DefinedTerm
78 anzsrc-for:0104 schema:inDefinedTermSet anzsrc-for:
79 schema:name Statistics
80 rdf:type schema:DefinedTerm
81 sg:journal.1042252 schema:issn 1758-2946
82 schema:name Journal of Cheminformatics
83 rdf:type schema:Periodical
84 sg:person.01300025613.44 schema:affiliation https://www.grid.ac/institutes/grid.453237.4
85 schema:familyName Blinov
86 schema:givenName Kirill
87 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01300025613.44
88 rdf:type schema:Person
89 sg:person.015776151425.89 schema:affiliation https://www.grid.ac/institutes/grid.14476.30
90 schema:familyName Pletnev
91 schema:givenName Igor
92 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015776151425.89
93 rdf:type schema:Person
94 sg:person.0637122264.59 schema:affiliation N4868a12ba3524e7d971b6eabd3084098
95 schema:familyName McNaught
96 schema:givenName Alan
97 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0637122264.59
98 rdf:type schema:Person
99 sg:person.0753350664.20 schema:affiliation https://www.grid.ac/institutes/grid.94225.38
100 schema:familyName Tchekhovskoi
101 schema:givenName Dmitrii
102 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0753350664.20
103 rdf:type schema:Person
104 https://doi.org/10.1016/s1381-141x(99)00002-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1005376902
105 rdf:type schema:CreativeWork
106 https://doi.org/10.1021/ci0341060 schema:sameAs https://app.dimensions.ai/details/publication/pub.1055401645
107 rdf:type schema:CreativeWork
108 https://doi.org/10.1021/cr050559n schema:sameAs https://app.dimensions.ai/details/publication/pub.1053824598
109 rdf:type schema:CreativeWork
110 https://www.grid.ac/institutes/grid.14476.30 schema:alternateName Moscow State University
111 schema:name Department of Chemistry, Lomonosov Moscow State University, 119991, Moscow, Russia
112 InChI Trust, c/o FIZ CHEMIE Franklinstrasse 11, 10587, Berlin, Germany
113 rdf:type schema:Organization
114 https://www.grid.ac/institutes/grid.453237.4 schema:alternateName Diabetes Canada
115 schema:name Advanced Chemistry Development, Inc. (ACD/Labs), 8 King Street East, Suite 107, M5C 1B5, Toronto, Canada
116 rdf:type schema:Organization
117 https://www.grid.ac/institutes/grid.94225.38 schema:alternateName National Institute of Standards and Technology
118 schema:name Biomolecular Measurement Division, National Institute of Standards and Technology, 20899-8362, Gaithersburg, MD, USA
119 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...