InChIKey collision resistance: an experimental testing View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2012-12

AUTHORS

Igor Pletnev, Andrey Erin, Alan McNaught, Kirill Blinov, Dmitrii Tchekhovskoi, Steve Heller

ABSTRACT

InChIKey is a 27-character compacted (hashed) version of InChI which is intended for Internet and database searching/indexing and is based on an SHA-256 hash of the InChI character string. The first block of InChIKey encodes molecular skeleton while the second block represents various kinds of isomerism (stereo, tautomeric, etc.). InChIKey is designed to be a nearly unique substitute for the parent InChI. However, a single InChIKey may occasionally map to two or more InChI strings (collision). The appearance of collision itself does not compromise the signature as collision-free hashing is impossible; the only viable approach is to set and keep a reasonable level of collision resistance which is sufficient for typical applications.We tested, in computational experiments, how well the real-life InChIKey collision resistance corresponds to the theoretical estimates expected by design. For this purpose, we analyzed the statistical characteristics of InChIKey for datasets of variable size in comparison to the theoretical statistical frequencies. For the relatively short second block, an exhaustive direct testing was performed. We computed and compared to theory the numbers of collisions for the stereoisomers of Spongistatin I (using the whole set of 67,108,864 isomers and its subsets). For the longer first block, we generated, using custom-made software, InChIKeys for more than 3 × 1010 chemical structures. The statistical behavior of this block was tested by comparison of experimental and theoretical frequencies for the various four-letter sequences which may appear in the first block body.From the results of our computational experiments we conclude that the observed characteristics of InChIKey collision resistance are in good agreement with theoretical expectations. More... »

PAGES

39

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/1758-2946-4-39

DOI

http://dx.doi.org/10.1186/1758-2946-4-39

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1037214563

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/23256896


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0104", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Statistics", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/01", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Mathematical Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Moscow State University", 
          "id": "https://www.grid.ac/institutes/grid.14476.30", 
          "name": [
            "Department of Chemistry, Lomonosov Moscow State University, 119991, Moscow, Russia", 
            "InChI Trust, c/o FIZ CHEMIE Franklinstrasse 11, 10587, Berlin, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Pletnev", 
        "givenName": "Igor", 
        "id": "sg:person.015776151425.89", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015776151425.89"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Diabetes Canada", 
          "id": "https://www.grid.ac/institutes/grid.453237.4", 
          "name": [
            "Advanced Chemistry Development, Inc. (ACD/Labs), 8 King Street East, Suite 107, M5C 1B5, Toronto, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Erin", 
        "givenName": "Andrey", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "name": [
            "InChI Trust, c/o FIZ CHEMIE Franklinstrasse 11, 10587, Berlin, Germany"
          ], 
          "type": "Organization"
        }, 
        "familyName": "McNaught", 
        "givenName": "Alan", 
        "id": "sg:person.0637122264.59", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0637122264.59"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Diabetes Canada", 
          "id": "https://www.grid.ac/institutes/grid.453237.4", 
          "name": [
            "Advanced Chemistry Development, Inc. (ACD/Labs), 8 King Street East, Suite 107, M5C 1B5, Toronto, Canada"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Blinov", 
        "givenName": "Kirill", 
        "id": "sg:person.01300025613.44", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01300025613.44"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "National Institute of Standards and Technology", 
          "id": "https://www.grid.ac/institutes/grid.94225.38", 
          "name": [
            "Biomolecular Measurement Division, National Institute of Standards and Technology, 20899-8362, Gaithersburg, MD, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Tchekhovskoi", 
        "givenName": "Dmitrii", 
        "id": "sg:person.0753350664.20", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0753350664.20"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "National Institute of Standards and Technology", 
          "id": "https://www.grid.ac/institutes/grid.94225.38", 
          "name": [
            "Biomolecular Measurement Division, National Institute of Standards and Technology, 20899-8362, Gaithersburg, MD, USA"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Heller", 
        "givenName": "Steve", 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1016/s1381-141x(99)00002-7", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1005376902"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1021/cr050559n", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1053824598"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1021/cr050559n", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1053824598"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1021/ci0341060", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1055401645"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1021/ci0341060", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1055401645"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2012-12", 
    "datePublishedReg": "2012-12-01", 
    "description": " InChIKey is a 27-character compacted (hashed) version of InChI which is intended for Internet and database searching/indexing and is based on an SHA-256 hash of the InChI character string. The first block of InChIKey encodes molecular skeleton while the second block represents various kinds of isomerism (stereo, tautomeric, etc.). InChIKey is designed to be a nearly unique substitute for the parent InChI. However, a single InChIKey may occasionally map to two or more InChI strings (collision). The appearance of collision itself does not compromise the signature as collision-free hashing is impossible; the only viable approach is to set and keep a reasonable level of collision resistance which is sufficient for typical applications.We tested, in computational experiments, how well the real-life InChIKey collision resistance corresponds to the theoretical estimates expected by design. For this purpose, we analyzed the statistical characteristics of InChIKey for datasets of variable size in comparison to the theoretical statistical frequencies. For the relatively short second block, an exhaustive direct testing was performed. We computed and compared to theory the numbers of collisions for the stereoisomers of Spongistatin I (using the whole set of 67,108,864 isomers and its subsets). For the longer first block, we generated, using custom-made software, InChIKeys for more than 3\u2009\u00d7\u20091010 chemical structures. The statistical behavior of this block was tested by comparison of experimental and theoretical frequencies for the various four-letter sequences which may appear in the first block body.From the results of our computational experiments we conclude that the observed characteristics of InChIKey collision resistance are in good agreement with theoretical expectations.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1186/1758-2946-4-39", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1042252", 
        "issn": [
          "1758-2946"
        ], 
        "name": "Journal of Cheminformatics", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "4"
      }
    ], 
    "name": "InChIKey collision resistance: an experimental testing", 
    "pagination": "39", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "b146bb6fc3cf3c5589580049ec65830f088cdcfc4ba0bfa0e099008f80e6d333"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "23256896"
        ]
      }, 
      {
        "name": "nlm_unique_id", 
        "type": "PropertyValue", 
        "value": [
          "101516718"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/1758-2946-4-39"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1037214563"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/1758-2946-4-39", 
      "https://app.dimensions.ai/details/publication/pub.1037214563"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-10T15:01", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8663_00000514.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "http://link.springer.com/10.1186%2F1758-2946-4-39"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/1758-2946-4-39'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/1758-2946-4-39'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/1758-2946-4-39'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/1758-2946-4-39'


 

This table displays all metadata directly associated to this object as RDF triples.

119 TRIPLES      21 PREDICATES      32 URIs      21 LITERALS      9 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/1758-2946-4-39 schema:about anzsrc-for:01
2 anzsrc-for:0104
3 schema:author N9b34c948d42c4467bca77bf3ca4abb73
4 schema:citation https://doi.org/10.1016/s1381-141x(99)00002-7
5 https://doi.org/10.1021/ci0341060
6 https://doi.org/10.1021/cr050559n
7 schema:datePublished 2012-12
8 schema:datePublishedReg 2012-12-01
9 schema:description InChIKey is a 27-character compacted (hashed) version of InChI which is intended for Internet and database searching/indexing and is based on an SHA-256 hash of the InChI character string. The first block of InChIKey encodes molecular skeleton while the second block represents various kinds of isomerism (stereo, tautomeric, etc.). InChIKey is designed to be a nearly unique substitute for the parent InChI. However, a single InChIKey may occasionally map to two or more InChI strings (collision). The appearance of collision itself does not compromise the signature as collision-free hashing is impossible; the only viable approach is to set and keep a reasonable level of collision resistance which is sufficient for typical applications.We tested, in computational experiments, how well the real-life InChIKey collision resistance corresponds to the theoretical estimates expected by design. For this purpose, we analyzed the statistical characteristics of InChIKey for datasets of variable size in comparison to the theoretical statistical frequencies. For the relatively short second block, an exhaustive direct testing was performed. We computed and compared to theory the numbers of collisions for the stereoisomers of Spongistatin I (using the whole set of 67,108,864 isomers and its subsets). For the longer first block, we generated, using custom-made software, InChIKeys for more than 3 × 1010 chemical structures. The statistical behavior of this block was tested by comparison of experimental and theoretical frequencies for the various four-letter sequences which may appear in the first block body.From the results of our computational experiments we conclude that the observed characteristics of InChIKey collision resistance are in good agreement with theoretical expectations.
10 schema:genre research_article
11 schema:inLanguage en
12 schema:isAccessibleForFree true
13 schema:isPartOf N2c0b000e82004780a4fc8023a3c9fa12
14 N730443a6db704be89625b09b3a771f6f
15 sg:journal.1042252
16 schema:name InChIKey collision resistance: an experimental testing
17 schema:pagination 39
18 schema:productId N15e668d4aa2b4d0b8d74b58d80de601f
19 N454a39e452ce481bbb888e993aa3225f
20 N7542470b07044c8ba8902a7ede386a60
21 Nb807e4e7e225498f820f00129291c376
22 Nf07367a28106492bb16d86e78c729900
23 schema:sameAs https://app.dimensions.ai/details/publication/pub.1037214563
24 https://doi.org/10.1186/1758-2946-4-39
25 schema:sdDatePublished 2019-04-10T15:01
26 schema:sdLicense https://scigraph.springernature.com/explorer/license/
27 schema:sdPublisher Nbc6a7d309a0e4b2f8a83df6c99ffe657
28 schema:url http://link.springer.com/10.1186%2F1758-2946-4-39
29 sgo:license sg:explorer/license/
30 sgo:sdDataset articles
31 rdf:type schema:ScholarlyArticle
32 N07cfe5d0d51e4388883865864643fd52 schema:affiliation https://www.grid.ac/institutes/grid.94225.38
33 schema:familyName Heller
34 schema:givenName Steve
35 rdf:type schema:Person
36 N15e668d4aa2b4d0b8d74b58d80de601f schema:name readcube_id
37 schema:value b146bb6fc3cf3c5589580049ec65830f088cdcfc4ba0bfa0e099008f80e6d333
38 rdf:type schema:PropertyValue
39 N1a7acd6f5bc746a5ad5233b86646ff13 rdf:first sg:person.0637122264.59
40 rdf:rest Ne6100e93d70841e69fd01c210a56fe3d
41 N2c0b000e82004780a4fc8023a3c9fa12 schema:issueNumber 1
42 rdf:type schema:PublicationIssue
43 N4415f13a1e854181bc94012b8a08fa58 schema:name InChI Trust, c/o FIZ CHEMIE Franklinstrasse 11, 10587, Berlin, Germany
44 rdf:type schema:Organization
45 N454a39e452ce481bbb888e993aa3225f schema:name nlm_unique_id
46 schema:value 101516718
47 rdf:type schema:PropertyValue
48 N730443a6db704be89625b09b3a771f6f schema:volumeNumber 4
49 rdf:type schema:PublicationVolume
50 N7542470b07044c8ba8902a7ede386a60 schema:name doi
51 schema:value 10.1186/1758-2946-4-39
52 rdf:type schema:PropertyValue
53 N8e96b686094d4971a5f0d561c2a0c7c9 rdf:first N9273617e2afc4e0db647f4a50cc498db
54 rdf:rest N1a7acd6f5bc746a5ad5233b86646ff13
55 N9273617e2afc4e0db647f4a50cc498db schema:affiliation https://www.grid.ac/institutes/grid.453237.4
56 schema:familyName Erin
57 schema:givenName Andrey
58 rdf:type schema:Person
59 N9b34c948d42c4467bca77bf3ca4abb73 rdf:first sg:person.015776151425.89
60 rdf:rest N8e96b686094d4971a5f0d561c2a0c7c9
61 Nb807e4e7e225498f820f00129291c376 schema:name pubmed_id
62 schema:value 23256896
63 rdf:type schema:PropertyValue
64 Nbc6a7d309a0e4b2f8a83df6c99ffe657 schema:name Springer Nature - SN SciGraph project
65 rdf:type schema:Organization
66 Nc2a7a83bcf28437bb7ba03da8eca84ef rdf:first sg:person.0753350664.20
67 rdf:rest Nffeff4cba85b488cb7938f505fdaaae3
68 Ne6100e93d70841e69fd01c210a56fe3d rdf:first sg:person.01300025613.44
69 rdf:rest Nc2a7a83bcf28437bb7ba03da8eca84ef
70 Nf07367a28106492bb16d86e78c729900 schema:name dimensions_id
71 schema:value pub.1037214563
72 rdf:type schema:PropertyValue
73 Nffeff4cba85b488cb7938f505fdaaae3 rdf:first N07cfe5d0d51e4388883865864643fd52
74 rdf:rest rdf:nil
75 anzsrc-for:01 schema:inDefinedTermSet anzsrc-for:
76 schema:name Mathematical Sciences
77 rdf:type schema:DefinedTerm
78 anzsrc-for:0104 schema:inDefinedTermSet anzsrc-for:
79 schema:name Statistics
80 rdf:type schema:DefinedTerm
81 sg:journal.1042252 schema:issn 1758-2946
82 schema:name Journal of Cheminformatics
83 rdf:type schema:Periodical
84 sg:person.01300025613.44 schema:affiliation https://www.grid.ac/institutes/grid.453237.4
85 schema:familyName Blinov
86 schema:givenName Kirill
87 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01300025613.44
88 rdf:type schema:Person
89 sg:person.015776151425.89 schema:affiliation https://www.grid.ac/institutes/grid.14476.30
90 schema:familyName Pletnev
91 schema:givenName Igor
92 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015776151425.89
93 rdf:type schema:Person
94 sg:person.0637122264.59 schema:affiliation N4415f13a1e854181bc94012b8a08fa58
95 schema:familyName McNaught
96 schema:givenName Alan
97 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0637122264.59
98 rdf:type schema:Person
99 sg:person.0753350664.20 schema:affiliation https://www.grid.ac/institutes/grid.94225.38
100 schema:familyName Tchekhovskoi
101 schema:givenName Dmitrii
102 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0753350664.20
103 rdf:type schema:Person
104 https://doi.org/10.1016/s1381-141x(99)00002-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1005376902
105 rdf:type schema:CreativeWork
106 https://doi.org/10.1021/ci0341060 schema:sameAs https://app.dimensions.ai/details/publication/pub.1055401645
107 rdf:type schema:CreativeWork
108 https://doi.org/10.1021/cr050559n schema:sameAs https://app.dimensions.ai/details/publication/pub.1053824598
109 rdf:type schema:CreativeWork
110 https://www.grid.ac/institutes/grid.14476.30 schema:alternateName Moscow State University
111 schema:name Department of Chemistry, Lomonosov Moscow State University, 119991, Moscow, Russia
112 InChI Trust, c/o FIZ CHEMIE Franklinstrasse 11, 10587, Berlin, Germany
113 rdf:type schema:Organization
114 https://www.grid.ac/institutes/grid.453237.4 schema:alternateName Diabetes Canada
115 schema:name Advanced Chemistry Development, Inc. (ACD/Labs), 8 King Street East, Suite 107, M5C 1B5, Toronto, Canada
116 rdf:type schema:Organization
117 https://www.grid.ac/institutes/grid.94225.38 schema:alternateName National Institute of Standards and Technology
118 schema:name Biomolecular Measurement Division, National Institute of Standards and Technology, 20899-8362, Gaithersburg, MD, USA
119 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...