Sketching algorithms for genomic data analysis and querying in a secure enclave View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2020-03-04

AUTHORS

Can Kockan, Kaiyuan Zhu, Natnatee Dokmai, Nikolai Karpov, M. Oguzhan Kulekci, David P. Woodruff, S. Cenk Sahinalp

ABSTRACT

Genome-wide association studies (GWAS), especially on rare diseases, may necessitate exchange of sensitive genomic data between multiple institutions. Since genomic data sharing is often infeasible due to privacy concerns, cryptographic methods, such as secure multiparty computation (SMC) protocols, have been developed with the aim of offering privacy-preserving collaborative GWAS. Unfortunately, the computational overhead of these methods remain prohibitive for human-genome-scale data. Here we introduce SkSES (https://github.com/ndokmai/sgx-genome-variants-search), a hardware–software hybrid approach for privacy-preserving collaborative GWAS, which improves the running time of the most advanced cryptographic protocols by two orders of magnitude. The SkSES approach is based on trusted execution environments (TEEs) offered by current-generation microprocessors—in particular, Intel’s SGX. To overcome the severe memory limitation of the TEEs, SkSES employs novel ‘sketching’ algorithms that maintain essential statistical information on genomic variants in input VCF files. By additionally incorporating efficient data compression and population stratification reduction methods, SkSES identifies the top k genomic variants in a cohort quickly, accurately and in a privacy-preserving manner. More... »

PAGES

295-301

References to SciGraph publications

  • 2013-01-01. Streaming fragment assignment for real-time analysis of sequencing experiments in NATURE METHODS
  • 2014. Algorithms in HElib in ADVANCES IN CRYPTOLOGY – CRYPTO 2014
  • 2016-10-24. Comparison of high-throughput sequencing data compression tools in NATURE METHODS
  • 2018-10-11. iDASH secure genome analysis competition 2017 in BMC MEDICAL GENOMICS
  • 2014-03-21. Privacy-Preserving Processing of Raw Genomic Data in DATA PRIVACY MANAGEMENT AND AUTONOMOUS SPONTANEOUS SECURITY
  • 2015-12-21. Secure distributed genome analysis for GWAS and sequence comparison computation in BMC MEDICAL INFORMATICS AND DECISION MAKING
  • 2014-04-20. Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms in NATURE BIOTECHNOLOGY
  • 2017-03-06. Salmon provides fast and bias-aware quantification of transcript expression in NATURE METHODS
  • 2015-03-29. Private Computation on Encrypted Genomic Data in PROGRESS IN CRYPTOLOGY - LATINCRYPT 2014
  • 2002-06-25. Finding Frequent Items in Data Streams in AUTOMATA, LANGUAGES AND PROGRAMMING
  • 2014-10-30. DeeZ: reference-based compression by local assembly in NATURE METHODS
  • 2018-05-07. Secure genome-wide association analysis using multiparty computation in NATURE BIOTECHNOLOGY
  • 2006-07-23. Principal components analysis corrects for stratification in genome-wide association studies in NATURE GENETICS
  • 2013-12-17. Does family always matter? Public genomes and their effect on relatives in GENOME MEDICINE
  • 2014-01-29. Advantages and pitfalls in the application of mixed-model association methods in NATURE GENETICS
  • 2018-02-08. Optimal compressed representation of high throughput sequence data via light assembly in NATURE COMMUNICATIONS
  • 2016-01-14. Privacy-preserving genomic testing in the clinic: a model using HIV treatment in GENETICS IN MEDICINE
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1038/s41592-020-0761-8

    DOI

    http://dx.doi.org/10.1038/s41592-020-0761-8

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1125341870

    PUBMED

    https://www.ncbi.nlm.nih.gov/pubmed/32132732


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Biological Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Genetics", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Algorithms", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Computational Biology", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Genetic Variation", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Genome, Human", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Genome-Wide Association Study", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Genomics", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Genotype", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Humans", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Phenotype", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Polymorphism, Single Nucleotide", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Software", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA", 
              "id": "http://www.grid.ac/institutes/grid.48336.3a", 
              "name": [
                "Department of Computer Science, Indiana University, Bloomington, IN, USA", 
                "Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Kockan", 
            "givenName": "Can", 
            "id": "sg:person.013771640265.57", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013771640265.57"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA", 
              "id": "http://www.grid.ac/institutes/grid.48336.3a", 
              "name": [
                "Department of Computer Science, Indiana University, Bloomington, IN, USA", 
                "Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Zhu", 
            "givenName": "Kaiyuan", 
            "id": "sg:person.010635173454.00", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010635173454.00"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Department of Computer Science, Indiana University, Bloomington, IN, USA", 
              "id": "http://www.grid.ac/institutes/grid.411377.7", 
              "name": [
                "Department of Computer Science, Indiana University, Bloomington, IN, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Dokmai", 
            "givenName": "Natnatee", 
            "id": "sg:person.013001332340.02", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013001332340.02"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Department of Computer Science, Indiana University, Bloomington, IN, USA", 
              "id": "http://www.grid.ac/institutes/grid.411377.7", 
              "name": [
                "Department of Computer Science, Indiana University, Bloomington, IN, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Karpov", 
            "givenName": "Nikolai", 
            "id": "sg:person.013576712740.43", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013576712740.43"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Informatics Institute, Istanbul Technical University, Istanbul, Turkey", 
              "id": "http://www.grid.ac/institutes/grid.10516.33", 
              "name": [
                "Informatics Institute, Istanbul Technical University, Istanbul, Turkey"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Kulekci", 
            "givenName": "M. Oguzhan", 
            "id": "sg:person.011026615332.33", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011026615332.33"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA", 
              "id": "http://www.grid.ac/institutes/grid.147455.6", 
              "name": [
                "Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Woodruff", 
            "givenName": "David P.", 
            "id": "sg:person.012727410605.86", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012727410605.86"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA", 
              "id": "http://www.grid.ac/institutes/grid.48336.3a", 
              "name": [
                "Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Sahinalp", 
            "givenName": "S. Cenk", 
            "id": "sg:person.01132015666.77", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01132015666.77"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1186/gm511", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1050743128", 
              "https://doi.org/10.1186/gm511"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/3-540-45465-9_59", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1002330524", 
              "https://doi.org/10.1007/3-540-45465-9_59"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-642-54568-9_9", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1017707706", 
              "https://doi.org/10.1007/978-3-642-54568-9_9"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.4197", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1084129290", 
              "https://doi.org/10.1038/nmeth.4197"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/gim.2015.167", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1046300096", 
              "https://doi.org/10.1038/gim.2015.167"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/ng1847", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1031429813", 
              "https://doi.org/10.1038/ng1847"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/s41467-017-02480-6", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1100795961", 
              "https://doi.org/10.1038/s41467-017-02480-6"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nbt.4108", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1103841581", 
              "https://doi.org/10.1038/nbt.4108"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.3133", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1009051359", 
              "https://doi.org/10.1038/nmeth.3133"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nbt.2862", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1011219673", 
              "https://doi.org/10.1038/nbt.2862"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.4037", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1036940519", 
              "https://doi.org/10.1038/nmeth.4037"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/ng.2876", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1016276818", 
              "https://doi.org/10.1038/ng.2876"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/s12920-018-0396-0", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1107561301", 
              "https://doi.org/10.1186/s12920-018-0396-0"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-662-44371-2_31", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1003203986", 
              "https://doi.org/10.1007/978-3-662-44371-2_31"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1472-6947-15-s5-s4", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1038741423", 
              "https://doi.org/10.1186/1472-6947-15-s5-s4"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.2251", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1016190409", 
              "https://doi.org/10.1038/nmeth.2251"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/978-3-319-16295-9_1", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1015682925", 
              "https://doi.org/10.1007/978-3-319-16295-9_1"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2020-03-04", 
        "datePublishedReg": "2020-03-04", 
        "description": "Genome-wide association studies (GWAS), especially on rare diseases, may necessitate exchange of sensitive genomic data between multiple institutions. Since genomic data sharing is often infeasible due to privacy concerns, cryptographic methods, such as secure multiparty computation (SMC) protocols, have been developed with the aim of offering privacy-preserving collaborative GWAS. Unfortunately, the computational overhead of these methods remain prohibitive for human-genome-scale data. Here we introduce SkSES (https://github.com/ndokmai/sgx-genome-variants-search), a hardware\u2013software hybrid approach for privacy-preserving collaborative GWAS, which improves the running time of the most advanced cryptographic protocols by two orders of magnitude. The SkSES approach is based on trusted execution environments (TEEs) offered by current-generation microprocessors\u2014in particular, Intel\u2019s SGX. To overcome the severe memory limitation of the TEEs, SkSES employs novel \u2018sketching\u2019 algorithms that maintain essential statistical information on genomic variants in input VCF files. By additionally incorporating efficient data compression and population stratification reduction methods, SkSES identifies the top k genomic variants in a cohort quickly, accurately and in a privacy-preserving manner.", 
        "genre": "article", 
        "id": "sg:pub.10.1038/s41592-020-0761-8", 
        "isAccessibleForFree": true, 
        "isFundedItemOf": [
          {
            "id": "sg:grant.10017225", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.8555259", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.5541872", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.5544530", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.7912625", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.4108472", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.2522154", 
            "type": "MonetaryGrant"
          }, 
          {
            "id": "sg:grant.12920980", 
            "type": "MonetaryGrant"
          }
        ], 
        "isPartOf": [
          {
            "id": "sg:journal.1033763", 
            "issn": [
              "1548-7091", 
              "1548-7105"
            ], 
            "name": "Nature Methods", 
            "publisher": "Springer Nature", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "3", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "17"
          }
        ], 
        "keywords": [
          "secure multiparty computation protocol", 
          "privacy-preserving manner", 
          "sensitive genomic data", 
          "advanced cryptographic protocols", 
          "multiparty computation protocol", 
          "efficient data compression", 
          "severe memory limitations", 
          "essential statistical information", 
          "genomic data sharing", 
          "genomic data analysis", 
          "cryptographic methods", 
          "execution environment", 
          "Intel SGX", 
          "secure enclave", 
          "cryptographic protocols", 
          "privacy concerns", 
          "computation protocol", 
          "data compression", 
          "data sharing", 
          "computational overhead", 
          "VCF files", 
          "memory limitations", 
          "collaborative genome-wide association studies", 
          "hybrid approach", 
          "scale data", 
          "SGX", 
          "statistical information", 
          "algorithm", 
          "current generation microprocessors", 
          "genomic data", 
          "multiple institutions", 
          "genomic variants", 
          "data analysis", 
          "overhead", 
          "sketching", 
          "sharing", 
          "protocol", 
          "files", 
          "reduction method", 
          "microprocessor", 
          "information", 
          "method", 
          "environment", 
          "data", 
          "compression", 
          "orders of magnitude", 
          "limitations", 
          "order", 
          "variants", 
          "manner", 
          "exchange", 
          "time", 
          "concern", 
          "genome-wide association studies", 
          "enclaves", 
          "analysis", 
          "institutions", 
          "TEE", 
          "aim", 
          "association studies", 
          "study", 
          "magnitude", 
          "SKS", 
          "rare disease", 
          "disease", 
          "approach", 
          "cohort"
        ], 
        "name": "Sketching algorithms for genomic data analysis and querying in a secure enclave", 
        "pagination": "295-301", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1125341870"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1038/s41592-020-0761-8"
            ]
          }, 
          {
            "name": "pubmed_id", 
            "type": "PropertyValue", 
            "value": [
              "32132732"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1038/s41592-020-0761-8", 
          "https://app.dimensions.ai/details/publication/pub.1125341870"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2022-12-01T06:42", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-springernature-scigraph/baseset/20221201/entities/gbq_results/article/article_870.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://doi.org/10.1038/s41592-020-0761-8"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1038/s41592-020-0761-8'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1038/s41592-020-0761-8'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1038/s41592-020-0761-8'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1038/s41592-020-0761-8'


     

    This table displays all metadata directly associated to this object as RDF triples.

    308 TRIPLES      21 PREDICATES      120 URIs      95 LITERALS      18 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1038/s41592-020-0761-8 schema:about N081880c5ea4940ceb290abea346e9826
    2 N1ddc80e132ca4b68890bde775e7a69a8
    3 N255225cb8ac146fd932b4b0ff699fc76
    4 N374b990c60c14c3982ac60138d5ec8ce
    5 N450f2d5dfd4f4ce4958df6f34148467c
    6 N475f6f57fd06410f9533cbc06e02cf8c
    7 N520b7b414cbb47fba5934e106a2b0a0b
    8 N5630c594c24041228d01cf99285c1fba
    9 Na9d5c6824f5d440c81876d6c529e0f47
    10 Nbc9849e6435947c29427fb8d5eaef64b
    11 Ndcab1e67a1a94f7ebd6d37c9d6783763
    12 anzsrc-for:06
    13 anzsrc-for:0604
    14 schema:author N4b9b2e8646c240c4b68349fd91828dd4
    15 schema:citation sg:pub.10.1007/3-540-45465-9_59
    16 sg:pub.10.1007/978-3-319-16295-9_1
    17 sg:pub.10.1007/978-3-642-54568-9_9
    18 sg:pub.10.1007/978-3-662-44371-2_31
    19 sg:pub.10.1038/gim.2015.167
    20 sg:pub.10.1038/nbt.2862
    21 sg:pub.10.1038/nbt.4108
    22 sg:pub.10.1038/ng.2876
    23 sg:pub.10.1038/ng1847
    24 sg:pub.10.1038/nmeth.2251
    25 sg:pub.10.1038/nmeth.3133
    26 sg:pub.10.1038/nmeth.4037
    27 sg:pub.10.1038/nmeth.4197
    28 sg:pub.10.1038/s41467-017-02480-6
    29 sg:pub.10.1186/1472-6947-15-s5-s4
    30 sg:pub.10.1186/gm511
    31 sg:pub.10.1186/s12920-018-0396-0
    32 schema:datePublished 2020-03-04
    33 schema:datePublishedReg 2020-03-04
    34 schema:description Genome-wide association studies (GWAS), especially on rare diseases, may necessitate exchange of sensitive genomic data between multiple institutions. Since genomic data sharing is often infeasible due to privacy concerns, cryptographic methods, such as secure multiparty computation (SMC) protocols, have been developed with the aim of offering privacy-preserving collaborative GWAS. Unfortunately, the computational overhead of these methods remain prohibitive for human-genome-scale data. Here we introduce SkSES (https://github.com/ndokmai/sgx-genome-variants-search), a hardware–software hybrid approach for privacy-preserving collaborative GWAS, which improves the running time of the most advanced cryptographic protocols by two orders of magnitude. The SkSES approach is based on trusted execution environments (TEEs) offered by current-generation microprocessors—in particular, Intel’s SGX. To overcome the severe memory limitation of the TEEs, SkSES employs novel ‘sketching’ algorithms that maintain essential statistical information on genomic variants in input VCF files. By additionally incorporating efficient data compression and population stratification reduction methods, SkSES identifies the top k genomic variants in a cohort quickly, accurately and in a privacy-preserving manner.
    35 schema:genre article
    36 schema:isAccessibleForFree true
    37 schema:isPartOf Ne46dbe0fdb6443d5bd50e2b7967bcb23
    38 Nee67f54046904986a2e34b63c4baa96b
    39 sg:journal.1033763
    40 schema:keywords Intel SGX
    41 SGX
    42 SKS
    43 TEE
    44 VCF files
    45 advanced cryptographic protocols
    46 aim
    47 algorithm
    48 analysis
    49 approach
    50 association studies
    51 cohort
    52 collaborative genome-wide association studies
    53 compression
    54 computation protocol
    55 computational overhead
    56 concern
    57 cryptographic methods
    58 cryptographic protocols
    59 current generation microprocessors
    60 data
    61 data analysis
    62 data compression
    63 data sharing
    64 disease
    65 efficient data compression
    66 enclaves
    67 environment
    68 essential statistical information
    69 exchange
    70 execution environment
    71 files
    72 genome-wide association studies
    73 genomic data
    74 genomic data analysis
    75 genomic data sharing
    76 genomic variants
    77 hybrid approach
    78 information
    79 institutions
    80 limitations
    81 magnitude
    82 manner
    83 memory limitations
    84 method
    85 microprocessor
    86 multiparty computation protocol
    87 multiple institutions
    88 order
    89 orders of magnitude
    90 overhead
    91 privacy concerns
    92 privacy-preserving manner
    93 protocol
    94 rare disease
    95 reduction method
    96 scale data
    97 secure enclave
    98 secure multiparty computation protocol
    99 sensitive genomic data
    100 severe memory limitations
    101 sharing
    102 sketching
    103 statistical information
    104 study
    105 time
    106 variants
    107 schema:name Sketching algorithms for genomic data analysis and querying in a secure enclave
    108 schema:pagination 295-301
    109 schema:productId N2eeb04a240ec479da3a7384b6fab9b7b
    110 N53c769983a814d8a84ddf0dab1a93540
    111 N918380c37acc4e54882dfbd4ad7550cd
    112 schema:sameAs https://app.dimensions.ai/details/publication/pub.1125341870
    113 https://doi.org/10.1038/s41592-020-0761-8
    114 schema:sdDatePublished 2022-12-01T06:42
    115 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    116 schema:sdPublisher N72ecd1090726428896aa76cb4abf2324
    117 schema:url https://doi.org/10.1038/s41592-020-0761-8
    118 sgo:license sg:explorer/license/
    119 sgo:sdDataset articles
    120 rdf:type schema:ScholarlyArticle
    121 N081880c5ea4940ceb290abea346e9826 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    122 schema:name Genome-Wide Association Study
    123 rdf:type schema:DefinedTerm
    124 N17f399c16b4c499ea6d50e8fe8a4d72c rdf:first sg:person.011026615332.33
    125 rdf:rest Need8d4f98267419eb60c7e0e08f219b3
    126 N1ddc80e132ca4b68890bde775e7a69a8 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    127 schema:name Genotype
    128 rdf:type schema:DefinedTerm
    129 N23a9135273554737a6248e17e3d85e4f rdf:first sg:person.013001332340.02
    130 rdf:rest N3f64c1a2697449c6929ccb5c346071f3
    131 N255225cb8ac146fd932b4b0ff699fc76 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    132 schema:name Computational Biology
    133 rdf:type schema:DefinedTerm
    134 N2eeb04a240ec479da3a7384b6fab9b7b schema:name pubmed_id
    135 schema:value 32132732
    136 rdf:type schema:PropertyValue
    137 N374b990c60c14c3982ac60138d5ec8ce schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    138 schema:name Genomics
    139 rdf:type schema:DefinedTerm
    140 N3f64c1a2697449c6929ccb5c346071f3 rdf:first sg:person.013576712740.43
    141 rdf:rest N17f399c16b4c499ea6d50e8fe8a4d72c
    142 N450f2d5dfd4f4ce4958df6f34148467c schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    143 schema:name Genetic Variation
    144 rdf:type schema:DefinedTerm
    145 N475f6f57fd06410f9533cbc06e02cf8c schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    146 schema:name Algorithms
    147 rdf:type schema:DefinedTerm
    148 N4b9b2e8646c240c4b68349fd91828dd4 rdf:first sg:person.013771640265.57
    149 rdf:rest N8fc99aa8d15545e19cb0967f6e5a2764
    150 N520b7b414cbb47fba5934e106a2b0a0b schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    151 schema:name Humans
    152 rdf:type schema:DefinedTerm
    153 N53c769983a814d8a84ddf0dab1a93540 schema:name dimensions_id
    154 schema:value pub.1125341870
    155 rdf:type schema:PropertyValue
    156 N5630c594c24041228d01cf99285c1fba schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    157 schema:name Software
    158 rdf:type schema:DefinedTerm
    159 N72ecd1090726428896aa76cb4abf2324 schema:name Springer Nature - SN SciGraph project
    160 rdf:type schema:Organization
    161 N8fc99aa8d15545e19cb0967f6e5a2764 rdf:first sg:person.010635173454.00
    162 rdf:rest N23a9135273554737a6248e17e3d85e4f
    163 N918380c37acc4e54882dfbd4ad7550cd schema:name doi
    164 schema:value 10.1038/s41592-020-0761-8
    165 rdf:type schema:PropertyValue
    166 Na9d5c6824f5d440c81876d6c529e0f47 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    167 schema:name Genome, Human
    168 rdf:type schema:DefinedTerm
    169 Nbc9849e6435947c29427fb8d5eaef64b schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    170 schema:name Phenotype
    171 rdf:type schema:DefinedTerm
    172 Ndcab1e67a1a94f7ebd6d37c9d6783763 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    173 schema:name Polymorphism, Single Nucleotide
    174 rdf:type schema:DefinedTerm
    175 Ne450e7cabfe64890964454eea64828d7 rdf:first sg:person.01132015666.77
    176 rdf:rest rdf:nil
    177 Ne46dbe0fdb6443d5bd50e2b7967bcb23 schema:volumeNumber 17
    178 rdf:type schema:PublicationVolume
    179 Nee67f54046904986a2e34b63c4baa96b schema:issueNumber 3
    180 rdf:type schema:PublicationIssue
    181 Need8d4f98267419eb60c7e0e08f219b3 rdf:first sg:person.012727410605.86
    182 rdf:rest Ne450e7cabfe64890964454eea64828d7
    183 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
    184 schema:name Biological Sciences
    185 rdf:type schema:DefinedTerm
    186 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
    187 schema:name Genetics
    188 rdf:type schema:DefinedTerm
    189 sg:grant.10017225 http://pending.schema.org/fundedItem sg:pub.10.1038/s41592-020-0761-8
    190 rdf:type schema:MonetaryGrant
    191 sg:grant.12920980 http://pending.schema.org/fundedItem sg:pub.10.1038/s41592-020-0761-8
    192 rdf:type schema:MonetaryGrant
    193 sg:grant.2522154 http://pending.schema.org/fundedItem sg:pub.10.1038/s41592-020-0761-8
    194 rdf:type schema:MonetaryGrant
    195 sg:grant.4108472 http://pending.schema.org/fundedItem sg:pub.10.1038/s41592-020-0761-8
    196 rdf:type schema:MonetaryGrant
    197 sg:grant.5541872 http://pending.schema.org/fundedItem sg:pub.10.1038/s41592-020-0761-8
    198 rdf:type schema:MonetaryGrant
    199 sg:grant.5544530 http://pending.schema.org/fundedItem sg:pub.10.1038/s41592-020-0761-8
    200 rdf:type schema:MonetaryGrant
    201 sg:grant.7912625 http://pending.schema.org/fundedItem sg:pub.10.1038/s41592-020-0761-8
    202 rdf:type schema:MonetaryGrant
    203 sg:grant.8555259 http://pending.schema.org/fundedItem sg:pub.10.1038/s41592-020-0761-8
    204 rdf:type schema:MonetaryGrant
    205 sg:journal.1033763 schema:issn 1548-7091
    206 1548-7105
    207 schema:name Nature Methods
    208 schema:publisher Springer Nature
    209 rdf:type schema:Periodical
    210 sg:person.010635173454.00 schema:affiliation grid-institutes:grid.48336.3a
    211 schema:familyName Zhu
    212 schema:givenName Kaiyuan
    213 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010635173454.00
    214 rdf:type schema:Person
    215 sg:person.011026615332.33 schema:affiliation grid-institutes:grid.10516.33
    216 schema:familyName Kulekci
    217 schema:givenName M. Oguzhan
    218 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011026615332.33
    219 rdf:type schema:Person
    220 sg:person.01132015666.77 schema:affiliation grid-institutes:grid.48336.3a
    221 schema:familyName Sahinalp
    222 schema:givenName S. Cenk
    223 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01132015666.77
    224 rdf:type schema:Person
    225 sg:person.012727410605.86 schema:affiliation grid-institutes:grid.147455.6
    226 schema:familyName Woodruff
    227 schema:givenName David P.
    228 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012727410605.86
    229 rdf:type schema:Person
    230 sg:person.013001332340.02 schema:affiliation grid-institutes:grid.411377.7
    231 schema:familyName Dokmai
    232 schema:givenName Natnatee
    233 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013001332340.02
    234 rdf:type schema:Person
    235 sg:person.013576712740.43 schema:affiliation grid-institutes:grid.411377.7
    236 schema:familyName Karpov
    237 schema:givenName Nikolai
    238 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013576712740.43
    239 rdf:type schema:Person
    240 sg:person.013771640265.57 schema:affiliation grid-institutes:grid.48336.3a
    241 schema:familyName Kockan
    242 schema:givenName Can
    243 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013771640265.57
    244 rdf:type schema:Person
    245 sg:pub.10.1007/3-540-45465-9_59 schema:sameAs https://app.dimensions.ai/details/publication/pub.1002330524
    246 https://doi.org/10.1007/3-540-45465-9_59
    247 rdf:type schema:CreativeWork
    248 sg:pub.10.1007/978-3-319-16295-9_1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1015682925
    249 https://doi.org/10.1007/978-3-319-16295-9_1
    250 rdf:type schema:CreativeWork
    251 sg:pub.10.1007/978-3-642-54568-9_9 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017707706
    252 https://doi.org/10.1007/978-3-642-54568-9_9
    253 rdf:type schema:CreativeWork
    254 sg:pub.10.1007/978-3-662-44371-2_31 schema:sameAs https://app.dimensions.ai/details/publication/pub.1003203986
    255 https://doi.org/10.1007/978-3-662-44371-2_31
    256 rdf:type schema:CreativeWork
    257 sg:pub.10.1038/gim.2015.167 schema:sameAs https://app.dimensions.ai/details/publication/pub.1046300096
    258 https://doi.org/10.1038/gim.2015.167
    259 rdf:type schema:CreativeWork
    260 sg:pub.10.1038/nbt.2862 schema:sameAs https://app.dimensions.ai/details/publication/pub.1011219673
    261 https://doi.org/10.1038/nbt.2862
    262 rdf:type schema:CreativeWork
    263 sg:pub.10.1038/nbt.4108 schema:sameAs https://app.dimensions.ai/details/publication/pub.1103841581
    264 https://doi.org/10.1038/nbt.4108
    265 rdf:type schema:CreativeWork
    266 sg:pub.10.1038/ng.2876 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016276818
    267 https://doi.org/10.1038/ng.2876
    268 rdf:type schema:CreativeWork
    269 sg:pub.10.1038/ng1847 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031429813
    270 https://doi.org/10.1038/ng1847
    271 rdf:type schema:CreativeWork
    272 sg:pub.10.1038/nmeth.2251 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016190409
    273 https://doi.org/10.1038/nmeth.2251
    274 rdf:type schema:CreativeWork
    275 sg:pub.10.1038/nmeth.3133 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009051359
    276 https://doi.org/10.1038/nmeth.3133
    277 rdf:type schema:CreativeWork
    278 sg:pub.10.1038/nmeth.4037 schema:sameAs https://app.dimensions.ai/details/publication/pub.1036940519
    279 https://doi.org/10.1038/nmeth.4037
    280 rdf:type schema:CreativeWork
    281 sg:pub.10.1038/nmeth.4197 schema:sameAs https://app.dimensions.ai/details/publication/pub.1084129290
    282 https://doi.org/10.1038/nmeth.4197
    283 rdf:type schema:CreativeWork
    284 sg:pub.10.1038/s41467-017-02480-6 schema:sameAs https://app.dimensions.ai/details/publication/pub.1100795961
    285 https://doi.org/10.1038/s41467-017-02480-6
    286 rdf:type schema:CreativeWork
    287 sg:pub.10.1186/1472-6947-15-s5-s4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1038741423
    288 https://doi.org/10.1186/1472-6947-15-s5-s4
    289 rdf:type schema:CreativeWork
    290 sg:pub.10.1186/gm511 schema:sameAs https://app.dimensions.ai/details/publication/pub.1050743128
    291 https://doi.org/10.1186/gm511
    292 rdf:type schema:CreativeWork
    293 sg:pub.10.1186/s12920-018-0396-0 schema:sameAs https://app.dimensions.ai/details/publication/pub.1107561301
    294 https://doi.org/10.1186/s12920-018-0396-0
    295 rdf:type schema:CreativeWork
    296 grid-institutes:grid.10516.33 schema:alternateName Informatics Institute, Istanbul Technical University, Istanbul, Turkey
    297 schema:name Informatics Institute, Istanbul Technical University, Istanbul, Turkey
    298 rdf:type schema:Organization
    299 grid-institutes:grid.147455.6 schema:alternateName Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
    300 schema:name Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
    301 rdf:type schema:Organization
    302 grid-institutes:grid.411377.7 schema:alternateName Department of Computer Science, Indiana University, Bloomington, IN, USA
    303 schema:name Department of Computer Science, Indiana University, Bloomington, IN, USA
    304 rdf:type schema:Organization
    305 grid-institutes:grid.48336.3a schema:alternateName Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
    306 schema:name Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
    307 Department of Computer Science, Indiana University, Bloomington, IN, USA
    308 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...