Systematic auditing is essential to debiasing machine learning in biology View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2021-02-10

AUTHORS

Fatma-Elzahraa Eid, Haitham A. Elmarakeby, Yujia Alina Chan, Nadine Fornelos, Mahmoud ElHefnawi, Eliezer M. Van Allen, Lenwood S. Heath, Kasper Lage

ABSTRACT

Biases in data used to train machine learning (ML) models can inflate their prediction performance and confound our understanding of how and what they learn. Although biases are common in biological data, systematic auditing of ML models to identify and eliminate these biases is not a common practice when applying ML in the life sciences. Here we devise a systematic, principled, and general approach to audit ML models in the life sciences. We use this auditing framework to examine biases in three ML applications of therapeutic interest and identify unrecognized biases that hinder the ML process and result in substantially reduced model performance on new datasets. Ultimately, we show that ML models tend to learn primarily from data biases when there is insufficient signal in the data to learn from. We provide detailed protocols, guidelines, and examples of code to enable tailoring of the auditing framework to other biomedical applications. More... »

PAGES

183

References to SciGraph publications

  • 2007-07-04. Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method in BMC BIOINFORMATICS
  • 2000-05. Gene Ontology: tool for the unification of biology in NATURE GENETICS
  • 2018-03-05. Using deep learning to model the hierarchical structure and function of a cell in NATURE METHODS
  • 2012-12-07. Flaws in evaluation schemes for pair-input computational predictions in NATURE METHODS
  • 2019-01-28. DeepSeqPan, a novel deep convolutional neural network model for pan-specific class I HLA-peptide binding affinity prediction in SCIENTIFIC REPORTS
  • 2018-07-18. AI can be sexist and racist — it’s time to make it fair in NATURE
  • 2006-03-20. Choosing negative examples for the prediction of protein-protein interactions in BMC BIOINFORMATICS
  • 2013-06-07. Efficient regularized least-squares algorithms for conditional ranking on relational data in MACHINE LEARNING
  • 2009-09-18. NN-align. An artificial neural network-based alignment algorithm for MHC class II peptide binding prediction in BMC BIOINFORMATICS
  • 2009-11-30. Derivation of an amino acid similarity matrix for peptide:MHC binding and its application as a Bayesian prior in BMC BIOINFORMATICS
  • 2011-02-20. Navigating the kinome in NATURE CHEMICAL BIOLOGY
  • 1999-06. Generation of tissue-specific and promiscuous HLA ligand databases using DNA microarrays and virtual HLA class II matrices in NATURE BIOTECHNOLOGY
  • 2017-05-25. Sequence-based prediction of protein protein interaction using a deep-learning algorithm in BMC BIOINFORMATICS
  • 2005-05-03. Automated generation and evaluation of specific MHC binding predictive tools: ARB matrix applications in IMMUNOGENETICS
  • 2007-12-21. A new pairwise kernel for biological network inference with support vector machines in BMC BIOINFORMATICS
  • Identifiers

    URI

    http://scigraph.springernature.com/pub.10.1038/s42003-021-01674-5

    DOI

    http://dx.doi.org/10.1038/s42003-021-01674-5

    DIMENSIONS

    https://app.dimensions.ai/details/publication/pub.1135302311

    PUBMED

    https://www.ncbi.nlm.nih.gov/pubmed/33568741


    Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
    Incoming Citations Browse incoming citations for this publication using opencitations.net

    JSON-LD is the canonical representation for SciGraph data.

    TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

    [
      {
        "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
        "about": [
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/01", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Mathematical Sciences", 
            "type": "DefinedTerm"
          }, 
          {
            "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0104", 
            "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
            "name": "Statistics", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Animals", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Bias", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Data Mining", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Databases, Protein", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Histocompatibility Antigens", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Humans", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Machine Learning", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Pharmaceutical Preparations", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Protein Binding", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Protein Interaction Maps", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Proteins", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Proteome", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Proteomics", 
            "type": "DefinedTerm"
          }, 
          {
            "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
            "name": "Reproducibility of Results", 
            "type": "DefinedTerm"
          }
        ], 
        "author": [
          {
            "affiliation": {
              "alternateName": "Department of Systems and Computer Engineering, Al-Azhar University, Cairo, Egypt", 
              "id": "http://www.grid.ac/institutes/grid.411303.4", 
              "name": [
                "Broad Institute of MIT and Harvard, Cambridge, MA, USA", 
                "Department of Systems and Computer Engineering, Al-Azhar University, Cairo, Egypt"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Eid", 
            "givenName": "Fatma-Elzahraa", 
            "id": "sg:person.01331366253.46", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01331366253.46"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Dana-Farber Cancer Institute, Boston, MA, USA", 
              "id": "http://www.grid.ac/institutes/grid.65499.37", 
              "name": [
                "Broad Institute of MIT and Harvard, Cambridge, MA, USA", 
                "Department of Systems and Computer Engineering, Al-Azhar University, Cairo, Egypt", 
                "Dana-Farber Cancer Institute, Boston, MA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Elmarakeby", 
            "givenName": "Haitham A.", 
            "id": "sg:person.01264227073.34", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01264227073.34"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Broad Institute of MIT and Harvard, Cambridge, MA, USA", 
              "id": "http://www.grid.ac/institutes/grid.66859.34", 
              "name": [
                "Broad Institute of MIT and Harvard, Cambridge, MA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Chan", 
            "givenName": "Yujia Alina", 
            "id": "sg:person.01310017500.30", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01310017500.30"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Broad Institute of MIT and Harvard, Cambridge, MA, USA", 
              "id": "http://www.grid.ac/institutes/grid.66859.34", 
              "name": [
                "Broad Institute of MIT and Harvard, Cambridge, MA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Fornelos", 
            "givenName": "Nadine", 
            "id": "sg:person.0742031056.69", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0742031056.69"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Informatics and Systems Department, Division of Engineering Research, National Research Centre, Giza, Egypt", 
              "id": "http://www.grid.ac/institutes/grid.419725.c", 
              "name": [
                "Informatics and Systems Department, Division of Engineering Research, National Research Centre, Giza, Egypt"
              ], 
              "type": "Organization"
            }, 
            "familyName": "ElHefnawi", 
            "givenName": "Mahmoud", 
            "id": "sg:person.01010454622.24", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01010454622.24"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Dana-Farber Cancer Institute, Boston, MA, USA", 
              "id": "http://www.grid.ac/institutes/grid.65499.37", 
              "name": [
                "Broad Institute of MIT and Harvard, Cambridge, MA, USA", 
                "Dana-Farber Cancer Institute, Boston, MA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Van Allen", 
            "givenName": "Eliezer M.", 
            "id": "sg:person.01274137244.17", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01274137244.17"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Virginia Polytechnic Institute and State University, Blacksburg, VA, USA", 
              "id": "http://www.grid.ac/institutes/grid.438526.e", 
              "name": [
                "Virginia Polytechnic Institute and State University, Blacksburg, VA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Heath", 
            "givenName": "Lenwood S.", 
            "id": "sg:person.01121450224.28", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01121450224.28"
            ], 
            "type": "Person"
          }, 
          {
            "affiliation": {
              "alternateName": "Harvard Medical School, Boston, MA, USA", 
              "id": "http://www.grid.ac/institutes/grid.38142.3c", 
              "name": [
                "Broad Institute of MIT and Harvard, Cambridge, MA, USA", 
                "Department of Surgery, Massachusetts General Hospital, Boston, MA, USA", 
                "Harvard Medical School, Boston, MA, USA"
              ], 
              "type": "Organization"
            }, 
            "familyName": "Lage", 
            "givenName": "Kasper", 
            "id": "sg:person.0717272702.13", 
            "sameAs": [
              "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0717272702.13"
            ], 
            "type": "Person"
          }
        ], 
        "citation": [
          {
            "id": "sg:pub.10.1038/nmeth.4627", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1101336342", 
              "https://doi.org/10.1038/nmeth.4627"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2105-8-238", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1020429024", 
              "https://doi.org/10.1186/1471-2105-8-238"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/75556", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1044135237", 
              "https://doi.org/10.1038/75556"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2105-7-s1-s2", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1038206691", 
              "https://doi.org/10.1186/1471-2105-7-s1-s2"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/s41598-018-37214-1", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1111675992", 
              "https://doi.org/10.1038/s41598-018-37214-1"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2105-10-394", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1046329132", 
              "https://doi.org/10.1186/1471-2105-10-394"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2105-8-s10-s8", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1044521529", 
              "https://doi.org/10.1186/1471-2105-8-s10-s8"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s10994-013-5354-7", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1009563914", 
              "https://doi.org/10.1007/s10994-013-5354-7"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/d41586-018-05707-8", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1105609545", 
              "https://doi.org/10.1038/d41586-018-05707-8"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/s12859-017-1700-2", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1085594900", 
              "https://doi.org/10.1186/s12859-017-1700-2"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nchembio.530", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1011297255", 
              "https://doi.org/10.1038/nchembio.530"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1186/1471-2105-10-296", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1021629866", 
              "https://doi.org/10.1186/1471-2105-10-296"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/9858", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1033407907", 
              "https://doi.org/10.1038/9858"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1038/nmeth.2259", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1005052725", 
              "https://doi.org/10.1038/nmeth.2259"
            ], 
            "type": "CreativeWork"
          }, 
          {
            "id": "sg:pub.10.1007/s00251-005-0798-y", 
            "sameAs": [
              "https://app.dimensions.ai/details/publication/pub.1015144264", 
              "https://doi.org/10.1007/s00251-005-0798-y"
            ], 
            "type": "CreativeWork"
          }
        ], 
        "datePublished": "2021-02-10", 
        "datePublishedReg": "2021-02-10", 
        "description": "Biases in data used to train machine learning (ML) models can inflate their prediction performance and confound our understanding of how and what they learn. Although biases are common in biological data, systematic auditing of ML models to identify and eliminate these biases is not a common practice when applying ML in the life sciences. Here we devise a systematic, principled, and general approach to audit ML models in the life sciences. We use this auditing framework to examine biases in three ML applications of therapeutic interest and identify unrecognized biases that hinder the ML process and result in substantially reduced model performance on new datasets. Ultimately, we show that ML models tend to learn primarily from data biases when there is insufficient signal in the data to learn from. We provide detailed protocols, guidelines, and examples of code to enable tailoring of the auditing framework to other biomedical applications.", 
        "genre": "article", 
        "id": "sg:pub.10.1038/s42003-021-01674-5", 
        "isAccessibleForFree": true, 
        "isFundedItemOf": [
          {
            "id": "sg:grant.5476754", 
            "type": "MonetaryGrant"
          }
        ], 
        "isPartOf": [
          {
            "id": "sg:journal.1300829", 
            "issn": [
              "2399-3642"
            ], 
            "name": "Communications Biology", 
            "publisher": "Springer Nature", 
            "type": "Periodical"
          }, 
          {
            "issueNumber": "1", 
            "type": "PublicationIssue"
          }, 
          {
            "type": "PublicationVolume", 
            "volumeNumber": "4"
          }
        ], 
        "keywords": [
          "ML models", 
          "auditing framework", 
          "machine learning models", 
          "examples of codes", 
          "ML applications", 
          "learning model", 
          "ML process", 
          "new dataset", 
          "prediction performance", 
          "data biases", 
          "biological data", 
          "systematic auditing", 
          "model performance", 
          "auditing", 
          "framework", 
          "general approach", 
          "machine", 
          "dataset", 
          "applications", 
          "performance", 
          "code", 
          "model", 
          "common practice", 
          "data", 
          "protocol", 
          "example", 
          "interest", 
          "process", 
          "biases", 
          "signals", 
          "practice", 
          "tailoring", 
          "life", 
          "guidelines", 
          "understanding", 
          "biomedical applications", 
          "biology", 
          "mL", 
          "unrecognized biases", 
          "detailed protocol", 
          "approach", 
          "therapeutic interest", 
          "insufficient signal"
        ], 
        "name": "Systematic auditing is essential to debiasing machine learning in biology", 
        "pagination": "183", 
        "productId": [
          {
            "name": "dimensions_id", 
            "type": "PropertyValue", 
            "value": [
              "pub.1135302311"
            ]
          }, 
          {
            "name": "doi", 
            "type": "PropertyValue", 
            "value": [
              "10.1038/s42003-021-01674-5"
            ]
          }, 
          {
            "name": "pubmed_id", 
            "type": "PropertyValue", 
            "value": [
              "33568741"
            ]
          }
        ], 
        "sameAs": [
          "https://doi.org/10.1038/s42003-021-01674-5", 
          "https://app.dimensions.ai/details/publication/pub.1135302311"
        ], 
        "sdDataset": "articles", 
        "sdDatePublished": "2022-10-01T06:48", 
        "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
        "sdPublisher": {
          "name": "Springer Nature - SN SciGraph project", 
          "type": "Organization"
        }, 
        "sdSource": "s3://com-springernature-scigraph/baseset/20221001/entities/gbq_results/article/article_884.jsonl", 
        "type": "ScholarlyArticle", 
        "url": "https://doi.org/10.1038/s42003-021-01674-5"
      }
    ]
     

    Download the RDF metadata as:  json-ld nt turtle xml License info

    HOW TO GET THIS DATA PROGRAMMATICALLY:

    JSON-LD is a popular format for linked data which is fully compatible with JSON.

    curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1038/s42003-021-01674-5'

    N-Triples is a line-based linked data format ideal for batch operations.

    curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1038/s42003-021-01674-5'

    Turtle is a human-readable linked data format.

    curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1038/s42003-021-01674-5'

    RDF/XML is a standard XML format for linked data.

    curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1038/s42003-021-01674-5'


     

    This table displays all metadata directly associated to this object as RDF triples.

    290 TRIPLES      21 PREDICATES      97 URIs      74 LITERALS      21 BLANK NODES

    Subject Predicate Object
    1 sg:pub.10.1038/s42003-021-01674-5 schema:about N0d9982f34b96447687a116570bf5fcec
    2 N15153f5353ce4f7a85623f34fb336b31
    3 N2a684e47d34d4c94bd42996a5b177ab1
    4 N326d9f26a9774f2089260c032e952f52
    5 N41e64f8599154e028f285393ce14e022
    6 N4aebc50c24944c81a3e29943c05afc1c
    7 N76e76545d5a64356bd95e7eb02c22493
    8 N8407ee6dfdde4fef98a423769faee541
    9 N8d2ea59111c543dcb2fa0718729bd0d5
    10 N97a2e668ebf74abc8b7a3b0f05731ef2
    11 N9ea16820daae46b18716068f90d03b1f
    12 Nadde9e18337d4f7aa8ad487145e9e7a9
    13 Nc3f591d9567840f799b4eceaf7c67d53
    14 Nfb55125d7d034acba028fa86e257bac5
    15 anzsrc-for:01
    16 anzsrc-for:0104
    17 schema:author Nb117955cebff4de0aff707e4f0a06305
    18 schema:citation sg:pub.10.1007/s00251-005-0798-y
    19 sg:pub.10.1007/s10994-013-5354-7
    20 sg:pub.10.1038/75556
    21 sg:pub.10.1038/9858
    22 sg:pub.10.1038/d41586-018-05707-8
    23 sg:pub.10.1038/nchembio.530
    24 sg:pub.10.1038/nmeth.2259
    25 sg:pub.10.1038/nmeth.4627
    26 sg:pub.10.1038/s41598-018-37214-1
    27 sg:pub.10.1186/1471-2105-10-296
    28 sg:pub.10.1186/1471-2105-10-394
    29 sg:pub.10.1186/1471-2105-7-s1-s2
    30 sg:pub.10.1186/1471-2105-8-238
    31 sg:pub.10.1186/1471-2105-8-s10-s8
    32 sg:pub.10.1186/s12859-017-1700-2
    33 schema:datePublished 2021-02-10
    34 schema:datePublishedReg 2021-02-10
    35 schema:description Biases in data used to train machine learning (ML) models can inflate their prediction performance and confound our understanding of how and what they learn. Although biases are common in biological data, systematic auditing of ML models to identify and eliminate these biases is not a common practice when applying ML in the life sciences. Here we devise a systematic, principled, and general approach to audit ML models in the life sciences. We use this auditing framework to examine biases in three ML applications of therapeutic interest and identify unrecognized biases that hinder the ML process and result in substantially reduced model performance on new datasets. Ultimately, we show that ML models tend to learn primarily from data biases when there is insufficient signal in the data to learn from. We provide detailed protocols, guidelines, and examples of code to enable tailoring of the auditing framework to other biomedical applications.
    36 schema:genre article
    37 schema:isAccessibleForFree true
    38 schema:isPartOf Nb65f985388aa4c89b08ebbe45fda92b6
    39 Ne99a0a4589084e9494bfb72d4cf38c94
    40 sg:journal.1300829
    41 schema:keywords ML applications
    42 ML models
    43 ML process
    44 applications
    45 approach
    46 auditing
    47 auditing framework
    48 biases
    49 biological data
    50 biology
    51 biomedical applications
    52 code
    53 common practice
    54 data
    55 data biases
    56 dataset
    57 detailed protocol
    58 example
    59 examples of codes
    60 framework
    61 general approach
    62 guidelines
    63 insufficient signal
    64 interest
    65 learning model
    66 life
    67 mL
    68 machine
    69 machine learning models
    70 model
    71 model performance
    72 new dataset
    73 performance
    74 practice
    75 prediction performance
    76 process
    77 protocol
    78 signals
    79 systematic auditing
    80 tailoring
    81 therapeutic interest
    82 understanding
    83 unrecognized biases
    84 schema:name Systematic auditing is essential to debiasing machine learning in biology
    85 schema:pagination 183
    86 schema:productId N6edd9890944348acb9123a3a4ec29430
    87 Nd80e68b3da8e463fb8f2644a0b8528fd
    88 Ne0769d4abb9246b78288f51b26d4fd24
    89 schema:sameAs https://app.dimensions.ai/details/publication/pub.1135302311
    90 https://doi.org/10.1038/s42003-021-01674-5
    91 schema:sdDatePublished 2022-10-01T06:48
    92 schema:sdLicense https://scigraph.springernature.com/explorer/license/
    93 schema:sdPublisher Na8162c0f09664c1e915e103699ebfc14
    94 schema:url https://doi.org/10.1038/s42003-021-01674-5
    95 sgo:license sg:explorer/license/
    96 sgo:sdDataset articles
    97 rdf:type schema:ScholarlyArticle
    98 N0d9982f34b96447687a116570bf5fcec schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    99 schema:name Humans
    100 rdf:type schema:DefinedTerm
    101 N15153f5353ce4f7a85623f34fb336b31 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    102 schema:name Proteins
    103 rdf:type schema:DefinedTerm
    104 N2a684e47d34d4c94bd42996a5b177ab1 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    105 schema:name Bias
    106 rdf:type schema:DefinedTerm
    107 N326d9f26a9774f2089260c032e952f52 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    108 schema:name Machine Learning
    109 rdf:type schema:DefinedTerm
    110 N41e64f8599154e028f285393ce14e022 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    111 schema:name Proteome
    112 rdf:type schema:DefinedTerm
    113 N4aebc50c24944c81a3e29943c05afc1c schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    114 schema:name Protein Binding
    115 rdf:type schema:DefinedTerm
    116 N552e2dde5fe843dba0cd1aa71b23f96d rdf:first sg:person.0717272702.13
    117 rdf:rest rdf:nil
    118 N6edd9890944348acb9123a3a4ec29430 schema:name doi
    119 schema:value 10.1038/s42003-021-01674-5
    120 rdf:type schema:PropertyValue
    121 N709cbeac6902464a85b0221c4b05a9e5 rdf:first sg:person.0742031056.69
    122 rdf:rest N836f4525912a40e988289e360acc918e
    123 N76e76545d5a64356bd95e7eb02c22493 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    124 schema:name Pharmaceutical Preparations
    125 rdf:type schema:DefinedTerm
    126 N836f4525912a40e988289e360acc918e rdf:first sg:person.01010454622.24
    127 rdf:rest Na3ae9048951c49339278c03e05f87855
    128 N8407ee6dfdde4fef98a423769faee541 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    129 schema:name Protein Interaction Maps
    130 rdf:type schema:DefinedTerm
    131 N8864bfe77dc4409a8fae6535ce75c5d0 rdf:first sg:person.01121450224.28
    132 rdf:rest N552e2dde5fe843dba0cd1aa71b23f96d
    133 N8d2ea59111c543dcb2fa0718729bd0d5 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    134 schema:name Proteomics
    135 rdf:type schema:DefinedTerm
    136 N97a2e668ebf74abc8b7a3b0f05731ef2 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    137 schema:name Data Mining
    138 rdf:type schema:DefinedTerm
    139 N9ea16820daae46b18716068f90d03b1f schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    140 schema:name Histocompatibility Antigens
    141 rdf:type schema:DefinedTerm
    142 Na3ae9048951c49339278c03e05f87855 rdf:first sg:person.01274137244.17
    143 rdf:rest N8864bfe77dc4409a8fae6535ce75c5d0
    144 Na8162c0f09664c1e915e103699ebfc14 schema:name Springer Nature - SN SciGraph project
    145 rdf:type schema:Organization
    146 Nadde9e18337d4f7aa8ad487145e9e7a9 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    147 schema:name Animals
    148 rdf:type schema:DefinedTerm
    149 Nb117955cebff4de0aff707e4f0a06305 rdf:first sg:person.01331366253.46
    150 rdf:rest Nf838abd1ef0e466f9d6b8c47ba2b809f
    151 Nb65f985388aa4c89b08ebbe45fda92b6 schema:volumeNumber 4
    152 rdf:type schema:PublicationVolume
    153 Nc3f591d9567840f799b4eceaf7c67d53 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    154 schema:name Reproducibility of Results
    155 rdf:type schema:DefinedTerm
    156 Nd80e68b3da8e463fb8f2644a0b8528fd schema:name pubmed_id
    157 schema:value 33568741
    158 rdf:type schema:PropertyValue
    159 Ne0769d4abb9246b78288f51b26d4fd24 schema:name dimensions_id
    160 schema:value pub.1135302311
    161 rdf:type schema:PropertyValue
    162 Ne99a0a4589084e9494bfb72d4cf38c94 schema:issueNumber 1
    163 rdf:type schema:PublicationIssue
    164 Nea2188f56aa549efa503f10cafd373ff rdf:first sg:person.01310017500.30
    165 rdf:rest N709cbeac6902464a85b0221c4b05a9e5
    166 Nf838abd1ef0e466f9d6b8c47ba2b809f rdf:first sg:person.01264227073.34
    167 rdf:rest Nea2188f56aa549efa503f10cafd373ff
    168 Nfb55125d7d034acba028fa86e257bac5 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
    169 schema:name Databases, Protein
    170 rdf:type schema:DefinedTerm
    171 anzsrc-for:01 schema:inDefinedTermSet anzsrc-for:
    172 schema:name Mathematical Sciences
    173 rdf:type schema:DefinedTerm
    174 anzsrc-for:0104 schema:inDefinedTermSet anzsrc-for:
    175 schema:name Statistics
    176 rdf:type schema:DefinedTerm
    177 sg:grant.5476754 http://pending.schema.org/fundedItem sg:pub.10.1038/s42003-021-01674-5
    178 rdf:type schema:MonetaryGrant
    179 sg:journal.1300829 schema:issn 2399-3642
    180 schema:name Communications Biology
    181 schema:publisher Springer Nature
    182 rdf:type schema:Periodical
    183 sg:person.01010454622.24 schema:affiliation grid-institutes:grid.419725.c
    184 schema:familyName ElHefnawi
    185 schema:givenName Mahmoud
    186 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01010454622.24
    187 rdf:type schema:Person
    188 sg:person.01121450224.28 schema:affiliation grid-institutes:grid.438526.e
    189 schema:familyName Heath
    190 schema:givenName Lenwood S.
    191 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01121450224.28
    192 rdf:type schema:Person
    193 sg:person.01264227073.34 schema:affiliation grid-institutes:grid.65499.37
    194 schema:familyName Elmarakeby
    195 schema:givenName Haitham A.
    196 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01264227073.34
    197 rdf:type schema:Person
    198 sg:person.01274137244.17 schema:affiliation grid-institutes:grid.65499.37
    199 schema:familyName Van Allen
    200 schema:givenName Eliezer M.
    201 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01274137244.17
    202 rdf:type schema:Person
    203 sg:person.01310017500.30 schema:affiliation grid-institutes:grid.66859.34
    204 schema:familyName Chan
    205 schema:givenName Yujia Alina
    206 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01310017500.30
    207 rdf:type schema:Person
    208 sg:person.01331366253.46 schema:affiliation grid-institutes:grid.411303.4
    209 schema:familyName Eid
    210 schema:givenName Fatma-Elzahraa
    211 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01331366253.46
    212 rdf:type schema:Person
    213 sg:person.0717272702.13 schema:affiliation grid-institutes:grid.38142.3c
    214 schema:familyName Lage
    215 schema:givenName Kasper
    216 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0717272702.13
    217 rdf:type schema:Person
    218 sg:person.0742031056.69 schema:affiliation grid-institutes:grid.66859.34
    219 schema:familyName Fornelos
    220 schema:givenName Nadine
    221 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0742031056.69
    222 rdf:type schema:Person
    223 sg:pub.10.1007/s00251-005-0798-y schema:sameAs https://app.dimensions.ai/details/publication/pub.1015144264
    224 https://doi.org/10.1007/s00251-005-0798-y
    225 rdf:type schema:CreativeWork
    226 sg:pub.10.1007/s10994-013-5354-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009563914
    227 https://doi.org/10.1007/s10994-013-5354-7
    228 rdf:type schema:CreativeWork
    229 sg:pub.10.1038/75556 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044135237
    230 https://doi.org/10.1038/75556
    231 rdf:type schema:CreativeWork
    232 sg:pub.10.1038/9858 schema:sameAs https://app.dimensions.ai/details/publication/pub.1033407907
    233 https://doi.org/10.1038/9858
    234 rdf:type schema:CreativeWork
    235 sg:pub.10.1038/d41586-018-05707-8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1105609545
    236 https://doi.org/10.1038/d41586-018-05707-8
    237 rdf:type schema:CreativeWork
    238 sg:pub.10.1038/nchembio.530 schema:sameAs https://app.dimensions.ai/details/publication/pub.1011297255
    239 https://doi.org/10.1038/nchembio.530
    240 rdf:type schema:CreativeWork
    241 sg:pub.10.1038/nmeth.2259 schema:sameAs https://app.dimensions.ai/details/publication/pub.1005052725
    242 https://doi.org/10.1038/nmeth.2259
    243 rdf:type schema:CreativeWork
    244 sg:pub.10.1038/nmeth.4627 schema:sameAs https://app.dimensions.ai/details/publication/pub.1101336342
    245 https://doi.org/10.1038/nmeth.4627
    246 rdf:type schema:CreativeWork
    247 sg:pub.10.1038/s41598-018-37214-1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1111675992
    248 https://doi.org/10.1038/s41598-018-37214-1
    249 rdf:type schema:CreativeWork
    250 sg:pub.10.1186/1471-2105-10-296 schema:sameAs https://app.dimensions.ai/details/publication/pub.1021629866
    251 https://doi.org/10.1186/1471-2105-10-296
    252 rdf:type schema:CreativeWork
    253 sg:pub.10.1186/1471-2105-10-394 schema:sameAs https://app.dimensions.ai/details/publication/pub.1046329132
    254 https://doi.org/10.1186/1471-2105-10-394
    255 rdf:type schema:CreativeWork
    256 sg:pub.10.1186/1471-2105-7-s1-s2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1038206691
    257 https://doi.org/10.1186/1471-2105-7-s1-s2
    258 rdf:type schema:CreativeWork
    259 sg:pub.10.1186/1471-2105-8-238 schema:sameAs https://app.dimensions.ai/details/publication/pub.1020429024
    260 https://doi.org/10.1186/1471-2105-8-238
    261 rdf:type schema:CreativeWork
    262 sg:pub.10.1186/1471-2105-8-s10-s8 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044521529
    263 https://doi.org/10.1186/1471-2105-8-s10-s8
    264 rdf:type schema:CreativeWork
    265 sg:pub.10.1186/s12859-017-1700-2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1085594900
    266 https://doi.org/10.1186/s12859-017-1700-2
    267 rdf:type schema:CreativeWork
    268 grid-institutes:grid.38142.3c schema:alternateName Harvard Medical School, Boston, MA, USA
    269 schema:name Broad Institute of MIT and Harvard, Cambridge, MA, USA
    270 Department of Surgery, Massachusetts General Hospital, Boston, MA, USA
    271 Harvard Medical School, Boston, MA, USA
    272 rdf:type schema:Organization
    273 grid-institutes:grid.411303.4 schema:alternateName Department of Systems and Computer Engineering, Al-Azhar University, Cairo, Egypt
    274 schema:name Broad Institute of MIT and Harvard, Cambridge, MA, USA
    275 Department of Systems and Computer Engineering, Al-Azhar University, Cairo, Egypt
    276 rdf:type schema:Organization
    277 grid-institutes:grid.419725.c schema:alternateName Informatics and Systems Department, Division of Engineering Research, National Research Centre, Giza, Egypt
    278 schema:name Informatics and Systems Department, Division of Engineering Research, National Research Centre, Giza, Egypt
    279 rdf:type schema:Organization
    280 grid-institutes:grid.438526.e schema:alternateName Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
    281 schema:name Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
    282 rdf:type schema:Organization
    283 grid-institutes:grid.65499.37 schema:alternateName Dana-Farber Cancer Institute, Boston, MA, USA
    284 schema:name Broad Institute of MIT and Harvard, Cambridge, MA, USA
    285 Dana-Farber Cancer Institute, Boston, MA, USA
    286 Department of Systems and Computer Engineering, Al-Azhar University, Cairo, Egypt
    287 rdf:type schema:Organization
    288 grid-institutes:grid.66859.34 schema:alternateName Broad Institute of MIT and Harvard, Cambridge, MA, USA
    289 schema:name Broad Institute of MIT and Harvard, Cambridge, MA, USA
    290 rdf:type schema:Organization
     




    Preview window. Press ESC to close (or click here)


    ...