MINT and IntAct contribute to the Second BioCreative challenge: serving the text-mining community with high quality molecular interaction data View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2008-09-01

AUTHORS

Andrew Chatr-aryamontri, Samuel Kerrien, Jyoti Khadake, Sandra Orchard, Arnaud Ceol, Luana Licata, Luisa Castagnoli, Stefano Costa, Cathy Derow, Rachael Huntley, Bruno Aranda, Catherine Leroy, Dave Thorneycroft, Rolf Apweiler, Gianni Cesareni, Henning Hermjakob

ABSTRACT

BackgroundIn the absence of consolidated pipelines to archive biological data electronically, information dispersed in the literature must be captured by manual annotation. Unfortunately, manual annotation is time consuming and the coverage of published interaction data is therefore far from complete. The use of text-mining tools to identify relevant publications and to assist in the initial information extraction could help to improve the efficiency of the curation process and, as a consequence, the database coverage of data available in the literature. The 2006 BioCreative competition was aimed at evaluating text-mining procedures in comparison with manual annotation of protein-protein interactions.ResultsTo aid the BioCreative protein-protein interaction task, IntAct and MINT (Molecular INTeraction) provided both the training and the test datasets. Data from both databases are comparable because they were curated according to the same standards. During the manual curation process, the major cause of data loss in mining the articles for information was ambiguity in the mapping of the gene names to stable UniProtKB database identifiers. It was also observed that most of the information about interactions was contained only within the full-text of the publication; hence, text mining of protein-protein interaction data will require the analysis of the full-text of the articles and cannot be restricted to the abstract.ConclusionThe development of text-mining tools to extract protein-protein interaction information may increase the literature coverage achieved by manual curation. To support the text-mining community, databases will highlight those sentences within the articles that describe the interactions. These will supply data-miners with a high quality dataset for algorithm development. Furthermore, the dictionary of terms created by the BioCreative competitors could enrich the synonym list of the PSI-MI (Proteomics Standards Initiative-Molecular Interactions) controlled vocabulary, which is used by both databases to annotate their data content. More... »

PAGES

s5

References to SciGraph publications

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/gb-2008-9-s2-s5

DOI

http://dx.doi.org/10.1186/gb-2008-9-s2-s5

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1022366302

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/18834496


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0806", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information Systems", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Computational Biology", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Databases, Bibliographic", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Protein Interaction Mapping", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Proteomics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Societies, Scientific", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Vocabulary, Controlled", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Department of Biology, University of Rome, Tor Vergata, Via della Ricerca Scientifica, 00133, Rome, Italy", 
          "id": "http://www.grid.ac/institutes/grid.7841.a", 
          "name": [
            "Department of Biology, University of Rome, Tor Vergata, Via della Ricerca Scientifica, 00133, Rome, Italy"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Chatr-aryamontri", 
        "givenName": "Andrew", 
        "id": "sg:person.01130104722.77", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01130104722.77"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Cambridge, UK", 
          "id": "http://www.grid.ac/institutes/grid.225360.0", 
          "name": [
            "EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Cambridge, UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Kerrien", 
        "givenName": "Samuel", 
        "id": "sg:person.01250435717.21", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01250435717.21"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Cambridge, UK", 
          "id": "http://www.grid.ac/institutes/grid.225360.0", 
          "name": [
            "EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Cambridge, UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Khadake", 
        "givenName": "Jyoti", 
        "id": "sg:person.01021407513.17", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01021407513.17"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Cambridge, UK", 
          "id": "http://www.grid.ac/institutes/grid.225360.0", 
          "name": [
            "EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Cambridge, UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Orchard", 
        "givenName": "Sandra", 
        "id": "sg:person.01067522713.08", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01067522713.08"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Biology, University of Rome, Tor Vergata, Via della Ricerca Scientifica, 00133, Rome, Italy", 
          "id": "http://www.grid.ac/institutes/grid.7841.a", 
          "name": [
            "Department of Biology, University of Rome, Tor Vergata, Via della Ricerca Scientifica, 00133, Rome, Italy"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Ceol", 
        "givenName": "Arnaud", 
        "id": "sg:person.01076760745.48", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01076760745.48"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Biology, University of Rome, Tor Vergata, Via della Ricerca Scientifica, 00133, Rome, Italy", 
          "id": "http://www.grid.ac/institutes/grid.7841.a", 
          "name": [
            "Department of Biology, University of Rome, Tor Vergata, Via della Ricerca Scientifica, 00133, Rome, Italy"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Licata", 
        "givenName": "Luana", 
        "id": "sg:person.0766155045.13", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0766155045.13"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Biology, University of Rome, Tor Vergata, Via della Ricerca Scientifica, 00133, Rome, Italy", 
          "id": "http://www.grid.ac/institutes/grid.7841.a", 
          "name": [
            "Department of Biology, University of Rome, Tor Vergata, Via della Ricerca Scientifica, 00133, Rome, Italy"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Castagnoli", 
        "givenName": "Luisa", 
        "id": "sg:person.01174107367.97", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01174107367.97"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Biology, University of Rome, Tor Vergata, Via della Ricerca Scientifica, 00133, Rome, Italy", 
          "id": "http://www.grid.ac/institutes/grid.7841.a", 
          "name": [
            "Department of Biology, University of Rome, Tor Vergata, Via della Ricerca Scientifica, 00133, Rome, Italy"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Costa", 
        "givenName": "Stefano", 
        "id": "sg:person.01307040043.10", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01307040043.10"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Cambridge, UK", 
          "id": "http://www.grid.ac/institutes/grid.225360.0", 
          "name": [
            "EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Cambridge, UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Derow", 
        "givenName": "Cathy", 
        "id": "sg:person.01171460461.45", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01171460461.45"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Cambridge, UK", 
          "id": "http://www.grid.ac/institutes/grid.225360.0", 
          "name": [
            "EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Cambridge, UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Huntley", 
        "givenName": "Rachael", 
        "id": "sg:person.01120775354.41", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01120775354.41"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Cambridge, UK", 
          "id": "http://www.grid.ac/institutes/grid.225360.0", 
          "name": [
            "EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Cambridge, UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Aranda", 
        "givenName": "Bruno", 
        "id": "sg:person.01316551117.75", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01316551117.75"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Cambridge, UK", 
          "id": "http://www.grid.ac/institutes/grid.225360.0", 
          "name": [
            "EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Cambridge, UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Leroy", 
        "givenName": "Catherine", 
        "id": "sg:person.0757135413.69", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0757135413.69"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Cambridge, UK", 
          "id": "http://www.grid.ac/institutes/grid.225360.0", 
          "name": [
            "EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Cambridge, UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Thorneycroft", 
        "givenName": "Dave", 
        "id": "sg:person.01025250613.49", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01025250613.49"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Cambridge, UK", 
          "id": "http://www.grid.ac/institutes/grid.225360.0", 
          "name": [
            "EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Cambridge, UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Apweiler", 
        "givenName": "Rolf", 
        "id": "sg:person.01134215603.98", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01134215603.98"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Department of Biology, University of Rome, Tor Vergata, Via della Ricerca Scientifica, 00133, Rome, Italy", 
          "id": "http://www.grid.ac/institutes/grid.7841.a", 
          "name": [
            "Department of Biology, University of Rome, Tor Vergata, Via della Ricerca Scientifica, 00133, Rome, Italy"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Cesareni", 
        "givenName": "Gianni", 
        "id": "sg:person.0703271022.11", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0703271022.11"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Cambridge, UK", 
          "id": "http://www.grid.ac/institutes/grid.225360.0", 
          "name": [
            "EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Cambridge, UK"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Hermjakob", 
        "givenName": "Henning", 
        "id": "sg:person.01070655672.90", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01070655672.90"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1038/nbt0307-262b", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1014250456", 
          "https://doi.org/10.1038/nbt0307-262b"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nbt1324", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1032491066", 
          "https://doi.org/10.1038/nbt1324"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1741-7007-5-44", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1035046201", 
          "https://doi.org/10.1186/1741-7007-5-44"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2008-09-01", 
    "datePublishedReg": "2008-09-01", 
    "description": "BackgroundIn the absence of consolidated pipelines to archive biological data electronically, information dispersed in the literature must be captured by manual annotation. Unfortunately, manual annotation is time consuming and the coverage of published interaction data is therefore far from complete. The use of text-mining tools to identify relevant publications and to assist in the initial information extraction could help to improve the efficiency of the curation process and, as a consequence, the database coverage of data available in the literature. The 2006 BioCreative competition was aimed at evaluating text-mining procedures in comparison with manual annotation of protein-protein interactions.ResultsTo aid the BioCreative protein-protein interaction task, IntAct and MINT (Molecular INTeraction) provided both the training and the test datasets. Data from both databases are comparable because they were curated according to the same standards. During the manual curation process, the major cause of data loss in mining the articles for information was ambiguity in the mapping of the gene names to stable UniProtKB database identifiers. It was also observed that most of the information about interactions was contained only within the full-text of the publication; hence, text mining of protein-protein interaction data will require the analysis of the full-text of the articles and cannot be restricted to the abstract.ConclusionThe development of text-mining tools to extract protein-protein interaction information may increase the literature coverage achieved by manual curation. To support the text-mining community, databases will highlight those sentences within the articles that describe the interactions. These will supply data-miners with a high quality dataset for algorithm development. Furthermore, the dictionary of terms created by the BioCreative competitors could enrich the synonym list of the PSI-MI (Proteomics Standards Initiative-Molecular Interactions) controlled vocabulary, which is used by both databases to annotate their data content.", 
    "genre": "article", 
    "id": "sg:pub.10.1186/gb-2008-9-s2-s5", 
    "isAccessibleForFree": true, 
    "isFundedItemOf": [
      {
        "id": "sg:grant.3762696", 
        "type": "MonetaryGrant"
      }
    ], 
    "isPartOf": [
      {
        "id": "sg:journal.1023439", 
        "issn": [
          "1474-760X", 
          "1465-6906"
        ], 
        "name": "Genome Biology", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "Suppl 2", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "9"
      }
    ], 
    "keywords": [
      "text mining community", 
      "text mining tools", 
      "manual annotation", 
      "curation process", 
      "interaction data", 
      "protein-protein interaction task", 
      "text mining procedures", 
      "manual curation process", 
      "dictionary of terms", 
      "molecular interaction data", 
      "information extraction", 
      "data content", 
      "high-quality datasets", 
      "BioCreative challenge", 
      "text mining", 
      "data loss", 
      "database identifiers", 
      "PSI-MI", 
      "test dataset", 
      "interaction information", 
      "algorithm development", 
      "quality datasets", 
      "interaction tasks", 
      "protein-protein interaction data", 
      "protein-protein interaction information", 
      "synonym lists", 
      "manual curation", 
      "annotation", 
      "biological data", 
      "gene names", 
      "dataset", 
      "database coverage", 
      "information", 
      "database", 
      "mining", 
      "curation", 
      "tool", 
      "identifiers", 
      "task", 
      "literature coverage", 
      "dictionary", 
      "pipeline", 
      "data", 
      "vocabulary", 
      "coverage", 
      "protein-protein interactions", 
      "extraction", 
      "sentences", 
      "challenges", 
      "mapping", 
      "same standards", 
      "training", 
      "ambiguity", 
      "process", 
      "efficiency", 
      "community", 
      "list", 
      "standards", 
      "article", 
      "development", 
      "competitors", 
      "relevant publications", 
      "ResultsTo", 
      "name", 
      "publications", 
      "terms", 
      "time", 
      "interaction", 
      "use", 
      "literature", 
      "competition", 
      "content", 
      "comparison", 
      "analysis", 
      "Intact", 
      "procedure", 
      "mint", 
      "ConclusionThe development", 
      "Abstract", 
      "absence", 
      "major cause", 
      "loss", 
      "consequences", 
      "cause", 
      "BackgroundIn"
    ], 
    "name": "MINT and IntAct contribute to the Second BioCreative challenge: serving the text-mining community with high quality molecular interaction data", 
    "pagination": "s5", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1022366302"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/gb-2008-9-s2-s5"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "18834496"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/gb-2008-9-s2-s5", 
      "https://app.dimensions.ai/details/publication/pub.1022366302"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2022-09-02T15:52", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220902/entities/gbq_results/article/article_457.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1186/gb-2008-9-s2-s5"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/gb-2008-9-s2-s5'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/gb-2008-9-s2-s5'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/gb-2008-9-s2-s5'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/gb-2008-9-s2-s5'


 

This table displays all metadata directly associated to this object as RDF triples.

296 TRIPLES      21 PREDICATES      119 URIs      107 LITERALS      13 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/gb-2008-9-s2-s5 schema:about N395845644c594472998d1d5124bb887f
2 N566e1a8a57b54cc2a22f975471df6670
3 N7dad5aa357d74feb8c694ea95b4e074a
4 Nb022e55ba0774ec2834f38b171481f4d
5 Nc626dbaf14fc4cc5ad59297bef1bcd25
6 Nf84088901b5442a2849eb8dc837a0883
7 anzsrc-for:08
8 anzsrc-for:0801
9 anzsrc-for:0806
10 schema:author N75cdf424f34e4f1699b59d6e90df81c2
11 schema:citation sg:pub.10.1038/nbt0307-262b
12 sg:pub.10.1038/nbt1324
13 sg:pub.10.1186/1741-7007-5-44
14 schema:datePublished 2008-09-01
15 schema:datePublishedReg 2008-09-01
16 schema:description BackgroundIn the absence of consolidated pipelines to archive biological data electronically, information dispersed in the literature must be captured by manual annotation. Unfortunately, manual annotation is time consuming and the coverage of published interaction data is therefore far from complete. The use of text-mining tools to identify relevant publications and to assist in the initial information extraction could help to improve the efficiency of the curation process and, as a consequence, the database coverage of data available in the literature. The 2006 BioCreative competition was aimed at evaluating text-mining procedures in comparison with manual annotation of protein-protein interactions.ResultsTo aid the BioCreative protein-protein interaction task, IntAct and MINT (Molecular INTeraction) provided both the training and the test datasets. Data from both databases are comparable because they were curated according to the same standards. During the manual curation process, the major cause of data loss in mining the articles for information was ambiguity in the mapping of the gene names to stable UniProtKB database identifiers. It was also observed that most of the information about interactions was contained only within the full-text of the publication; hence, text mining of protein-protein interaction data will require the analysis of the full-text of the articles and cannot be restricted to the abstract.ConclusionThe development of text-mining tools to extract protein-protein interaction information may increase the literature coverage achieved by manual curation. To support the text-mining community, databases will highlight those sentences within the articles that describe the interactions. These will supply data-miners with a high quality dataset for algorithm development. Furthermore, the dictionary of terms created by the BioCreative competitors could enrich the synonym list of the PSI-MI (Proteomics Standards Initiative-Molecular Interactions) controlled vocabulary, which is used by both databases to annotate their data content.
17 schema:genre article
18 schema:isAccessibleForFree true
19 schema:isPartOf N52c54f8b977d4627bfe666a1c8d37e3e
20 Nb86946e03d454011855ff97f63f3ee33
21 sg:journal.1023439
22 schema:keywords Abstract
23 BackgroundIn
24 BioCreative challenge
25 ConclusionThe development
26 Intact
27 PSI-MI
28 ResultsTo
29 absence
30 algorithm development
31 ambiguity
32 analysis
33 annotation
34 article
35 biological data
36 cause
37 challenges
38 community
39 comparison
40 competition
41 competitors
42 consequences
43 content
44 coverage
45 curation
46 curation process
47 data
48 data content
49 data loss
50 database
51 database coverage
52 database identifiers
53 dataset
54 development
55 dictionary
56 dictionary of terms
57 efficiency
58 extraction
59 gene names
60 high-quality datasets
61 identifiers
62 information
63 information extraction
64 interaction
65 interaction data
66 interaction information
67 interaction tasks
68 list
69 literature
70 literature coverage
71 loss
72 major cause
73 manual annotation
74 manual curation
75 manual curation process
76 mapping
77 mining
78 mint
79 molecular interaction data
80 name
81 pipeline
82 procedure
83 process
84 protein-protein interaction data
85 protein-protein interaction information
86 protein-protein interaction task
87 protein-protein interactions
88 publications
89 quality datasets
90 relevant publications
91 same standards
92 sentences
93 standards
94 synonym lists
95 task
96 terms
97 test dataset
98 text mining
99 text mining community
100 text mining procedures
101 text mining tools
102 time
103 tool
104 training
105 use
106 vocabulary
107 schema:name MINT and IntAct contribute to the Second BioCreative challenge: serving the text-mining community with high quality molecular interaction data
108 schema:pagination s5
109 schema:productId N5c97496aa8f0405ea1b88c246135bccc
110 N7209730f434c4ad2bd5818d3ac20d138
111 N99c0998c43034114bd3411a846c3548c
112 schema:sameAs https://app.dimensions.ai/details/publication/pub.1022366302
113 https://doi.org/10.1186/gb-2008-9-s2-s5
114 schema:sdDatePublished 2022-09-02T15:52
115 schema:sdLicense https://scigraph.springernature.com/explorer/license/
116 schema:sdPublisher Ndd2365c5e50c4b4fb61dfb3c98230502
117 schema:url https://doi.org/10.1186/gb-2008-9-s2-s5
118 sgo:license sg:explorer/license/
119 sgo:sdDataset articles
120 rdf:type schema:ScholarlyArticle
121 N1bf24bce5ef84e2eb70870365b1757e4 rdf:first sg:person.01171460461.45
122 rdf:rest Nf2b8ceb6168d487d84e9f623172a35fb
123 N23ec253d431d4525b990c6233e259eab rdf:first sg:person.0766155045.13
124 rdf:rest Ndfc0d70dfb874630ac27102dec66aff1
125 N2ebd7c382c574fde9a6e3fbeb064019e rdf:first sg:person.01076760745.48
126 rdf:rest N23ec253d431d4525b990c6233e259eab
127 N395845644c594472998d1d5124bb887f schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
128 schema:name Vocabulary, Controlled
129 rdf:type schema:DefinedTerm
130 N449017fe085f41469f1a5693306b5bc8 rdf:first sg:person.0757135413.69
131 rdf:rest Nee7def35b9a24e3e9ab710ec9c78dfe5
132 N466f52e03f4440c6b7239bc85adb0cac rdf:first sg:person.01316551117.75
133 rdf:rest N449017fe085f41469f1a5693306b5bc8
134 N52c54f8b977d4627bfe666a1c8d37e3e schema:issueNumber Suppl 2
135 rdf:type schema:PublicationIssue
136 N566e1a8a57b54cc2a22f975471df6670 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
137 schema:name Computational Biology
138 rdf:type schema:DefinedTerm
139 N5c97496aa8f0405ea1b88c246135bccc schema:name pubmed_id
140 schema:value 18834496
141 rdf:type schema:PropertyValue
142 N64026f72b10a4d5eb38fd8bb5f57c554 rdf:first sg:person.01307040043.10
143 rdf:rest N1bf24bce5ef84e2eb70870365b1757e4
144 N7209730f434c4ad2bd5818d3ac20d138 schema:name dimensions_id
145 schema:value pub.1022366302
146 rdf:type schema:PropertyValue
147 N75cdf424f34e4f1699b59d6e90df81c2 rdf:first sg:person.01130104722.77
148 rdf:rest Nd2104873eb684b4a97cd8d23465cf874
149 N7dad5aa357d74feb8c694ea95b4e074a schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
150 schema:name Societies, Scientific
151 rdf:type schema:DefinedTerm
152 N822fb1a4db624b44885a996a6475bb5d rdf:first sg:person.01070655672.90
153 rdf:rest rdf:nil
154 N985a0facfa8f4c3a89a00cd42ee24823 rdf:first sg:person.01134215603.98
155 rdf:rest Neb48e169c0b64a88bd06d9dcdc72db0d
156 N99c0998c43034114bd3411a846c3548c schema:name doi
157 schema:value 10.1186/gb-2008-9-s2-s5
158 rdf:type schema:PropertyValue
159 Nab0e5a6b6649458daebe79813dd8298a rdf:first sg:person.01021407513.17
160 rdf:rest Nfa10434b5b4848929d1548dbe8582878
161 Nb022e55ba0774ec2834f38b171481f4d schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
162 schema:name Databases, Bibliographic
163 rdf:type schema:DefinedTerm
164 Nb86946e03d454011855ff97f63f3ee33 schema:volumeNumber 9
165 rdf:type schema:PublicationVolume
166 Nc626dbaf14fc4cc5ad59297bef1bcd25 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
167 schema:name Protein Interaction Mapping
168 rdf:type schema:DefinedTerm
169 Nd2104873eb684b4a97cd8d23465cf874 rdf:first sg:person.01250435717.21
170 rdf:rest Nab0e5a6b6649458daebe79813dd8298a
171 Ndd2365c5e50c4b4fb61dfb3c98230502 schema:name Springer Nature - SN SciGraph project
172 rdf:type schema:Organization
173 Ndfc0d70dfb874630ac27102dec66aff1 rdf:first sg:person.01174107367.97
174 rdf:rest N64026f72b10a4d5eb38fd8bb5f57c554
175 Neb48e169c0b64a88bd06d9dcdc72db0d rdf:first sg:person.0703271022.11
176 rdf:rest N822fb1a4db624b44885a996a6475bb5d
177 Nee7def35b9a24e3e9ab710ec9c78dfe5 rdf:first sg:person.01025250613.49
178 rdf:rest N985a0facfa8f4c3a89a00cd42ee24823
179 Nf2b8ceb6168d487d84e9f623172a35fb rdf:first sg:person.01120775354.41
180 rdf:rest N466f52e03f4440c6b7239bc85adb0cac
181 Nf84088901b5442a2849eb8dc837a0883 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
182 schema:name Proteomics
183 rdf:type schema:DefinedTerm
184 Nfa10434b5b4848929d1548dbe8582878 rdf:first sg:person.01067522713.08
185 rdf:rest N2ebd7c382c574fde9a6e3fbeb064019e
186 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
187 schema:name Information and Computing Sciences
188 rdf:type schema:DefinedTerm
189 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
190 schema:name Artificial Intelligence and Image Processing
191 rdf:type schema:DefinedTerm
192 anzsrc-for:0806 schema:inDefinedTermSet anzsrc-for:
193 schema:name Information Systems
194 rdf:type schema:DefinedTerm
195 sg:grant.3762696 http://pending.schema.org/fundedItem sg:pub.10.1186/gb-2008-9-s2-s5
196 rdf:type schema:MonetaryGrant
197 sg:journal.1023439 schema:issn 1465-6906
198 1474-760X
199 schema:name Genome Biology
200 schema:publisher Springer Nature
201 rdf:type schema:Periodical
202 sg:person.01021407513.17 schema:affiliation grid-institutes:grid.225360.0
203 schema:familyName Khadake
204 schema:givenName Jyoti
205 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01021407513.17
206 rdf:type schema:Person
207 sg:person.01025250613.49 schema:affiliation grid-institutes:grid.225360.0
208 schema:familyName Thorneycroft
209 schema:givenName Dave
210 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01025250613.49
211 rdf:type schema:Person
212 sg:person.01067522713.08 schema:affiliation grid-institutes:grid.225360.0
213 schema:familyName Orchard
214 schema:givenName Sandra
215 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01067522713.08
216 rdf:type schema:Person
217 sg:person.01070655672.90 schema:affiliation grid-institutes:grid.225360.0
218 schema:familyName Hermjakob
219 schema:givenName Henning
220 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01070655672.90
221 rdf:type schema:Person
222 sg:person.01076760745.48 schema:affiliation grid-institutes:grid.7841.a
223 schema:familyName Ceol
224 schema:givenName Arnaud
225 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01076760745.48
226 rdf:type schema:Person
227 sg:person.01120775354.41 schema:affiliation grid-institutes:grid.225360.0
228 schema:familyName Huntley
229 schema:givenName Rachael
230 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01120775354.41
231 rdf:type schema:Person
232 sg:person.01130104722.77 schema:affiliation grid-institutes:grid.7841.a
233 schema:familyName Chatr-aryamontri
234 schema:givenName Andrew
235 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01130104722.77
236 rdf:type schema:Person
237 sg:person.01134215603.98 schema:affiliation grid-institutes:grid.225360.0
238 schema:familyName Apweiler
239 schema:givenName Rolf
240 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01134215603.98
241 rdf:type schema:Person
242 sg:person.01171460461.45 schema:affiliation grid-institutes:grid.225360.0
243 schema:familyName Derow
244 schema:givenName Cathy
245 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01171460461.45
246 rdf:type schema:Person
247 sg:person.01174107367.97 schema:affiliation grid-institutes:grid.7841.a
248 schema:familyName Castagnoli
249 schema:givenName Luisa
250 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01174107367.97
251 rdf:type schema:Person
252 sg:person.01250435717.21 schema:affiliation grid-institutes:grid.225360.0
253 schema:familyName Kerrien
254 schema:givenName Samuel
255 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01250435717.21
256 rdf:type schema:Person
257 sg:person.01307040043.10 schema:affiliation grid-institutes:grid.7841.a
258 schema:familyName Costa
259 schema:givenName Stefano
260 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01307040043.10
261 rdf:type schema:Person
262 sg:person.01316551117.75 schema:affiliation grid-institutes:grid.225360.0
263 schema:familyName Aranda
264 schema:givenName Bruno
265 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01316551117.75
266 rdf:type schema:Person
267 sg:person.0703271022.11 schema:affiliation grid-institutes:grid.7841.a
268 schema:familyName Cesareni
269 schema:givenName Gianni
270 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0703271022.11
271 rdf:type schema:Person
272 sg:person.0757135413.69 schema:affiliation grid-institutes:grid.225360.0
273 schema:familyName Leroy
274 schema:givenName Catherine
275 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0757135413.69
276 rdf:type schema:Person
277 sg:person.0766155045.13 schema:affiliation grid-institutes:grid.7841.a
278 schema:familyName Licata
279 schema:givenName Luana
280 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0766155045.13
281 rdf:type schema:Person
282 sg:pub.10.1038/nbt0307-262b schema:sameAs https://app.dimensions.ai/details/publication/pub.1014250456
283 https://doi.org/10.1038/nbt0307-262b
284 rdf:type schema:CreativeWork
285 sg:pub.10.1038/nbt1324 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032491066
286 https://doi.org/10.1038/nbt1324
287 rdf:type schema:CreativeWork
288 sg:pub.10.1186/1741-7007-5-44 schema:sameAs https://app.dimensions.ai/details/publication/pub.1035046201
289 https://doi.org/10.1186/1741-7007-5-44
290 rdf:type schema:CreativeWork
291 grid-institutes:grid.225360.0 schema:alternateName EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Cambridge, UK
292 schema:name EMBL - European Bioinformatics Institute, Wellcome Trust Genome Campus, CB10 1SD, Cambridge, UK
293 rdf:type schema:Organization
294 grid-institutes:grid.7841.a schema:alternateName Department of Biology, University of Rome, Tor Vergata, Via della Ricerca Scientifica, 00133, Rome, Italy
295 schema:name Department of Biology, University of Rome, Tor Vergata, Via della Ricerca Scientifica, 00133, Rome, Italy
296 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...