ADS-HCSpark: A scalable HaplotypeCaller leveraging adaptive data segmentation to accelerate variant calling on Spark View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2019-12

AUTHORS

Anghong Xiao, Zongze Wu, Shoubin Dong

ABSTRACT

BACKGROUND: The advance of next generation sequencing enables higher throughput with lower price, and as the basic of high-throughput sequencing data analysis, variant calling is widely used in disease research, clinical treatment and medicine research. However, current mainstream variant caller tools have a serious problem of computation bottlenecks, resulting in some long tail tasks when performing on large datasets. This prevents high scalability on clusters of multi-node and multi-core, and leads to long runtime and inefficient usage of computing resources. Thus, a high scalable tool which could run in distributed environment will be highly useful to accelerate variant calling on large scale genome data. RESULTS: In this paper, we present ADS-HCSpark, a scalable tool for variant calling based on Apache Spark framework. ADS-HCSpark accelerates the process of variant calling by implementing the parallelization of mainstream GATK HaplotypeCaller algorithm on multi-core and multi-node. Aiming at solving the problem of computation skew in HaplotypeCaller, a parallel strategy of adaptive data segmentation is proposed and a variant calling algorithm based on adaptive data segmentation is implemented, which achieves good scalability on both single-node and multi-node. For the requirement that adjacent data blocks should have overlapped boundaries, Hadoop-BAM library is customized to implement partitioning BAM file into overlapped blocks, further improving the accuracy of variant calling. CONCLUSIONS: ADS-HCSpark is a scalable tool to achieve variant calling based on Apache Spark framework, implementing the parallelization of GATK HaplotypeCaller algorithm. ADS-HCSpark is evaluated on our cluster and in the case of best performance that could be achieved in this experimental platform, ADS-HCSpark is 74% faster than GATK3.8 HaplotypeCaller on single-node experiments, 57% faster than GATK4.0 HaplotypeCallerSpark and 27% faster than SparkGA on multi-node experiments, with better scalability and the accuracy of over 99%. The source code of ADS-HCSpark is publicly available at https://github.com/SCUT-CCNL/ADS-HCSpark.git . More... »

PAGES

76

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/s12859-019-2665-0

DOI

http://dx.doi.org/10.1186/s12859-019-2665-0

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1112138957

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/30764760


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Algorithms", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Databases, Genetic", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Genetic Variation", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Genome", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Haplotypes", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "High-Throughput Nucleotide Sequencing", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Humans", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Analysis, DNA", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Software", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Time Factors", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "South China University of Technology", 
          "id": "https://www.grid.ac/institutes/grid.79703.3a", 
          "name": [
            "Communication & Computer Network Lab of Guangdong, School of Computer Science & Engineering, South China University of Technology, Wushan Road, 510641, Guangzhou, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Xiao", 
        "givenName": "Anghong", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "South China University of Technology", 
          "id": "https://www.grid.ac/institutes/grid.79703.3a", 
          "name": [
            "Communication & Computer Network Lab of Guangdong, School of Computer Science & Engineering, South China University of Technology, Wushan Road, 510641, Guangzhou, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Wu", 
        "givenName": "Zongze", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "South China University of Technology", 
          "id": "https://www.grid.ac/institutes/grid.79703.3a", 
          "name": [
            "Communication & Computer Network Lab of Guangdong, School of Computer Science & Engineering, South China University of Technology, Wushan Road, 510641, Guangzhou, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Dong", 
        "givenName": "Shoubin", 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1093/bioinformatics/bts054", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1009109395"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/s13059-014-0577-x", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1014312908", 
          "https://doi.org/10.1186/s13059-014-0577-x"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/s13059-014-0577-x", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1014312908", 
          "https://doi.org/10.1186/s13059-014-0577-x"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-11-s12-s2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1017679645", 
          "https://doi.org/10.1186/1471-2105-11-s12-s2"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/srep17875", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1019760953", 
          "https://doi.org/10.1038/srep17875"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btp352", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1023014918"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btu345", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1023110507"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1101/gr.107524.110", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1032096953"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.virusres.2016.08.004", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1034208611"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btv179", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1039048989"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/ncomms7275", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1041805093", 
          "https://doi.org/10.1038/ncomms7275"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-11-s12-s1", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1044778751", 
          "https://doi.org/10.1186/1471-2105-11-s12-s1"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/2934664", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1052134007"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/3020078.3021749", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1084677294"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/3107411.3107438", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1091243633"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/bibm.2016.7822584", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1094603751"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2019-12", 
    "datePublishedReg": "2019-12-01", 
    "description": "BACKGROUND: The advance of next generation sequencing enables higher throughput with lower price, and as the basic of high-throughput sequencing data analysis, variant calling is widely used in disease research, clinical treatment and medicine research. However, current mainstream variant caller tools have a serious problem of computation bottlenecks, resulting in some long tail tasks when performing on large datasets. This prevents high scalability on clusters of multi-node and multi-core, and leads to long runtime and inefficient usage of computing resources. Thus, a high scalable tool which could run in distributed environment will be highly useful to accelerate variant calling on large scale genome data.\nRESULTS: In this paper, we present ADS-HCSpark, a scalable tool for variant calling based on Apache Spark framework. ADS-HCSpark accelerates the process of variant calling by implementing the parallelization of mainstream GATK HaplotypeCaller algorithm on multi-core and multi-node. Aiming at solving the problem of computation skew in HaplotypeCaller, a parallel strategy of adaptive data segmentation is proposed and a variant calling algorithm based on adaptive data segmentation is implemented, which achieves good scalability on both single-node and multi-node. For the requirement that adjacent data blocks should have overlapped boundaries, Hadoop-BAM library is customized to implement partitioning BAM file into overlapped blocks, further improving the accuracy of variant calling.\nCONCLUSIONS: ADS-HCSpark is a scalable tool to achieve variant calling based on Apache Spark framework, implementing the parallelization of GATK HaplotypeCaller algorithm. ADS-HCSpark is evaluated on our cluster and in the case of best performance that could be achieved in this experimental platform, ADS-HCSpark is 74% faster than GATK3.8 HaplotypeCaller on single-node experiments, 57% faster than GATK4.0 HaplotypeCallerSpark and 27% faster than SparkGA on multi-node experiments, with better scalability and the accuracy of over 99%. The source code of ADS-HCSpark is publicly available at https://github.com/SCUT-CCNL/ADS-HCSpark.git .", 
    "genre": "research_article", 
    "id": "sg:pub.10.1186/s12859-019-2665-0", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1023786", 
        "issn": [
          "1471-2105"
        ], 
        "name": "BMC Bioinformatics", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "20"
      }
    ], 
    "name": "ADS-HCSpark: A scalable HaplotypeCaller leveraging adaptive data segmentation to accelerate variant calling on Spark", 
    "pagination": "76", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "d515645e39a558cd7ad10f5e538f8f44e670896216abe3673d23676b2b8b4dc1"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "30764760"
        ]
      }, 
      {
        "name": "nlm_unique_id", 
        "type": "PropertyValue", 
        "value": [
          "100965194"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/s12859-019-2665-0"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1112138957"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/s12859-019-2665-0", 
      "https://app.dimensions.ai/details/publication/pub.1112138957"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-11T12:10", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000361_0000000361/records_53977_00000001.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://link.springer.com/10.1186%2Fs12859-019-2665-0"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s12859-019-2665-0'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s12859-019-2665-0'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s12859-019-2665-0'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s12859-019-2665-0'


 

This table displays all metadata directly associated to this object as RDF triples.

169 TRIPLES      21 PREDICATES      54 URIs      31 LITERALS      19 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/s12859-019-2665-0 schema:about N09d2c212feed48a1a2d3f517f51dd581
2 N3920e3b5f26b42da9002c30e1580700a
3 N3c86b499c69b4c74896763c941696bc8
4 N498ad2b4a08d44a5b4e7a0a9c716c776
5 N4e35cd820e6b415e92070fc78938521f
6 N5c3a5470fb33412fb4be37a490402dea
7 N6193eace16ce4033adb28b4ba1138db9
8 N8293bbf0a4c04ec0b38f1b523c2cba87
9 N8bb8bcd4c3be42f0aca8548d28d1dc61
10 Nee7ca44fa85043429beb49cc20ccb330
11 anzsrc-for:08
12 anzsrc-for:0801
13 schema:author N097321d8f41043bfa92ea4fd13a1d1ad
14 schema:citation sg:pub.10.1038/ncomms7275
15 sg:pub.10.1038/srep17875
16 sg:pub.10.1186/1471-2105-11-s12-s1
17 sg:pub.10.1186/1471-2105-11-s12-s2
18 sg:pub.10.1186/s13059-014-0577-x
19 https://doi.org/10.1016/j.virusres.2016.08.004
20 https://doi.org/10.1093/bioinformatics/btp352
21 https://doi.org/10.1093/bioinformatics/bts054
22 https://doi.org/10.1093/bioinformatics/btu345
23 https://doi.org/10.1093/bioinformatics/btv179
24 https://doi.org/10.1101/gr.107524.110
25 https://doi.org/10.1109/bibm.2016.7822584
26 https://doi.org/10.1145/2934664
27 https://doi.org/10.1145/3020078.3021749
28 https://doi.org/10.1145/3107411.3107438
29 schema:datePublished 2019-12
30 schema:datePublishedReg 2019-12-01
31 schema:description BACKGROUND: The advance of next generation sequencing enables higher throughput with lower price, and as the basic of high-throughput sequencing data analysis, variant calling is widely used in disease research, clinical treatment and medicine research. However, current mainstream variant caller tools have a serious problem of computation bottlenecks, resulting in some long tail tasks when performing on large datasets. This prevents high scalability on clusters of multi-node and multi-core, and leads to long runtime and inefficient usage of computing resources. Thus, a high scalable tool which could run in distributed environment will be highly useful to accelerate variant calling on large scale genome data. RESULTS: In this paper, we present ADS-HCSpark, a scalable tool for variant calling based on Apache Spark framework. ADS-HCSpark accelerates the process of variant calling by implementing the parallelization of mainstream GATK HaplotypeCaller algorithm on multi-core and multi-node. Aiming at solving the problem of computation skew in HaplotypeCaller, a parallel strategy of adaptive data segmentation is proposed and a variant calling algorithm based on adaptive data segmentation is implemented, which achieves good scalability on both single-node and multi-node. For the requirement that adjacent data blocks should have overlapped boundaries, Hadoop-BAM library is customized to implement partitioning BAM file into overlapped blocks, further improving the accuracy of variant calling. CONCLUSIONS: ADS-HCSpark is a scalable tool to achieve variant calling based on Apache Spark framework, implementing the parallelization of GATK HaplotypeCaller algorithm. ADS-HCSpark is evaluated on our cluster and in the case of best performance that could be achieved in this experimental platform, ADS-HCSpark is 74% faster than GATK3.8 HaplotypeCaller on single-node experiments, 57% faster than GATK4.0 HaplotypeCallerSpark and 27% faster than SparkGA on multi-node experiments, with better scalability and the accuracy of over 99%. The source code of ADS-HCSpark is publicly available at https://github.com/SCUT-CCNL/ADS-HCSpark.git .
32 schema:genre research_article
33 schema:inLanguage en
34 schema:isAccessibleForFree true
35 schema:isPartOf N8cc85a0d72f84270b25d1c8c097f5de2
36 Nd78e4386083c4470b56db211f8feef84
37 sg:journal.1023786
38 schema:name ADS-HCSpark: A scalable HaplotypeCaller leveraging adaptive data segmentation to accelerate variant calling on Spark
39 schema:pagination 76
40 schema:productId N055d3ac17dff46f2907f421a3fed1058
41 N27ddb83dec8b41c5adf3dfcf07f1bff0
42 N8b8c1f5f4be74ac6b8f465c18bf5b9e9
43 Nb3286cbedc3f434181f0114aacb3e206
44 Nc09ab73e63a74bf3a00f627e19b31f22
45 schema:sameAs https://app.dimensions.ai/details/publication/pub.1112138957
46 https://doi.org/10.1186/s12859-019-2665-0
47 schema:sdDatePublished 2019-04-11T12:10
48 schema:sdLicense https://scigraph.springernature.com/explorer/license/
49 schema:sdPublisher Nbbe6e74392d2422ca0d675d3834a981c
50 schema:url https://link.springer.com/10.1186%2Fs12859-019-2665-0
51 sgo:license sg:explorer/license/
52 sgo:sdDataset articles
53 rdf:type schema:ScholarlyArticle
54 N055d3ac17dff46f2907f421a3fed1058 schema:name dimensions_id
55 schema:value pub.1112138957
56 rdf:type schema:PropertyValue
57 N097321d8f41043bfa92ea4fd13a1d1ad rdf:first Nd6afa3f5b9ac454ab8ef65da2ac3d46f
58 rdf:rest N2c40f188b5a24722b4df0c4857026893
59 N09d2c212feed48a1a2d3f517f51dd581 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
60 schema:name High-Throughput Nucleotide Sequencing
61 rdf:type schema:DefinedTerm
62 N27ddb83dec8b41c5adf3dfcf07f1bff0 schema:name doi
63 schema:value 10.1186/s12859-019-2665-0
64 rdf:type schema:PropertyValue
65 N2c40f188b5a24722b4df0c4857026893 rdf:first N9c2781f47bd64258a10b5a8056e25d1d
66 rdf:rest N6f4fa2cb321a42b6a6a87866de53d45c
67 N3920e3b5f26b42da9002c30e1580700a schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
68 schema:name Algorithms
69 rdf:type schema:DefinedTerm
70 N3c86b499c69b4c74896763c941696bc8 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
71 schema:name Genetic Variation
72 rdf:type schema:DefinedTerm
73 N498ad2b4a08d44a5b4e7a0a9c716c776 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
74 schema:name Humans
75 rdf:type schema:DefinedTerm
76 N4e35cd820e6b415e92070fc78938521f schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
77 schema:name Databases, Genetic
78 rdf:type schema:DefinedTerm
79 N5c3a5470fb33412fb4be37a490402dea schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
80 schema:name Time Factors
81 rdf:type schema:DefinedTerm
82 N6193eace16ce4033adb28b4ba1138db9 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
83 schema:name Genome
84 rdf:type schema:DefinedTerm
85 N6f4fa2cb321a42b6a6a87866de53d45c rdf:first N8143f192235a43b887fae91444cec453
86 rdf:rest rdf:nil
87 N8143f192235a43b887fae91444cec453 schema:affiliation https://www.grid.ac/institutes/grid.79703.3a
88 schema:familyName Dong
89 schema:givenName Shoubin
90 rdf:type schema:Person
91 N8293bbf0a4c04ec0b38f1b523c2cba87 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
92 schema:name Sequence Analysis, DNA
93 rdf:type schema:DefinedTerm
94 N8b8c1f5f4be74ac6b8f465c18bf5b9e9 schema:name pubmed_id
95 schema:value 30764760
96 rdf:type schema:PropertyValue
97 N8bb8bcd4c3be42f0aca8548d28d1dc61 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
98 schema:name Haplotypes
99 rdf:type schema:DefinedTerm
100 N8cc85a0d72f84270b25d1c8c097f5de2 schema:volumeNumber 20
101 rdf:type schema:PublicationVolume
102 N9c2781f47bd64258a10b5a8056e25d1d schema:affiliation https://www.grid.ac/institutes/grid.79703.3a
103 schema:familyName Wu
104 schema:givenName Zongze
105 rdf:type schema:Person
106 Nb3286cbedc3f434181f0114aacb3e206 schema:name readcube_id
107 schema:value d515645e39a558cd7ad10f5e538f8f44e670896216abe3673d23676b2b8b4dc1
108 rdf:type schema:PropertyValue
109 Nbbe6e74392d2422ca0d675d3834a981c schema:name Springer Nature - SN SciGraph project
110 rdf:type schema:Organization
111 Nc09ab73e63a74bf3a00f627e19b31f22 schema:name nlm_unique_id
112 schema:value 100965194
113 rdf:type schema:PropertyValue
114 Nd6afa3f5b9ac454ab8ef65da2ac3d46f schema:affiliation https://www.grid.ac/institutes/grid.79703.3a
115 schema:familyName Xiao
116 schema:givenName Anghong
117 rdf:type schema:Person
118 Nd78e4386083c4470b56db211f8feef84 schema:issueNumber 1
119 rdf:type schema:PublicationIssue
120 Nee7ca44fa85043429beb49cc20ccb330 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
121 schema:name Software
122 rdf:type schema:DefinedTerm
123 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
124 schema:name Information and Computing Sciences
125 rdf:type schema:DefinedTerm
126 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
127 schema:name Artificial Intelligence and Image Processing
128 rdf:type schema:DefinedTerm
129 sg:journal.1023786 schema:issn 1471-2105
130 schema:name BMC Bioinformatics
131 rdf:type schema:Periodical
132 sg:pub.10.1038/ncomms7275 schema:sameAs https://app.dimensions.ai/details/publication/pub.1041805093
133 https://doi.org/10.1038/ncomms7275
134 rdf:type schema:CreativeWork
135 sg:pub.10.1038/srep17875 schema:sameAs https://app.dimensions.ai/details/publication/pub.1019760953
136 https://doi.org/10.1038/srep17875
137 rdf:type schema:CreativeWork
138 sg:pub.10.1186/1471-2105-11-s12-s1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044778751
139 https://doi.org/10.1186/1471-2105-11-s12-s1
140 rdf:type schema:CreativeWork
141 sg:pub.10.1186/1471-2105-11-s12-s2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017679645
142 https://doi.org/10.1186/1471-2105-11-s12-s2
143 rdf:type schema:CreativeWork
144 sg:pub.10.1186/s13059-014-0577-x schema:sameAs https://app.dimensions.ai/details/publication/pub.1014312908
145 https://doi.org/10.1186/s13059-014-0577-x
146 rdf:type schema:CreativeWork
147 https://doi.org/10.1016/j.virusres.2016.08.004 schema:sameAs https://app.dimensions.ai/details/publication/pub.1034208611
148 rdf:type schema:CreativeWork
149 https://doi.org/10.1093/bioinformatics/btp352 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023014918
150 rdf:type schema:CreativeWork
151 https://doi.org/10.1093/bioinformatics/bts054 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009109395
152 rdf:type schema:CreativeWork
153 https://doi.org/10.1093/bioinformatics/btu345 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023110507
154 rdf:type schema:CreativeWork
155 https://doi.org/10.1093/bioinformatics/btv179 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039048989
156 rdf:type schema:CreativeWork
157 https://doi.org/10.1101/gr.107524.110 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032096953
158 rdf:type schema:CreativeWork
159 https://doi.org/10.1109/bibm.2016.7822584 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094603751
160 rdf:type schema:CreativeWork
161 https://doi.org/10.1145/2934664 schema:sameAs https://app.dimensions.ai/details/publication/pub.1052134007
162 rdf:type schema:CreativeWork
163 https://doi.org/10.1145/3020078.3021749 schema:sameAs https://app.dimensions.ai/details/publication/pub.1084677294
164 rdf:type schema:CreativeWork
165 https://doi.org/10.1145/3107411.3107438 schema:sameAs https://app.dimensions.ai/details/publication/pub.1091243633
166 rdf:type schema:CreativeWork
167 https://www.grid.ac/institutes/grid.79703.3a schema:alternateName South China University of Technology
168 schema:name Communication & Computer Network Lab of Guangdong, School of Computer Science & Engineering, South China University of Technology, Wushan Road, 510641, Guangzhou, China
169 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...