ADS-HCSpark: A scalable HaplotypeCaller leveraging adaptive data segmentation to accelerate variant calling on Spark View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2019-12

AUTHORS

Anghong Xiao, Zongze Wu, Shoubin Dong

ABSTRACT

BACKGROUND: The advance of next generation sequencing enables higher throughput with lower price, and as the basic of high-throughput sequencing data analysis, variant calling is widely used in disease research, clinical treatment and medicine research. However, current mainstream variant caller tools have a serious problem of computation bottlenecks, resulting in some long tail tasks when performing on large datasets. This prevents high scalability on clusters of multi-node and multi-core, and leads to long runtime and inefficient usage of computing resources. Thus, a high scalable tool which could run in distributed environment will be highly useful to accelerate variant calling on large scale genome data. RESULTS: In this paper, we present ADS-HCSpark, a scalable tool for variant calling based on Apache Spark framework. ADS-HCSpark accelerates the process of variant calling by implementing the parallelization of mainstream GATK HaplotypeCaller algorithm on multi-core and multi-node. Aiming at solving the problem of computation skew in HaplotypeCaller, a parallel strategy of adaptive data segmentation is proposed and a variant calling algorithm based on adaptive data segmentation is implemented, which achieves good scalability on both single-node and multi-node. For the requirement that adjacent data blocks should have overlapped boundaries, Hadoop-BAM library is customized to implement partitioning BAM file into overlapped blocks, further improving the accuracy of variant calling. CONCLUSIONS: ADS-HCSpark is a scalable tool to achieve variant calling based on Apache Spark framework, implementing the parallelization of GATK HaplotypeCaller algorithm. ADS-HCSpark is evaluated on our cluster and in the case of best performance that could be achieved in this experimental platform, ADS-HCSpark is 74% faster than GATK3.8 HaplotypeCaller on single-node experiments, 57% faster than GATK4.0 HaplotypeCallerSpark and 27% faster than SparkGA on multi-node experiments, with better scalability and the accuracy of over 99%. The source code of ADS-HCSpark is publicly available at https://github.com/SCUT-CCNL/ADS-HCSpark.git . More... »

PAGES

76

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/s12859-019-2665-0

DOI

http://dx.doi.org/10.1186/s12859-019-2665-0

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1112138957

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/30764760


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Algorithms", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Databases, Genetic", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Genetic Variation", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Genome", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Haplotypes", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "High-Throughput Nucleotide Sequencing", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Humans", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Analysis, DNA", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Software", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Time Factors", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "South China University of Technology", 
          "id": "https://www.grid.ac/institutes/grid.79703.3a", 
          "name": [
            "Communication & Computer Network Lab of Guangdong, School of Computer Science & Engineering, South China University of Technology, Wushan Road, 510641, Guangzhou, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Xiao", 
        "givenName": "Anghong", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "South China University of Technology", 
          "id": "https://www.grid.ac/institutes/grid.79703.3a", 
          "name": [
            "Communication & Computer Network Lab of Guangdong, School of Computer Science & Engineering, South China University of Technology, Wushan Road, 510641, Guangzhou, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Wu", 
        "givenName": "Zongze", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "South China University of Technology", 
          "id": "https://www.grid.ac/institutes/grid.79703.3a", 
          "name": [
            "Communication & Computer Network Lab of Guangdong, School of Computer Science & Engineering, South China University of Technology, Wushan Road, 510641, Guangzhou, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Dong", 
        "givenName": "Shoubin", 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1093/bioinformatics/bts054", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1009109395"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/s13059-014-0577-x", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1014312908", 
          "https://doi.org/10.1186/s13059-014-0577-x"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/s13059-014-0577-x", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1014312908", 
          "https://doi.org/10.1186/s13059-014-0577-x"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-11-s12-s2", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1017679645", 
          "https://doi.org/10.1186/1471-2105-11-s12-s2"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/srep17875", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1019760953", 
          "https://doi.org/10.1038/srep17875"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btp352", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1023014918"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btu345", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1023110507"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1101/gr.107524.110", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1032096953"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/j.virusres.2016.08.004", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1034208611"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1093/bioinformatics/btv179", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1039048989"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/ncomms7275", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1041805093", 
          "https://doi.org/10.1038/ncomms7275"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/1471-2105-11-s12-s1", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1044778751", 
          "https://doi.org/10.1186/1471-2105-11-s12-s1"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/2934664", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1052134007"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/3020078.3021749", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1084677294"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/3107411.3107438", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1091243633"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/bibm.2016.7822584", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1094603751"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2019-12", 
    "datePublishedReg": "2019-12-01", 
    "description": "BACKGROUND: The advance of next generation sequencing enables higher throughput with lower price, and as the basic of high-throughput sequencing data analysis, variant calling is widely used in disease research, clinical treatment and medicine research. However, current mainstream variant caller tools have a serious problem of computation bottlenecks, resulting in some long tail tasks when performing on large datasets. This prevents high scalability on clusters of multi-node and multi-core, and leads to long runtime and inefficient usage of computing resources. Thus, a high scalable tool which could run in distributed environment will be highly useful to accelerate variant calling on large scale genome data.\nRESULTS: In this paper, we present ADS-HCSpark, a scalable tool for variant calling based on Apache Spark framework. ADS-HCSpark accelerates the process of variant calling by implementing the parallelization of mainstream GATK HaplotypeCaller algorithm on multi-core and multi-node. Aiming at solving the problem of computation skew in HaplotypeCaller, a parallel strategy of adaptive data segmentation is proposed and a variant calling algorithm based on adaptive data segmentation is implemented, which achieves good scalability on both single-node and multi-node. For the requirement that adjacent data blocks should have overlapped boundaries, Hadoop-BAM library is customized to implement partitioning BAM file into overlapped blocks, further improving the accuracy of variant calling.\nCONCLUSIONS: ADS-HCSpark is a scalable tool to achieve variant calling based on Apache Spark framework, implementing the parallelization of GATK HaplotypeCaller algorithm. ADS-HCSpark is evaluated on our cluster and in the case of best performance that could be achieved in this experimental platform, ADS-HCSpark is 74% faster than GATK3.8 HaplotypeCaller on single-node experiments, 57% faster than GATK4.0 HaplotypeCallerSpark and 27% faster than SparkGA on multi-node experiments, with better scalability and the accuracy of over 99%. The source code of ADS-HCSpark is publicly available at https://github.com/SCUT-CCNL/ADS-HCSpark.git .", 
    "genre": "research_article", 
    "id": "sg:pub.10.1186/s12859-019-2665-0", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1023786", 
        "issn": [
          "1471-2105"
        ], 
        "name": "BMC Bioinformatics", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "20"
      }
    ], 
    "name": "ADS-HCSpark: A scalable HaplotypeCaller leveraging adaptive data segmentation to accelerate variant calling on Spark", 
    "pagination": "76", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "d515645e39a558cd7ad10f5e538f8f44e670896216abe3673d23676b2b8b4dc1"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "30764760"
        ]
      }, 
      {
        "name": "nlm_unique_id", 
        "type": "PropertyValue", 
        "value": [
          "100965194"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/s12859-019-2665-0"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1112138957"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/s12859-019-2665-0", 
      "https://app.dimensions.ai/details/publication/pub.1112138957"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-11T12:10", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000361_0000000361/records_53977_00000001.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://link.springer.com/10.1186%2Fs12859-019-2665-0"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/s12859-019-2665-0'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/s12859-019-2665-0'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/s12859-019-2665-0'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/s12859-019-2665-0'


 

This table displays all metadata directly associated to this object as RDF triples.

169 TRIPLES      21 PREDICATES      54 URIs      31 LITERALS      19 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/s12859-019-2665-0 schema:about N047df63ccad44402b31eb6fb36da95ce
2 N0cc92fb6fdce4dc08a44a8e5c2162602
3 N0fc4f66abc6a46fdb25e8804bf1841a8
4 N3ba9cb7e371b48978878630c892a69fa
5 N6ec223e20b0046e88579ac7cbdc566d7
6 N9ace631ab1c84b7fa030092ef3ebcf3e
7 N9cb11e4314b04b22b6337f299c253d9c
8 Nc3a53bca484c46f1aa75e07aba4d34dc
9 Nca2a40e108a645069fa74030b9f8ef24
10 Ne2199c82c7ca42eb8c5b9d8f8c107cbc
11 anzsrc-for:08
12 anzsrc-for:0801
13 schema:author Na8885205eec34f87a7e09e424c30926f
14 schema:citation sg:pub.10.1038/ncomms7275
15 sg:pub.10.1038/srep17875
16 sg:pub.10.1186/1471-2105-11-s12-s1
17 sg:pub.10.1186/1471-2105-11-s12-s2
18 sg:pub.10.1186/s13059-014-0577-x
19 https://doi.org/10.1016/j.virusres.2016.08.004
20 https://doi.org/10.1093/bioinformatics/btp352
21 https://doi.org/10.1093/bioinformatics/bts054
22 https://doi.org/10.1093/bioinformatics/btu345
23 https://doi.org/10.1093/bioinformatics/btv179
24 https://doi.org/10.1101/gr.107524.110
25 https://doi.org/10.1109/bibm.2016.7822584
26 https://doi.org/10.1145/2934664
27 https://doi.org/10.1145/3020078.3021749
28 https://doi.org/10.1145/3107411.3107438
29 schema:datePublished 2019-12
30 schema:datePublishedReg 2019-12-01
31 schema:description BACKGROUND: The advance of next generation sequencing enables higher throughput with lower price, and as the basic of high-throughput sequencing data analysis, variant calling is widely used in disease research, clinical treatment and medicine research. However, current mainstream variant caller tools have a serious problem of computation bottlenecks, resulting in some long tail tasks when performing on large datasets. This prevents high scalability on clusters of multi-node and multi-core, and leads to long runtime and inefficient usage of computing resources. Thus, a high scalable tool which could run in distributed environment will be highly useful to accelerate variant calling on large scale genome data. RESULTS: In this paper, we present ADS-HCSpark, a scalable tool for variant calling based on Apache Spark framework. ADS-HCSpark accelerates the process of variant calling by implementing the parallelization of mainstream GATK HaplotypeCaller algorithm on multi-core and multi-node. Aiming at solving the problem of computation skew in HaplotypeCaller, a parallel strategy of adaptive data segmentation is proposed and a variant calling algorithm based on adaptive data segmentation is implemented, which achieves good scalability on both single-node and multi-node. For the requirement that adjacent data blocks should have overlapped boundaries, Hadoop-BAM library is customized to implement partitioning BAM file into overlapped blocks, further improving the accuracy of variant calling. CONCLUSIONS: ADS-HCSpark is a scalable tool to achieve variant calling based on Apache Spark framework, implementing the parallelization of GATK HaplotypeCaller algorithm. ADS-HCSpark is evaluated on our cluster and in the case of best performance that could be achieved in this experimental platform, ADS-HCSpark is 74% faster than GATK3.8 HaplotypeCaller on single-node experiments, 57% faster than GATK4.0 HaplotypeCallerSpark and 27% faster than SparkGA on multi-node experiments, with better scalability and the accuracy of over 99%. The source code of ADS-HCSpark is publicly available at https://github.com/SCUT-CCNL/ADS-HCSpark.git .
32 schema:genre research_article
33 schema:inLanguage en
34 schema:isAccessibleForFree true
35 schema:isPartOf N1a9aa553afeb47f48d87273fca3154bd
36 Na9f314b2847c4ed4a4f83ce0a57239c4
37 sg:journal.1023786
38 schema:name ADS-HCSpark: A scalable HaplotypeCaller leveraging adaptive data segmentation to accelerate variant calling on Spark
39 schema:pagination 76
40 schema:productId N4820850fe5b64f04b5134206a864dce2
41 N5df4ff5c06d446d8aced6a8322f3e913
42 N73fd9ab1f3254334ba1ea7b95e57833b
43 N9c90b013f5dc4775856605e08a98dc14
44 Na3097804370b438f8b7b28d2ae22d726
45 schema:sameAs https://app.dimensions.ai/details/publication/pub.1112138957
46 https://doi.org/10.1186/s12859-019-2665-0
47 schema:sdDatePublished 2019-04-11T12:10
48 schema:sdLicense https://scigraph.springernature.com/explorer/license/
49 schema:sdPublisher Ndeb5345dba554d83a9b551ea0ed156a0
50 schema:url https://link.springer.com/10.1186%2Fs12859-019-2665-0
51 sgo:license sg:explorer/license/
52 sgo:sdDataset articles
53 rdf:type schema:ScholarlyArticle
54 N047df63ccad44402b31eb6fb36da95ce schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
55 schema:name Algorithms
56 rdf:type schema:DefinedTerm
57 N0cc92fb6fdce4dc08a44a8e5c2162602 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
58 schema:name Haplotypes
59 rdf:type schema:DefinedTerm
60 N0fc4f66abc6a46fdb25e8804bf1841a8 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
61 schema:name Humans
62 rdf:type schema:DefinedTerm
63 N1a9aa553afeb47f48d87273fca3154bd schema:issueNumber 1
64 rdf:type schema:PublicationIssue
65 N1cf03709ca7449839bd494e364c41041 schema:affiliation https://www.grid.ac/institutes/grid.79703.3a
66 schema:familyName Xiao
67 schema:givenName Anghong
68 rdf:type schema:Person
69 N21b7fe43d1014b9c83d6a88c50e7a8ef schema:affiliation https://www.grid.ac/institutes/grid.79703.3a
70 schema:familyName Dong
71 schema:givenName Shoubin
72 rdf:type schema:Person
73 N21bb6c10dbe24c49bc6d431f85b223ce rdf:first N43162c5fdcc441c8af2da812adccca82
74 rdf:rest Nd9152c4b51114e27b802c5bc125c333c
75 N3ba9cb7e371b48978878630c892a69fa schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
76 schema:name Genetic Variation
77 rdf:type schema:DefinedTerm
78 N43162c5fdcc441c8af2da812adccca82 schema:affiliation https://www.grid.ac/institutes/grid.79703.3a
79 schema:familyName Wu
80 schema:givenName Zongze
81 rdf:type schema:Person
82 N4820850fe5b64f04b5134206a864dce2 schema:name pubmed_id
83 schema:value 30764760
84 rdf:type schema:PropertyValue
85 N5df4ff5c06d446d8aced6a8322f3e913 schema:name nlm_unique_id
86 schema:value 100965194
87 rdf:type schema:PropertyValue
88 N6ec223e20b0046e88579ac7cbdc566d7 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
89 schema:name Software
90 rdf:type schema:DefinedTerm
91 N73fd9ab1f3254334ba1ea7b95e57833b schema:name doi
92 schema:value 10.1186/s12859-019-2665-0
93 rdf:type schema:PropertyValue
94 N9ace631ab1c84b7fa030092ef3ebcf3e schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
95 schema:name Sequence Analysis, DNA
96 rdf:type schema:DefinedTerm
97 N9c90b013f5dc4775856605e08a98dc14 schema:name readcube_id
98 schema:value d515645e39a558cd7ad10f5e538f8f44e670896216abe3673d23676b2b8b4dc1
99 rdf:type schema:PropertyValue
100 N9cb11e4314b04b22b6337f299c253d9c schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
101 schema:name Databases, Genetic
102 rdf:type schema:DefinedTerm
103 Na3097804370b438f8b7b28d2ae22d726 schema:name dimensions_id
104 schema:value pub.1112138957
105 rdf:type schema:PropertyValue
106 Na8885205eec34f87a7e09e424c30926f rdf:first N1cf03709ca7449839bd494e364c41041
107 rdf:rest N21bb6c10dbe24c49bc6d431f85b223ce
108 Na9f314b2847c4ed4a4f83ce0a57239c4 schema:volumeNumber 20
109 rdf:type schema:PublicationVolume
110 Nc3a53bca484c46f1aa75e07aba4d34dc schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
111 schema:name Genome
112 rdf:type schema:DefinedTerm
113 Nca2a40e108a645069fa74030b9f8ef24 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
114 schema:name Time Factors
115 rdf:type schema:DefinedTerm
116 Nd9152c4b51114e27b802c5bc125c333c rdf:first N21b7fe43d1014b9c83d6a88c50e7a8ef
117 rdf:rest rdf:nil
118 Ndeb5345dba554d83a9b551ea0ed156a0 schema:name Springer Nature - SN SciGraph project
119 rdf:type schema:Organization
120 Ne2199c82c7ca42eb8c5b9d8f8c107cbc schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
121 schema:name High-Throughput Nucleotide Sequencing
122 rdf:type schema:DefinedTerm
123 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
124 schema:name Information and Computing Sciences
125 rdf:type schema:DefinedTerm
126 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
127 schema:name Artificial Intelligence and Image Processing
128 rdf:type schema:DefinedTerm
129 sg:journal.1023786 schema:issn 1471-2105
130 schema:name BMC Bioinformatics
131 rdf:type schema:Periodical
132 sg:pub.10.1038/ncomms7275 schema:sameAs https://app.dimensions.ai/details/publication/pub.1041805093
133 https://doi.org/10.1038/ncomms7275
134 rdf:type schema:CreativeWork
135 sg:pub.10.1038/srep17875 schema:sameAs https://app.dimensions.ai/details/publication/pub.1019760953
136 https://doi.org/10.1038/srep17875
137 rdf:type schema:CreativeWork
138 sg:pub.10.1186/1471-2105-11-s12-s1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1044778751
139 https://doi.org/10.1186/1471-2105-11-s12-s1
140 rdf:type schema:CreativeWork
141 sg:pub.10.1186/1471-2105-11-s12-s2 schema:sameAs https://app.dimensions.ai/details/publication/pub.1017679645
142 https://doi.org/10.1186/1471-2105-11-s12-s2
143 rdf:type schema:CreativeWork
144 sg:pub.10.1186/s13059-014-0577-x schema:sameAs https://app.dimensions.ai/details/publication/pub.1014312908
145 https://doi.org/10.1186/s13059-014-0577-x
146 rdf:type schema:CreativeWork
147 https://doi.org/10.1016/j.virusres.2016.08.004 schema:sameAs https://app.dimensions.ai/details/publication/pub.1034208611
148 rdf:type schema:CreativeWork
149 https://doi.org/10.1093/bioinformatics/btp352 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023014918
150 rdf:type schema:CreativeWork
151 https://doi.org/10.1093/bioinformatics/bts054 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009109395
152 rdf:type schema:CreativeWork
153 https://doi.org/10.1093/bioinformatics/btu345 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023110507
154 rdf:type schema:CreativeWork
155 https://doi.org/10.1093/bioinformatics/btv179 schema:sameAs https://app.dimensions.ai/details/publication/pub.1039048989
156 rdf:type schema:CreativeWork
157 https://doi.org/10.1101/gr.107524.110 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032096953
158 rdf:type schema:CreativeWork
159 https://doi.org/10.1109/bibm.2016.7822584 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094603751
160 rdf:type schema:CreativeWork
161 https://doi.org/10.1145/2934664 schema:sameAs https://app.dimensions.ai/details/publication/pub.1052134007
162 rdf:type schema:CreativeWork
163 https://doi.org/10.1145/3020078.3021749 schema:sameAs https://app.dimensions.ai/details/publication/pub.1084677294
164 rdf:type schema:CreativeWork
165 https://doi.org/10.1145/3107411.3107438 schema:sameAs https://app.dimensions.ai/details/publication/pub.1091243633
166 rdf:type schema:CreativeWork
167 https://www.grid.ac/institutes/grid.79703.3a schema:alternateName South China University of Technology
168 schema:name Communication & Computer Network Lab of Guangdong, School of Computer Science & Engineering, South China University of Technology, Wushan Road, 510641, Guangzhou, China
169 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...