PileLine: a toolbox to handle genome position information in next-generation sequencing studies View Full Text


Ontology type: schema:ScholarlyArticle      Open Access: True


Article Info

DATE

2011-01-24

AUTHORS

Daniel Glez-Peña, Gonzalo Gómez-López, Miguel Reboiro-Jato, Florentino Fdez-Riverola, David G Pisano

ABSTRACT

BACKGROUND: Genomic position (GP) files currently used in next-generation sequencing (NGS) studies are always difficult to manipulate due to their huge size and the lack of appropriate tools to properly manage them. The structure of these flat files is based on representing one line per position that has been covered by at least one aligned read, imposing significant restrictions from a computational performance perspective. RESULTS: PileLine implements a flexible command-line toolkit providing specific support to the management, filtering, comparison and annotation of GP files produced by NGS experiments. PileLine tools are coded in Java and run on both UNIX (Linux, Mac OS) and Windows platforms. The set of tools comprising PileLine are designed to be memory efficient by performing fast seek on-disk operations over sorted GP files. CONCLUSIONS: Our novel toolbox has been extensively tested taking into consideration performance issues. It is publicly available at http://sourceforge.net/projects/pilelinetools under the GNU LGPL license. Full documentation including common use cases and guided analysis workflows is available at http://sing.ei.uvigo.es/pileline. More... »

PAGES

31-31

Identifiers

URI

http://scigraph.springernature.com/pub.10.1186/1471-2105-12-31

DOI

http://dx.doi.org/10.1186/1471-2105-12-31

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1004470351

PUBMED

https://www.ncbi.nlm.nih.gov/pubmed/21261974


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/06", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Biological Sciences", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0604", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Genetics", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Computational Biology", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Genome", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Sequence Analysis, DNA", 
        "type": "DefinedTerm"
      }, 
      {
        "inDefinedTermSet": "https://www.nlm.nih.gov/mesh/", 
        "name": "Software", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Higher Technical School of Computer Engineering, University of Vigo, Ourense, Spain", 
          "id": "http://www.grid.ac/institutes/grid.6312.6", 
          "name": [
            "Higher Technical School of Computer Engineering, University of Vigo, Ourense, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Glez-Pe\u00f1a", 
        "givenName": "Daniel", 
        "id": "sg:person.014144574663.72", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014144574663.72"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Bioinformatics Unit (UBio), Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain", 
          "id": "http://www.grid.ac/institutes/grid.7719.8", 
          "name": [
            "Bioinformatics Unit (UBio), Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "G\u00f3mez-L\u00f3pez", 
        "givenName": "Gonzalo", 
        "id": "sg:person.0700636421.39", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0700636421.39"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Higher Technical School of Computer Engineering, University of Vigo, Ourense, Spain", 
          "id": "http://www.grid.ac/institutes/grid.6312.6", 
          "name": [
            "Higher Technical School of Computer Engineering, University of Vigo, Ourense, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Reboiro-Jato", 
        "givenName": "Miguel", 
        "id": "sg:person.01231605303.77", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01231605303.77"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Higher Technical School of Computer Engineering, University of Vigo, Ourense, Spain", 
          "id": "http://www.grid.ac/institutes/grid.6312.6", 
          "name": [
            "Higher Technical School of Computer Engineering, University of Vigo, Ourense, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Fdez-Riverola", 
        "givenName": "Florentino", 
        "id": "sg:person.01277720503.20", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01277720503.20"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Bioinformatics Unit (UBio), Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain", 
          "id": "http://www.grid.ac/institutes/grid.7719.8", 
          "name": [
            "Bioinformatics Unit (UBio), Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Pisano", 
        "givenName": "David G", 
        "id": "sg:person.01361770421.50", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01361770421.50"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "sg:pub.10.1038/nprot.2009.86", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1015642657", 
          "https://doi.org/10.1038/nprot.2009.86"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nature08989", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1051657357", 
          "https://doi.org/10.1038/nature08989"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1186/gb-2009-10-3-r32", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1031680869", 
          "https://doi.org/10.1186/gb-2009-10-3-r32"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1038/nrg2626", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1023911485", 
          "https://doi.org/10.1038/nrg2626"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2011-01-24", 
    "datePublishedReg": "2011-01-24", 
    "description": "BACKGROUND: Genomic position (GP) files currently used in next-generation sequencing (NGS) studies are always difficult to manipulate due to their huge size and the lack of appropriate tools to properly manage them. The structure of these flat files is based on representing one line per position that has been covered by at least one aligned read, imposing significant restrictions from a computational performance perspective.\nRESULTS: PileLine implements a flexible command-line toolkit providing specific support to the management, filtering, comparison and annotation of GP files produced by NGS experiments. PileLine tools are coded in Java and run on both UNIX (Linux, Mac OS) and Windows platforms. The set of tools comprising PileLine are designed to be memory efficient by performing fast seek on-disk operations over sorted GP files.\nCONCLUSIONS: Our novel toolbox has been extensively tested taking into consideration performance issues. It is publicly available at http://sourceforge.net/projects/pilelinetools under the GNU LGPL license. Full documentation including common use cases and guided analysis workflows is available at http://sing.ei.uvigo.es/pileline.", 
    "genre": "article", 
    "id": "sg:pub.10.1186/1471-2105-12-31", 
    "inLanguage": "en", 
    "isAccessibleForFree": true, 
    "isPartOf": [
      {
        "id": "sg:journal.1023786", 
        "issn": [
          "1471-2105"
        ], 
        "name": "BMC Bioinformatics", 
        "publisher": "Springer Nature", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "1", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "12"
      }
    ], 
    "keywords": [
      "command-line toolkit", 
      "common use cases", 
      "GNU LGPL license", 
      "set of tools", 
      "use cases", 
      "flat files", 
      "Windows platform", 
      "LGPL license", 
      "performance issues", 
      "analysis workflow", 
      "huge size", 
      "position information", 
      "files", 
      "NGS experiments", 
      "disc operations", 
      "novel toolbox", 
      "performance perspective", 
      "full documentation", 
      "fast seek", 
      "specific support", 
      "appropriate tool", 
      "UNIX", 
      "toolbox", 
      "tool", 
      "workflow", 
      "Java", 
      "annotation", 
      "toolkit", 
      "platform", 
      "filtering", 
      "seek", 
      "license", 
      "information", 
      "set", 
      "memory", 
      "operation", 
      "issues", 
      "documentation", 
      "reads", 
      "support", 
      "management", 
      "experiments", 
      "next-generation sequencing studies", 
      "significant restrictions", 
      "perspective", 
      "restriction", 
      "position", 
      "lack", 
      "structure", 
      "comparison", 
      "size", 
      "sequencing studies", 
      "cases", 
      "lines", 
      "study", 
      "Genomic position (GP) files", 
      "position (GP) files", 
      "computational performance perspective", 
      "PileLine", 
      "flexible command-line toolkit", 
      "GP files", 
      "PileLine tools", 
      "sorted GP files", 
      "consideration performance issues", 
      "genome position information"
    ], 
    "name": "PileLine: a toolbox to handle genome position information in next-generation sequencing studies", 
    "pagination": "31-31", 
    "productId": [
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1004470351"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1186/1471-2105-12-31"
        ]
      }, 
      {
        "name": "pubmed_id", 
        "type": "PropertyValue", 
        "value": [
          "21261974"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1186/1471-2105-12-31", 
      "https://app.dimensions.ai/details/publication/pub.1004470351"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2022-01-01T18:24", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-springernature-scigraph/baseset/20220101/entities/gbq_results/article/article_532.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "https://doi.org/10.1186/1471-2105-12-31"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-12-31'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-12-31'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-12-31'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1186/1471-2105-12-31'


 

This table displays all metadata directly associated to this object as RDF triples.

189 TRIPLES      22 PREDICATES      99 URIs      87 LITERALS      11 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1186/1471-2105-12-31 schema:about N251c0bb7cb604c84ab38a8adde370d05
2 N39994aab615a4e9d925c8b84a35eb923
3 N6b3a7bfc5c304298b10a2caf1e60dbbd
4 Ncb71b68c9a704545968455c8b8636dc7
5 anzsrc-for:06
6 anzsrc-for:0604
7 schema:author Ndb8a3b76600c49ccaf28b42b34a3e103
8 schema:citation sg:pub.10.1038/nature08989
9 sg:pub.10.1038/nprot.2009.86
10 sg:pub.10.1038/nrg2626
11 sg:pub.10.1186/gb-2009-10-3-r32
12 schema:datePublished 2011-01-24
13 schema:datePublishedReg 2011-01-24
14 schema:description BACKGROUND: Genomic position (GP) files currently used in next-generation sequencing (NGS) studies are always difficult to manipulate due to their huge size and the lack of appropriate tools to properly manage them. The structure of these flat files is based on representing one line per position that has been covered by at least one aligned read, imposing significant restrictions from a computational performance perspective. RESULTS: PileLine implements a flexible command-line toolkit providing specific support to the management, filtering, comparison and annotation of GP files produced by NGS experiments. PileLine tools are coded in Java and run on both UNIX (Linux, Mac OS) and Windows platforms. The set of tools comprising PileLine are designed to be memory efficient by performing fast seek on-disk operations over sorted GP files. CONCLUSIONS: Our novel toolbox has been extensively tested taking into consideration performance issues. It is publicly available at http://sourceforge.net/projects/pilelinetools under the GNU LGPL license. Full documentation including common use cases and guided analysis workflows is available at http://sing.ei.uvigo.es/pileline.
15 schema:genre article
16 schema:inLanguage en
17 schema:isAccessibleForFree true
18 schema:isPartOf Nc2836074fec1464e9e12df5d73cbc966
19 Ndaba0958190644b6a816b1c65b3a9b88
20 sg:journal.1023786
21 schema:keywords GNU LGPL license
22 GP files
23 Genomic position (GP) files
24 Java
25 LGPL license
26 NGS experiments
27 PileLine
28 PileLine tools
29 UNIX
30 Windows platform
31 analysis workflow
32 annotation
33 appropriate tool
34 cases
35 command-line toolkit
36 common use cases
37 comparison
38 computational performance perspective
39 consideration performance issues
40 disc operations
41 documentation
42 experiments
43 fast seek
44 files
45 filtering
46 flat files
47 flexible command-line toolkit
48 full documentation
49 genome position information
50 huge size
51 information
52 issues
53 lack
54 license
55 lines
56 management
57 memory
58 next-generation sequencing studies
59 novel toolbox
60 operation
61 performance issues
62 performance perspective
63 perspective
64 platform
65 position
66 position (GP) files
67 position information
68 reads
69 restriction
70 seek
71 sequencing studies
72 set
73 set of tools
74 significant restrictions
75 size
76 sorted GP files
77 specific support
78 structure
79 study
80 support
81 tool
82 toolbox
83 toolkit
84 use cases
85 workflow
86 schema:name PileLine: a toolbox to handle genome position information in next-generation sequencing studies
87 schema:pagination 31-31
88 schema:productId N96bac870924941bc85a4393cc10f45bc
89 N9d5aa0b2f33e4d6c8951ee01f64bd043
90 N9f4109b2b5074b6bb889a173bfcddbc0
91 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004470351
92 https://doi.org/10.1186/1471-2105-12-31
93 schema:sdDatePublished 2022-01-01T18:24
94 schema:sdLicense https://scigraph.springernature.com/explorer/license/
95 schema:sdPublisher N5e0d0944f0e3485e9142a627de05b346
96 schema:url https://doi.org/10.1186/1471-2105-12-31
97 sgo:license sg:explorer/license/
98 sgo:sdDataset articles
99 rdf:type schema:ScholarlyArticle
100 N251c0bb7cb604c84ab38a8adde370d05 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
101 schema:name Genome
102 rdf:type schema:DefinedTerm
103 N39994aab615a4e9d925c8b84a35eb923 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
104 schema:name Sequence Analysis, DNA
105 rdf:type schema:DefinedTerm
106 N4700764fdad74c048550c800b06b12cb rdf:first sg:person.01277720503.20
107 rdf:rest Nc56217fa78ed48c88de3a92a406889b2
108 N5e0d0944f0e3485e9142a627de05b346 schema:name Springer Nature - SN SciGraph project
109 rdf:type schema:Organization
110 N6b3a7bfc5c304298b10a2caf1e60dbbd schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
111 schema:name Software
112 rdf:type schema:DefinedTerm
113 N75a28c8107334a5fa302934dee0e23b3 rdf:first sg:person.0700636421.39
114 rdf:rest Na40b9121c16240d18e533558006c6503
115 N96bac870924941bc85a4393cc10f45bc schema:name doi
116 schema:value 10.1186/1471-2105-12-31
117 rdf:type schema:PropertyValue
118 N9d5aa0b2f33e4d6c8951ee01f64bd043 schema:name dimensions_id
119 schema:value pub.1004470351
120 rdf:type schema:PropertyValue
121 N9f4109b2b5074b6bb889a173bfcddbc0 schema:name pubmed_id
122 schema:value 21261974
123 rdf:type schema:PropertyValue
124 Na40b9121c16240d18e533558006c6503 rdf:first sg:person.01231605303.77
125 rdf:rest N4700764fdad74c048550c800b06b12cb
126 Nc2836074fec1464e9e12df5d73cbc966 schema:issueNumber 1
127 rdf:type schema:PublicationIssue
128 Nc56217fa78ed48c88de3a92a406889b2 rdf:first sg:person.01361770421.50
129 rdf:rest rdf:nil
130 Ncb71b68c9a704545968455c8b8636dc7 schema:inDefinedTermSet https://www.nlm.nih.gov/mesh/
131 schema:name Computational Biology
132 rdf:type schema:DefinedTerm
133 Ndaba0958190644b6a816b1c65b3a9b88 schema:volumeNumber 12
134 rdf:type schema:PublicationVolume
135 Ndb8a3b76600c49ccaf28b42b34a3e103 rdf:first sg:person.014144574663.72
136 rdf:rest N75a28c8107334a5fa302934dee0e23b3
137 anzsrc-for:06 schema:inDefinedTermSet anzsrc-for:
138 schema:name Biological Sciences
139 rdf:type schema:DefinedTerm
140 anzsrc-for:0604 schema:inDefinedTermSet anzsrc-for:
141 schema:name Genetics
142 rdf:type schema:DefinedTerm
143 sg:journal.1023786 schema:issn 1471-2105
144 schema:name BMC Bioinformatics
145 schema:publisher Springer Nature
146 rdf:type schema:Periodical
147 sg:person.01231605303.77 schema:affiliation grid-institutes:grid.6312.6
148 schema:familyName Reboiro-Jato
149 schema:givenName Miguel
150 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01231605303.77
151 rdf:type schema:Person
152 sg:person.01277720503.20 schema:affiliation grid-institutes:grid.6312.6
153 schema:familyName Fdez-Riverola
154 schema:givenName Florentino
155 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01277720503.20
156 rdf:type schema:Person
157 sg:person.01361770421.50 schema:affiliation grid-institutes:grid.7719.8
158 schema:familyName Pisano
159 schema:givenName David G
160 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.01361770421.50
161 rdf:type schema:Person
162 sg:person.014144574663.72 schema:affiliation grid-institutes:grid.6312.6
163 schema:familyName Glez-Peña
164 schema:givenName Daniel
165 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.014144574663.72
166 rdf:type schema:Person
167 sg:person.0700636421.39 schema:affiliation grid-institutes:grid.7719.8
168 schema:familyName Gómez-López
169 schema:givenName Gonzalo
170 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.0700636421.39
171 rdf:type schema:Person
172 sg:pub.10.1038/nature08989 schema:sameAs https://app.dimensions.ai/details/publication/pub.1051657357
173 https://doi.org/10.1038/nature08989
174 rdf:type schema:CreativeWork
175 sg:pub.10.1038/nprot.2009.86 schema:sameAs https://app.dimensions.ai/details/publication/pub.1015642657
176 https://doi.org/10.1038/nprot.2009.86
177 rdf:type schema:CreativeWork
178 sg:pub.10.1038/nrg2626 schema:sameAs https://app.dimensions.ai/details/publication/pub.1023911485
179 https://doi.org/10.1038/nrg2626
180 rdf:type schema:CreativeWork
181 sg:pub.10.1186/gb-2009-10-3-r32 schema:sameAs https://app.dimensions.ai/details/publication/pub.1031680869
182 https://doi.org/10.1186/gb-2009-10-3-r32
183 rdf:type schema:CreativeWork
184 grid-institutes:grid.6312.6 schema:alternateName Higher Technical School of Computer Engineering, University of Vigo, Ourense, Spain
185 schema:name Higher Technical School of Computer Engineering, University of Vigo, Ourense, Spain
186 rdf:type schema:Organization
187 grid-institutes:grid.7719.8 schema:alternateName Bioinformatics Unit (UBio), Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
188 schema:name Bioinformatics Unit (UBio), Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
189 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...