A Touching Character Database from Tibetan Historical Documents to Evaluate the Segmentation Algorithm View Full Text


Ontology type: schema:Chapter     


Chapter Info

DATE

2018-11-02

AUTHORS

Quanchao Zhao , Long-long Ma , Lijuan Duan

ABSTRACT

The benchmarking database plays an essential role in evaluating the performance of the touching character string segmentation algorithm. In this paper, we present a new touching Tibetan character strings database. Firstly, using the previous proposed layout analysis and text-line segmentation algorithms, we segment scanned images of historical Tibetan documents into text-line images. Then, we find candidate touching Tibetan character strings using connected component analysis and screen out the correct touching samples. Finally, we annotate the data manually and establish the touching character database. The database contains 5,844 images of two-touching characters and 1,399 images of more than two-touching characters. It is applicable to evaluate the segmentation algorithms for the touching Tibetan character strings. For each image, the annotated ground truth file includes class labels, candidate segment points, baseline and average stroke width of a Tibetan single character. According to the type of touching, we divide the touching character string into three types: AB, OB and BB. We also count the number of different type of samples and find that 76.27% of the samples belongs to the third type (BB). In the end, we measure the performance of the over-segmentation algorithm on this database for reference. More... »

PAGES

309-321

Book

TITLE

Pattern Recognition and Computer Vision

ISBN

978-3-030-03340-8
978-3-030-03341-5

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/978-3-030-03341-5_26

DOI

http://dx.doi.org/10.1007/978-3-030-03341-5_26

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1107966566


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Beijing University of Technology", 
          "id": "https://www.grid.ac/institutes/grid.28703.3e", 
          "name": [
            "Faculty of Information Technology, Beijing University of Technology, Beijing, China", 
            "Beijing Key Laboratory of Trusted Computing, Beijing, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Zhao", 
        "givenName": "Quanchao", 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Institute of Software", 
          "id": "https://www.grid.ac/institutes/grid.458446.f", 
          "name": [
            "Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Ma", 
        "givenName": "Long-long", 
        "id": "sg:person.016221175360.12", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016221175360.12"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Beijing University of Technology", 
          "id": "https://www.grid.ac/institutes/grid.28703.3e", 
          "name": [
            "Faculty of Information Technology, Beijing University of Technology, Beijing, China", 
            "Beijing Key Laboratory on Integration and Analysis of Large-Scale Stream Data, Beijing, China"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Duan", 
        "givenName": "Lijuan", 
        "id": "sg:person.012212011025.96", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012212011025.96"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1016/s0167-8655(02)00240-4", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1010577465"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0167-8655(02)00240-4", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1010577465"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0031-3203(99)00227-7", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1015719613"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1016/s0031-3203(98)00123-x", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1045836790"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/34.506792", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061156429"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-981-10-7302-1_45", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1093080525", 
          "https://doi.org/10.1007/978-981-10-7302-1_45"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/978-981-10-7299-4_29", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1093084377", 
          "https://doi.org/10.1007/978-981-10-7299-4_29"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/icdar.1995.602080", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1093932800"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/icdar.2005.35", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1094136273"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/icfhr.2012.173", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1094249600"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/ismvl.2016.38", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1095155044"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/icdar.2011.17", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1095596395"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2018-11-02", 
    "datePublishedReg": "2018-11-02", 
    "description": "The benchmarking database plays an essential role in evaluating the performance of the touching character string segmentation algorithm. In this paper, we present a new touching Tibetan character strings database. Firstly, using the previous proposed layout analysis and text-line segmentation algorithms, we segment scanned images of historical Tibetan documents into text-line images. Then, we find candidate touching Tibetan character strings using connected component analysis and screen out the correct touching samples. Finally, we annotate the data manually and establish the touching character database. The database contains 5,844 images of two-touching characters and 1,399 images of more than two-touching characters. It is applicable to evaluate the segmentation algorithms for the touching Tibetan character strings. For each image, the annotated ground truth file includes class labels, candidate segment points, baseline and average stroke width of a Tibetan single character. According to the type of touching, we divide the touching character string into three types: AB, OB and BB. We also count the number of different type of samples and find that 76.27% of the samples belongs to the third type (BB). In the end, we measure the performance of the over-segmentation algorithm on this database for reference.", 
    "editor": [
      {
        "familyName": "Lai", 
        "givenName": "Jian-Huang", 
        "type": "Person"
      }, 
      {
        "familyName": "Liu", 
        "givenName": "Cheng-Lin", 
        "type": "Person"
      }, 
      {
        "familyName": "Chen", 
        "givenName": "Xilin", 
        "type": "Person"
      }, 
      {
        "familyName": "Zhou", 
        "givenName": "Jie", 
        "type": "Person"
      }, 
      {
        "familyName": "Tan", 
        "givenName": "Tieniu", 
        "type": "Person"
      }, 
      {
        "familyName": "Zheng", 
        "givenName": "Nanning", 
        "type": "Person"
      }, 
      {
        "familyName": "Zha", 
        "givenName": "Hongbin", 
        "type": "Person"
      }
    ], 
    "genre": "chapter", 
    "id": "sg:pub.10.1007/978-3-030-03341-5_26", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": {
      "isbn": [
        "978-3-030-03340-8", 
        "978-3-030-03341-5"
      ], 
      "name": "Pattern Recognition and Computer Vision", 
      "type": "Book"
    }, 
    "name": "A Touching Character Database from Tibetan Historical Documents to Evaluate the Segmentation Algorithm", 
    "pagination": "309-321", 
    "productId": [
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/978-3-030-03341-5_26"
        ]
      }, 
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "f3d0a93e7102e5d77d1e89732020fe39fe64b5c7caa11b1d7f409ac44e98f52f"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1107966566"
        ]
      }
    ], 
    "publisher": {
      "location": "Cham", 
      "name": "Springer International Publishing", 
      "type": "Organisation"
    }, 
    "sameAs": [
      "https://doi.org/10.1007/978-3-030-03341-5_26", 
      "https://app.dimensions.ai/details/publication/pub.1107966566"
    ], 
    "sdDataset": "chapters", 
    "sdDatePublished": "2019-04-16T04:41", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000322_0000000322/records_65022_00000000.jsonl", 
    "type": "Chapter", 
    "url": "https://link.springer.com/10.1007%2F978-3-030-03341-5_26"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/978-3-030-03341-5_26'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/978-3-030-03341-5_26'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/978-3-030-03341-5_26'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/978-3-030-03341-5_26'


 

This table displays all metadata directly associated to this object as RDF triples.

148 TRIPLES      23 PREDICATES      37 URIs      19 LITERALS      8 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/978-3-030-03341-5_26 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author Na06cf38dea4d495bbe842d960b97306e
4 schema:citation sg:pub.10.1007/978-981-10-7299-4_29
5 sg:pub.10.1007/978-981-10-7302-1_45
6 https://doi.org/10.1016/s0031-3203(98)00123-x
7 https://doi.org/10.1016/s0031-3203(99)00227-7
8 https://doi.org/10.1016/s0167-8655(02)00240-4
9 https://doi.org/10.1109/34.506792
10 https://doi.org/10.1109/icdar.1995.602080
11 https://doi.org/10.1109/icdar.2005.35
12 https://doi.org/10.1109/icdar.2011.17
13 https://doi.org/10.1109/icfhr.2012.173
14 https://doi.org/10.1109/ismvl.2016.38
15 schema:datePublished 2018-11-02
16 schema:datePublishedReg 2018-11-02
17 schema:description The benchmarking database plays an essential role in evaluating the performance of the touching character string segmentation algorithm. In this paper, we present a new touching Tibetan character strings database. Firstly, using the previous proposed layout analysis and text-line segmentation algorithms, we segment scanned images of historical Tibetan documents into text-line images. Then, we find candidate touching Tibetan character strings using connected component analysis and screen out the correct touching samples. Finally, we annotate the data manually and establish the touching character database. The database contains 5,844 images of two-touching characters and 1,399 images of more than two-touching characters. It is applicable to evaluate the segmentation algorithms for the touching Tibetan character strings. For each image, the annotated ground truth file includes class labels, candidate segment points, baseline and average stroke width of a Tibetan single character. According to the type of touching, we divide the touching character string into three types: AB, OB and BB. We also count the number of different type of samples and find that 76.27% of the samples belongs to the third type (BB). In the end, we measure the performance of the over-segmentation algorithm on this database for reference.
18 schema:editor N87c0d0716904428f811603c2a6e5789b
19 schema:genre chapter
20 schema:inLanguage en
21 schema:isAccessibleForFree false
22 schema:isPartOf Nbd7937e1a19b4aa3ad65cb02c6e16e24
23 schema:name A Touching Character Database from Tibetan Historical Documents to Evaluate the Segmentation Algorithm
24 schema:pagination 309-321
25 schema:productId N5dbe9aa4f24b4ac1bdf939ffe8ee7f7c
26 N6b007b847e424360af5dfe17c0db4a3a
27 N6b09da9dcf9446689708988544f57add
28 schema:publisher Nc4e7072cc4a44d958e108b2d39134a9b
29 schema:sameAs https://app.dimensions.ai/details/publication/pub.1107966566
30 https://doi.org/10.1007/978-3-030-03341-5_26
31 schema:sdDatePublished 2019-04-16T04:41
32 schema:sdLicense https://scigraph.springernature.com/explorer/license/
33 schema:sdPublisher N5b243066c7e248f395b54837766611c1
34 schema:url https://link.springer.com/10.1007%2F978-3-030-03341-5_26
35 sgo:license sg:explorer/license/
36 sgo:sdDataset chapters
37 rdf:type schema:Chapter
38 N2179a5434e68463cbe7718adbf510e3e schema:affiliation https://www.grid.ac/institutes/grid.28703.3e
39 schema:familyName Zhao
40 schema:givenName Quanchao
41 rdf:type schema:Person
42 N2d86d4849c7a4cd2814df457ca00ab9b schema:familyName Zheng
43 schema:givenName Nanning
44 rdf:type schema:Person
45 N45de03f328ff4be7b7be483d16b7da5c schema:familyName Zhou
46 schema:givenName Jie
47 rdf:type schema:Person
48 N51a4edc1d4434d5da15a07ce6890cb6f rdf:first N8f50b6afdea44a7a919127ba5cad1dd1
49 rdf:rest N78bc8c17f6d64d68a304308bad686657
50 N5b243066c7e248f395b54837766611c1 schema:name Springer Nature - SN SciGraph project
51 rdf:type schema:Organization
52 N5dbe9aa4f24b4ac1bdf939ffe8ee7f7c schema:name dimensions_id
53 schema:value pub.1107966566
54 rdf:type schema:PropertyValue
55 N6b007b847e424360af5dfe17c0db4a3a schema:name doi
56 schema:value 10.1007/978-3-030-03341-5_26
57 rdf:type schema:PropertyValue
58 N6b09da9dcf9446689708988544f57add schema:name readcube_id
59 schema:value f3d0a93e7102e5d77d1e89732020fe39fe64b5c7caa11b1d7f409ac44e98f52f
60 rdf:type schema:PropertyValue
61 N78bc8c17f6d64d68a304308bad686657 rdf:first N998aa2c8b5e649f18447dac392c38364
62 rdf:rest Nf3e8402bc1344bb8bc0e094f94f0bca4
63 N7d4f51b1a6224ad393789c72861b8ee2 schema:familyName Tan
64 schema:givenName Tieniu
65 rdf:type schema:Person
66 N84558bb72d9f427d90f393225023aaae rdf:first N7d4f51b1a6224ad393789c72861b8ee2
67 rdf:rest Nf7a12496a8524ffc89c52e107e0b77ad
68 N87c0d0716904428f811603c2a6e5789b rdf:first Nfd7f6cd9b68b4ea1a374b57c8d7464cf
69 rdf:rest N51a4edc1d4434d5da15a07ce6890cb6f
70 N8f50b6afdea44a7a919127ba5cad1dd1 schema:familyName Liu
71 schema:givenName Cheng-Lin
72 rdf:type schema:Person
73 N96ad7dae390145b08825671c57ec80bd schema:familyName Zha
74 schema:givenName Hongbin
75 rdf:type schema:Person
76 N998aa2c8b5e649f18447dac392c38364 schema:familyName Chen
77 schema:givenName Xilin
78 rdf:type schema:Person
79 Na06cf38dea4d495bbe842d960b97306e rdf:first N2179a5434e68463cbe7718adbf510e3e
80 rdf:rest Nd8abd48cd7794d6fbf15ba3ea79d76ef
81 Nb48ab2931c9d486ebf7a39ee84d70bb0 rdf:first sg:person.012212011025.96
82 rdf:rest rdf:nil
83 Nbd7937e1a19b4aa3ad65cb02c6e16e24 schema:isbn 978-3-030-03340-8
84 978-3-030-03341-5
85 schema:name Pattern Recognition and Computer Vision
86 rdf:type schema:Book
87 Nc4733426adf94c1f9d51ec10fee81688 rdf:first N96ad7dae390145b08825671c57ec80bd
88 rdf:rest rdf:nil
89 Nc4e7072cc4a44d958e108b2d39134a9b schema:location Cham
90 schema:name Springer International Publishing
91 rdf:type schema:Organisation
92 Nd8abd48cd7794d6fbf15ba3ea79d76ef rdf:first sg:person.016221175360.12
93 rdf:rest Nb48ab2931c9d486ebf7a39ee84d70bb0
94 Nf3e8402bc1344bb8bc0e094f94f0bca4 rdf:first N45de03f328ff4be7b7be483d16b7da5c
95 rdf:rest N84558bb72d9f427d90f393225023aaae
96 Nf7a12496a8524ffc89c52e107e0b77ad rdf:first N2d86d4849c7a4cd2814df457ca00ab9b
97 rdf:rest Nc4733426adf94c1f9d51ec10fee81688
98 Nfd7f6cd9b68b4ea1a374b57c8d7464cf schema:familyName Lai
99 schema:givenName Jian-Huang
100 rdf:type schema:Person
101 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
102 schema:name Information and Computing Sciences
103 rdf:type schema:DefinedTerm
104 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
105 schema:name Artificial Intelligence and Image Processing
106 rdf:type schema:DefinedTerm
107 sg:person.012212011025.96 schema:affiliation https://www.grid.ac/institutes/grid.28703.3e
108 schema:familyName Duan
109 schema:givenName Lijuan
110 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.012212011025.96
111 rdf:type schema:Person
112 sg:person.016221175360.12 schema:affiliation https://www.grid.ac/institutes/grid.458446.f
113 schema:familyName Ma
114 schema:givenName Long-long
115 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016221175360.12
116 rdf:type schema:Person
117 sg:pub.10.1007/978-981-10-7299-4_29 schema:sameAs https://app.dimensions.ai/details/publication/pub.1093084377
118 https://doi.org/10.1007/978-981-10-7299-4_29
119 rdf:type schema:CreativeWork
120 sg:pub.10.1007/978-981-10-7302-1_45 schema:sameAs https://app.dimensions.ai/details/publication/pub.1093080525
121 https://doi.org/10.1007/978-981-10-7302-1_45
122 rdf:type schema:CreativeWork
123 https://doi.org/10.1016/s0031-3203(98)00123-x schema:sameAs https://app.dimensions.ai/details/publication/pub.1045836790
124 rdf:type schema:CreativeWork
125 https://doi.org/10.1016/s0031-3203(99)00227-7 schema:sameAs https://app.dimensions.ai/details/publication/pub.1015719613
126 rdf:type schema:CreativeWork
127 https://doi.org/10.1016/s0167-8655(02)00240-4 schema:sameAs https://app.dimensions.ai/details/publication/pub.1010577465
128 rdf:type schema:CreativeWork
129 https://doi.org/10.1109/34.506792 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061156429
130 rdf:type schema:CreativeWork
131 https://doi.org/10.1109/icdar.1995.602080 schema:sameAs https://app.dimensions.ai/details/publication/pub.1093932800
132 rdf:type schema:CreativeWork
133 https://doi.org/10.1109/icdar.2005.35 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094136273
134 rdf:type schema:CreativeWork
135 https://doi.org/10.1109/icdar.2011.17 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095596395
136 rdf:type schema:CreativeWork
137 https://doi.org/10.1109/icfhr.2012.173 schema:sameAs https://app.dimensions.ai/details/publication/pub.1094249600
138 rdf:type schema:CreativeWork
139 https://doi.org/10.1109/ismvl.2016.38 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095155044
140 rdf:type schema:CreativeWork
141 https://www.grid.ac/institutes/grid.28703.3e schema:alternateName Beijing University of Technology
142 schema:name Beijing Key Laboratory of Trusted Computing, Beijing, China
143 Beijing Key Laboratory on Integration and Analysis of Large-Scale Stream Data, Beijing, China
144 Faculty of Information Technology, Beijing University of Technology, Beijing, China
145 rdf:type schema:Organization
146 https://www.grid.ac/institutes/grid.458446.f schema:alternateName Institute of Software
147 schema:name Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences, Beijing, China
148 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...