Indexing and querying segmented web pages: the BlockWeb Model View Full Text


Ontology type: schema:ScholarlyArticle     


Article Info

DATE

2011-10

AUTHORS

Emmanuel Bruno, Nicolas Faessel, Hervé Glotin, Jacques Le Maitre, Michel Scholl

ABSTRACT

We present in this paper a model for indexing and querying web pages, based on the hierarchical decomposition of pages into blocks. Splitting up a page into blocks has several advantages in terms of page design, indexing and querying such as (i) blocks of a page most similar to a query may be returned instead of the page as a whole (ii) the importance of a block can be taken into account, as well as (iii) the permeability of the blocks to neighbor blocks: a block b is said to be permeable to a block b′ in the same page if b′ content (text, image, etc.) can be (partially) inherited by b upon indexing. An engine implementing this model is described including: the transformation of web pages into blocks hierarchies, the definition of a dedicated language to express indexing rules and the storage of indexed blocks into an XML repository. The model is assessed on a dataset of electronic news, and a dataset drawn from web pages of the ImagEval campaign where it improves by 16% the mean average precision of the baseline. More... »

PAGES

623-649

Identifiers

URI

http://scigraph.springernature.com/pub.10.1007/s11280-011-0124-6

DOI

http://dx.doi.org/10.1007/s11280-011-0124-6

DIMENSIONS

https://app.dimensions.ai/details/publication/pub.1009855802


Indexing Status Check whether this publication has been indexed by Scopus and Web Of Science using the SN Indexing Status Tool
Incoming Citations Browse incoming citations for this publication using opencitations.net

JSON-LD is the canonical representation for SciGraph data.

TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT

[
  {
    "@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", 
    "about": [
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0801", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Artificial Intelligence and Image Processing", 
        "type": "DefinedTerm"
      }, 
      {
        "id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08", 
        "inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/", 
        "name": "Information and Computing Sciences", 
        "type": "DefinedTerm"
      }
    ], 
    "author": [
      {
        "affiliation": {
          "alternateName": "Laboratoire des Sciences de l'Information et des Syst\u00e8mes", 
          "id": "https://www.grid.ac/institutes/grid.462878.7", 
          "name": [
            "LSIS, Universit\u00e9 du Sud Toulon-Var, BP 20132, 83957, La Garde Cedex, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Bruno", 
        "givenName": "Emmanuel", 
        "id": "sg:person.011735523635.44", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011735523635.44"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Laboratoire des Sciences de l'Information et des Syst\u00e8mes", 
          "id": "https://www.grid.ac/institutes/grid.462878.7", 
          "name": [
            "LSIS, Universit\u00e9 Paul C\u00e9zanne, Avenue Escadrille Normandie-Niemen, 13397, Marseille Cedex 20, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Faessel", 
        "givenName": "Nicolas", 
        "id": "sg:person.010714732035.74", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010714732035.74"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Laboratoire des Sciences de l'Information et des Syst\u00e8mes", 
          "id": "https://www.grid.ac/institutes/grid.462878.7", 
          "name": [
            "LSIS, Universit\u00e9 du Sud Toulon-Var, BP 20132, 83957, La Garde Cedex, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Glotin", 
        "givenName": "Herv\u00e9", 
        "id": "sg:person.016622300103.82", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016622300103.82"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Laboratoire des Sciences de l'Information et des Syst\u00e8mes", 
          "id": "https://www.grid.ac/institutes/grid.462878.7", 
          "name": [
            "LSIS, Universit\u00e9 du Sud Toulon-Var, BP 20132, 83957, La Garde Cedex, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Le Maitre", 
        "givenName": "Jacques", 
        "id": "sg:person.015627601511.39", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015627601511.39"
        ], 
        "type": "Person"
      }, 
      {
        "affiliation": {
          "alternateName": "Conservatoire National des Arts et M\u00e9tiers", 
          "id": "https://www.grid.ac/institutes/grid.36823.3c", 
          "name": [
            "Cedric/Wisdom, CNAM, 292 Rue St Martin, 75141, Paris Cedex 03, France"
          ], 
          "type": "Organization"
        }, 
        "familyName": "Scholl", 
        "givenName": "Michel", 
        "id": "sg:person.016173501771.43", 
        "sameAs": [
          "https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016173501771.43"
        ], 
        "type": "Person"
      }
    ], 
    "citation": [
      {
        "id": "https://doi.org/10.1145/361219.361220", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1004270480"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/3-540-36618-0_6", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1008624834", 
          "https://doi.org/10.1007/3-540-36618-0_6"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/956750.956785", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1016384532"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/1141753.1141777", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1016797502"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/1282280.1282289", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1018248675"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/1027527.1027747", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1018537760"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "sg:pub.10.1007/s11280-007-0021-1", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1029731426", 
          "https://doi.org/10.1007/s11280-007-0021-1"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/988672.988700", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1032277700"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1145/1600193.1600209", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1041648376"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/tkde.2005.138", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1061661387"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/cbmi.2009.36", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1093930639"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/icassp.2008.4517838", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1095351518"
        ], 
        "type": "CreativeWork"
      }, 
      {
        "id": "https://doi.org/10.1109/icdar.1995.602059", 
        "sameAs": [
          "https://app.dimensions.ai/details/publication/pub.1095715659"
        ], 
        "type": "CreativeWork"
      }
    ], 
    "datePublished": "2011-10", 
    "datePublishedReg": "2011-10-01", 
    "description": "We present in this paper a model for indexing and querying web pages, based on the hierarchical decomposition of pages into blocks. Splitting up a page into blocks has several advantages in terms of page design, indexing and querying such as (i) blocks of a page most similar to a query may be returned instead of the page as a whole (ii) the importance of a block can be taken into account, as well as (iii) the permeability of the blocks to neighbor blocks: a block b is said to be permeable to a block b\u2032 in the same page if b\u2032 content (text, image, etc.) can be (partially) inherited by b upon indexing. An engine implementing this model is described including: the transformation of web pages into blocks hierarchies, the definition of a dedicated language to express indexing rules and the storage of indexed blocks into an XML repository. The model is assessed on a dataset of electronic news, and a dataset drawn from web pages of the ImagEval campaign where it improves by 16% the mean average precision of the baseline.", 
    "genre": "research_article", 
    "id": "sg:pub.10.1007/s11280-011-0124-6", 
    "inLanguage": [
      "en"
    ], 
    "isAccessibleForFree": false, 
    "isPartOf": [
      {
        "id": "sg:journal.1136663", 
        "issn": [
          "1386-145X", 
          "1573-1413"
        ], 
        "name": "World Wide Web", 
        "type": "Periodical"
      }, 
      {
        "issueNumber": "5-6", 
        "type": "PublicationIssue"
      }, 
      {
        "type": "PublicationVolume", 
        "volumeNumber": "14"
      }
    ], 
    "name": "Indexing and querying segmented web pages: the BlockWeb Model", 
    "pagination": "623-649", 
    "productId": [
      {
        "name": "readcube_id", 
        "type": "PropertyValue", 
        "value": [
          "2ef3d16ec7b72638fee6463ef216d7727a021a66494fb97a25ac6ddfbfcef3e0"
        ]
      }, 
      {
        "name": "doi", 
        "type": "PropertyValue", 
        "value": [
          "10.1007/s11280-011-0124-6"
        ]
      }, 
      {
        "name": "dimensions_id", 
        "type": "PropertyValue", 
        "value": [
          "pub.1009855802"
        ]
      }
    ], 
    "sameAs": [
      "https://doi.org/10.1007/s11280-011-0124-6", 
      "https://app.dimensions.ai/details/publication/pub.1009855802"
    ], 
    "sdDataset": "articles", 
    "sdDatePublished": "2019-04-11T00:18", 
    "sdLicense": "https://scigraph.springernature.com/explorer/license/", 
    "sdPublisher": {
      "name": "Springer Nature - SN SciGraph project", 
      "type": "Organization"
    }, 
    "sdSource": "s3://com-uberresearch-data-dimensions-target-20181106-alternative/cleanup/v134/2549eaecd7973599484d7c17b260dba0a4ecb94b/merge/v9/a6c9fde33151104705d4d7ff012ea9563521a3ce/jats-lookup/v90/0000000001_0000000264/records_8695_00000520.jsonl", 
    "type": "ScholarlyArticle", 
    "url": "http://link.springer.com/10.1007%2Fs11280-011-0124-6"
  }
]
 

Download the RDF metadata as:  json-ld nt turtle xml License info

HOW TO GET THIS DATA PROGRAMMATICALLY:

JSON-LD is a popular format for linked data which is fully compatible with JSON.

curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s11280-011-0124-6'

N-Triples is a line-based linked data format ideal for batch operations.

curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s11280-011-0124-6'

Turtle is a human-readable linked data format.

curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s11280-011-0124-6'

RDF/XML is a standard XML format for linked data.

curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s11280-011-0124-6'


 

This table displays all metadata directly associated to this object as RDF triples.

134 TRIPLES      21 PREDICATES      40 URIs      19 LITERALS      7 BLANK NODES

Subject Predicate Object
1 sg:pub.10.1007/s11280-011-0124-6 schema:about anzsrc-for:08
2 anzsrc-for:0801
3 schema:author Na2628cbfb8024e739e9fd9aa692f4be6
4 schema:citation sg:pub.10.1007/3-540-36618-0_6
5 sg:pub.10.1007/s11280-007-0021-1
6 https://doi.org/10.1109/cbmi.2009.36
7 https://doi.org/10.1109/icassp.2008.4517838
8 https://doi.org/10.1109/icdar.1995.602059
9 https://doi.org/10.1109/tkde.2005.138
10 https://doi.org/10.1145/1027527.1027747
11 https://doi.org/10.1145/1141753.1141777
12 https://doi.org/10.1145/1282280.1282289
13 https://doi.org/10.1145/1600193.1600209
14 https://doi.org/10.1145/361219.361220
15 https://doi.org/10.1145/956750.956785
16 https://doi.org/10.1145/988672.988700
17 schema:datePublished 2011-10
18 schema:datePublishedReg 2011-10-01
19 schema:description We present in this paper a model for indexing and querying web pages, based on the hierarchical decomposition of pages into blocks. Splitting up a page into blocks has several advantages in terms of page design, indexing and querying such as (i) blocks of a page most similar to a query may be returned instead of the page as a whole (ii) the importance of a block can be taken into account, as well as (iii) the permeability of the blocks to neighbor blocks: a block b is said to be permeable to a block b′ in the same page if b′ content (text, image, etc.) can be (partially) inherited by b upon indexing. An engine implementing this model is described including: the transformation of web pages into blocks hierarchies, the definition of a dedicated language to express indexing rules and the storage of indexed blocks into an XML repository. The model is assessed on a dataset of electronic news, and a dataset drawn from web pages of the ImagEval campaign where it improves by 16% the mean average precision of the baseline.
20 schema:genre research_article
21 schema:inLanguage en
22 schema:isAccessibleForFree false
23 schema:isPartOf N60cbcf76d2b944c7b296ff777e73068b
24 Ncaeb235a136846248af7b2a7899f26fe
25 sg:journal.1136663
26 schema:name Indexing and querying segmented web pages: the BlockWeb Model
27 schema:pagination 623-649
28 schema:productId N3fd2646a84fa423dbbf795bd853a929e
29 N56143b52bcf2435eb3c395ecfc3428cf
30 N95b7310962784aa4aacf11c0c064e0f4
31 schema:sameAs https://app.dimensions.ai/details/publication/pub.1009855802
32 https://doi.org/10.1007/s11280-011-0124-6
33 schema:sdDatePublished 2019-04-11T00:18
34 schema:sdLicense https://scigraph.springernature.com/explorer/license/
35 schema:sdPublisher N49c2cdeb39d540f2b889daabe9c723d3
36 schema:url http://link.springer.com/10.1007%2Fs11280-011-0124-6
37 sgo:license sg:explorer/license/
38 sgo:sdDataset articles
39 rdf:type schema:ScholarlyArticle
40 N3fd2646a84fa423dbbf795bd853a929e schema:name dimensions_id
41 schema:value pub.1009855802
42 rdf:type schema:PropertyValue
43 N49c2cdeb39d540f2b889daabe9c723d3 schema:name Springer Nature - SN SciGraph project
44 rdf:type schema:Organization
45 N56143b52bcf2435eb3c395ecfc3428cf schema:name readcube_id
46 schema:value 2ef3d16ec7b72638fee6463ef216d7727a021a66494fb97a25ac6ddfbfcef3e0
47 rdf:type schema:PropertyValue
48 N5d1c99969dbe474ca2cfb8abc212ba98 rdf:first sg:person.016173501771.43
49 rdf:rest rdf:nil
50 N60cbcf76d2b944c7b296ff777e73068b schema:issueNumber 5-6
51 rdf:type schema:PublicationIssue
52 N95b7310962784aa4aacf11c0c064e0f4 schema:name doi
53 schema:value 10.1007/s11280-011-0124-6
54 rdf:type schema:PropertyValue
55 Na2628cbfb8024e739e9fd9aa692f4be6 rdf:first sg:person.011735523635.44
56 rdf:rest Nae760e0be5a8471ab213d0ceaf7bddcb
57 Nae760e0be5a8471ab213d0ceaf7bddcb rdf:first sg:person.010714732035.74
58 rdf:rest Naffbb64eb3c04680a1e3101c318b7e6a
59 Naffbb64eb3c04680a1e3101c318b7e6a rdf:first sg:person.016622300103.82
60 rdf:rest Nc358b2d2b04146819644648be7aa29b9
61 Nc358b2d2b04146819644648be7aa29b9 rdf:first sg:person.015627601511.39
62 rdf:rest N5d1c99969dbe474ca2cfb8abc212ba98
63 Ncaeb235a136846248af7b2a7899f26fe schema:volumeNumber 14
64 rdf:type schema:PublicationVolume
65 anzsrc-for:08 schema:inDefinedTermSet anzsrc-for:
66 schema:name Information and Computing Sciences
67 rdf:type schema:DefinedTerm
68 anzsrc-for:0801 schema:inDefinedTermSet anzsrc-for:
69 schema:name Artificial Intelligence and Image Processing
70 rdf:type schema:DefinedTerm
71 sg:journal.1136663 schema:issn 1386-145X
72 1573-1413
73 schema:name World Wide Web
74 rdf:type schema:Periodical
75 sg:person.010714732035.74 schema:affiliation https://www.grid.ac/institutes/grid.462878.7
76 schema:familyName Faessel
77 schema:givenName Nicolas
78 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.010714732035.74
79 rdf:type schema:Person
80 sg:person.011735523635.44 schema:affiliation https://www.grid.ac/institutes/grid.462878.7
81 schema:familyName Bruno
82 schema:givenName Emmanuel
83 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.011735523635.44
84 rdf:type schema:Person
85 sg:person.015627601511.39 schema:affiliation https://www.grid.ac/institutes/grid.462878.7
86 schema:familyName Le Maitre
87 schema:givenName Jacques
88 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015627601511.39
89 rdf:type schema:Person
90 sg:person.016173501771.43 schema:affiliation https://www.grid.ac/institutes/grid.36823.3c
91 schema:familyName Scholl
92 schema:givenName Michel
93 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016173501771.43
94 rdf:type schema:Person
95 sg:person.016622300103.82 schema:affiliation https://www.grid.ac/institutes/grid.462878.7
96 schema:familyName Glotin
97 schema:givenName Hervé
98 schema:sameAs https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016622300103.82
99 rdf:type schema:Person
100 sg:pub.10.1007/3-540-36618-0_6 schema:sameAs https://app.dimensions.ai/details/publication/pub.1008624834
101 https://doi.org/10.1007/3-540-36618-0_6
102 rdf:type schema:CreativeWork
103 sg:pub.10.1007/s11280-007-0021-1 schema:sameAs https://app.dimensions.ai/details/publication/pub.1029731426
104 https://doi.org/10.1007/s11280-007-0021-1
105 rdf:type schema:CreativeWork
106 https://doi.org/10.1109/cbmi.2009.36 schema:sameAs https://app.dimensions.ai/details/publication/pub.1093930639
107 rdf:type schema:CreativeWork
108 https://doi.org/10.1109/icassp.2008.4517838 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095351518
109 rdf:type schema:CreativeWork
110 https://doi.org/10.1109/icdar.1995.602059 schema:sameAs https://app.dimensions.ai/details/publication/pub.1095715659
111 rdf:type schema:CreativeWork
112 https://doi.org/10.1109/tkde.2005.138 schema:sameAs https://app.dimensions.ai/details/publication/pub.1061661387
113 rdf:type schema:CreativeWork
114 https://doi.org/10.1145/1027527.1027747 schema:sameAs https://app.dimensions.ai/details/publication/pub.1018537760
115 rdf:type schema:CreativeWork
116 https://doi.org/10.1145/1141753.1141777 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016797502
117 rdf:type schema:CreativeWork
118 https://doi.org/10.1145/1282280.1282289 schema:sameAs https://app.dimensions.ai/details/publication/pub.1018248675
119 rdf:type schema:CreativeWork
120 https://doi.org/10.1145/1600193.1600209 schema:sameAs https://app.dimensions.ai/details/publication/pub.1041648376
121 rdf:type schema:CreativeWork
122 https://doi.org/10.1145/361219.361220 schema:sameAs https://app.dimensions.ai/details/publication/pub.1004270480
123 rdf:type schema:CreativeWork
124 https://doi.org/10.1145/956750.956785 schema:sameAs https://app.dimensions.ai/details/publication/pub.1016384532
125 rdf:type schema:CreativeWork
126 https://doi.org/10.1145/988672.988700 schema:sameAs https://app.dimensions.ai/details/publication/pub.1032277700
127 rdf:type schema:CreativeWork
128 https://www.grid.ac/institutes/grid.36823.3c schema:alternateName Conservatoire National des Arts et Métiers
129 schema:name Cedric/Wisdom, CNAM, 292 Rue St Martin, 75141, Paris Cedex 03, France
130 rdf:type schema:Organization
131 https://www.grid.ac/institutes/grid.462878.7 schema:alternateName Laboratoire des Sciences de l'Information et des Systèmes
132 schema:name LSIS, Université Paul Cézanne, Avenue Escadrille Normandie-Niemen, 13397, Marseille Cedex 20, France
133 LSIS, Université du Sud Toulon-Var, BP 20132, 83957, La Garde Cedex, France
134 rdf:type schema:Organization
 




Preview window. Press ESC to close (or click here)


...