Ontology type: schema:ScholarlyArticle Open Access: True
2022-04-18
AUTHORSBingqing Lyu, Lu Qin, Xuemin Lin, Ying Zhang, Zhengping Qian, Jingren Zhou
ABSTRACTMaximum biclique search, which finds the biclique with the maximum number of edges in a bipartite graph, is a fundamental problem with a wide spectrum of applications in different domains, such as E-Commerce, social analysis, web services, and bioinformatics. Unfortunately, due to the difficulty of the problem in graph theory, no practical solution has been proposed to solve the issue in large-scale real-world datasets. Existing techniques for maximum clique search on a general graph cannot be applied because the search objective of maximum biclique search is two-dimensional, i.e., we have to consider the size of both parts of the biclique simultaneously. In this paper, we divide the problem into several subproblems each of which is specified using two parameters. These subproblems are derived in a progressive manner, and in each subproblem, we can restrict the search in a very small part of the original bipartite graph. We prove that a logarithmic number of subproblems is enough to guarantee the algorithm correctness. To minimize the computational cost, we show how to reduce significantly the bipartite graph size for each subproblem while preserving the maximum biclique satisfying certain constraints by exploring the properties of one-hop and two-hop neighbors for each vertex. Furthermore, we study the diversified top-k biclique search problem which aims to find k maximal bicliques that cover the most edges in total. The basic idea is to repeatedly find the maximum biclique in the bipartite graph and remove it from the bipartite graph k times. We design an efficient algorithm that considers to share the computation cost among the k results, based on the idea of deriving the same subproblems of different results. We further propose two optimizations to accelerate the computation by pruning the search space with size constraint and refining the candidates in a lazy manner. We use several real datasets from various application domains, one of which contains over 300 million vertices and 1.3 billion edges, to demonstrate the high efficiency and scalability of our proposed solution. It is reported that 50% improvement on recall can be achieved after applying our method in Alibaba Group to identify the fraudulent transactions in their e-commerce networks. This further demonstrates the usefulness of our techniques in practice. More... »
PAGES1-25
http://scigraph.springernature.com/pub.10.1007/s00778-021-00681-6
DOIhttp://dx.doi.org/10.1007/s00778-021-00681-6
DIMENSIONShttps://app.dimensions.ai/details/publication/pub.1147188010
JSON-LD is the canonical representation for SciGraph data.
TIP: You can open this SciGraph record using an external JSON-LD service: JSON-LD Playground Google SDTT
[
{
"@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json",
"about": [
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/08",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Information and Computing Sciences",
"type": "DefinedTerm"
},
{
"id": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/0802",
"inDefinedTermSet": "http://purl.org/au-research/vocabulary/anzsrc-for/2008/",
"name": "Computation Theory and Mathematics",
"type": "DefinedTerm"
}
],
"author": [
{
"affiliation": {
"alternateName": "Alibaba Group, Hangzhou, China",
"id": "http://www.grid.ac/institutes/grid.481558.5",
"name": [
"Alibaba Group, Hangzhou, China"
],
"type": "Organization"
},
"familyName": "Lyu",
"givenName": "Bingqing",
"id": "sg:person.015377124275.09",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015377124275.09"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "The University of Technology Sydney, Ultimo, Australia",
"id": "http://www.grid.ac/institutes/grid.117476.2",
"name": [
"The University of Technology Sydney, Ultimo, Australia"
],
"type": "Organization"
},
"familyName": "Qin",
"givenName": "Lu",
"id": "sg:person.016134045533.77",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016134045533.77"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "The University of New South Wales, Sydney, Australia",
"id": "http://www.grid.ac/institutes/grid.1005.4",
"name": [
"The University of New South Wales, Sydney, Australia"
],
"type": "Organization"
},
"familyName": "Lin",
"givenName": "Xuemin",
"id": "sg:person.07565515413.36",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.07565515413.36"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "The University of Technology Sydney, Ultimo, Australia",
"id": "http://www.grid.ac/institutes/grid.117476.2",
"name": [
"The University of Technology Sydney, Ultimo, Australia"
],
"type": "Organization"
},
"familyName": "Zhang",
"givenName": "Ying",
"id": "sg:person.013335275373.17",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.013335275373.17"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Alibaba Group, Hangzhou, China",
"id": "http://www.grid.ac/institutes/grid.481558.5",
"name": [
"Alibaba Group, Hangzhou, China"
],
"type": "Organization"
},
"familyName": "Qian",
"givenName": "Zhengping",
"id": "sg:person.015150534463.19",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.015150534463.19"
],
"type": "Person"
},
{
"affiliation": {
"alternateName": "Alibaba Group, Hangzhou, China",
"id": "http://www.grid.ac/institutes/grid.481558.5",
"name": [
"Alibaba Group, Hangzhou, China"
],
"type": "Organization"
},
"familyName": "Zhou",
"givenName": "Jingren",
"id": "sg:person.016056427127.67",
"sameAs": [
"https://app.dimensions.ai/discover/publication?and_facet_researcher=ur.016056427127.67"
],
"type": "Person"
}
],
"citation": [
{
"id": "sg:pub.10.1007/978-3-642-11440-3_18",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1015765121",
"https://doi.org/10.1007/978-3-642-11440-3_18"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/11564126_18",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1031573650",
"https://doi.org/10.1007/11564126_18"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-319-07046-9_16",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1005452112",
"https://doi.org/10.1007/978-3-319-07046-9_16"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/s00778-015-0408-z",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1011616176",
"https://doi.org/10.1007/s00778-015-0408-z"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-642-37401-2_21",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1039356519",
"https://doi.org/10.1007/978-3-642-37401-2_21"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-540-27810-8_23",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1005823615",
"https://doi.org/10.1007/978-3-540-27810-8_23"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-319-39817-4_21",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1032990988",
"https://doi.org/10.1007/978-3-319-39817-4_21"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/s10898-006-9039-7",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1031441758",
"https://doi.org/10.1007/s10898-006-9039-7"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1186/1471-2105-15-110",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1020521704",
"https://doi.org/10.1186/1471-2105-15-110"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/11823728_42",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1038116783",
"https://doi.org/10.1007/11823728_42"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/s00778-019-00579-4",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1122119449",
"https://doi.org/10.1007/s00778-019-00579-4"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/3-540-45066-1_22",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1002026040",
"https://doi.org/10.1007/3-540-45066-1_22"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/978-3-319-31204-0_10",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1013911586",
"https://doi.org/10.1007/978-3-319-31204-0_10"
],
"type": "CreativeWork"
},
{
"id": "sg:pub.10.1007/s10898-013-0075-9",
"sameAs": [
"https://app.dimensions.ai/details/publication/pub.1044530102",
"https://doi.org/10.1007/s10898-013-0075-9"
],
"type": "CreativeWork"
}
],
"datePublished": "2022-04-18",
"datePublishedReg": "2022-04-18",
"description": "Maximum biclique search, which finds the biclique with the maximum number of edges in a bipartite graph, is a fundamental problem with a wide spectrum of applications in different domains, such as E-Commerce, social analysis, web services, and bioinformatics. Unfortunately, due to the difficulty of the problem in graph theory, no practical solution has been proposed to solve the issue in large-scale real-world datasets. Existing techniques for maximum clique search on a general graph cannot be applied because the search objective of maximum biclique search is two-dimensional, i.e., we have to consider the size of both parts of the biclique simultaneously. In this paper, we divide the problem into several subproblems each of which is specified using two parameters. These subproblems are derived in a progressive manner, and in each subproblem, we can restrict the search in a very small part of the original bipartite graph. We prove that a logarithmic number of subproblems is enough to guarantee the algorithm correctness. To minimize the computational cost, we show how to reduce significantly the bipartite graph size for each subproblem while preserving the maximum biclique satisfying certain constraints by exploring the properties of one-hop and two-hop neighbors for each vertex. Furthermore, we study the diversified top-k biclique search problem which aims to find k maximal bicliques that cover the most edges in total. The basic idea is to repeatedly find the maximum biclique in the bipartite graph and remove it from the bipartite graph k times. We design an efficient algorithm that considers to share the computation cost among the k results, based on the idea of deriving the same subproblems of different results. We further propose two optimizations to accelerate the computation by pruning the search space with size constraint and refining the candidates in a lazy manner. We use several real datasets from various application domains, one of which contains over 300 million vertices and 1.3 billion edges, to demonstrate the high efficiency and scalability of our proposed solution. It is reported that 50% improvement on recall can be achieved after applying our method in Alibaba Group to identify the fraudulent transactions in their e-commerce networks. This further demonstrates the usefulness of our techniques in practice.",
"genre": "article",
"id": "sg:pub.10.1007/s00778-021-00681-6",
"inLanguage": "en",
"isAccessibleForFree": true,
"isFundedItemOf": [
{
"id": "sg:grant.9783350",
"type": "MonetaryGrant"
},
{
"id": "sg:grant.5127645",
"type": "MonetaryGrant"
},
{
"id": "sg:grant.7074633",
"type": "MonetaryGrant"
},
{
"id": "sg:grant.8676493",
"type": "MonetaryGrant"
},
{
"id": "sg:grant.6809780",
"type": "MonetaryGrant"
},
{
"id": "sg:grant.6712150",
"type": "MonetaryGrant"
}
],
"isPartOf": [
{
"id": "sg:journal.1044889",
"issn": [
"1066-8888",
"0949-877X"
],
"name": "The VLDB Journal",
"publisher": "Springer Nature",
"type": "Periodical"
}
],
"keywords": [
"large-scale real-world datasets",
"bipartite graphs",
"maximum biclique",
"e-commerce network",
"real-world datasets",
"original bipartite graph",
"two-hop neighbors",
"maximum clique search",
"web services",
"fraudulent transactions",
"application domains",
"search problem",
"computation cost",
"lazy manner",
"search objective",
"graph size",
"algorithm correctness",
"e-commerce",
"search space",
"Alibaba Group",
"maximal bicliques",
"real datasets",
"clique search",
"efficient algorithm",
"graph theory",
"same subproblem",
"one-hop",
"computational cost",
"general graphs",
"logarithmic number",
"different domains",
"subproblems",
"bicliques",
"fundamental problem",
"graph",
"certain constraints",
"basic idea",
"practical solution",
"dataset",
"size constraints",
"k times",
"progressive manner",
"vertices",
"search",
"maximum number",
"problem",
"scalability",
"constraints",
"most edges",
"algorithm",
"correctness",
"transactions",
"network",
"solution",
"computation",
"bioinformatics",
"cost",
"edge",
"neighbors",
"domain",
"services",
"idea",
"small part",
"optimization",
"technique",
"theory",
"recall",
"space",
"different results",
"high efficiency",
"applications",
"K results",
"parameters",
"issues",
"number",
"manner",
"efficiency",
"wide spectrum",
"social analysis",
"properties",
"top",
"usefulness",
"results",
"part",
"method",
"size",
"spectra",
"difficulties",
"improvement",
"time",
"objective",
"analysis",
"scale",
"practice",
"candidates",
"group",
"total",
"paper"
],
"name": "Maximum and top-k diversified biclique search at scale",
"pagination": "1-25",
"productId": [
{
"name": "dimensions_id",
"type": "PropertyValue",
"value": [
"pub.1147188010"
]
},
{
"name": "doi",
"type": "PropertyValue",
"value": [
"10.1007/s00778-021-00681-6"
]
}
],
"sameAs": [
"https://doi.org/10.1007/s00778-021-00681-6",
"https://app.dimensions.ai/details/publication/pub.1147188010"
],
"sdDataset": "articles",
"sdDatePublished": "2022-06-01T22:25",
"sdLicense": "https://scigraph.springernature.com/explorer/license/",
"sdPublisher": {
"name": "Springer Nature - SN SciGraph project",
"type": "Organization"
},
"sdSource": "s3://com-springernature-scigraph/baseset/20220601/entities/gbq_results/article/article_934.jsonl",
"type": "ScholarlyArticle",
"url": "https://doi.org/10.1007/s00778-021-00681-6"
}
]
Download the RDF metadata as: json-ld nt turtle xml License info
JSON-LD is a popular format for linked data which is fully compatible with JSON.
curl -H 'Accept: application/ld+json' 'https://scigraph.springernature.com/pub.10.1007/s00778-021-00681-6'
N-Triples is a line-based linked data format ideal for batch operations.
curl -H 'Accept: application/n-triples' 'https://scigraph.springernature.com/pub.10.1007/s00778-021-00681-6'
Turtle is a human-readable linked data format.
curl -H 'Accept: text/turtle' 'https://scigraph.springernature.com/pub.10.1007/s00778-021-00681-6'
RDF/XML is a standard XML format for linked data.
curl -H 'Accept: application/rdf+xml' 'https://scigraph.springernature.com/pub.10.1007/s00778-021-00681-6'
This table displays all metadata directly associated to this object as RDF triples.
259 TRIPLES
22 PREDICATES
135 URIs
113 LITERALS
4 BLANK NODES