Efficient continuous multi-query processing over data streams

Ζερβάκης, Ελευθέριος

dc.contributor.advisor	Τρυφωνόπουλος, Χρήστος
dc.contributor.author	Ζερβάκης, Ελευθέριος
dc.date.accessioned	2024-08-27T10:36:59Z
dc.date.issued	2019-07-10
dc.identifier.uri	https://amitos.library.uop.gr/xmlui/handle/123456789/8181
dc.identifier.uri	http://dx.doi.org/10.26263/amitos-1683
dc.description	Δ.Δ. 9	el
dc.description.abstract	In the modern digital era, the creation and availability of new information has increased exponentially. A plethora of information sources, such as news delivery sites, knowledge bases, and social networks, constantly make new content available at an overwhelming pace. To assist users in coping with the vast amount of newly generated information the Information Filtering (IF) paradigm was introduced. IF applications aim at assisting users in information discovery and enable users to cope with the information avalanche and the cognitive overload associated with it. In an IF scenario, users or services, express their information needs (implicitly or explicitly) through appropriate interfaces, tools and languages and submit profiles (or continuous queries) to a system or service. In this way, users create subscriptions that are continuously matched (by the system or service) against a stream of newly published content, and generate notifications whenever new items that match users’ information needs are published. The filtering problem is of high importance and needs to be solved efficiently, since servers are expected to handle millions of queries and high rates of incoming content. In our work, we examine the research problem of developing efficient and effective algorithms that are able to capture the nature of information streams through the form of continuous multi-query answering. To this end, we choose to explore solutions under the domains of textual information filtering, ontology publish/subscribe systems and evolving graph stream environments. Finally, we design, implement and present a fully-functional information filtering system that showcases the usefulness of the IF paradigm and provides the basis for developers to build added-value IF services in a number of different information domains. At first we examine the information filtering paradigm, under the scope of textual information filtering while employing the Boolean data model. In this setup clients subscribe to a server with continuous queries that express their information needs and get notified every time appropriate information is published. To perform this task in an efficient way, servers employ indexing schemes that support fast matches of the incoming information with the query database. However, state-of-the-art indexing schemes are sensitive to the query insertion order and cannot adapt to an evolving query workload, degrading the filtering performance over time. In this line of work, we present an adaptive trie-based algorithm that outperforms current methods by relying on query statistics to reorganize the query database. In our research, we explore query database reorganization techniques and demonstrate that the nature of the constructed tries, rather than their compactness, is the determining factor for efficient filtering performance. Our algorithm does not depend on the order of insertion of queries in the database, manages to cluster queries even when clustering possibilities are limited, and achieves two orders of magnitude filtering time improvement over its state-of-the-art competitors. Finally, we demonstrate that our solution is easily extensible to multi-core machines by providing an implementation in a multi-core environment. In the continuation of our work, we investigate publish/subscribe ontology systems; we envision a publish/subscribe ontology system that is able to index large numbers of expressive continuous queries and filter them against RDF data that arrive in a streaming fashion. To this end, we propose a SPARQL extension that supports the creation of full-text continuous queries and propose a family of main-memory query indexing algorithms which perform matching at low complexity and minimal filtering time. We experimentally compare our approach against a state-of-the-art competitor (extended to handle indexing of full-text queries) both on structural and full-text tasks using real-world data. The experimental results demonstrate that our approach yields two orders of magnitude faster performance than the competitor in all types of filtering tasks. Subsequently, in our research we study the domain of evolving graphs that have a wide range of applications involving social networks, knowledge bases and biological interactions. The evolution of a graph in such scenarios can yield important insights about the nature and activities of the underlying network, which can then be utilized for applications such as news dissemination, network monitoring, and content curation. Capturing the continuous evolution of a graph can be achieved by long-standing sub-graph queries. Although, for many applications this can only be achieved by a set of queries, state-of-the-art approaches focus on a single query scenario. Therefore, in this line of work, we introduce the notion of continuous multi-query processing over graph streams and discuss its application to a number of use cases. To this end, we developed a novel algorithmic solution for efficient multi-query evaluation against a stream of graph updates and experimentally demonstrated its applicability. Our results against three baseline approaches and the graph database Neo4j, using real-world and synthetic datasets, confirm a two orders of magnitude improvement of the proposed solution. Finally, we conclude our research with the design and development of a fullyfledged textual information filtering system coined Ping. The Ping IF system is a fully-functional content-based information filtering system aiming (i) to showcase the realizability of information filtering and (ii) to explore and test the suitability of the existing technological arsenal for information filtering tasks. The proposed system is entirely based upon open-source tools and components, is customizable enough to be adapted for different textual information filtering tasks, and puts emphasis in user profile expressivity, intuitive UIs, and timely information delivery. To assess the customizability of Ping, we deployed it in two distinct information filtering scenarios, while to assess its performance we designed and conducted a series of experiments for both scenarios.	el
dc.format.extent	σελ. 255	el
dc.language.iso	en	el
dc.publisher	Πανεπιστήμιο Πελοποννήσου	el
dc.rights	Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/gr/	*
dc.title	Efficient continuous multi-query processing over data streams	el
dc.title.alternative	Αποτελεσματική συνεχής επεξεργασία πολλαπλών ερωτημάτων μέσω ροών δεδομένων	el
dc.type	Διδακτορική διατριβή	el
dc.contributor.committee	Σκιαδόπουλος, Σπύρος
dc.contributor.committee	Βασιλάκης, Κώστας
dc.contributor.department	Τμήμα Πληροφορικής και Τηλεπικοινωνιών	el
dc.contributor.faculty	Σχολή Οικονομίας, Διοίκησης και Πληροφορικής	el
dcterms.embargoTerms	3 years	el
dcterms.embargoLiftDate	2027-08-27T10:36:59Z

Files in this item

Name:: zervakis-thesis-v11.pdf
Size:: 3.602Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Τμήμα Πληροφορικής και Τηλεπικοινωνιών (Δ. Δ.)

Show simple item record

Except where otherwise noted, this item's license is described as
Αναφορά Δημιουργού-Μη Εμπορική Χρήση-Όχι Παράγωγα Έργα 3.0 Ελλάδα