Several indexing techniques have been proposed to efficiently answer top-k spatial-textual queries in the last decade. However, all of these approaches focus on answering one query at a time. In contrast, how to design efficient algorithms that can exploit similarities between incoming queries to improve performance has received little attention. In this paper, we study a series of efficient approaches to batch process multiple top-k spatial-textual queries concurrently. We carefully design a variety of indexing structures for the problem space by exploring the effect of prioritizing spatial and textual properties on system performance. Specifically, we present an efficient traversal method, SF-SEP over an existing space-prioritized index structure. Then, we propose a new space-prioritized index structure, the MIR-Tree to support a filter-and-refine based technique, SF-GRP. To support the processing of text-intensive data, we propose an augmented, inverted indexing structure that can easily be added into existing text search engine architectures, and a novel traversal method for batch processing of the queries. In all of these approaches, the goal is to improve the overall performance by sharing the I/O costs of similar queries. Finally, we demonstrate significant I/O savings in our algorithms over traditional approaches by extensive experiments on three real datasets, and compare how properties of different datasets affect the performance. Many applications in streaming, micro-batching of continuous queries, and privacy-aware search can benefit from this line of work.
Data analytics has a significantly increasing impact on tackling various societal challenges. In this paper, we investigate how information from several heterogeneous and publicly available sources can be used to discover insights and make predictions about the spatial distribution of crime in large urban environments, thus contributing to the safety of citizens. A series of important research questions is addressed. First, we examine how useful different types of data are for the task of crime levels prediction, focusing especially on how prediction accuracy can be improved by combining data from multiple information sources. Then, we look into individual features, aiming to identify and quantify the most important factors. Finally, we drill down to different crime types, elaborating on how the prediction accuracy and the importance of individual features vary across them. Our analysis involves 6 different datasets, from which more than 3,000 features are extracted, filtered, and used to learn models for predicting crime rates across 11 different crime categories. Our results indicate that combining data from multiple information sources can significantly improve prediction accuracy. They also highlight which features affect prediction accuracy the most, as well as for which particular crime categories the predictions are more accurate.
In this paper we present a search problem in which mobile agents are searching for static resources. Each agent wants to obtain exactly one resource. Both agents and resources are spatially located on a road network and the movement of the agents is constrained to the road network. This problem applies to various transportation applications including: vehicles (agents) searching for parking (resources) and taxicabs (agents) searching for clients to pick up (resources). In this work, we design search algorithms for such scenarios. We model the problem in different scenarios that vary based on the level of information that is available to the agents. These scenarios vary from: scenarios in which agents have complete information about other agents and resources, to scenarios in which agents only have access to a fraction of the data about the availability of resources (uncertain data). We also propose pricing schemes that incentivize vehicles to search for resources in a way that benefits the system and the environment. Our proposed algorithms were tested in a simulation environment that uses real-world data. We were able to attain up to 40% improvements over other approaches that were tested against our algorithms.