Several indexing techniques have been proposed to efficiently answer top-k spatial-textual queries in the last decade. However, all of these approaches focus on answering one query at a time. In contrast, how to design efficient algorithms that can exploit similarities between incoming queries to improve performance has received little attention. In this paper, we study a series of efficient approaches to batch process multiple top-k spatial-textual queries concurrently. We carefully design a variety of indexing structures for the problem space by exploring the effect of prioritizing spatial and textual properties on system performance. Specifically, we present an efficient traversal method, SF-SEP over an existing space-prioritized index structure. Then, we propose a new space-prioritized index structure, the MIR-Tree to support a filter-and-refine based technique, SF-GRP. To support the processing of text-intensive data, we propose an augmented, inverted indexing structure that can easily be added into existing text search engine architectures, and a novel traversal method for batch processing of the queries. In all of these approaches, the goal is to improve the overall performance by sharing the I/O costs of similar queries. Finally, we demonstrate significant I/O savings in our algorithms over traditional approaches by extensive experiments on three real datasets, and compare how properties of different datasets affect the performance. Many applications in streaming, micro-batching of continuous queries, and privacy-aware search can benefit from this line of work.
We propose a novel indexing and querying method for trajectories constrained in a road network. We aim to provide efficient algorithms for various types of spatio-temporal queries that involve routing in road networks such as 1) finding moving objects that have traveled along a given path during a given time interval, 2) extracting all paths traveled after a given spatio-temporal context, 3) enumerating all paths between two locations traveled during a certain time interval. Unlike the existing methods in spatial database research, we employ indexing techniques and algorithms from string processing. This idea is based on the fact that we can represent spatial paths as strings, because trajectories in a network are represented as sequences of road segment IDs. The proposed SNT-index introduces two novel concepts to trajectory indexing. The first is FM-index, a compact in-memory data structure for pattern matching. The second is an inverse suffix array, which allows the FM-index to be integrated with the temporal information stored in a forest of B+-trees. Thanks to these concepts, we can reduce the number of B+-tree accesses required by the query processing algorithms to a constant number, something that cannot be achieved with existing methods. Although an FM-index is essentially a static index, we also propose a practical method of appending new data to the index. Finally, experiments show that our method can process the target queries for more than one million trajectories in a few tens of milliseconds, which is significantly faster than the baseline algorithms without string algorithms.
Path planning for walking characters in complicated virtual environments is a fundamental task in simulations and games. In this paper, we present an improved definition of the Explicit Corridor Map (ECM), a navigation mesh that allows efficient path planning and crowd simulation for disk-shaped characters of any radius. The ECM is a medial axis (MA) annotated with nearest-obstacle information. For a 2D environment with n obstacle vertices, the ECM has size O(n) and can be computed in O(n log n) time. In 3D spaces, characters are constrained to walkable surfaces which form the walkable environment (WE). This can be converted to a multi-layered environment (MLE) in which multiple planar layers are connected by line segment connections. Typical real-world examples are multi-storey buildings, train stations, and sports stadiums. We define the medial axis and the ECM for multi-layered environments, based on projected distances on the ground plane. For an MLE with n obstacle points and k connections, the MA has size O(n). We present an improved algorithm that constructs the MA and ECM in O(n log n log k) time. Our implementations show that the ECM can be computed efficiently for large 2D and multi-layered environments, and that it can be used to compute paths within milliseconds. This enables simulations of large virtual crowds of heterogeneous characters in real-time.
Data analytics has a significantly increasing impact on tackling various societal challenges. In this paper, we investigate how information from several heterogeneous and publicly available sources can be used to discover insights and make predictions about the spatial distribution of crime in large urban environments, thus contributing to the safety of citizens. A series of important research questions is addressed. First, we examine how useful different types of data are for the task of crime levels prediction, focusing especially on how prediction accuracy can be improved by combining data from multiple information sources. Then, we look into individual features, aiming to identify and quantify the most important factors. Finally, we drill down to different crime types, elaborating on how the prediction accuracy and the importance of individual features vary across them. Our analysis involves 6 different datasets, from which more than 3,000 features are extracted, filtered, and used to learn models for predicting crime rates across 11 different crime categories. Our results indicate that combining data from multiple information sources can significantly improve prediction accuracy. They also highlight which features affect prediction accuracy the most, as well as for which particular crime categories the predictions are more accurate.
In this paper we present a search problem in which mobile agents are searching for static resources. Each agent wants to obtain exactly one resource. Both agents and resources are spatially located on a road network and the movement of the agents is constrained to the road network. This problem applies to various transportation applications including: vehicles (agents) searching for parking (resources) and taxicabs (agents) searching for clients to pick up (resources). In this work, we design search algorithms for such scenarios. We model the problem in different scenarios that vary based on the level of information that is available to the agents. These scenarios vary from: scenarios in which agents have complete information about other agents and resources, to scenarios in which agents only have access to a fraction of the data about the availability of resources (uncertain data). We also propose pricing schemes that incentivize vehicles to search for resources in a way that benefits the system and the environment. Our proposed algorithms were tested in a simulation environment that uses real-world data. We were able to attain up to 40% improvements over other approaches that were tested against our algorithms.
This study proposes a method to find intersections at which many cars deviate from optimal routes using global positioning system (GPS) tracking data. Such deviations indicate that on-road navigation aids, such as car navigation systems (CNS) and road signage, are not readily available. For continuous improvement of on-road navigation, developing a method to automatically find intersections with high deviation rates is an important task. If all destinations in a drive are known, deviations can be enumerated as a difference in the optimal route that goes through all the destinations and the actual GPS data; however, such a set of destinations are not always queried to CNSs and the optimal route is unknown. To estimate deviation rates without knowing such destinations, we divides the GPS tracks into sets of sub-sequences, and consider the optimal route from the start to the end of sub-sequence. An approximate deviation rate is estimated by enumerating differences in the optimal route and the actual GPS data over all sub-sequences. We applied the proposed method to 3,843 GPS tracks collected from visitor drivers in Kyoto city. Thresholding the estimated deviation rate yielded 39 intersections from 14,543 candidates. The results showed a certain level of correlation between the obtained deviations and rerouting locations from actual CNS data. We also found several intersections where faulty route suggestions are provided by CNSs.