ACM Transactions on

Spatial Algorithms and Systems (TSAS)

Latest Articles

Efficient Computation of the Optimal Accessible Location for a Group of Mobile Agents

Nowadays, people can access location-based services (LBSs) as a group via mobile devices to plan their daily activities with friends and relatives. In... (more)

Geography and Routing in the Internet

The Internet is a network of networks consisting of tens of thousands of Autonomous Systems (ASes). These ASes connect to each other in different forms to enable the “global” Internet communication. In this study, we investigate the geographical characteristics of the visible Internet as well as examine the relation between... (more)

2D Vector Map Fragile Watermarking with Region Location

Locating the original region of tampered features is a challenging task for existing 2D vector map fragile watermarking methods. This article presents... (more)


 ACM Transactions on Spatial Algorithms and Systems (TSAS) is a scholarly journal that publishes high-quality papers on all aspects of spatial algorithms and systems and closely related disciplines. The journal welcomes articles on any of the above topics or closely related disciplines in the context of various computing architectures including parallel and distributed networks of computers, multiprocessing computers, or new mobile devices and sensors. READ MORE

Call-for-papers: ACM TSAS has issued a call for papers for a Special issue on Urban Mobility: Algorithms and Systems. Please use manuscriptcentral ( to submit articles, check the status of articles and for reviewing tasks.

Forthcoming Articles

Editorial: Special Issue on the Best Papers from the 2017 ACM SIGSPATIAL Conference

Personalized Travel Time Prediction using a Small Number of Probe Vehicles

Predicting the travel time of a path is an important task in route planning and navigation applications. As more GPS floating car data has been collected to monitor urban traffic, GPS trajectories of floating cars have been frequently used to predict path travel time. However, most trajectory-based methods rely on deploying GPS devices and collect real-time data on a large taxi fleet, which can be expensive and unreliable in smaller cities. This work deals with the problem of predicting path travel time when only a small number of GPS floating cars are available. We developed an algorithm that learns local congestion patterns of a compact set of frequently shared paths from historical data. Given a travel time prediction query, we identify the current congestion patterns around the query path from recent trajectories, then infer its travel time in the near future. Driver identities are also used in predicting personalized travel time. Experimental results using 10-25 taxis in urban areas of Shenzhen, China show that personal prediction has on average 3.4 minutes of error on trips of 10-65 minutes. This result improves the baseline approach of using purely historical trajectories by 16.8% on average, over four regions with various degree of path regularity. It also outperforms a state-of-the-art travel time prediction method that uses both historical trajectories and real-time trajectories.

Dictionary Compression in Point Cloud Data Management

Nowadays, massive amounts of point cloud data can be collected thanks to advances in data acquisition and processing technologies like dense image matching and airborne LiDAR scanning. With the increase in volume and precision, point cloud data offers a useful source of information for natural resource management, urban planning, self-driving cars and more. At the same time, the scale at which point cloud data is produced, introduces management challenges: it is important to achieve efficiency both in terms of querying performance and space requirements. Traditional file-based solutions to point cloud management offer space efficiency, however, cannot scale to such massive data and provide the declarative power of a DBMS. In this paper, we propose a time- and space-efficient solution to storing and managing point cloud data in main memory column-store DBMS. Our solution, Space-Filling Curve Dictionary-Based Compression (SFC-DBC), employs dictionary-based compression in the spatial data management domain and enhances it with indexing capabilities by using space-filling curves. SFC-DBC does so by constructing the space-filling curve over a compressed, artificially introduced dictionary space. Consequently, SFC-DBC significantly optimizes query execution, and yet does not require additional storage resources, compared to traditional dictionary-based compression. With respect to space-filling curve-based approaches, it minimizes storage footprint and increases resilience to skew. As a proof of concept, we develop and evaluate our approach as a research prototype in the context of SAP HANA. SFC-DBC outperforms other dictionary-based compression schemes by up to 61% in terms of space and up to 9.4x in terms of query performan

Regrets in routing networks: measuring the impact of routing apps in traffic

The impact of the recent increase in usage of routing apps on road traffic remains uncertain to this day. For the first time, the article introduces a criterion to evaluate a distance between an observed state of traffic and the user equilibrium of the traffic assignment: the average marginal regret. The average marginal regret provides a quantitative measure of the impact of routing apps on traffic using only link flows, link travel times and travel demand. In non-atomic routing games (or static traffic assignment models), the average marginal regret is a measure of how selfish drivers behave. Unlike the price of anarchy, the average marginal regret in the routing game can be compute in polynomial time without any knowledge on user equilibria and social optimal states of traffic. First, on a small example, the article demonstrates that the average marginal regret is more appropriate than comparing flows, travel times, or total cost (like in literature) to define proximity between an observed state of traffic and an user equilibrium state of traffic. Then, the article provides simulation-based assessment of the impact of routing apps on traffic using a routing game approach and models we develop for app usage. Experiments on two different models of app usage and three networks (including the whole LA networks with more than 50,000 nodes) demonstrates that the average marginal regret decreases with an increase of app usage. Finally, using a toy example, the article concludes that app usage can be the new Braess paradox.

Crowd Replication: Sensing-Assisted Quantification of Human Behavior in Public Spaces

Public spaces serve an important role in contemporary cities, providing a medium for passive and active engagement and a place for discovery and reflection. A central challenge for public space design is to evaluate whether a given space indeed promotes the desired activities. In this paper, we contribute by developing crowd replication as a novel sensor-assisted method for quantifying human behavior within public spaces. The idea in crowd replication is to task a researcher with replicating the behavior of people using a space while being instrumented with a mobile device that captures a sensor trace of the replicated behaviors. Through mathematical modeling, behavioral indicators extracted from the replicated trajectories can be extrapolated to represent a larger population. We develop the overall method, and propose a novel pedestrian sensing solution for capturing high-quality information about human behavior within public spaces. We validate our approach through a case study carried out within a representative example of a metropolitan-scale public space. Our results show that crowd-replicated data closely mirrors the real human dynamics in public spaces and provides a useful building block for human-scale analytics. We also demonstrate that crowd replication can reduce overall data collection efforts while producing high quality and unbiased data about the behaviors and activities of people within the space. Finally, we demonstrate that our pedestrian sensing solution enables crowd replication to cost-effectively capture fine-grained and detailed indicators of liveliness, extent of social interaction, and other factors congruent with the common understandings about the qualities of public spaces.

A Data-Driven Framework for Long-Range Aircraft Conflict Detection and Resolution

At the present time, there is no mechanism for Air Navigation Service Providers to probe new flight plans filed by the Airlines Operation Centers against the existing approved flight plans to see if they are likely to cause conflicts or bring sector traffic densities beyond control. In the current Air Traffic Control operations, aircraft conflicts and sector traffic densities are resolved tactically, increasing workload and leading to potential safety risks and loss of capacity and efficiency. We propose a novel Data-Driven Framework to address a long-range aircraft conflict detection and resolution (CDR) problem. Given a set of predicted trajectories, the framework declares a conflict when a protected zone of an aircraft on its trajectory is infringed upon by another aircraft. The framework resolves the conflict by prescribing an alternative solution that is optimized by perturbing at least one of the trajectories involved in the conflict. To achieve this, the framework learns from descriptive patterns of historical trajectories and pertinent weather observations and builds a Hidden Markov Model. Using a variant of the Viterbi algorithm, the framework avoids the airspace volume in which the conflict is detected and generates a new optimal trajectory. The key concept upon which the framework is built is the assumption that the airspace is nothing more than a horizontally and vertically concatenated set of spatio-temporal data cubes where each cube is considered as an atomic unit. We evaluate our framework using real trajectory datasets with pertinent weather observations from two continents and demonstrate its effectiveness.

Flood Risk Analysis on Terrains

An important problem in terrain analysis is modeling how water flows across a terrain and creates floods by filling up depressions. In this paper we study the \emph{flooding query} problem: Preprocess a given terrain $\terrain$, represented as a triangulated $xy$-monotone surface with $n$ vertices, into a data structure so that for a query rain region $\varregion$ and a query point $q$ on $\terrain$, one can quickly determine how much rain has to fall in $\varregion$ so that $q$ is flooded. Available terrain data is often subject to uncertainty which must be incorporated into the terrain analysis. By representing the uncertainty in the terrain data explicitly, we can develop methods for flood risk analysis that properly incorporate terrain uncertainty when reporting what areas are at risk of flooding. We present two results. First, we present an $\BigO(n\log{n})$-time algorithm for preprocessing $\terrain$ with a linear-size data structure that can answer a flooding query in $\BigO(|\varregion|+\varrainpoints \log n)$ time, where $|\varregion|$ is the number of vertices in $\varregion$, $\varrainpoints$ is the number of minima of the terrain at which rain is falling, and $n$ is the number of vertices of the terrain. Next, we extend this data structure to handle ``uncertain'' terrains, using a standard Monte Carlo method. Given a probability distribution on terrain data, our data structure returns the probability of a query point being flooded if a specified amount of rain falls on a query region.

Understanding Metropolitan Crowd Mobility via Mobile Cellular Accessing Data

Understanding crowd mobility in a metropolitan area is extremely valuable for city planners and decision makers. However, crowd mobility is a relatively new area of research and has significant technical challenges: lack of large-scale fine-grained data, difficulties in large-scale trajectory processing, and issues with spatial resolution. In this paper, we propose a novel approach for analyzing crowd mobility on a ``city block'' level. We first propose algorithms to detect homes, working places and stay regions for individual user trajectories. Next, we propose a method for analyzing commute patterns and spatial correlation at a city block level. Using mobile cellular accessing trace data collected from users in Shanghai, we discover commute patterns, spatial correlation rules as well as a hidden structure of the city based on crowd mobility analysis. Therefore, our proposed methods contribute to our understanding of human mobility in a large metropolitan area.

Fast Computation of Clustered Many-To-Many Shortest Paths and Its Application to Map Matching

We examine the problem of computing the shortest paths in a transportation network from a set of geographically clustered source nodes to a set of target nodes. Such many-to-many shortest path computations are an essential and compute-intensive part of many emerging applications that involve map matching of imprecise spatial trajectories. Existing solutions to this problem mostly conform to the obvious approach of performing a single-source shortest path computation for each source node. We present an alternative approach that exploits the clustered nature of the source nodes. Specifically, we rely on the observation that paths originating from a cluster of nodes typically exit the source regions boundary through a small number of nodes. Evaluations on a real road network show that the proposed solution is several times faster than the conventional approach when the source nodes are densely clustered. We also demonstrate that the presented technique is capable of substantially accelerating the map matching of coarse-grained location trajectories.

Imbalance in Mobility-on-Demand Systems: A Stochastic Model and Distributed Control Approach

The control of large scale mobility-on-demand systems is an emerging topic that has been considered from a system theoretical, transportation scientific and algorithmic point of view. Existing formulations model mobility-on-demand systems in a queuing theoretical, network flow based or continuous, kinematic framework. In this work, we model a mobility-on-demand system as a stochastic differential equation which represents a generalization of previous approaches. Based on the model, we define system imbalance as the difference of the stochastic processes of service request arrival and vehicle arrival. We formally derive the first moment of the system imbalance for an imbalance control strategy that consists of a feedforward control approach (reference trajectory) and an additional feedback component. A distributed feedback control policy is defined that averages the imbalance across the system and therefore aims at a uniform quality of service distribution. Finally, we verify our results in a high-fidelity and large-scale agent-based simulation of a hypothetical mobility-on-demand system.

Accurate Detection of Road Network Anomaly by Understanding Crowd's Driving Strategies from Human Mobility

Thousands of roads change with the city's development every day. Detecting the road closures and traffic rules changes is essential for dynamic route planning and navigation serving. In this paper, we propose a driving behavior modeling based method for accurately and effectively detecting the road anomalies. In the first step, we detect the areas of anomalies by using the deviation between drivers? actual and expected behaviors. To discover the cause of anomalies, we explore the drivers? short-term destination and find the crucial link pairs in anomalous areas through a novel optimized link entanglements () search algorithm, namely, Select LES (SELES) algorithm. Finally, we analyze the crowd?s driving patterns to explain the road network anomalies further. Experiments on a very large GPS dataset demonstrate that the proposed approach outperforms the existing methods in term of both accuracy and effectiveness.

Distributed Spatial and Spatio-Temporal Join on Apache Spark

Effective processing of extremely large volumes of spatial data has led to many organizations employing distributed processing frameworks. Apache Spark is one such open-source framework that is enjoying widespread adoption. Within this data space, it is important to note that most of the observational data has a temporal component, or timestamp. In order to perform advanced analytics and gain insights, the temporal component becomes equally important as the spatial and attribute components. In this paper, we detail several variants of a spatial join operation that addresses both spatial, temporal, and attribute-based joins. Our spatial join technique differs from other approaches in that it combines spatial, temporal, and attribute predicates in the join operator. In addition,our spatio-temporal join algorithm and implementation differs from others in that it runs in commercial off-the-shelf (COTS) application. The users of this functionality are assumed to be GIS analysts with little if any knowledge of the implementation details of spatio-temporal joins or distributed processing. They are comfortable using simple tools that do not provide the ability to tweak the configuration of the algorithm or processing environment. The spatio-temporal join algorithm behind the tool must always succeed, regardless of input data parameters (e.g., it can be highly irregularly distributed, contain large numbers of coincident points, it can be extremely large, etc.). These factors combine to place additional requirements on the algorithm that are uncommonly found in the traditional research environment. Our spatio-temporal join algorithm was shipped as part of the GeoAnalytics Server.

From Rocks to Pebbles: Smoothing Spatiotemporal Data Streams in an Overlay of Sensors

Spatiotemporal streams are prone to data quality issues such as missing, duplicated and delayed data---when data generating sensors malfunction, data transmissions experience problems, or when data are processed improperly. However, many important real-time applications rely on the continuous availability of stream values, e.g., to monitor traffic flow, resource usage, weather phenomena, etc. Other non real-time applications that support continuous or offline historical analytics also require high quality data to avoid producing misleading output such as false positives, erroneous conclusions and decisions. In this paper, we study the problem of smoothing streams produced by an overlay of sensors. We present nonparametric (data-driven, distribution free) statistical methods to provide an uninterrupted stream of high-quality spatiotemporal data to real-time applications, even when the raw stream suffers data quality issues, like noise or missing values. Our novel family of robust methods computes most likely values (MLVs) that could be used as proxies for data of questionable quality. The methods make use of a partition of the monitored area into cells to compute MLVs based on historical data and the deviation from normalcy in neighboring spatial cells, in a way that outperforms standard regression or interpolation. Our methods use incremental computation for efficiency, and they differ in how the deviations are normalized, e.g., with respect to zeroth order, first order and second order moments. We use three real data sets to run a suite of experiments, and empirically demonstrate the superiority of the method that uses normalization with respect to variability.

Efficient Scheduling of Generalized Group Trips in Road Networks

In this paper, we introduce generalized group trip scheduling (GGTS) queries that enable friends and families to perform activities at different points of interests (POIs) like a shopping center, a restaurant and a pharmacy with the minimum total travel distance. Trip planning and scheduling for groups, an important class of location-based services (LBSs), have recently received attention from the researchers. However, both group trip planning (GTP) and group trip scheduling (GTS) queries have restrictions: a GTP query assumes that all group members visit all required POIs together whereas a GTS query requires that each POI is visited by a single group member. A GGTS query is more general and allows any number of group members to visit a POI together. We propose an efficient algorithm to evaluate the exact answers for GGTS queries in road networks. Since finding the answer for a GGTS query is a NP-hard problem, to reduce the processing overhead for a large group size or a large number of required POI types or a large POI dataset, we propose two heuristic solutions, trip scheduling heuristic (TSH) and search space refinement heuristic (SRH), for processing GGTS queries. Extensive experiments with real datasets show that our optimal algorithm is preferable for small parameter settings and the heuristic solutions reduce the processing overhead significantly in return of sacrificing the accuracy slightly.

To Buy or Not to Buy: Computing Value of Spatiotemporal Information

Location data from mobile devices is a sensitive yet valuable commodity for location-based services and advertising. We investigate the intrinsic value of location data in the context of strong privacy, where location information is only available from end users via purchase. We present an algorithm to compute the expected value of location data from a user, without access to the specific coordinates of the location data point. We use decision-theoretic techniques to provide a principled way for a potential buyer to make purchasing decisions about private user location data. We illustrate our approach in three scenarios: the delivery of targeted ads specific to a user?s home location, the estimation of traffic speed, and location prediction. In all three cases, the methodology leads to quantifiably better purchasing decisions than competing methods.

Congestion-Aware Ride-Sharing

In its current form, ride-sharing is responsible for a small fraction of traffic load compared to other transportation modes, especially private vehicles. As its benefits became more evident, and obstacles, e.g., lack of liability legislation, that may hinder its larger scale adoption are being overcome, ride-sharing will be a more common mode of transportation. In particular, autonomous vehicles (AVs) are showing their proficiency on the roads, which may also catalize ride-sharing ubiquity. For example, while an AV owner is at work, he may find appealing to offer his AV as a service or rent it to Uber so that the vehicle serves others? transportation requests. Furthermore, this disruptive technology is backed up by companies like Google (Waymo), Tesla and Uber. Therefore, ride-sharing will soon become a source of traffic congestion itself. In this paper, we present an efficient congestion-aware ride-sharing algorithm which, instead of finding optimal travel plans based on traffic load generated by other means of transportation, it computes optimal travel plans for thousands of ride-sharing requests within a time interval. Note that in this problem, an optimal travel plan for a group of requests may affect an already computed travel plan for another concurrent group of requests, therefore plans cannot be isolated from each other.

Stepping Stone Graph: A Graph for Finding Movement Corridors using Sparse Trajectories

There are many real world applications that require identifying crowd movements such as identifying movement corridors in cities and most popular paths. If one is not given user trajectories but rather sporadic location data, such as location based social network data, finding movement related information becomes difficult. Rather than processing all points in a data set given a query, a clever approach is to construct a graph, based on user locations, and query this graph for answer questions such as shortest paths, most popular paths and movement corridors. A popular graph is the shortest path graph. However the shortest path graph can be inefficient and ineffective for analysing movement data, as it calculates the graph edges considering the shortest paths over all the points in a data set. Therefore, edge sets resulting from shortest path graphs are usually very restrictive and not suitable for movement analysis because of its global view of the data set. We propose the stepping stone graph, which calculates the graph considering point pairs rather than all points, that focuses on possible local movements, making it both efficient and effective for location based social network related data. We demonstrate its uses by applying it in the Location Based Social Network domain and comparing with the shortest path graph. We also compare its properties to a range of other graphs and demonstrate how stepping stone graph relates to Gabriel graph, relative neighbourhood graph and Delaunay triangulation.

An Algorithm for Road Closure Detection from Vehicle Probe Data

We developed an algorithm for automatically detecting road closures by monitoring vehicle probe data. The algorithm applies to a large class of roads and in the implementation presented was optimized for lower-volume roads. It is suitable for batch as well as real-time applications, the latter class being the most valuable in order to guarantee a continuously up-to-date traffic product. The algorithm compares the likelihood that every road segment meeting certain requirements is closed or open, and triggers an alert whenever the likelihood of the observed probe activity is too small given a historical model. We implemented the algorithm, and tested it on 12 metro areas in Western Europe. After optimizing parameters for performance on lower-volume roads, we obtained a precision of 92% on those roads, and of 80% overall.

Personalized Visited-POI Assignment to Individual Raw GPS Trajectories

Knowledge discovery from GPS trajectory data is an important topic in several areas including data mining, human behavior analysis, and user modeling. This paper proposes a task that assigns personalized visited-POIs. Its goal is to estimate fine-grained and pre-defined locations (i.e., points of interest (POI)) that a user actually visits and assign the visited location information to the corresponding span of the users (personal) GPS trajectories. We also introduce a novel algorithm to solve this assignment task. First, we exhaustively extract stay points as candidates for significant locations using a variant of a conventional stay-point extraction method. Then we select true significant locations and simultaneously assign visited-POIs to them by considering various aspects, which we formulate in integer linear programming. Experimental results conducted on an actual user dataset show that our method achieves higher accuracy in the visited-POI assignment task than the various cascaded procedures of conventional methods.

All ACM Journals | See Full Journal Index

Search TSAS
enter search term and/or author name