Understanding movement behaviors is critical for urban mobility and transport problems, including robust path matching, behavior analysis, and anomaly detection. We investigate a graph-based, probabilistic method for matching behaviors of entities operating on networks embedded in some geographic context (e.g., road networks) under different types of uncertainty. Our method uses a decay function that allows network topology and attribute information associated with that topology (geographic or otherwise) to guide generalizations of the activity patterns and model learning process. This allows the system to recognize when two routes within a network are similar, even when those routes share little explicit path information. We demonstrate this method's robust ability to distinguish between fundamentally different behaviors, even when data is both incomplete and subject to noise. The results show good performance when matching behaviors on different sized and attributed synthetic networks, as well as on a real-world road network; it examines situations in which observed entity behavior is noisy, as well as situations in which observed behaviors differ from learned models as a result of systemic noise in the underlying network. Finally, our approach provides a robust method of detecting anomalous activity patterns on the network.
The impact of the recent increase in usage of routing apps on road traffic remains uncertain to this day. For the first time, the article introduces a criterion to evaluate a distance between an observed state of traffic and the user equilibrium of the traffic assignment: the average marginal regret. The average marginal regret provides a quantitative measure of the impact of routing apps on traffic using only link flows, link travel times and travel demand. In non-atomic routing games (or static traffic assignment models), the average marginal regret is a measure of how selfish drivers behave. Unlike the price of anarchy, the average marginal regret in the routing game can be compute in polynomial time without any knowledge on user equilibria and social optimal states of traffic. First, on a small example, the article demonstrates that the average marginal regret is more appropriate than comparing flows, travel times, or total cost (like in literature) to define proximity between an observed state of traffic and an user equilibrium state of traffic. Then, the article provides simulation-based assessment of the impact of routing apps on traffic using a routing game approach and models we develop for app usage. Experiments on two different models of app usage and three networks (including the whole LA networks with more than 50,000 nodes) demonstrates that the average marginal regret decreases with an increase of app usage. Finally, using a toy example, the article concludes that app usage can be the new Braess paradox.
Public spaces serve an important role in contemporary cities, providing a medium for passive and active engagement and a place for discovery and reflection. A central challenge for public space design is to evaluate whether a given space indeed promotes the desired activities. In this paper, we contribute by developing crowd replication as a novel sensor-assisted method for quantifying human behavior within public spaces. The idea in crowd replication is to task a researcher with replicating the behavior of people using a space while being instrumented with a mobile device that captures a sensor trace of the replicated behaviors. Through mathematical modeling, behavioral indicators extracted from the replicated trajectories can be extrapolated to represent a larger population. We develop the overall method, and propose a novel pedestrian sensing solution for capturing high-quality information about human behavior within public spaces. We validate our approach through a case study carried out within a representative example of a metropolitan-scale public space. Our results show that crowd-replicated data closely mirrors the real human dynamics in public spaces and provides a useful building block for human-scale analytics. We also demonstrate that crowd replication can reduce overall data collection efforts while producing high quality and unbiased data about the behaviors and activities of people within the space. Finally, we demonstrate that our pedestrian sensing solution enables crowd replication to cost-effectively capture fine-grained and detailed indicators of liveliness, extent of social interaction, and other factors congruent with the common understandings about the qualities of public spaces.
At the present time, there is no mechanism for Air Navigation Service Providers to probe new flight plans filed by the Airlines Operation Centers against the existing approved flight plans to see if they are likely to cause conflicts or bring sector traffic densities beyond control. In the current Air Traffic Control operations, aircraft conflicts and sector traffic densities are resolved tactically, increasing workload and leading to potential safety risks and loss of capacity and efficiency. We propose a novel Data-Driven Framework to address a long-range aircraft conflict detection and resolution (CDR) problem. Given a set of predicted trajectories, the framework declares a conflict when a protected zone of an aircraft on its trajectory is infringed upon by another aircraft. The framework resolves the conflict by prescribing an alternative solution that is optimized by perturbing at least one of the trajectories involved in the conflict. To achieve this, the framework learns from descriptive patterns of historical trajectories and pertinent weather observations and builds a Hidden Markov Model. Using a variant of the Viterbi algorithm, the framework avoids the airspace volume in which the conflict is detected and generates a new optimal trajectory. The key concept upon which the framework is built is the assumption that the airspace is nothing more than a horizontally and vertically concatenated set of spatio-temporal data cubes where each cube is considered as an atomic unit. We evaluate our framework using real trajectory datasets with pertinent weather observations from two continents and demonstrate its effectiveness.
We present LOOM (Line-Ordering Optimized Maps), a fully automatic generator of geographically accurate transit maps. The input to LOOM is data about the lines of a given transit network, namely for each line, the sequence of stations it serves and the geographical course the vehicles of this line take. We parse this data from GTFS, the prevailing standard for public transit data. LOOM proceeds in four stages: (1) construct a so-called line graph, where edges correspond to segments of the network with the same set of lines following the same course; (2) compute an optimal partial ordering of the lines by a process we call graph untangling; (3) construct an ILP that yields a line ordering for each edge which minimizes the total number of line crossings and line separations; (4) based on the line graph and the computed line ordering, draw the map. Since our maps respect the geography of the transit network, they can be used as overlays in typical map services. Previous research work either did not take the geographical course of the lines into account, or was concerned with schematic maps without optimizing line crossings or separations. We evaluate LOOM on six real-world transit networks, with line-ordering search-space sizes up to 2*10^267. Using graph untangling and a sophisticated ILP formulation, we can compute optimal line orderings in a fraction of a second for all networks. This enables real-time updates and interactive use in map editors. We also compare our automatically computed maps to manually designed maps.
Understanding crowd mobility in a metropolitan area is extremely valuable for city planners and decision makers. However, crowd mobility is a relatively new area of research and has significant technical challenges: lack of large-scale fine-grained data, difficulties in large-scale trajectory processing, and issues with spatial resolution. In this paper, we propose a novel approach for analyzing crowd mobility on a ``city block'' level. We first propose algorithms to detect homes, working places and stay regions for individual user trajectories. Next, we propose a method for analyzing commute patterns and spatial correlation at a city block level. Using mobile cellular accessing trace data collected from users in Shanghai, we discover commute patterns, spatial correlation rules as well as a hidden structure of the city based on crowd mobility analysis. Therefore, our proposed methods contribute to our understanding of human mobility in a large metropolitan area.
We examine the problem of computing the shortest paths in a transportation network from a set of geographically clustered source nodes to a set of target nodes. Such many-to-many shortest path computations are an essential and compute-intensive part of many emerging applications that involve map matching of imprecise spatial trajectories. Existing solutions to this problem mostly conform to the obvious approach of performing a single-source shortest path computation for each source node. We present an alternative approach that exploits the clustered nature of the source nodes. Specifically, we rely on the observation that paths originating from a cluster of nodes typically exit the source regions boundary through a small number of nodes. Evaluations on a real road network show that the proposed solution is several times faster than the conventional approach when the source nodes are densely clustered. We also demonstrate that the presented technique is capable of substantially accelerating the map matching of coarse-grained location trajectories.
The control of large scale mobility-on-demand systems is an emerging topic that has been considered from a system theoretical, transportation scientific and algorithmic point of view. Existing formulations model mobility-on-demand systems in a queuing theoretical, network flow based or continuous, kinematic framework. In this work, we model a mobility-on-demand system as a stochastic differential equation which represents a generalization of previous approaches. Based on the model, we define system imbalance as the difference of the stochastic processes of service request arrival and vehicle arrival. We formally derive the first moment of the system imbalance for an imbalance control strategy that consists of a feedforward control approach (reference trajectory) and an additional feedback component. A distributed feedback control policy is defined that averages the imbalance across the system and therefore aims at a uniform quality of service distribution. Finally, we verify our results in a high-fidelity and large-scale agent-based simulation of a hypothetical mobility-on-demand system.
An important problem in terrain analysis is modeling how water flows across a terrain and creates floods by filling up depressions. In this paper we study a number of flood-risk related problems: Given a terrain ?, with n vertices, a rain distribution R and a volume of rain ? , determine which portions of ? are flooded. We develop efficient algorithms for flood-risk analysis under the multiflow-direction (MFD) model, in which water at a point can flow along multiple downslope edges and which more accurately represent flooding events. We present three main results: First, we present an O(n log n)-time algorithm to answer a terrain-flood query: if it rains a volume ? according to a rain distribution R, determine what regions of ? will be flooded. Second, we present a algorithm for preprocessing ? containing m sinks into a data structure of size O(nm) for answering point-flood queries: given a volume ?, distribution R, and point q ? ?, determine whether q will be flooded. A point-flood query can be answered in O(|R|k + k^2 ) time, where k is the number of maximal depressions in ? containing q and |R| is the number of vertices in R with positive rain fall. Finally, we present algorithms for answering a flood-time query: given a rain distribution R and a point q ? ?, determine the volume of rain that must fall before q is flooded. We implemented our terrain-flooding algorithm and tested its efficacy and efficiency on real terrains.
Thousands of roads change with the city's development every day. Detecting the road closures and traffic rules changes is essential for dynamic route planning and navigation serving. In this paper, we propose a driving behavior modeling based method for accurately and effectively detecting the road anomalies. In the first step, we detect the areas of anomalies by using the deviation between drivers? actual and expected behaviors. To discover the cause of anomalies, we explore the drivers? short-term destination and find the crucial link pairs in anomalous areas through a novel optimized link entanglements () search algorithm, namely, Select LES (SELES) algorithm. Finally, we analyze the crowd?s driving patterns to explain the road network anomalies further. Experiments on a very large GPS dataset demonstrate that the proposed approach outperforms the existing methods in term of both accuracy and effectiveness.
Effective processing of extremely large volumes of spatial data has led to many organizations employing distributed processing frameworks. Apache Spark is one such open-source framework that is enjoying widespread adoption. Within this data space, it is important to note that most of the observational data has a temporal component, or timestamp. In order to perform advanced analytics and gain insights, the temporal component becomes equally important as the spatial and attribute components. In this paper, we detail several variants of a spatial join operation that addresses both spatial, temporal, and attribute-based joins. Our spatial join technique differs from other approaches in that it combines spatial, temporal, and attribute predicates in the join operator. In addition,our spatio-temporal join algorithm and implementation differs from others in that it runs in commercial off-the-shelf (COTS) application. The users of this functionality are assumed to be GIS analysts with little if any knowledge of the implementation details of spatio-temporal joins or distributed processing. They are comfortable using simple tools that do not provide the ability to tweak the configuration of the algorithm or processing environment. The spatio-temporal join algorithm behind the tool must always succeed, regardless of input data parameters (e.g., it can be highly irregularly distributed, contain large numbers of coincident points, it can be extremely large, etc.). These factors combine to place additional requirements on the algorithm that are uncommonly found in the traditional research environment. Our spatio-temporal join algorithm was shipped as part of the GeoAnalytics Server.
Spatiotemporal streams are prone to data quality issues such as missing, duplicated and delayed data---when data generating sensors malfunction, data transmissions experience problems, or when data are processed improperly. However, many important real-time applications rely on the continuous availability of stream values, e.g., to monitor traffic flow, resource usage, weather phenomena, etc. Other non real-time applications that support continuous or offline historical analytics also require high quality data to avoid producing misleading output such as false positives, erroneous conclusions and decisions. In this paper, we study the problem of smoothing streams produced by an overlay of sensors. We present nonparametric (data-driven, distribution free) statistical methods to provide an uninterrupted stream of high-quality spatiotemporal data to real-time applications, even when the raw stream suffers data quality issues, like noise or missing values. Our novel family of robust methods computes most likely values (MLVs) that could be used as proxies for data of questionable quality. The methods make use of a partition of the monitored area into cells to compute MLVs based on historical data and the deviation from normalcy in neighboring spatial cells, in a way that outperforms standard regression or interpolation. Our methods use incremental computation for efficiency, and they differ in how the deviations are normalized, e.g., with respect to zeroth order, first order and second order moments. We use three real data sets to run a suite of experiments, and empirically demonstrate the superiority of the method that uses normalization with respect to variability.
In this paper, we introduce generalized group trip scheduling (GGTS) queries that enable friends and families to perform activities at different points of interests (POIs) like a shopping center, a restaurant and a pharmacy with the minimum total travel distance. Trip planning and scheduling for groups, an important class of location-based services (LBSs), have recently received attention from the researchers. However, both group trip planning (GTP) and group trip scheduling (GTS) queries have restrictions: a GTP query assumes that all group members visit all required POIs together whereas a GTS query requires that each POI is visited by a single group member. A GGTS query is more general and allows any number of group members to visit a POI together. We propose an efficient algorithm to evaluate the exact answers for GGTS queries in road networks. Since finding the answer for a GGTS query is a NP-hard problem, to reduce the processing overhead for a large group size or a large number of required POI types or a large POI dataset, we propose two heuristic solutions, trip scheduling heuristic (TSH) and search space refinement heuristic (SRH), for processing GGTS queries. Extensive experiments with real datasets show that our optimal algorithm is preferable for small parameter settings and the heuristic solutions reduce the processing overhead significantly in return of sacrificing the accuracy slightly.
Spatial regression models are widely used in numerous areas, including detecting and predicting traffic volume, air pollution, and housing prices. Unlike conventional regression models, which commonly assume independent and identically distributions among observations, spatial regression affords spatial dependency among the observations in different spatial locations. Such spatial dependency is typically predefined or heuristically established, but is not fine-tuned to the context of a specific application, thus is sub-optimal for the prediction task. Until now, little attention has been paid to automatically optimizing spatial dependency in relation to a prediction task. Several serious challenges are: 1) necessity and complexity of modeling the spatial topological constraints; 2) incomplete prior spatial knowledge; and 3) difficulty in optimizing under spatial topological constraints. To address these challenges, this paper proposes a novel convex framework that jointly learns the prediction mapping and spatial dependency based on spatial topological constraints. There are two different scenarios to be addressed. First, when the spatial connectivity is known a priori (e.g., via contiguity or network), we propose the first model named Spatial-Autoregressive Dependency Learning I (SADL-I) to learn the strength of connectivity. However, when such connections are unknown or incomplete, our second model named Spatial-Autoregressive Dependency Learning II (SADL-II) is proposed to automatically learn the spatial connectivity and dependency based on spatial topological constraints. The proposed models are convex and iteratively optimized by our new effective algorithms until convergence to a globally optimal solution. Extensive experimentation using several real-world datasets demonstrates the outstanding performance of the proposed models.
Location data from mobile devices is a sensitive yet valuable commodity for location-based services and advertising. We investigate the intrinsic value of location data in the context of strong privacy, where location information is only available from end users via purchase. We present an algorithm to compute the expected value of location data from a user, without access to the specific coordinates of the location data point. We use decision-theoretic techniques to provide a principled way for a potential buyer to make purchasing decisions about private user location data. We illustrate our approach in three scenarios: the delivery of targeted ads specific to a user?s home location, the estimation of traffic speed, and location prediction. In all three cases, the methodology leads to quantifiably better purchasing decisions than competing methods.
There are many real world applications that require identifying crowd movements such as identifying movement corridors in cities and most popular paths. If one is not given user trajectories but rather sporadic location data, such as location based social network data, finding movement related information becomes difficult. Rather than processing all points in a data set given a query, a clever approach is to construct a graph, based on user locations, and query this graph for answer questions such as shortest paths, most popular paths and movement corridors. A popular graph is the shortest path graph. However the shortest path graph can be inefficient and ineffective for analysing movement data, as it calculates the graph edges considering the shortest paths over all the points in a data set. Therefore, edge sets resulting from shortest path graphs are usually very restrictive and not suitable for movement analysis because of its global view of the data set. We propose the stepping stone graph, which calculates the graph considering point pairs rather than all points, that focuses on possible local movements, making it both efficient and effective for location based social network related data. We demonstrate its uses by applying it in the Location Based Social Network domain and comparing with the shortest path graph. We also compare its properties to a range of other graphs and demonstrate how stepping stone graph relates to Gabriel graph, relative neighbourhood graph and Delaunay triangulation.
We developed an algorithm for automatically detecting road closures by monitoring vehicle probe data. The algorithm applies to a large class of roads and in the implementation presented was optimized for lower-volume roads. It is suitable for batch as well as real-time applications, the latter class being the most valuable in order to guarantee a continuously up-to-date traffic product. The algorithm compares the likelihood that every road segment meeting certain requirements is closed or open, and triggers an alert whenever the likelihood of the observed probe activity is too small given a historical model. We implemented the algorithm, and tested it on 12 metro areas in Western Europe. After optimizing parameters for performance on lower-volume roads, we obtained a precision of 92% on those roads, and of 80% overall.
Knowledge discovery from GPS trajectory data is an important topic in several areas including data mining, human behavior analysis, and user modeling. This paper proposes a task that assigns personalized visited-POIs. Its goal is to estimate fine-grained and pre-defined locations (i.e., points of interest (POI)) that a user actually visits and assign the visited location information to the corresponding span of the users (personal) GPS trajectories. We also introduce a novel algorithm to solve this assignment task. First, we exhaustively extract stay points as candidates for significant locations using a variant of a conventional stay-point extraction method. Then we select true significant locations and simultaneously assign visited-POIs to them by considering various aspects, which we formulate in integer linear programming. Experimental results conducted on an actual user dataset show that our method achieves higher accuracy in the visited-POI assignment task than the various cascaded procedures of conventional methods.