Web Mining on Political Homoplily in the Congressional Twitter Community

Abstract:

Homophily is the social phenomenon that people who are similar in some aspect interact at a significantly higher rate. We conducted web mining based on Twitter's application programming interfaces to investigate political homophily in the congressional Twitter community. We examined the structures of public Twitter communications among active members of the US congress. Our findings showed strong evidence of homophily with respect to party affiliation and a significant higher rate of communication for members in the same party. In an entirely distinct differentiation, we discovered moderate evidence of inverse homophily with respect to seniority and a significant higher rate of communication between members at different seniority levels.

Managing Routing Information for Optimal Vehicle Refueling in Transportation Networks

Abstract:

Managing routing information is an important aspect of point-to-point delivery by motor carriers. In this paper, we describe the data structures and algorithms needed for implementing a system that can effectively maintain the routing and refueling information dynamically to minimize the fuel cost of point-to-point delivery by motor vehicles. Given a transportation network of n vertices, the system can maintain and update the critical routing information using quadratic space of n and cubic time of n. Given any starting point and an initial fuel level, the system can then on the fly determine an optimal refueling plan in quadratic time of n based on the established routing information.

Multi-Objective Constrained Vehicle Refueling Planning: Complexity and Polynomial-Time Approximation Schemes

Abstract:

For point-to-point direct delivery over the transportation network, timely delivery of commodity and reduction of total fuel cost are both important objectives to consider. Since fuel prices can vary significantly over a broad region, often there is a tradeoff between fuel cost and travel time. A short path may not be economical in terms of fuel cost while routing through regions with lower fuel prices may take more time. In this paper, we consider multi-objective constrained vehicle refueling planning in two settings: (i) minimizing the fuel cost given an upper bound on travel time or (ii) minimizing travel time given an upper bound on the fuel cost. We prove that the computational task is NP-Complete even when the fuel price is fixed or when the amount of fuel consumption and the travel time are linearly dependent. We then show fully polynomial-time approximation schemes for refueling planning in these two situations.

Multi-objective Vehicle Refueling Planning Using Mixed Integer Programming

Abstract:

For point-to-point direct delivery over the transportation network, timely delivery of commodity and reduction of total fuel cost are both important objectives to consider. Since fuel prices can vary significantly over a broad region, often there is a tradeoff between fuel cost and travel time. A short path may not be economical in terms of fuel cost while routing through areas with lower fuel prices may take more time. In this paper, we address multi-objective refueling optimization problems in the context of two priority models regarding fuel cost and travel time. Unlike the shortest path problem, optimal refueling paths may not be simple paths, which complicates the setup of mixed integer programs. We first start with arbitrage-free vehicle refueling planning that restricts refueling paths to simple paths in the network only. We then show how we can augment the mixed integer formulation for vehicle refueling planning without the arbitrage-free assumption.

Vehicle Refueling Planning for Point-to-Point Delivery by Motor Carriers

Abstract:

Managing the fuel cost is a critical task for the transportation industry. For point-to-point delivery of commodity over the transportation network, the motor carrier needs a vehicle refueling plan that specifies (i) the path from the starting point to the destination, (ii) the intermediate stopping points for refueling along the path, and (iii) the amount of fuel to refill at each stopping point. Using the shortest path through the network may not end in the best result since fuel prices along the path may be more expensive than those along a longer path. An optimal vehicle refueling plan needs to under various operational constraints. We show that we can search for optimal vehicle refueling plans minimizing the total fuel cost by examining paths satisfying an extremal transition property and depict the resulting time complexity under several operational constraints.

Data Mining for Student Retention Management

ABSTRACT

We conduct a data mining project to generate predictive models for student retention management on campus. Given new records of incoming students, these predictive models can produce short accurate prediction lists identifying students who tend to need the support from the student retention program most. The project is a component in our artificial intelligence class. Students in the class get involved in the entire process of modeling and problem solving using machine learning algorithms. We examine the quality of the predictive models generated by the machine learning algorithms. The results show that some of the machine learning algorithms are able to establish effective predictive models from the existing student retention data.

Coordinating Time-Constrained Multi-Agent Resource Sharing with Fault Detection

Abstract:

Sharing common resources in a distributed multi-agent environment requires coordination to avoid faulty system states. The statuses of resources such as personnel, equipments, and environmental factors at a point in time determine the system state at that time. When an agent takes an action at any time point within a scheduled time interval, it becomes a state-transition event occurring at that time. For each event, the underlying state transition relation can be compactly encoded as causal rules, which describe how statuses of resources and environmental factors may change in different ways based on preconditions before the event occurs. The central coordinator needs to check in advance whether any of the possible event sequences consistent with a proposed schedule may end in faulty system states. This fault detection task is NP-complete even for a polynomial-size state space in general. In this paper, we investigate the computational complexity of the fault detection task when agents fairly constrain the maximal length of time intervals. We develop a decomposition algorithm to divide the fault detection task over all events into subtasks involving subsets of the events with overlapping time intervals. For each subtask, only a subspace with reduced dimensionality is involved instead of the whole original state space. When the maximal length of time intervals is constrained below a fair threshold, we prove that with probability approaching one as the size of the problem instance grows the algorithm can accomplish the fault detection task in polynomial time even if the original underlying state space is exponential in size.

Checking Multi-agent Schedules with Temporal and Causal Information

Abstract:

Time management in a distributed multi-agent environment requires agents to progressively collaborate and negotiate before reaching a final feasible schedule of future events. In this process, it is important for individual agents to check whether a prototype schedule can meet their requirements and are free from undesirable effects in all possible event sequences. We present a modeling framework for encoding events with causal and temporal information. We show that the schedule validation task under this framework is NP-complete when the uncertainty in event ordering is very high. We develop a search algorithm for reasoning about possible consequences over a given set of events and show that the algorithm can effectively improve the computation efficiency by exploiting event-chain structure embedded in the time-interval information, which ends in tractable polynomial-time performance for events with moderate uncertainty in event ordering.

Data Mining for Managing Stock Keeping Units

Abstract:

Stock keeping units (SKUs) are compact identifiers representing billable products in the inventory for sale. Merchants often assign SKUs to similar products by transforming the text descriptions of the products following some implicit SKU encoding scheme. In the transformation process, the text description of a product is divided into character blocks, some blocks are skipped, and the remaining are abbreviated and reassembled into the SKU in a new order. In this paper, we depict a data-mining framework for (i) automatically extracting SKU encoding schemes from the stock keeping units and text descriptions of existing products, (ii) suggesting a list of most likely SKUs given the text description of a new product, and (iii) inferring a list of most likely text descriptions given the SKU of a product with missing text description. These are important and challenging tasks originated from the operational needs of e-commerce application domains where SKU encoding schemes are implicit and not well documented. We have built a prototype system for conducting tests on real-world data, and the empirical results confirm the effectiveness of the approach.

Finding Optimal Refueling Policies in Transportation Networks

Abstract:

We study the combinatorial properties of optimal refueling policies, which specify the transportation paths and the refueling operations along the paths to minimize the total transportation costs between vertices. The insight into the structure of optimal refueling policies leads to an elegant reduction of the problem of finding optimal refueling policies into the classical shortest path problem, which ends in simple and more efficient algorithms for finding all-pairs optimal refueling policies in transportation networks.

Finding Optimal Refueling Policies: A Dynamic Programming Approach

Abstract:

Given the initial fuel level and the required final fuel level, a valid vehicle refueling policy specifies a path in a transportation network and the refueling decisions for passing through the vertices on the path to reach a destination vertex from a starting vertex in the network while maintaining the fuel level between the upper and lower limits of fuel level allowed in the entire process. A valid refueling policy is optimal if it minimizes the total fuel cost to reach the destination vertex from the starting vertex. We show that finding optimal refueling policies is NP-hard in general. We devise a pseudo-polynomial-time dynamic-programming algorithm for finding optimal refueling policies. This algorithm is similar to the Floyd-Warshall algorithm for solving the shortest path problem. However, unlike the shortest path problem, the path used by an optimal refueling policy may contain loops and we incorporate loops in the dynamic-programming computation.

A Linear-Time Algorithm for Finding Optimal Vehicle Refueling Policies

Abstract:

We explore a fixed-route vehicle refueling problem as a special case of the inventory-capacitated lot-sizing problem, and present a linear-time greedy algorithm for finding optimal refueling policies.

Reasoning about Discrete Event Sources

Abstract:

We investigate the modelling of workflows, plans, and other event-generating processes as discrete event sources and reason about the possibility of having event sequences ending in undesirable states. In previous research, the problem is shown to be NP-Complete even if the number of events to occur is fixed in advance. In this paper, we consider possible events sequences of indefinite length and show that many interesting cases of such reasoning task are solvable in polynomial time.

An Empirical Exploration of Hidden Markov Models: From Spelling Recognition to Speech Recognition

Abstract:

Hidden Markov models play a critical role in the modelling and problem solving of important AI tasks such as speech recognition and natural language processing. However, the students often have difficulty in understanding the essence and applications of Hidden Markov models in the context of a cursory introductory coverage of the subject. In this paper, we describe an empirical approach to explore the subject of the Hidden Markov models. This approach focuses on a series of incremental developments of Hidden Markov models for automatic spelling recognition. The process of programming and experiments with these models cultivates the actual modelling and problem-solving capacity, and guides the students to a better understanding of the application of similar Hidden Markov models used in speech recognition.