Causal Reinforcement Learning


When expert observational data contain unobserved confounding variables, it is well known that the causal effects of these confounders cannot be estimated without bias regardless of the sample size. In this research, we rely on rigorous principles of causal inference to develop new machine learning methods to enable learner agents to learn optimal control policies from expert data in the presence of unobserved confounders. Applications include robotics and healthcare. Research supported by NSF and AFOSR.

Selected Publications:

Y. Shen and M. M. Zavlanos, "Risk-Averse Multi-Armed Bandits with Unobserved Confounders," under review.

C. Liu, Y. Zhang, Y. Shen, and M. M. Zavlanos, "Learning without Knowing: Unobserved Context in Continuous Transfer Reinforcement Learning," Proc. 3rd Conference on Learning for Dynamics and Control (L4DC), ser. Proc. of Machine Learning Research, vol. 144, pp. 791-802, Jun. 2021.

Distributed Optimization and Learning


Distributed optimization algorithms are iterative methods that allow to decompose an optimization or learning problem into smaller, more manageable subproblems that are solved in parallel by a group of agents or processors. In this research, we develop new distributed optimization and learning algorithms that can handle uncertainty and non-stationary environments, and analyze their convergence and complexity properties. Applications include robotics and healthcare. Research supported by NSF and AFOSR.

Selected Publications:


Y. Zhang and M. M. Zavlanos, "Cooperative Multi-Agent Reinforcement Learning with Partial Observations," under review. 


Y. Zhang, R. Ravier, V. Tarokh, and M. M. Zavlanos, "Distributed Online Convex Optimization with Improved Dynamic Regret," under review.

Black-Box Optimization and Learning


Zeroth-order (or derivative-free) optimization methods enable the optimization of black-box models that are available only in the form of input-output data and are common in simulation-based optimization, training of Deep Neural Networks, and reinforcement learning. In the absence of input-output models, exact first or second order information (gradient or hessian) is unavailable and cannot be used for optimization. Therefore, zeroth-order methods rely on input-output data to obtain approximations of the gradients that can be used as descent directions. In this research, we develop new zeroth-order algorithms for distributed and non-stationary optimization and learning problems with reduced variance and improved complexity. Applications include robotics and healthcare. Research supported by NSF and AFOSR.

Selected Publications:


Y. Zhang, Y. Zhou, K. Ji, and M. M. Zavlanos, "Boosting One-Point Derivative-Free Online Optimization via Residual Feedback," under review. 


Y. Zhang, Y. Zhou, K. Ji, and M. M. Zavlanos, "A New One-Point Residual-Feedback Oracle for Black-Box Learning and Control," Automatica, accepted.

Safe Learning for Control


Recent progress in the field of machine learning has given rise to a new family of neural network controllers for robotic systems that can significantly simplify the overall design process. As such control schemes become more common in real-world applications, the ability to train neural networks with safety considerations becomes a necessity. In this research, we develop new safe learning methods for robot navigation problems and study the tradeoff between data density, computational complexity, and safety guarantees. Research supported by AFOSR.

Selected Publications:


P. Vlantis, Yijie Zhou, Yan Zhang, and M. M. Zavlanos, "Failing with Grace: Learning Neural Network Controllers that are Boundedly Unsafe," under review.


S. Sun, Y. Zhang, X. Luo, P. Vlantis, M. Pajic, and M. M. Zavlanos, "Formal Verification of Stochastic Systems with ReLU Neural Network Controllers," under review.

Optimal Control Synthesis for High-Level Robot Tasks


The basic motion planning problem consists of generating robot trajectories that reach a given goal region from an initial configuration while avoiding obstacles. More recently, a new class of planning approaches have been developed that can handle a richer class of tasks, than the classical point-to-point navigation, and can capture temporal and boolean requirements. Such tasks can be, e.g., sequencing or coverage, data gathering, intermittent communication, or persistent surveillance, and can be captured using formal languages, such as Linear Temporal Logic (LTL). In this research, we develop optimal control synthesis methods that scale to large numbers of robots and can handle known or unknown uncertainty in the workspace properties, the robot actions, and the task outcomes. Research supported by AFOSR and ONR.

Selected Publications:


X. Luo and M. M. Zavlanos, "Temporal Logic Task Allocation in Heterogeneous Multi-Robot Systems," under review.


Y. Kantaros and M. M. Zavlanos, "STyLuS*: A Temporal Logic Optimal Control Synthesis Algorithm for Large-Scale Multi-Robot Systems," International Journal of Robotics Research, vol. 39, no. 7, pp. 812-836, Jun. 2020.