I am a Researcher in the Adaptive Systems and Interaction group at MSR AI. My research work lies in the intersection of human and machine intelligence. I am currently excited about two main directions in this realm: Human-AI Collaboration for enhancing human capabilities while solving complex tasks, as well as Troubleshooting and Failure Analysis for AI\ML systems for improving and accelerating the software development lifecycle of intelligent systems. Moreover, I am also involved in various research initiatives and projects that study the societal impact of artificial intelligence as well as various quality-of-service aspects of AI including interpretability, transparency, accountability, and fairness. This is a recent research podcast on my current interests.

If you are a PhD student looking for an internship position around these topics send me an email. The Adaptive Systems and Interaction group is a fun bunch of excellent and diverse researchers.

Prior to joining MSR AI, in 2016 I completed my PhD degree at ETH Zurich (Switzerland) in the Systems Group, mentored by Prof. Donald Kossmann and Prof. Andreas Krause. My doctoral thesis focuses on building cost and quality-aware models for integrating crowdsourcing in the process of building machine learning algorithms and systems. In 2011, I completed my master studies in computer science in a double-degree MSc program at RWTH University of Aachen (Germany) and University of Trento (Italy) as an Erasmus Mundus scholar. I also have a Diploma in Informatics from University of Tirana (Albania) from where I graduated in 2007.

Coming soon - Our paper titled SQuINTing at VQA Models: Interrogating VQA Models with Sub-Questions was accepted at CVPR 2020. Ramprasaath R. Selvaraju lead this work during his internship at MSR bringing to life important ideas on enabling VQA model reasoning through enforcing sub-question consistency.

Coming soon - New paper on Characterizing Search-Engine Traffic to Internet Research Agency Web Properties accepted at WebConf 2020. The work lead by Alex Spangher presents a thorough analysis on the impact of IRA-related posts and ads on web search activity.

February 2020 - Our group presented a new paper on Metareasoning in Modular Software Systems at AAAI 2020. Work lead by Aditya Modi and Debadeepta Dey on optimizing integrative AI systems on-the-fly, using reinforcement learning with rich contextual representations. Read our blog post summarizing the work and vision.

February 2020 - Together with Dan Weld, Adam Fourney, and Saleema Amershi, we organized a tutorial on Guidelines for Human-AI Interaction at AAAI 2020 in New York, February 8th 2020.

October 2019 - Our group presented two papers at HCOMP 2019.
1. Beyond Accuracy: The Role of Mental Models in Human-AI Team Performance - Lead by Gagan Bansal. The work studies properties of machine learning models that make them better collaborators.

2. What You See Is What You Get? The Impact of Representation Criteria on Human Bias in Hiring - Lead by Andi Peng as part of her residency project. The work makes a step forward in understanding how representation critera affects human decision making for hiring.

May 2019 - The Error Terrain Analysis tool received the best demo award at the DebugML workshop organized in ICLR 2019. This is ongoing work with great collaborators at Microsoft: Rick Barraza, Russell Eames, Yan Esteve Balducci, Josh Hinds, Scott Hoogerwerf, Eric Horvitz, Ece Kamar, Jacquelyn Krones, Josh Lovejoy, Parham Mohadjer, and Ben Noah.

May 2019 - Our paper on Software Engineering for Machine Learning received the best paper award in the Sofware Engineering in Practice track at ICSE 2019.

May 2019 - Our paper on Guidelines for Human-AI Interaction received a best paper honorable mention award at CHI 2019.

What are the properties of a good AI collaborator? How to optimize ML models for collaboration?

Machine learning models are currently optimized to maximize the model performance on given benchmarks and test datasets. When a learning model is being used by a human to either accomplish a complex task or to take a high-stake decision (e.g. medical diagnosis or recidivism), team performance does not only depend on model performance but also on how well do humans understand when to trust the AI or not so that they can learn when to override its decisions. In this project, we instead aim at optimizing for joint human-model performance.

AI-advised Human Decision Making
AI-advised Human Decision Making

Our first steps towards this goal have been to study how models should be updated so that they do not violate previous trust that users might have built during their interaction over time. By incorporating in the loss function the backward compatibility to the previous model, and therefore to the previous potential user experience, we minimize update disruption for the whole team.
Updates in a human-AI team
Human-AI teams undergoing a model update

Within the same context, we recently also studied further properties of the machine learning error boundary (i.e. when does the model make an error?) such as parsimony and stochasticisty to understand their impact in human decision making. The parsimony of an error boundary expresses how simple it is to express when the model errs or succeeds (e.g. How many feature rules are needed?). Stochasticity instead expresses how clean that description would be. Both of them are of course related to the learnability of the error boundary as a function of the data representation from a human perspective. Based on our study,more parsimonious and less stochastic error boundaries are easier to learn, which opens up a new opportunity in machine learning optimization for training and deploying models that are easier to work with. To facilitate studies in this field we developed the CAJA platform, which supports parameterized user studies.
Collaborators: Gagan Bansal (University of Washington), Ece Kamar (Microsoft Research), Dan Weld (University of Washington), Walter Lasecki (University of Michigan), Eric Horvitz (Microsoft Research)

How can we better understand failures of an AI system?

Building reliable AI requires a deep understanding of the potential system failures. The focus of this project is to build tools that can help engineers to accelerate development and improvement cycles by assisting them in debugging and troubleshooting. For example, Pandora is a set of hybrid human-machine methods and tools for describing and explaining system failures. It provides descriptive performance reports to engineers correlating input conditions with errors, guiding them towards discovering hidden conditions of failure.

Based on Pandora, we recently built the Error Terrain Analysis Tool as a collaboration between Microsoft Research AI and Microsoft Cognition. The vision of this collaboration is to build tools that help engineers accelerate the development iterations by identifying errors faster, systematically, and rigorously.

Pandora error analysis workflow
Pandora workflow for error analysis

In the same vein, this project has also explored ideas on enabling troubleshooting techniques with humans in the loop. Diagnosing and fixing a complex AI system is a challenging task. Often, errors get propagated, suppressed or even amplified down the computation pipelines. We propose a troubleshooting methodology that generates counterfactual improved states of system components by using crowd intelligence. These states, which would have been too expensive or infeasible to generate otherwise, are then integrated in the system execution to create insights about which component fixes are the most efficient given the current system architecture.
Human in the loop troubleshooting
Troubleshooting Integrative AI systems with humans in the loop

Collaborators: Ece Kamar (Microsoft Research), Lydia Manikonda (Arizona State University), Eric Horvitz (Microsoft Research), Donald Kossmann

Quality assurance is one the most important challenges in crowdsourcing. Assigning tasks to several workers to increase quality through redundant answers can be expensive if asking homogeneous sources. In this project, we look at various crowd access optimization techniques that can be applied either while building training models with crowdsourced data or while applying such models to make crowdsourced predictions.

In the context of crowdsourced predictions, our work argues that optimization needs to be aware of diversity and correlation of information within groups of individuals so that crowdsourcing redundancy can be adequately planned beforehand. Based on this intuitive idea, we introduce the Access Path Model (APM), a novel crowd model that leverages the notion of access paths as an alternative way of retrieving information. The access path configuration can be based on various criteria depending on the task: (i) workers’ demographics (e.g. profession, group of interest, age) (ii) the source of information or the tool that is used to find the answer (e.g. phone call vs. web page, Bing vs. Google) (iii) task design (e.g. time of completion, user interface) (iv) task decomposition (e.g. part of the answers, features). APM aggregates answers ensuring high quality and meaningful confidence. Moreover, we devise a greedy optimization algorithm for this model that finds a provably good approximate plan to access the crowd.

Access Path Model
The Access Path Model applied on Medical Questions and Answers

In addition, we have devised the B-LEAFS algorithm for building machine learning models with crowdsourced input features under budget constraints. The main challenge we addressed with this algorithm is related to the natural exploration and exploitation trade-offs in crowdsourcing between noisy redundancy features and the number of observed examples.
Collaborators: Anja Gruenheid, Adish Singla, Erfan Zamanian, Andreas Krause, Donald Kossmann

The goal of this project is to develop a set of novel techniques that allow to integrate human resources into a database system in order to process some of the impossible queries that Google and Oracle cannot answer today and address some of the notoriously hard database research problems in a very different way as has been done in the past. Specifically, CrowdDB extends a relational database system and is able to process both conventional and crowdsourced data. For this purpose, we have designed and implemented various algorithms for quality management and query processing. Moreover, the project has been focused on implementing crowdsourced query operators for entity resolution, joins, comparisons, and sorting.

CrowdDB Architecture
CrowdDB Architecture

Collaborators: Anja Gruenheid, Donald Kossmann, Lynn Aders, Erfan Zamanian


SQuINTing at VQA Models: Interrogating VQA Models with Sub-Questions. Ramprasaath R. Selvaraju, Purva Tendulkar, Devi Parikh, Eric Horvitz, Marco Ribeiro, Besmira Nushi, Ece Kamar; CVPR 2020. pdf

Characterizing Search-Engine Traffic toInternet Research Agency Web Properties. Alexander Spangher, Gireeja Ranade, Besmira Nushi, Adam Fourney, Eric Horvitz; WebConf 2020. pdf

Metareasoning in Modular Software Systems: On-the-Fly Configuration using Reinforcement Learning with Rich Contextual Representations. Aditya Modi, Debadeepta Dey, Alekh Agarwal, Adith Swaminathan, Besmira Nushi, Sean Andrist, Eric Horvitz; AAAI 2020. pdf

Beyond Accuracy: The Role of Mental Models in Human-AI Team Performance. Gagan Bansal, Besmira Nushi, Ece Kamar, Daniel S Weld, Walter S Lasecki, Eric Horvitz; HCOMP 2019. pdf

What You See Is What You Get? The Impact of Representation Criteria on Human Bias in Hiring. Andi Peng, Besmira Nushi, Emre Kiciman, Kori Inkpen, Siddharth Suri, Ece Kamar; HCOMP 2019. pdf

Software Engineering for Machine Learning: A Case Study. Saleema Amershi, Andrew Begel, Christian Bird, Rob DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, Thomas Zimmermann; ICSE 2019 . pdf

Guidelines for Human-AI Interaction. Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, Eric Horvitz; CHI 2019. pdf

Updates in Human-AI Teams: Understanding and Addressing the Performance/Compatibility Tradeoff. Gagan Bansal, Besmira Nushi, Ece Kamar, Daniel S Weld, Walter S Lasecki, Eric Horvitz; AAAI 2019. pdf

Overcoming Blind Spots in the RealWorld: Leveraging Complementary Abilities for Joint Execution. Ramya Ramakrishnan, Ece Kamar, Besmira Nushi, Debadeepta Dey, Julie Shah, Eric Horvitz; AAAI 2019. pdf

Towards Accountable AI: Hybrid Human-Machine Analyses for Characterizing System Failure. Besmira Nushi, Ece Kamar, Eric Horvitz; HCOMP 2018. pdf

Analysis of Strategy and Spread of Russia-sponsored Content in the US in 2017. Alexander Spangher, Gireeja Ranade, Besmira Nushi, Adam Fourney, Eric Horvitz; arXiv 2018. pdf

On Human Intellect and Machine Failures: Troubleshooting Integrative Machine Learning Systems. Besmira Nushi, Ece Kamar, Eric Horvitz, Donald Kossmann; AAAI 2017. pdf

Quality Control and Optimization for Hybrid Crowd-Machine Learning Systems. Besmira Nushi; ETH PhD Thesis 2016. pdf

Learning and Feature Selection under Budget Constraints in Crowdsourcing. Besmira Nushi, Adish Singla, Andreas Krause, Donald Kossmann; HCOMP 2016. pdf

Fault-Tolerant Entity Resolution with the Crowd. Anja Gruenheid, Besmira Nushi, Tim Kraska, Wolfgang Gatterbauer, Donald Kossmann; arXiv 2016. full technical report

Crowd Access Path Optimization: Diversity Matters. Besmira Nushi, Adish Singla, Anja Gruenheid, Erfan Zamanian, Andreas Krause, Donald Kossmann; HCOMP 2015. pdf

CrowdSTAR: A Social Task Routing Framework for Online Communities. Besmira Nushi, Omar Alonso, Martin Hentschel, and Vasileios Kandylas; ICWE 2015. pdf full technical report

When is A = B? Anja Gruenheid, Donald Kossmann, Besmira Nushi, Yuri Gurevich; EATCS Bulletin 111 (2013) pdf

Uncertain time-series similarity: Return to the basics. Michele Dallachiesa, Besmira Nushi, Katsiaryna Mirylenka, and Themis Palpanas; Proceedings of the VLDB Endowment 5, no. 11 (2012): 1662-1673. pdf

Similarity matching for uncertain time series: analytical and experimental comparison. Michele Dallachiesa, Besmira Nushi, Katsiaryna Mirylenka, and Themis Palpanas. Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Querying and Mining Uncertain Spatio-Temporal Data, pp. 8-15. ACM, 2011. pdf

Microsoft Research AI (MSR AI) is a new organization that brings together the breadth of talent across Microsoft Research to pursue game-changing advances in artificial intelligence. The new research and development initiative combines advances in machine learning with innovations in language and dialog, human computer interaction, and computer vision to solve some of the toughest challenges in AI. A key focus for this initiative is to probe the foundational principles of intelligence, including efforts to unravel the mysteries of human intellect, and use this knowledge to develop a more general, flexible artificial intelligence. MSR AI pursues use of machine intelligence in new ways to empower people and organizations, including systems that deliver new experiences and capabilities that help people be more efficient, engaged and productive.


Microsoft building 99 (3121)
14820 NE 36th St, Redmond, WA 98052, USA