Q: Safety Affordances for Reinforcement Learning Agents

Contact: A/Prof. Richard Dazeley Agents interacting with the environment may be subjected to potential risks. In reinforcement learning, an alternative is to use a model of the environment. The model is something that imitates or mimics the behaviour of the real environment. Models are used to plan since the action to perform can be decided considering possible future situations before they have actually occurred. One way to model the environment is by the use of contextual affordances where cognitive agents favour specific actions to be performed with specific objects.In this project, we will use real-world and simulated robots interacting with its environment. By setting up a reinforcement learning task, the agent will use contextual affordances, an extension of the classic Gibson's affordance concept, to establish safety rules which anticipate potential risks.

Q: AI Apology: Beyond Explainable Reinforcement Learning

Contact: A/Prof. Richard Dazeley Two of the primary aims in explainable artificial intelligence (XAI) is to improve trust and understanding in human users. However, most XAI approaches focus on providing understanding of an AI’s decision in the hope that it will improve trust. Trust though also requires the agent to show it understands and considers people’s needs. This requires the agent to be able to both illustrate empathy and to be able to alter future behaviour to match that understanding.The aim of this project is to utilise knowledge engineering and emotion detection approaches to model external actor’s desires. Using Multiobjective Reinforcement Learning these desires can be used as a constraint on an agent’s primary objective allowing it to optimise its performance while minimising its impact on other actors. When an outcome of the agent’s behaviour results in a policy that exceeds the user’s model the agent can generate an apology and add new rules to the constraint model to ensure future compliance.

Q: AI for Health Economics

Contact: Dr Wei Luo Australia runs a $13.2-billion-per-annum Pharmaceutical Benefits Scheme (PBS). With new interventions emerging every year, the government has to decide which interventions are cost-effective and should be covered by PBS. Such decision is difficult to make and relies on high-quality evidence. This project aims to develop innovative AI-based approaches for accurate and reliable evidence-collection for health-economic policy decision-making.This project will i) assess the feasibility and potential of existing AI techniques in predicting health-economic measures; ii) develop new AI methods to improve the current utility mapping practice; and iii) apply the new methods on a number of chronic diseases/conditions (e.g. cancer, mental illness, pulmonary diseases).

Q: Computer Vision for 3D Scene Understanding

Contact: Dr Duc Thanh Nguyen Scene understanding is a fundamental topic in Computer Vision with a wide spectrum of applications in many research fields such as robotics and virtual reality. The project will develop novel Computer Vision and Machine Learning models to address the current challenges in 3D scene understanding from large-scale and real-world data. In the project, contemporary computational models and technologies in Computer Vision and Machine Learning such as mobile-based real-time 3D reconstruction, big data processing, and deep learning will be advanced. The outcomes of the project will be applied in real-time navigative robots and mobile-based virtual reality systems.This project aims to develop Computer Vision and Machine Learning models to solve the following problems High-quality and real-time 3D reconstruction 3D-2D reasoning Semantic scene segmentation 3D object recognition Scene modelling

Q: Video Surveillance for Workplace Safety

Contact: Dr Wei Luo Work-related accidents can lead to significant financial cost, injuries or even deaths. Many employers are turning to computer vision solutions to track and ensure safety compliance. This project will develop new video surveillance technologies for oil and gas production facilities.The project will involve the following three aims: 1) Modeling the behavioural patterns of work-related incidents; 2) Developing new computer vision models for inferring human behavioural intention, and 3) Developing a video surveillance system to provide early warning for risk-inducing activities.

Q: Explainable Intentionality with Multiobjective Reinforcement Learning

Contact: A/Prof. Richard Dazeley Researchers have long understood that an AI-based system’s ability to explain its decision is critical to human acceptance, understanding and trust. Recently, with the growth of machine learning based systems there has been a significant increase in work in this domain. Explaining the behaviour of Goal-Driven agents however is currently mostly limited to local decisions rather than explaining the intentionality and temporal nature of the decision. Intentionality, however, is limited in single objective domains. Reward decomposition can provide some degree of justification provision around action preferences but is limited due to the correlation of reward signals.In this project we will use a multiobjective framework to extract action preferences that allow a comparison of the possible actions against each of the objectives. This will be combined with our approach for transition probability prediction to explain to the user that the selected behaviour increases the opportunity of achieving a particular objective over other actions that lead to alternative and unwanted objectives. For instance, this will allow us to provide the explanation: I did X instead of Y because X will still allow (with some probability) me to achieve my primary objective but is unlikely (with some probability) of causing Z (some undesirable outcome) . The natural extension to this will provide both counterfactual and contrastive explanations and this study will show to what degree human users can develop a mental model learnt from such explanations, allowing them to accurately predict the agent’s behaviour in future environments.

Question 1

Safety Affordances for Reinforcement Learning Agents

Accepted Answer

Contact: A/Prof. Richard DazeleyAgents interacting with the environment may be subjected to potential risks. In reinforcement learning, an alternative is to use a model of the environment. The model is something that imitates or mimics the behaviour of the real environment. Models are used to plan since the action to perform can be decided considering possible future situations before they have actually occurred. One way to model the environment is by the use of contextual affordances where cognitive agents favour specific actions to be performed with specific objects.In this project, we will use real-world and simulated robots interacting with its environment. By setting up a reinforcement learning task, the agent will use contextual affordances, an extension of the classic Gibson's affordance concept, to establish safety rules which anticipate potential risks.

Question 2

AI Apology: Beyond Explainable Reinforcement Learning

Accepted Answer

Contact: A/Prof. Richard DazeleyTwo of the primary aims in explainable artificial intelligence (XAI) is to improve trust and understanding in human users. However, most XAI approaches focus on providing understanding of an AI’s decision in the hope that it will improve trust. Trust though also requires the agent to show it understands and considers people’s needs. This requires the agent to be able to both illustrate empathy and to be able to alter future behaviour to match that understanding.The aim of this project is to utilise knowledge engineering and emotion detection approaches to model external actor’s desires. Using Multiobjective Reinforcement Learning these desires can be used as a constraint on an agent’s primary objective allowing it to optimise its performance while minimising its impact on other actors. When an outcome of the agent’s behaviour results in a policy that exceeds the user’s model the agent can generate an apology and add new rules to the constraint model to ensure future compliance.

Question 3

Impact minimisation in dynamic environments

Accepted Answer

Contact: A/Prof. Richard DazeleyImagine a robot is tasked with removing rubbish from a room while. The robot will receive a reward based on how much rubbish there is in the room. The less rubbish the greater the reward. In this simple task the robot will learn to remove all items classed as rubbish. However, what if someone enters the room and drops new rubbish. The agent will now receive a negative reward because of that person’s actions. The robot, wishing to maximise reward, may learn, not only to remove rubbish, but also to prevent people from dropping rubbish in the first place.In this project you will investigate approaches to Multiple Object Tracking (MOT) approaches to identify external agent interactions with objects and use this information to dynamically identify responsibility for environmental changes. This information will be incorporated into a multiobjective reinforcement learning using impact minimisation to dynamically correct the impact potential function.

Question 4

Explainability through Fuzzy Reinforcement Learning

Accepted Answer

Contact: A/Prof. Richard DazeleyReinforcement Learning (RL) observes an environment and determines an action that leads the agent towards its goal. The majority or Deep RL systems use raw state input such as a video image, audio, or other continuous data to represent state information. However, when an RL agent is expected to explain its behaviour it must rely on methods such as saliency maps to identify features of relevance, which have been to been shown to provide a poor communication technique for explanations. Furthermore, providing explanations of the agent’s longer-term intentionality is even more difficult to articulate. One unique alternative approach, called Programmatically Interpretable RL (PIRL) approach raises the possibility of representing state through a programable structure and using this for generating basic explainable functionality.In this project we will build upon the idea in PIRL by developing an approach where represents the environmental state through an abstracted model using fuzzy rules where the fuzzy sets are generated through environment interactions. These fuzzy rules will provide a structured ontology for interpreting the state, while the rule inference process will provide the agent’s intentionality-based reasoning. The combination of these components will both allow an agent to apply traditional RL learning while also being capable of providing both perception and goal-driven explanations of its behaviour. This project will be conducted in two stages. The first stage will develop a Fuzzy RL framework capable of learning in traditionally deep learning environments. This agent will then be used to generate explanations of that improves human observers’ mental model of its behaviour. This will involve a quantitative and qualitative study of people predicting the agent’s future behaviour after being trained on past cases.

Question 5

Narrative generation in complex group decision making

Accepted Answer

Contact: A/Prof. Richard DazeleyModelling and representing decisions made by a team under complex environment is very challenging. Some examples include decisions made by health professionals in hospitals and army commanders in battle fields. These decisions can be critical and have to be made quickly considering the current context and information available. Understanding the rationale/story behind decisions made previously is important to understand the context for future decision making. Knowledge representation, argumentation modelling and narrative theory can be useful to generate narratives for such complex decisions.This PhD project aims to develop a framework to generate narratives for complex decisions made by a team. The plan is to extend the idea of Generic/Actual Argument Modelling (GAAM) to create a story for a decision made. In GAAM, knowledge is represented as a tree structure called ‘Generic Argument Structure’ (GAS). It captures context variables, relevant data with reasons for relevance, inference with reasons and claims. Each argument is an instantiation of GAS. The two layered abstractions separating generic and actual argumentation provides flexibility to consider different opinions. Because GAAM captures context, data (setting) and claim (resolution), it can be used to generate a story based on the narrative theory. The research may be based on case studies from domains such as defence, health and/or law. A candidate with some background in knowledge engineering, programming, text mining and machine learning will be preferred.

Question 6

AI for Health Economics

Accepted Answer

Contact: Dr Wei LuoAustralia runs a $13.2-billion-per-annum Pharmaceutical Benefits Scheme (PBS). With new interventions emerging every year, the government has to decide which interventions are cost-effective and should be covered by PBS. Such decision is difficult to make and relies on high-quality evidence. This project aims to develop innovative AI-based approaches for accurate and reliable evidence-collection for health-economic policy decision-making.This project will i) assess the feasibility and potential of existing AI techniques in predicting health-economic measures; ii) develop new AI methods to improve the current utility mapping practice; and iii) apply the new methods on a number of chronic diseases/conditions (e.g. cancer, mental illness, pulmonary diseases).

Question 7

Computer Vision for 3D Scene Understanding

Accepted Answer

Contact: Dr Duc Thanh NguyenScene understanding is a fundamental topic in Computer Vision with a wide spectrum of applications in many research fields such as robotics and virtual reality. The project will develop novel Computer Vision and Machine Learning models to address the current challenges in 3D scene understanding from large-scale and real-world data. In the project, contemporary computational models and technologies in Computer Vision and Machine Learning such as mobile-based real-time 3D reconstruction, big data processing, and deep learning will be advanced. The outcomes of the project will be applied in real-time navigative robots and mobile-based virtual reality systems.This project aims to develop Computer Vision and Machine Learning models to solve the following problems

High-quality and real-time 3D reconstruction
3D-2D reasoning
Semantic scene segmentation
3D object recognition
Scene modelling

Question 8

Video Surveillance for Workplace Safety

Accepted Answer

Contact: Dr Wei LuoWork-related accidents can lead to significant financial cost, injuries or even deaths. Many employers are turning to computer vision solutions to track and ensure safety compliance. This project will develop new video surveillance technologies for oil and gas production facilities.The project will involve the following three aims: 1) Modeling the behavioural patterns of work-related incidents; 2) Developing new computer vision models for inferring human behavioural intention, and 3) Developing a video surveillance system to provide early warning for risk-inducing activities.

Question 9

Explainable Intentionality with Multiobjective Reinforcement Learning

Accepted Answer

Contact: A/Prof. Richard DazeleyResearchers have long understood that an AI-based system’s ability to explain its decision is critical to human acceptance, understanding and trust. Recently, with the growth of machine learning based systems there has been a significant increase in work in this domain. Explaining the behaviour of Goal-Driven agents however is currently mostly limited to local decisions rather than explaining the intentionality and temporal nature of the decision. Intentionality, however, is limited in single objective domains. Reward decomposition can provide some degree of justification provision around action preferences but is limited due to the correlation of reward signals.In this project we will use a multiobjective framework to extract action preferences that allow a comparison of the possible actions against each of the objectives. This will be combined with our approach for transition probability prediction to explain to the user that the selected behaviour increases the opportunity of achieving a particular objective over other actions that lead to alternative and unwanted objectives. For instance, this will allow us to provide the explanation:I did X instead of Y because X will still allow (with some probability) me to achieve my primary objective but is unlikely (with some probability) of causing Z (some undesirable outcome). The natural extension to this will provide both counterfactual and contrastive explanations and this study will show to what degree human users can develop a mental model learnt from such explanations, allowing them to accurately predict the agent’s behaviour in future environments.