ARDC Research Link Australia

ORCID Profile
Orcid icon. 0000-0002-8089-9962

Current Organisation
Deakin University

Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.

Publications

Publication

GEFA: Early Fusion Approach in Drug-Target Affinity Prediction

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 03-2022

DOI: 10.1109/TCBB.2021.3094217

Publication

An efficient camera-based surveillance for fall detection of elderly people

Publisher: IEEE

Date: 06-2014

DOI: 10.1109/ICIEA.2014.6931308

Publication

From Deep Learning to Deep Reasoning

Publisher: ACM

Date: 14-08-2021

DOI: 10.1145/3447548.3470803

Publication

Dynamic Language Binding in Relational Visual Reasoning

Publisher: International Joint Conferences on Artificial Intelligence Organization

Date: 07-2020

DOI: 10.24963/IJCAI.2020/114

Abstract: We present Language-binding Object Graph Network, the first neural reasoning method with dynamic relational structures across both visual and textual domains with applications in visual question answering. Relaxing the common assumption made by current models that the object predicates pre-exist and stay static, passive to the reasoning process, we propose that these dynamic predicates expand across the domain borders to include pair-wise visual-linguistic object binding. In our method, these contextualized object links are actively found within each recurrent reasoning step without relying on external predicative priors. These dynamic structures reflect the conditional dual-domain object dependency given the evolving context of the reasoning through co-attention. Such discovered dynamic graphs facilitate multi-step knowledge combination and refinements that iteratively deduce the compact representation of the final answer. The effectiveness of this model is demonstrated on image question answering demonstrating favorable performance on major VQA datasets. Our method outperforms other methods in sophisticated question-answering tasks wherein multiple object relations are involved. The graph structure effectively assists the progress of training, and therefore the network learns efficiently compared to other reasoning models.

Publication

Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering

Publisher: International Joint Conferences on Artificial Intelligence Organization

Date: 08-2021

DOI: 10.24963/IJCAI.2021/88

Abstract: Video Question Answering (Video QA) is a powerful testbed to develop new AI capabilities. This task necessitates learning to reason about objects, relations, and events across visual and linguistic domains in space-time. High-level reasoning demands lifting from associative visual pattern recognition to symbol like manipulation over objects, their behavior and interactions. Toward reaching this goal we propose an object-oriented reasoning approach in that video is abstracted as a dynamic stream of interacting objects. At each stage of the video event flow, these objects interact with each other, and their interactions are reasoned about with respect to the query and under the overall context of a video. This mechanism is materialized into a family of general-purpose neural units and their multi-level architecture called Hierarchical Object-oriented Spatio-Temporal Reasoning (HOSTR) networks. This neural model maintains the objects' consistent lifelines in the form of a hierarchically nested spatio-temporal graph. Within this graph, the dynamic interactive object-oriented representations are built up along the video sequence, hierarchically abstracted in a bottom-up manner, and converge toward the key information for the correct answer. The method is evaluated on multiple major Video QA datasets and establishes new state-of-the-arts in these tasks. Analysis into the model's behavior indicates that object-oriented reasoning is a reliable, interpretable and efficient approach to Video QA.

Publication

Neural Reasoning, Fast and Slow, for Video Question Answering

Publisher: IEEE

Date: 07-2020

DOI: 10.1109/IJCNN48605.2020.9207580

Publication

Hierarchical Conditional Relation Networks for Multimodal Video Question Answering

Publisher: Springer Science and Business Media LLC

Date: 27-08-2021

DOI: 10.1007/S11263-021-01514-3

Publication

Hierarchical Conditional Relation Networks for Video Question Answering

Publisher: IEEE

Date: 06-2020

DOI: 10.1109/CVPR42600.2020.00999

Publication

Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances

Publisher: International Joint Conferences on Artificial Intelligence Organization

Date: 07-2018

DOI: 10.24963/IJCAI.2018/214

Abstract: With the widespread use of intelligent systems, such as smart speakers, addressee recognition has become a concern in human-computer interaction, as more and more people expect such systems to understand complicated social scenes, including those outdoors, in cafeterias, and hospitals. Because previous studies typically focused only on pre-specified tasks with limited conversational situations such as controlling smart homes, we created a mock dataset called Addressee Recognition in Visual Scenes with Utterances (ARVSU) that contains a vast body of image variations in visual scenes with an annotated utterance and a corresponding addressee for each scenario. We also propose a multi-modal deep-learning-based model that takes different human cues, specifically eye gazes and transcripts of an utterance corpus, into account to predict the conversational addressee from a specific speaker's view in various real-life conversational scenarios. To the best of our knowledge, we are the first to introduce an end-to-end deep learning model that combines vision and transcripts of utterance for addressee recognition. As a result, our study suggests that future addressee recognition can reach the ability to understand human intention in many social situations previously unexplored, and our modality dataset is a first step in promoting research in this field.

Related Organisations

Organisation

Deakin University

Location: Australia

View Organisation

Organisation

Tokyo Institute Of Technology

Location: Japan

View Organisation

Organisation

Hanoi University Of Science And Technology

Location: Viet Nam

View Organisation

Related Funding Activities

No related grants have been discovered for Thao Minh Le.

Thao Minh Le

Researcher

Related Links

Publications

GEFA: Early Fusion Approach in Drug-Target Affinity Prediction

An efficient camera-based surveillance for fall detection of elderly people

From Deep Learning to Deep Reasoning

Dynamic Language Binding in Relational Visual Reasoning

Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering

Neural Reasoning, Fast and Slow, for Video Question Answering

Hierarchical Conditional Relation Networks for Multimodal Video Question Answering

Hierarchical Conditional Relation Networks for Video Question Answering

Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances

Related Organisations

Deakin University

Tokyo Institute Of Technology

Hanoi University Of Science And Technology

Related Funding Activities

ARDC NEWSLETTER SIGNUP