ORCID Profile
0000-0002-0512-880X
Current Organisation
Zhejiang University
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
In Research Link Australia (RLA), "Research Topics" refer to ANZSRC FOR and SEO codes. These topics are either sourced from ANZSRC FOR and SEO codes listed in researchers' related grants or generated by a large language model (LLM) based on their publications.
Pattern Recognition and Data Mining | Artificial Intelligence and Image Processing | Database Management | Computer Vision | Artificial Intelligence and Image Processing not elsewhere classified | Information Systems | Neural, Evolutionary and Fuzzy Computation | Multimedia Programming
Information Processing Services (incl. Data Entry and Capture) | Electronic Information Storage and Retrieval Services | Film and Video Services (excl. Animation and Computer Generated Imagery) | Media Services not elsewhere classified | Application Tools and System Utilities | Expanding Knowledge in the Information and Computing Sciences | Health Policy Evaluation |
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 12-2010
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 08-2012
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 12-2019
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 08-2017
Publisher: Association for Computing Machinery (ACM)
Date: 05-07-2020
DOI: 10.1145/3390891
Abstract: In Visual Dialog, an agent has to parse temporal context in the dialog history and spatial context in the image to hold a meaningful dialog with humans. For ex le, to answer “what is the man on her left wearing?” the agent needs to (1) analyze the temporal context in the dialog history to infer who is being referred to as “her,” (2) parse the image to attend “her,” and (3) uncover the spatial context to shift the attention to “her left” and check the apparel of the man. In this article, we use a dialog network to memorize the temporal context and an attention processor to parse the spatial context. Since the question and the image are usually very complex, which makes it difficult for the question to be grounded with a single glimpse, the attention processor attends to the image multiple times to better collect visual information. In the Visual Dialog task, the generative decoder (G) is trained under the word-by-word paradigm, which suffers from the lack of sentence-level training. We propose to reinforce G at the sentence level using the discriminative model (D), which aims to select the right answer from a few candidates, to ameliorate the problem. Experimental results on the VisDial dataset demonstrate the effectiveness of our approach.
Publisher: ACM
Date: 04-2014
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
Publisher: Springer Science and Business Media LLC
Date: 15-06-2018
Publisher: Springer Science and Business Media LLC
Date: 03-02-2007
Publisher: ACM
Date: 26-10-2008
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2021
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 06-2023
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 02-2014
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2023
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2021
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2023
Publisher: IEEE
Date: 05-2012
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 02-2008
Publisher: IEEE
Date: 06-2015
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2016
Publisher: IEEE
Date: 07-2017
Publisher: Elsevier BV
Date: 02-2020
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 12-2018
Publisher: IEEE
Date: 06-2013
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 02-2021
Publisher: Springer Science and Business Media LLC
Date: 20-12-2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 09-2019
Publisher: IEEE
Date: 07-2017
Publisher: ACM
Date: 28-11-2011
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 10-2023
Publisher: Elsevier BV
Date: 05-2013
Publisher: IEEE
Date: 06-2014
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
Publisher: Elsevier BV
Date: 03-2011
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 12-2018
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 04-2018
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2019
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 03-2020
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 02-2014
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 12-2013
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 12-2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 10-2016
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 09-2018
Publisher: ACM
Date: 19-10-2009
Publisher: IEEE
Date: 07-2017
Publisher: Elsevier BV
Date: 06-2010
Publisher: Elsevier BV
Date: 2016
Publisher: ACM
Date: 28-11-2011
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 12-2013
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 12-2016
Publisher: IEEE
Date: 06-2013
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 09-2014
Publisher: Association for Computing Machinery (ACM)
Date: 29-10-2018
DOI: 10.1145/3230709
Abstract: Learning from very few s les is a challenge for machine learning tasks, such as text and image classification. Performance of such task can be enhanced via transfer of helpful knowledge from related domains, which is referred to as transfer learning. In previous transfer learning works, instance transfer learning algorithms mostly focus on selecting the source domain instances similar to the target domain instances for transfer. However, the selected instances usually do not directly contribute to the learning performance in the target domain. Hypothesis transfer learning algorithms focus on the model arameter level transfer. They treat the source hypotheses as well-trained and transfer their knowledge in terms of parameters to learn the target hypothesis. Such algorithms directly optimize the target hypothesis by the observable performance improvements. However, they fail to consider the problem that instances that contribute to the source hypotheses may be harmful for the target hypothesis, as instance transfer learning analyzed. To relieve the aforementioned problems, we propose a novel transfer learning algorithm, which follows an analogical strategy. Particularly, the proposed algorithm first learns a revised source hypothesis with only instances contributing to the target hypothesis. Then, the proposed algorithm transfers both the revised source hypothesis and the target hypothesis (only trained with a few s les) to learn an analogical hypothesis. We denote our algorithm as Analogical Transfer Learning. Extensive experiments on one synthetic dataset and three real-world benchmark datasets demonstrate the superior performance of the proposed algorithm.
Publisher: Association for Computing Machinery (ACM)
Date: 13-12-2017
DOI: 10.1145/3152116
Abstract: Cloud-assisted video streaming has emerged as a new paradigm to optimize multimedia content distribution over the Internet. This article investigates the problem of streaming cloud-assisted real-time video to multiple destinations (e.g., cloud video conferencing, multi-player cloud gaming, etc.) over lossy communication networks. The user ersity and network dynamics result in the delay differences among multiple destinations. This research proposes underline D /underline ifferentiated cloud- underline A /underline ssisted underline VI /underline deo underline S /underline treaming (DAVIS) framework, which proactively leverages such delay differences in video coding and transmission optimization. First, we analytically formulate the optimization problem of joint coding and transmission to maximize received video quality. Second, we develop a quality optimization framework that integrates the video representation selection and FEC (Forward Error Correction) packet interleaving. The proposed DAVIS is able to effectively perform differentiated quality optimization for multiple destinations by taking advantage of the delay differences in cloud-assisted video streaming system. We conduct the performance evaluation through extensive experiments with the Amazon EC2 instances and Exata emulation platform. Evaluation results show that DAVIS outperforms the reference cloud-assisted streaming solutions in video quality and delay performance.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2023
Publisher: ACM
Date: 28-11-2011
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 06-2014
Publisher: Springer International Publishing
Date: 2018
Publisher: MIT Press - Journals
Date: 04-2017
DOI: 10.1162/NECO_A_00937
Abstract: Robust principal component analysis (PCA) is one of the most important dimension-reduction techniques for handling high-dimensional data with outliers. However, most of the existing robust PCA presupposes that the mean of the data is zero and incorrectly utilizes the average of data as the optimal mean of robust PCA. In fact, this assumption holds only for the squared [Formula: see text]-norm-based traditional PCA. In this letter, we equivalently reformulate the objective of conventional PCA and learn the optimal projection directions by maximizing the sum of projected difference between each pair of instances based on [Formula: see text]-norm. The proposed method is robust to outliers and also invariant to rotation. More important, the reformulated objective not only automatically avoids the calculation of optimal mean and makes the assumption of centered data unnecessary, but also theoretically connects to the minimization of reconstruction error. To solve the proposed nonsmooth problem, we exploit an efficient optimization algorithm to soften the contributions from outliers by reweighting each data point iteratively. We theoretically analyze the convergence and computational complexity of the proposed algorithm. Extensive experimental results on several benchmark data sets illustrate the effectiveness and superiority of the proposed method.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2021
Publisher: IEEE
Date: 04-2013
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 03-2021
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 10-2010
Publisher: ACM
Date: 26-10-2023
Publisher: Springer International Publishing
Date: 2015
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 05-2017
Publisher: ACM
Date: 19-10-2009
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 10-2019
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 02-2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 04-2013
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
Publisher: Association for Computing Machinery (ACM)
Date: 25-02-2023
DOI: 10.1145/3569584
Abstract: It is crucial to s le a small portion of relevant frames for efficient video classification. The existing methods mainly develop hand-designed s ling strategies or learn sequential selection policies. However, there are two challenges to be solved. First, hand-designed s ling strategies are intrinsically non-adaptive to different video backbones. Second, sequential frame selection policies ignore temporal relations among all video frames. The sequential selection process also hinders the application of these video s lers in speed-critical systems. In this article, we propose a differentiable parallel video s ling network (PSN) to tackle the aforementioned challenges, First, we optimize the video s ler with a differentiable surrogate loss, allowing to dynamically learn the s ler with the cooperation from the video classification model. Our s ler considers the feedback from all frames jointly, eliminating the learning difficulties of sequential decision making. The learning process is fully gradient-based, making the s ler be learned efficiently. Our video s ler can assess a set of frames swiftly and determine the importance of each frame in parallel. Second, we propose to model the inter-relation among contextual frames, which encourages the s ler to select frames based on a comprehensive inspection of the entire video. We observe that a simple context relation mining instantiation would significantly improve the classification performance. The experimental results on three standard video recognition benchmarks demonstrate the efficacy and efficiency of our framework.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 07-2019
Publisher: IEEE
Date: 06-2015
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 11-2020
Publisher: IEEE
Date: 10-2017
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 02-2023
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 12-2013
Publisher: Elsevier BV
Date: 07-2014
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 12-2016
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 10-2020
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2020
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 07-2018
Publisher: Elsevier BV
Date: 03-2021
Publisher: IEEE
Date: 12-2013
Publisher: ACM
Date: 28-11-2011
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2021
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 02-2015
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 02-2020
Publisher: IEEE
Date: 12-2013
Publisher: Springer Science and Business Media LLC
Date: 09-07-2013
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
Publisher: IEEE
Date: 05-2017
Publisher: Inderscience Publishers
Date: 2010
Publisher: Springer Science and Business Media LLC
Date: 13-07-2017
Publisher: Elsevier BV
Date: 08-2013
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2023
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
Publisher: Elsevier BV
Date: 2013
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 10-2019
Publisher: Springer Science and Business Media LLC
Date: 2015
Publisher: Association for Computing Machinery (ACM)
Date: 20-07-2016
DOI: 10.1145/2910585
Abstract: Principal component analysis (PCA) has been widely applied to dimensionality reduction and data pre-processing for different applications in engineering, biology, social science, and the like. Classical PCA and its variants seek for linear projections of the original variables to obtain the low-dimensional feature representations with maximal variance. One limitation is that it is difficult to interpret the results of PCA. Besides, the classical PCA is vulnerable to certain noisy data. In this paper, we propose a Convex Sparse Principal Component Analysis (CSPCA) algorithm and apply it to feature learning. First, we show that PCA can be formulated as a low-rank regression optimization problem. Based on the discussion, the l 2, 1 -normminimization is incorporated into the objective function to make the regression coefficients sparse, thereby robust to the outliers. Also, based on the sparse model used in CSPCA, an optimal weight is assigned to each of the original feature, which in turn provides the output with good interpretability. With the output of our CSPCA, we can effectively analyze the importance of each feature under the PCA criteria. Our new objective function is convex, and we propose an iterative algorithm to optimize it. We apply the CSPCA algorithm to feature selection and conduct extensive experiments on seven benchmark datasets. Experimental results demonstrate that the proposed algorithm outperforms state-of-the-art unsupervised feature selection algorithms.
Publisher: Springer Science and Business Media LLC
Date: 19-03-2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 07-2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 05-2017
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 06-2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 07-2017
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2021
Publisher: Association for Computing Machinery (ACM)
Date: 13-12-2017
DOI: 10.1145/3159171
Abstract: In this article, we revisit two popular convolutional neural networks in person re-identification (re-ID): verification and identification models. The two models have their respective advantages and limitations due to different loss functions. Here, we shed light on how to combine the two models to learn more discriminative pedestrian descriptors. Specifically, we propose a Siamese network that simultaneously computes the identification loss and verification loss. Given a pair of training images, the network predicts the identities of the two input images and whether they belong to the same identity. Our network learns a discriminative embedding and a similarity measurement at the same time, thus taking full usage of the re-ID annotations. Our method can be easily applied on different pretrained networks. Albeit simple, the learned embedding improves the state-of-the-art performance on two public person re-ID benchmarks. Further, we show that our architecture can also be applied to image retrieval. The code is available at ayumi/2016_person_re-ID.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 07-2016
Publisher: Elsevier BV
Date: 10-2012
Publisher: Springer Berlin Heidelberg
Date: 2006
Publisher: ACM
Date: 19-10-2017
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 09-2022
Publisher: ACM
Date: 21-10-2013
Publisher: Elsevier BV
Date: 08-2010
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 03-2019
Publisher: Springer Nature Switzerland
Date: 2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2020
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 12-2015
Publisher: Springer International Publishing
Date: 2018
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 11-2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2023
Publisher: ACM
Date: 22-06-2015
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2021
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 09-2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 08-2014
Publisher: Springer Science and Business Media LLC
Date: 23-06-2015
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 06-2019
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 04-2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 11-2011
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 08-2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 06-2015
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 05-2018
Publisher: ACM
Date: 05-06-2012
Publisher: ACM
Date: 04-08-2017
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2023
Publisher: Springer Berlin Heidelberg
Date: 2005
DOI: 10.1007/11581772_87
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 02-2023
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 06-2023
Publisher: IEEE
Date: 12-2013
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2023
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2021
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 03-2018
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
Publisher: Springer International Publishing
Date: 2018
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 10-2014
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 09-2014
Publisher: IEEE
Date: 07-2014
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2021
Publisher: IEEE
Date: 06-2014
DOI: 10.1109/CVPR.2014.20
Publisher: Association for Computing Machinery (ACM)
Date: 28-02-2021
DOI: 10.1145/3418214
Abstract: Quick response (QR) codes are usually scanned in different environments, so they must be robust to variations in illumination, scale, coverage, and camera angles. Aesthetic QR codes improve the visual quality, but subtle changes in their appearance may cause scanning failure. In this article, a new method to generate scanning-robust aesthetic QR codes is proposed, which is based on a module-based scanning probability estimation model that can effectively balance the tradeoff between visual quality and scanning robustness. Our method locally adjusts the luminance of each module by estimating the probability of successful s ling. The approach adopts the hierarchical, coarse-to-fine strategy to enhance the visual quality of aesthetic QR codes, which sequentially generate the following three codes: a binary aesthetic QR code, a grayscale aesthetic QR code, and the final color aesthetic QR code. Our approach also can be used to create QR codes with different visual styles by adjusting some initialization parameters. User surveys and decoding experiments were adopted for evaluating our method compared with state-of-the-art algorithms, which indicates that the proposed approach has excellent performance in terms of both visual quality and scanning robustness.
Publisher: ACM
Date: 29-10-2012
Publisher: IEEE
Date: 06-2011
Publisher: ACM
Date: 26-10-2023
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 09-2020
Publisher: ACM
Date: 22-06-2015
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 07-2014
DOI: 10.1109/MMUL.2014.43
Publisher: ACM
Date: 26-10-2023
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 08-2020
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
Publisher: IEEE
Date: 07-2014
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
Publisher: Institute of Image Information and Television Engineers
Date: 2016
DOI: 10.3169/MTA.4.227
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 10-2012
Publisher: Public Library of Science (PLoS)
Date: 25-02-2021
DOI: 10.1371/JOURNAL.PBIO.3001091
Abstract: The recent emergence of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), the underlying cause of Coronavirus Disease 2019 (COVID-19), has led to a worldwide pandemic causing substantial morbidity, mortality, and economic devastation. In response, many laboratories have redirected attention to SARS-CoV-2, meaning there is an urgent need for tools that can be used in laboratories unaccustomed to working with coronaviruses. Here we report a range of tools for SARS-CoV-2 research. First, we describe a facile single plasmid SARS-CoV-2 reverse genetics system that is simple to genetically manipulate and can be used to rescue infectious virus through transient transfection (without in vitro transcription or additional expression plasmids). The rescue system is accompanied by our panel of SARS-CoV-2 antibodies (against nearly every viral protein), SARS-CoV-2 clinical isolates, and SARS-CoV-2 permissive cell lines, which are all openly available to the scientific community. Using these tools, we demonstrate here that the controversial ORF10 protein is expressed in infected cells. Furthermore, we show that the promising repurposed antiviral activity of apilimod is dependent on TMPRSS2 expression. Altogether, our SARS-CoV-2 toolkit, which can be directly accessed via our website at mrcppu-covid.bio/ , constitutes a resource with considerable potential to advance COVID-19 vaccine design, drug testing, and discovery science.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 11-2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
Publisher: ACM
Date: 13-10-2015
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2023
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 03-2019
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2021
Publisher: Springer International Publishing
Date: 2018
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 05-2020
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 04-2018
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 02-2012
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 04-2012
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 08-2013
Publisher: Springer New York
Date: 03-08-2012
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 07-2014
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 07-2017
Publisher: ACM
Date: 26-10-2023
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 09-2014
DOI: 10.1109/TKDE.2013.65
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 05-2018
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2014
Publisher: Springer Science and Business Media LLC
Date: 02-03-2016
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 11-2013
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 07-2020
Publisher: Springer Science and Business Media LLC
Date: 16-04-2014
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2021
Publisher: ACM
Date: 26-10-2023
Publisher: IEEE
Date: 10-2017
DOI: 10.1109/ICCV.2017.86
Publisher: ACM
Date: 13-10-2015
Publisher: Springer Science and Business Media LLC
Date: 13-11-2015
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2021
Publisher: ACM
Date: 03-11-2014
Publisher: Springer International Publishing
Date: 2018
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2023
Publisher: ACM
Date: 03-11-2014
Publisher: Springer Science and Business Media LLC
Date: 27-10-2017
Publisher: Springer International Publishing
Date: 2014
Publisher: Elsevier BV
Date: 03-2014
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2023
Publisher: Elsevier BV
Date: 2014
Publisher: Springer International Publishing
Date: 2014
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 05-2015
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 02-2018
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2023
Publisher: Elsevier BV
Date: 2016
Publisher: Springer Science and Business Media LLC
Date: 06-01-2021
Publisher: Springer International Publishing
Date: 2017
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 03-2012
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 10-2017
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2023
Publisher: Springer International Publishing
Date: 2016
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2021
Publisher: Springer International Publishing
Date: 2018
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 06-2017
Publisher: Springer International Publishing
Date: 2018
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2023
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2023
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 08-2023
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 10-2021
Publisher: Association for Computing Machinery (ACM)
Date: 10-10-2018
DOI: 10.1145/3243316
Abstract: The superiority of deeply learned pedestrian representations has been reported in very recent literature of person re-identification (re-ID). In this article, we consider the more pragmatic issue of learning a deep feature with no or only a few labels. We propose a progressive unsupervised learning (PUL) method to transfer pretrained deep representations to unseen domains. Our method is easy to implement and can be viewed as an effective baseline for unsupervised re-ID feature learning. Specifically, PUL iterates between (1) pedestrian clustering and (2) fine-tuning of the convolutional neural network (CNN) to improve the initialization model trained on the irrelevant labeled dataset. Since the clustering results can be very noisy, we add a selection operation between the clustering and fine-tuning. At the beginning, when the model is weak, CNN is fine-tuned on a small amount of reliable ex les that locate near to cluster centroids in the feature space. As the model becomes stronger, in subsequent iterations, more images are being adaptively selected as CNN training s les. Progressively, pedestrian clustering and the CNN model are improved simultaneously until algorithm convergence. This process is naturally formulated as self-paced learning. We then point out promising directions that may lead to further improvement. Extensive experiments on three large-scale re-ID datasets demonstrate that PUL outputs discriminative features that improve the re-ID accuracy. Our code has been released at ehefan/Unsupervised-Person-Re-identification-Clustering-and-Fine-tuning.
Publisher: IEEE
Date: 06-2012
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 08-2017
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 04-2013
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2023
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 06-2021
Publisher: Association for Computing Machinery (ACM)
Date: 22-05-2020
DOI: 10.1145/3383184
Abstract: Matching images and sentences demands a fine understanding of both modalities. In this article, we propose a new system to discriminatively embed the image and text to a shared visual-textual space. In this field, most existing works apply the ranking loss to pull the positive image/text pairs close and push the negative pairs apart from each other. However, directly deploying the ranking loss on heterogeneous features (i.e., text and image features) is less effective, because it is hard to find appropriate triplets at the beginning. So the naive way of using the ranking loss may compromise the network from learning inter-modal relationship. To address this problem, we propose the instance loss, which explicitly considers the intra-modal data distribution. It is based on an unsupervised assumption that each image/text group can be viewed as a class. So the network can learn the fine granularity from every image/text group. The experiment shows that the instance loss offers better weight initialization for the ranking loss, so that more discriminative embeddings can be learned. Besides, existing works usually apply the off-the-shelf features, i.e., word2vec and fixed visual feature. So in a minor contribution, this article constructs an end-to-end dual-path convolutional network to learn the image and text representations. End-to-end learning allows the system to directly learn from the data and fully utilize the supervision. On two generic retrieval datasets (Flickr30k and MSCOCO), experiments demonstrate that our method yields competitive accuracy compared to state-of-the-art methods. Moreover, in language-based person retrieval, we improve the state of the art by a large margin. The code has been made publicly available.
Publisher: ACM
Date: 04-2014
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 12-2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 09-2023
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 04-2008
Publisher: ACM
Date: 29-10-2012
Publisher: ACM
Date: 03-07-2014
Publisher: Springer International Publishing
Date: 2018
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2023
Publisher: ACM
Date: 22-06-2013
Publisher: IEEE
Date: 06-2012
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2019
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 12-2012
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 10-2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 10-2016
Publisher: No publisher found
Date: 2014
Publisher: ACM
Date: 22-10-2013
Publisher: Association for Computing Machinery (ACM)
Date: 07-02-2019
DOI: 10.1145/3300939
Abstract: Performing direct matching among different modalities (like image and text) can benefit many tasks in computer vision, multimedia, information retrieval, and information fusion. Most of existing works focus on class-level image-text matching, called cross-modal retrieval , which attempts to propose a uniform model for matching images with all types of texts, for ex le, tags, sentences, and articles (long texts). Although cross-model retrieval alleviates the heterogeneous gap among visual and textual information, it can provide only a rough correspondence between two modalities. In this article, we propose a more precise image-text embedding method, image-sentence matching, which can provide heterogeneous matching in the instance level. The key issue for image-text embedding is how to make the distributions of the two modalities consistent in the embedding space. To address this problem, some previous works on the cross-model retrieval task have attempted to pull close their distributions by employing adversarial learning. However, the effectiveness of adversarial learning on image-sentence matching has not been proved and there is still not an effective method. Inspired by previous works, we propose to learn a modality-invariant image-text embedding for image-sentence matching by involving adversarial learning. On top of the triplet loss--based baseline, we design a modality classification network with an adversarial loss, which classifies an embedding into either the image or text modality. In addition, the multi-stage training procedure is carefully designed so that the proposed network not only imposes the image-text similarity constraints by ground-truth labels, but also enforces the image and text embedding distributions to be similar by adversarial learning. Experiments on two public datasets (Flickr30k and MSCOCO) demonstrate that our method yields stable accuracy improvement over the baseline model and that our results compare favorably to the state-of-the-art methods.
Publisher: IEEE
Date: 07-2017
Publisher: ACM
Date: 22-10-2013
Publisher: Association for Computing Machinery (ACM)
Date: 05-2013
Abstract: Recent years have witnessed a great explosion of user-generated videos on the Web. In order to achieve an effective and efficient video search, it is critical for modern video search engines to associate videos with semantic keywords automatically. Most of the existing video tagging methods can hardly achieve reliable performance due to deficiency of training data. It is noticed that abundant well-tagged data are available in other relevant types of media (e.g., images). In this article, we propose a novel video tagging framework, termed as Cross-Media Tag Transfer (CMTT), which utilizes the abundance of well-tagged images to facilitate video tagging. Specifically, we build a “cross-media tunnel” to transfer knowledge from images to videos. To this end, an optimal kernel space, in which distribution distance between images and video is minimized, is found to tackle the domain-shift problem. A novel cross-media video tagging model is proposed to infer tags by exploring the intrinsic local structures of both labeled and unlabeled data, and learn reliable video classifiers. An efficient algorithm is designed to optimize the proposed model in an iterative and alternative way. Extensive experiments illustrate the superiority of our proposal compared to the state-of-the-art algorithms.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 07-2016
DOI: 10.1109/MMUL.2016.42
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 03-2020
Start Date: 05-2013
End Date: 12-2016
Amount: $375,000.00
Funder: Australian Research Council
View Funded ActivityStart Date: 05-2020
End Date: 03-2025
Amount: $486,000.00
Funder: Australian Research Council
View Funded ActivityStart Date: 07-2016
End Date: 07-2020
Amount: $520,000.00
Funder: Australian Research Council
View Funded ActivityStart Date: 2018
End Date: 12-2021
Amount: $392,884.00
Funder: Australian Research Council
View Funded ActivityStart Date: 2015
End Date: 12-2018
Amount: $494,300.00
Funder: Australian Research Council
View Funded Activity