ARDC Research Link Australia

Publication

Broken External Links on Stack Overflow

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 09-2022

DOI: 10.1109/TSE.2021.3086494

Publication

VT-Revolution: Interactive Programming Video Tutorial Authoring and Watching System

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 08-2019

DOI: 10.1109/TSE.2018.2802916

Publication

What Permissions Should This Android App Request?

Publisher: IEEE

Date: 11-2016

DOI: 10.1109/SATE.2016.13

Publication

Fusing multi-abstraction vector space models for concern localization

Publisher: Springer Science and Business Media LLC

Date: 27-12-2017

DOI: 10.1007/S10664-017-9585-2

Publication

Practical and effective sandboxing for Linux containers

Publisher: Springer Science and Business Media LLC

Date: 04-07-2019

DOI: 10.1007/S10664-019-09737-2

Publication

Automating Change-Level Self-Admitted Technical Debt Determination

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 12-2019

DOI: 10.1109/TSE.2018.2831232

Publication

Which Packages Would be Affected by This Bug Report?

Publisher: IEEE

Date: 10-2017

DOI: 10.1109/ISSRE.2017.24

Publication

Automated Configuration Bug Report Prediction Using Text Mining

Publisher: IEEE

Date: 07-2014

DOI: 10.1109/COMPSAC.2014.17

Publication

Assessing the Alignment between the Information Needs of Developers and the Documentation of Programming Languages: A Case Study on Rust

Publisher: Association for Computing Machinery (ACM)

Date: 04-04-2023

DOI: 10.1145/3546945

Abstract: Programming language documentation refers to the set of technical documents that provide application developers with a description of the high-level concepts of a language (e.g., manuals, tutorials, and API references). Such documentation is essential to support application developers in effectively using a programming language. One of the challenges faced by documenters (i.e., personnel that design and produce documentation for a programming language) is to ensure that documentation has relevant information that aligns with the concrete needs of developers, defined as the missing knowledge that developers acquire via voluntary search. In this article, we present an automated approach to support documenters in evaluating the differences and similarities between the concrete information need of developers and the current state of documentation (a problem that we refer to as the topical alignment of a programming language documentation). Our approach leverages semi-supervised topic modelling that uses domain knowledge to guide the derivation of topics. We initially train a baseline topic model from a set of Rust -related Q& A posts. We then use this baseline model to determine the distribution of topic probabilities of each document of the official Rust documentation. Afterwards, we assess the similarities and differences between the topics of the Q& A posts and the official documentation. Our results show a relatively high level of topical alignment in Rust documentation. Still, information about specific topics is scarce in both the Q& A websites and the documentation, particularly related topics with programming niches such as network, game, and database development. For other topics (e.g., related topics with language features such as structs, patterns and matchings, and foreign function interface), information is only available on Q& A websites while lacking in the official documentation. Finally, we discuss implications for programming language documenters, particularly how to leverage our approach to prioritize topics that should be added to the documentation.

Publication

VT-revolution: interactive programming tutorials made possible

Publisher: ACM

Date: 26-10-2018

DOI: 10.1145/3236024.3264587

Publication

Automatic, high accuracy prediction of reopened bugs

Publisher: Springer Science and Business Media LLC

Date: 18-09-2014

DOI: 10.1007/S10515-014-0162-2

Publication

An Empirical Study of Bugs in Software Build Systems

Publisher: IEEE

Date: 07-2013

DOI: 10.1109/QSIC.2013.60

Publication

Vulnerability Detection by Learning from Syntax-Based Execution Paths of Code

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 2023

DOI: 10.1109/TSE.2023.3286586

Publication

Wireframe-based UI Design Search through Image Autoencoder

Publisher: Association for Computing Machinery (ACM)

Date: 16-06-2020

DOI: 10.1145/3391613

Abstract: UI design is an integral part of software development. For many developers who do not have much UI design experience, exposing them to a large database of real-application UI designs can help them quickly build up a realistic understanding of the design space for a software feature and get design inspirations from existing applications. However, existing keyword-based, image-similarity-based, and component-matching-based methods cannot reliably find relevant high-fidelity UI designs in a large database alike to the UI wireframe that the developers sketch, in face of the great variations in UI designs. In this article, we propose a deep-learning-based UI design search engine to fill in the gap. The key innovation of our search engine is to train a wireframe image autoencoder using a large database of real-application UI designs, without the need for labeling relevant UI designs. We implement our approach for Android UI design search, and conduct extensive experiments with artificially created relevant UI designs and human evaluation of UI design search results. Our experiments confirm the superior performance of our search engine over existing image-similarity or component-matching-based methods and demonstrate the usefulness of our search engine in real-world UI design tasks.

Publication

What design topics do developers discuss?

Publisher: ACM

Date: 28-05-2018

DOI: 10.1145/3196321.3196357

Publication

Software Internationalization and Localization: An Industrial Experience

Publisher: IEEE

Date: 07-2013

DOI: 10.1109/ICECCS.2013.40

Publication

Revisiting the Identification of the Co-evolution of Production and Test Code

Publisher: Association for Computing Machinery (ACM)

Date: 30-09-2023

DOI: 10.1145/3607183

Abstract: Many software processes advocate that the test code should co-evolve with the production code. Prior work usually studies such co-evolution based on production-test co-evolution s les mined from software repositories. A production-test co-evolution s le refers to a pair of a test code change and a production code change where the test code change triggers or is triggered by the production code change. The quality of the mined s les is critical to the reliability of research conclusions. Existing studies mined production-test co-evolution s les based on the following assumption: if a test class and its associated production class change together in one commit, or a test class changes immediately after the changes of associated production class within a short time interval, this change pair should be a production-test co-evolution s le . However, the validity of this assumption has never been investigated. To fill this gap, we present an empirical study, investigating the reasons for test code updates occurring after the associated production code changes, and revealing the pervasive existence of noise in the production-test co-evolution s les identified based on the aforementioned assumption by existing works. We define a taxonomy of such noise, including 6 categories (i.e., adaptive maintenance, perfective maintenance, corrective maintenance, indirectly related production code update, indirectly related test code update, and other reasons). Guided by the empirical findings, we propose CHOSEN (an identifi C ation met H od O f production-te S t co- E volutio N ) based on a two-stage strategy. CHOSEN takes a test code change and its associated production code change as input, aiming to determine whether the production-test change pair is a production-test co-evolution s le. Such identified s les are the basis of or are useful for various downstream tasks. We conduct a series of experiments to evaluate our method. Results show that: 1) CHOSEN achieves an AUC of 0.931 and an F1-score of 0.928, significantly outperforming existing identification methods. 2) CHOSEN can help researchers and practitioners draw more accurate conclusions on studies related to the co-evolution of production and test code. For the task of Just-In-Time (JIT) obsolete test code detection, which can help detect whether a piece of test code should be updated when developers modify the production code, the test set constructed by CHOSEN can help measure the detection method’s performance more accurately, only leading to 0.76% of average error compared with ground truth. In addition, the dataset constructed by CHOSEN can be used to train a better obsolete test code detection model, of which the average improvements on accuracy, precision, recall, and F1-score are 12.00%, 17.35%, 8.75%, and 13.50% respectively.

Publication

Automating aggregation for Software Quality modeling

Publisher: IEEE

Date: 09-2017

DOI: 10.1109/ICSME.2017.30

Publication

Build Predictor: More Accurate Missed Dependency Prediction in Build Configuration Files

Publisher: IEEE

Date: 07-2014

DOI: 10.1109/COMPSAC.2014.12

Publication

Inference of development activities from interaction with uninstrumented applications

Publisher: Springer Science and Business Media LLC

Date: 16-09-2017

DOI: 10.1007/S10664-017-9547-8

Publication

Who Will Leave the Company?: A Large-Scale Industry Study of Developer Turnover by Mining Monthly Work Report

Publisher: IEEE

Date: 05-2017

DOI: 10.1109/MSR.2017.58

Publication

Bug Characteristics in Blockchain Systems: A Large-Scale Empirical Study

Publisher: IEEE

Date: 05-2017

DOI: 10.1109/MSR.2017.59

Publication

“Automated Debugging Considered Harmful” Considered Harmful: A User Study Revisiting the Usefulness of Spectra-Based Fault Localization Techniques with Professionals Using Real Bugs from Large Systems

Publisher: IEEE

Date: 10-2016

DOI: 10.1109/ICSME.2016.67

Publication

1+1$>$2: Programming Know-What and Know-How Knowledge Fusion, Semantic Enrichment and Coherent Application

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 2022

DOI: 10.1109/TSC.2022.3207273

Publication

TopicAns: Topic-informed Architecture for Answer Recommendation on Technical Q&A Site

Publisher: Association for Computing Machinery (ACM)

Date: 11-07-2024

DOI: 10.1145/3607189

Publication

Characterizing and identifying reverted commits

Publisher: Springer Science and Business Media LLC

Date: 02-03-2019

DOI: 10.1007/S10664-019-09688-8

Publication

An Empirical Study of Bugs in Software Build System

Publisher: Institute of Electronics, Information and Communications Engineers (IEICE)

Date: 2014

DOI: 10.1587/TRANSINF.E97.D.1769

Publication

Context-Aware Neural Fault Localization

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 07-2023

DOI: 10.1109/TSE.2023.3279125

Publication

SATD detector

Publisher: ACM

Date: 27-05-2018

DOI: 10.1145/3183440.3183478

Publication

Evaluating defect prediction approaches using a massive set of metrics

Publisher: ACM

Date: 13-04-2015

DOI: 10.1145/2695664.2695959

Publication

Maintenance-related concerns for post-deployed Ethereum smart contract development: issues, techniques, and future challenges

Publisher: Springer Science and Business Media LLC

Date: 25-08-2021

DOI: 10.1007/S10664-021-10018-0

Publication

An Empirical Study of Classifier Combination for Cross-Project Defect Prediction

Publisher: IEEE

Date: 07-2015

DOI: 10.1109/COMPSAC.2015.58

Publication

ActivitySpace: A remembrance framework to support interapplication information needs

Publisher: IEEE

Date: 11-2016

DOI: 10.1109/ASE.2015.90

Publication

Correlating Automated and Human Evaluation of Code Documentation Generation Quality

Publisher: Association for Computing Machinery (ACM)

Date: 28-07-2022

DOI: 10.1145/3502853

Abstract: Automatic code documentation generation has been a crucial task in the field of software engineering. It not only relieves developers from writing code documentation but also helps them to understand programs better. Specifically, deep-learning-based techniques that leverage large-scale source code corpora have been widely used in code documentation generation. These works tend to use automatic metrics (such as BLEU, METEOR, ROUGE, CIDEr, and SPICE) to evaluate different models. These metrics compare generated documentation to reference texts by measuring the overlapping words. Unfortunately, there is no evidence demonstrating the correlation between these metrics and human judgment. We conduct experiments on two popular code documentation generation tasks, code comment generation and commit message generation, to investigate the presence or absence of correlations between these metrics and human judgments. For each task, we replicate three state-of-the-art approaches and the generated documentation is evaluated automatically in terms of BLEU, METEOR, ROUGE-L, CIDEr, and SPICE. We also ask 24 participants to rate the generated documentation considering three aspects (i.e., language, content, and effectiveness). Each participant is given Java methods or commit diffs along with the target documentation to be rated. The results show that the ranking of generated documentation from automatic metrics is different from that evaluated by human annotators. Thus, these automatic metrics are not reliable enough to replace human evaluation for code documentation generation tasks. In addition, METEOR shows the strongest correlation (with moderate Pearson correlation r about 0.7) to human evaluation metrics. However, it is still much lower than the correlation observed between different annotators (with a high Pearson correlation r about 0.8) and correlations that are reported in the literature for other tasks (e.g., Neural Machine Translation [ 39 ]). Our study points to the need to develop specialized automated evaluation metrics that can correlate more closely to human evaluation metrics for code generation tasks.

Publication

It Takes Two to Tango: Deleted Stack Overflow Question Prediction with Text and Meta Features

Publisher: IEEE

Date: 06-2016

DOI: 10.1109/COMPSAC.2016.145

Publication

Automated Bug Report Field Reassignment and Refinement Prediction

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 09-2016

DOI: 10.1109/TR.2015.2484074

Publication

Neural Network-based Detection of Self-Admitted Technical Debt

Publisher: Association for Computing Machinery (ACM)

Date: 29-07-2019

DOI: 10.1145/3324916

Abstract: Technical debt is a metaphor to reflect the tradeoff software engineers make between short-term benefits and long-term stability. Self-admitted technical debt (SATD), a variant of technical debt, has been proposed to identify debt that is intentionally introduced during software development, e.g., temporary fixes and workarounds. Previous studies have leveraged human-summarized patterns (which represent n-gram phrases that can be used to identify SATD) or text-mining techniques to detect SATD in source code comments. However, several characteristics of SATD features in code comments, such as vocabulary ersity, project uniqueness, length, and semantic variations, pose a big challenge to the accuracy of pattern or traditional text-mining-based SATD detection, especially for cross-project deployment. Furthermore, although traditional text-mining-based method outperforms pattern-based method in prediction accuracy, the text features it uses are less intuitive than human-summarized patterns, which makes the prediction results hard to explain. To improve the accuracy of SATD prediction, especially for cross-project prediction, we propose a Convolutional Neural Network-- (CNN) based approach for classifying code comments as SATD or non-SATD. To improve the explainability of our model’s prediction results, we exploit the computational structure of CNNs to identify key phrases and patterns in code comments that are most relevant to SATD. We have conducted an extensive set of experiments with 62,566 code comments from 10 open-source projects and a user study with 150 comments of another three projects. Our evaluation confirms the effectiveness of different aspects of our approach and its superior performance, generalizability, adaptability, and explainability over current state-of-the-art traditional text-mining-based methods for SATD classification.

Publication

Recommending frequently encountered bugs

Publisher: ACM

Date: 28-05-2018

DOI: 10.1145/3196321.3196348

Publication

Predictive Comment Updating With Heuristics and AST-Path-Based Neural Learning: A Two-Phase Approach

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 04-2023

DOI: 10.1109/TSE.2022.3185458

Publication

Detecting similar repositories on GitHub

Publisher: IEEE

Date: 02-2017

DOI: 10.1109/SANER.2017.7884605

Publication

Automatic Defect Categorization Based on Fault Triggering Conditions

Publisher: IEEE

Date: 08-2014

DOI: 10.1109/ICECCS.2014.14

Publication

Enhancing developer recommendation with supplementary information via mining historical commits

Publisher: Elsevier BV

Date: 12-2017

DOI: 10.1016/J.JSS.2017.09.021

Publication

EFSPredictor: Predicting Configuration Bugs with Ensemble Feature Selection

Publisher: IEEE

Date: 12-2015

DOI: 10.1109/APSEC.2015.38

Publication

Who should make decision on this pull request? Analyzing time-decaying relationships and file similarities for integrator prediction

Publisher: Elsevier BV

Date: 08-2019

DOI: 10.1016/J.JSS.2019.04.055

Publication

An Empirical Study of Bug Fixing Rate

Publisher: IEEE

Date: 07-2015

DOI: 10.1109/COMPSAC.2015.57

Publication

Practitioners' expectations on automated fault localization

Publisher: ACM

Date: 18-07-2016

DOI: 10.1145/2931037.2931051

Publication

TagCombine: Recommending Tags to Contents in Software Information Sites

Publisher: Springer Science and Business Media LLC

Date: 09-2015

DOI: 10.1007/S11390-015-1578-2

Publication

Towards more accurate content categorization of API discussions

Publisher: ACM

Date: 02-06-2014

DOI: 10.1145/2597008.2597142

Publication

Experience report: An industrial experience report on test outsourcing practices

Publisher: IEEE

Date: 11-2015

DOI: 10.1109/ISSRE.2015.7381830

Publication

Chatbot4QR: Interactive Query Refinement for Technical Question Retrieval

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 04-2022

DOI: 10.1109/TSE.2020.3016006

Publication

Cross-Project Change-Proneness Prediction

Publisher: IEEE

Date: 07-2018

DOI: 10.1109/COMPSAC.2018.00017

Publication

The Impact of Mislabeled Changes by SZZ on Just-in-Time Defect Prediction

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 08-2021

DOI: 10.1109/TSE.2019.2929761

Publication

psc2code

Publisher: Association for Computing Machinery (ACM)

Date: 06-2020

DOI: 10.1145/3392093

Abstract: Programming screencasts have become a pervasive resource on the Internet, which help developers learn new programming technologies or skills. The source code in programming screencasts is an important and valuable information for developers. But the streaming nature of programming screencasts (i.e., a sequence of screen-captured images) limits the ways that developers can interact with the source code in the screencasts. Many studies use the Optical Character Recognition (OCR) technique to convert screen images (also referred to as video frames) into textual content, which can then be indexed and searched easily. However, noisy screen images significantly affect the quality of source code extracted by OCR, for ex le, no-code frames (e.g., PowerPoint slides, web pages of API specification), non-code regions (e.g., Package Explorer view, Console view), and noisy code regions with code in completion suggestion popups. Furthermore, due to the code characteristics (e.g., long compound identifiers like ItemListener), even professional OCR tools cannot extract source code without errors from screen images. The noisy OCRed source code will negatively affect the downstream applications, such as the effective search and navigation of the source code content in programming screencasts. In this article, we propose an approach named psc2code to denoise the process of extracting source code from programming screencasts. First, psc2code leverages the Convolutional Neural Network (CNN) based image classification to remove non-code and noisy-code frames. Then, psc2code performs edge detection and clustering-based image segmentation to detect sub-windows in a code frame, and based on the detected sub-windows, it identifies and crops the screen region that is most likely to be a code editor. Finally, psc2code calls the API of a professional OCR tool to extract source code from the cropped code regions and leverages the OCRed cross-frame information in the programming screencast and the statistical language model of a large corpus of source code to correct errors in the OCRed source code. We conduct an experiment on 1,142 programming screencasts from YouTube. We find that our CNN-based image classification technique can effectively remove the non-code and noisy-code frames, which achieves an F1-score of 0.95 on the valid code frames. We also find that psc2code can significantly improve the quality of the OCRed source code by truly correcting about half of incorrectly OCRed words. Based on the source code denoised by psc2code , we implement two applications: (1) a programming screencast search engine (2) an interaction-enhanced programming screencast watching tool. Based on the source code extracted from the 1,142 collected programming screencasts, our experiments show that our programming screencast search engine achieves the precision@5, 10, and 20 of 0.93, 0.81, and 0.63, respectively. We also conduct a user study of our interaction-enhanced programming screencast watching tool with 10 participants. This user study shows that our interaction-enhanced watching tool can help participants learn the knowledge in the programming video more efficiently and effectively.

Publication

Code Structure–Guided Transformer for Source Code Summarization

Publisher: Association for Computing Machinery (ACM)

Date: 31-01-2023

DOI: 10.1145/3522674

Abstract: Code summaries help developers comprehend programs and reduce their time to infer the program functionalities during software maintenance. Recent efforts resort to deep learning techniques such as sequence-to-sequence models for generating accurate code summaries, among which Transformer-based approaches have achieved promising performance. However, effectively integrating the code structure information into the Transformer is under-explored in this task domain. In this article, we propose a novel approach named SG-Trans to incorporate code structural properties into Transformer. Specifically, we inject the local symbolic information (e.g., code tokens and statements) and global syntactic structure (e.g., dataflow graph) into the self-attention module of Transformer as inductive bias. To further capture the hierarchical characteristics of code, the local information and global structure are designed to distribute in the attention heads of lower layers and high layers of Transformer. Extensive evaluation shows the superior performance of SG-Trans over the state-of-the-art approaches. Compared with the best-performing baseline, SG-Trans still improves 1.4% and 2.0% on two benchmark datasets, respectively, in terms of METEOR score, a metric widely used for measuring generation quality.

Publication

Smart Contract Security: A Practitioners' Perspective

Publisher: IEEE

Date: 05-2021

DOI: 10.1109/ICSE43902.2021.00127

Publication

Combining Word Embedding with Information Retrieval to Recommend Similar Bug Reports

Publisher: IEEE

Date: 10-2016

DOI: 10.1109/ISSRE.2016.33

Publication

Cross-language bug localization

Publisher: ACM

Date: 02-06-2014

DOI: 10.1145/2597008.2597788

Publication

Early prediction of merged code changes to prioritize reviewing tasks

Publisher: Springer Science and Business Media LLC

Date: 19-03-2018

DOI: 10.1007/S10664-018-9602-0

Publication

ELBlocker: Predicting blocking bugs with ensemble imbalance learning

Publisher: Elsevier BV

Date: 05-2015

DOI: 10.1016/J.INFSOF.2014.12.006

Publication

Tag recommendation in software information sites

Publisher: IEEE

Date: 05-2013

DOI: 10.1109/MSR.2013.6624040

Publication

Automatic, highly accurate app permission recommendation

Publisher: Springer Science and Business Media LLC

Date: 19-03-2019

DOI: 10.1007/S10515-019-00254-6

Publication

Semantic-enriched Code Knowledge Graph to Reveal Unknowns in Smart Contract Code Reuse

Publisher: Association for Computing Machinery (ACM)

Date: 22-05-2023

DOI: 10.1145/3597206

Abstract: Programmers who work with smart contract development often encounter challenges in reusing code from repositories. This is due to the presence of two unknowns that can lead to non-functional and functional failures. These unknowns are implicit collaborations between functions and subtle differences among similar functions. Current code mining methods can extract syntax and semantic knowledge (known knowledge), but they cannot uncover these unknowns due to a significant gap between the known and the unknown. To address this issue, we formulate knowledge acquisition as a knowledge deduction task and propose an analytic flow that uses function-clone as a bridge to gradually deduce the known knowledge into the problem-solving knowledge that can reveal the unknowns. This flow comprises five methods: clone detection, co-occurrence probability calculation, function usage frequency accumulation, description propagation, and CFG annotation. This provides a systematic and coherent approach to knowledge deduction. We then structure all the knowledge into a semantic-enriched code Knowledge Graph (KG) and integrate this KG into two software engineering tasks: code recommendation and crowd-scaled coding practice checking. As a proof of concept, we apply our approach to 5,140 smart contract files available on Etherscan.io, and confirm high accuracy of our KG construction steps. In our experiments, our code KG effectively improved code recommendation accuracy by 6% to 45%, increased ersity by 61% to 102%, and enhanced NDCG by 1% to 21%. Furthermore, compared to traditional analysis tools and the debugging-with-the-crowd method, our KG improved time efficiency by 30-380 seconds, vulnerability determination accuracy by 20%-33%, and vulnerability fixing accuracy by 24%-40% for novice developers who identified and fixed vulnerable smart contract functions.

Publication

Automated Android application permission recommendation

Publisher: Springer Science and Business Media LLC

Date: 28-07-2017

DOI: 10.1007/S11432-016-9072-3

Publication

Characterising deprecated Android APIs

Publisher: ACM

Date: 28-05-2018

DOI: 10.1145/3196398.3196419

Publication

Characterizing malicious Android apps by mining topic-specific data flow signatures

Publisher: Elsevier BV

Date: 10-2017

DOI: 10.1016/J.INFSOF.2017.04.007

Publication

Constructing a System Knowledge Graph of User Tasks and Failures from Bug Reports to Support Soap Opera Testing

Publisher: ACM

Date: 10-10-2022

DOI: 10.1145/3551349.3556967

Publication

What Security Questions Do Developers Ask? A Large-Scale Study of Stack Overflow Posts

Publisher: Springer Science and Business Media LLC

Date: 09-2016

DOI: 10.1007/S11390-016-1672-0

Publication

Automated prediction of bug report priority using multi-factor analysis

Publisher: Springer Science and Business Media LLC

Date: 03-08-2014

DOI: 10.1007/S10664-014-9331-Y

Publication

High-Impact Bug Report Identification with Imbalanced Learning Strategies

Publisher: Springer Science and Business Media LLC

Date: 2017

DOI: 10.1007/S11390-017-1713-3

Publication

Revisiting Supervised and Unsupervised Methods for Effort-Aware Cross-Project Defect Prediction

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 03-2022

DOI: 10.1109/TSE.2020.3001739

Publication

A Comparative Study of Supervised Learning Algorithms for Re-opened Bug Prediction

Publisher: IEEE

Date: 03-2013

DOI: 10.1109/CSMR.2013.43

Publication

How Does Visualisation Help App Practitioners Analyse Android Apps?

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 05-2023

DOI: 10.1109/TDSC.2022.3178181

Publication

AnswerBot: Automated generation of answer summary to developers' technical questions

Publisher: IEEE

Date: 10-2017

DOI: 10.1109/ASE.2017.8115681

Publication

Instance-ranking: A new perspective to consider the instance dependency for classification

Publisher: Springer Berlin Heidelberg

Date: 2013

DOI: 10.1007/978-3-642-36778-6_10

Publication

XSearch: a domain-specific cross-language relevant question retrieval tool

Publisher: ACM

Date: 21-08-2017

DOI: 10.1145/3106237.3122820

Publication

Why Do Smart Contracts Self-Destruct? Investigating the Selfdestruct Function on Ethereum

Publisher: Association for Computing Machinery (ACM)

Date: 24-12-2021

DOI: 10.1145/3488245

Abstract: The selfdestruct function is provided by Ethereum smart contracts to destroy a contract on the blockchain system. However, it is a double-edged sword for developers. On the one hand, using the selfdestruct function enables developers to remove smart contracts ( SCs ) from Ethereum and transfers Ethers when emergency situations happen, e.g., being attacked. On the other hand, this function can increase the complexity for the development and open an attack vector for attackers. To better understand the reasons why SC developers include or exclude the selfdestruct function in their contracts, we conducted an online survey to collect feedback from them and summarize the key reasons. Their feedback shows that 66.67% of the developers will deploy an updated contract to the Ethereum after destructing the old contract. According to this information, we propose a method to find the self-destructed contracts (also called predecessor contracts) and their updated version (successor contracts) by computing the code similarity. By analyzing the difference between the predecessor contracts and their successor contracts, we found five reasons that led to the death of the contracts two of them (i.e., Unmatched ERC20 Token and Limits of Permission ) might affect the life span of contracts. We developed a tool named LifeScope to detect these problems. LifeScope reports 0 false positives or negatives in detecting Unmatched ERC20 Token . In terms of Limits of Permission , LifeScope achieves 77.89% of F-measure and 0.8673 of AUC in average. According to the feedback of developers who exclude selfdestruct functions, we propose suggestions to help developers use selfdestruct functions in Ethereum smart contracts better.

Publication

Smart Contract Development: Challenges and Opportunities

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 10-2021

DOI: 10.1109/TSE.2019.2942301

Publication

Personalized project recommendation on GitHub

Publisher: Springer Science and Business Media LLC

Date: 20-04-2018

DOI: 10.1007/S11432-017-9419-X

Publication

Inferring Links between Concerns and Methods with Multi-abstraction Vector Space Model

Publisher: IEEE

Date: 10-2016

DOI: 10.1109/ICSME.2016.51

Publication

Checking Smart Contracts With Structural Code Embedding

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 12-2021

DOI: 10.1109/TSE.2020.2971482

Publication

What Do Programmers Discuss About Blockchain? A Case Study on the Use of Balanced LDA and the Reference Architecture of a Domain to Capture Online Discussions About Blockchain Platforms Across Stack E

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 07-2021

DOI: 10.1109/TSE.2019.2921343

Publication

Just-In-Time Obsolete Comment Detection and Update

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 2023

DOI: 10.1109/TSE.2021.3138909

Publication

File-Level Defect Prediction: Unsupervised vs. Supervised Models

Publisher: IEEE

Date: 11-2017

DOI: 10.1109/ESEM.2017.48

Publication

Just-In-Time Defect Identification and Localization: A Two-Phase Framework

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 2022

DOI: 10.1109/TSE.2020.2978819

Publication

Technical Q8A Site Answer Recommendation via Question Boosting

Publisher: Association for Computing Machinery (ACM)

Date: 31-12-2020

DOI: 10.1145/3412845

Abstract: Software developers have heavily used online question-and-answer platforms to seek help to solve their technical problems. However, a major problem with these technical Q8A sites is “answer hungriness,” i.e., a large number of questions remain unanswered or unresolved, and users have to wait for a long time or painstakingly go through the provided answers with various levels of quality. To alleviate this time-consuming problem, we propose a novel D EEP A NS neural network–based approach to identify the most relevant answer among a set of answer candidates. Our approach follows a three-stage process: question boosting, label establishment, and answer recommendation. Given a post, we first generate a clarifying question as a way of question boosting. We automatically establish the positive , neutral + , neutral - , and negative training s les via label establishment. When it comes to answer recommendation, we sort answer candidates by the matching scores calculated by our neural network–based model. To evaluate the performance of our proposed model, we conducted a large-scale evaluation on four datasets, collected from the real-world technical Q8A sites (i.e., Ask Ubuntu, Super User, Stack Overflow Python, and Stack Overflow Java). Our experimental results show that our approach significantly outperforms several state-of-the-art baselines in automatic evaluation. We also conducted a user study with 50 solved/unanswered/unresolved questions. The user-study results demonstrate that our approach is effective in solving the answer-hungry problem by recommending the most relevant answers from historical archives.

Publication

Domain-specific cross-language relevant question retrieval

Publisher: Springer Science and Business Media LLC

Date: 04-11-2017

DOI: 10.1007/S10664-017-9568-3

Publication

Identifying self-admitted technical debt in open source projects using text mining

Publisher: Springer Science and Business Media LLC

Date: 09-05-2017

DOI: 10.1007/S10664-017-9522-4

Publication

API Usage Recommendation Via Multi-View Heterogeneous Graph Representation Learning

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 05-2023

DOI: 10.1109/TSE.2023.3252259

Publication

Learning to aggregate: An automated aggregation method for software quality model

Publisher: IEEE

Date: 05-2017

DOI: 10.1109/ICSE-C.2017.139

Publication

Automating Intention Mining

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 10-2020

DOI: 10.1109/TSE.2018.2876340

Publication

A Survey on Deep Learning for Software Engineering

Publisher: Association for Computing Machinery (ACM)

Date: 31-01-2022

DOI: 10.1145/3505243

Abstract: In 2006, Geoffrey Hinton proposed the concept of training “Deep Neural Networks (DNNs)” and an improved model training method to break the bottleneck of neural network development. More recently, the introduction of AlphaGo in 2016 demonstrated the powerful learning ability of deep learning and its enormous potential. Deep learning has been increasingly used to develop state-of-the-art software engineering (SE) research tools due to its ability to boost performance for various SE tasks. There are many factors, e.g., deep learning model selection, internal structure differences, and model optimization techniques, that may have an impact on the performance of DNNs applied in SE. Few works to date focus on summarizing, classifying, and analyzing the application of deep learning techniques in SE. To fill this gap, we performed a survey to analyze the relevant studies published since 2006. We first provide an ex le to illustrate how deep learning techniques are used in SE. We then conduct a background analysis (BA) of primary studies and present four research questions to describe the trend of DNNs used in SE (BA), summarize and classify different deep learning techniques (RQ1), and analyze the data processing including data collection, data classification, data pre-processing, and data representation (RQ2). In RQ3, we depicted a range of key research topics using DNNs and investigated the relationships between DL-based model adoption and multiple factors (i.e., DL architectures, task types, problem types, and data types). We also summarized commonly used datasets for different SE tasks. In RQ4, we summarized the widely used optimization algorithms and provided important evaluation metrics for different problem types, including regression, classification, recommendation, and generation. Based on our findings, we present a set of current challenges remaining to be investigated and outline a proposed research road map highlighting key opportunities for future work.

Publication

Just-In-Time Defect Prediction on JavaScript Projects: A Replication Study

Publisher: Association for Computing Machinery (ACM)

Date: 22-08-2022

DOI: 10.1145/3508479

Abstract: Change-level defect prediction is widely referred to as just-in-time (JIT) defect prediction since it identifies a defect-inducing change at the check-in time, and researchers have proposed many approaches based on the language-independent change-level features. These approaches can be ided into two types: supervised approaches and unsupervised approaches, and their effectiveness has been verified on Java or C++ projects. However, whether the language-independent change-level features can effectively identify the defects of JavaScript projects is still unknown. Additionally, many researches have confirmed that supervised approaches outperform unsupervised approaches on Java or C++ projects when considering inspection effort. However, whether supervised JIT defect prediction approaches can still perform best on JavaScript projects is still unknown. Lastly, prior proposed change-level features are programming language–independent, whether programming language–specific change-level features can further improve the performance of JIT approaches on identifying defect-prone changes is also unknown. To address the aforementioned gap in knowledge, in this article, we collect and label the top-20 most starred JavaScript projects on GitHub. JavaScript is an extremely popular and widely used programming language in the industry. We propose five JavaScript-specific change-level features and conduct a large-scale empirical study (i.e., involving a total of 176,902 changes) and find that (1) supervised JIT defect prediction approaches (i.e., CBS+) still statistically significantly outperform unsupervised approaches on JavaScript projects when considering inspection effort (2) JavaScript-specific change-level features can further improve the performance of approach built with language-independent features on identifying defect-prone changes (3) the change-level features in the dimension of size (i.e., LT), diffusion (i.e., NF), and JavaScript-specific (i.e., SO and TC) are the most important features for indicating the defect-proneness of a change on JavaScript projects and (4) project-related features (i.e., Stars, Branches, Def Ratio, Changes, Files, Defective, and Forks) have a high association with the probability of a change to be a defect-prone one on JavaScript projects.

Publication

Why and how developers fork what from whom in GitHub

Publisher: Springer Science and Business Media LLC

Date: 31-05-2016

DOI: 10.1007/S10664-016-9436-6

Publication

Perceptions, Expectations, and Challenges in Defect Prediction

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 11-2020

DOI: 10.1109/TSE.2018.2877678

Publication

Combining Software Metrics and Text Features for Vulnerable File Prediction

Publisher: IEEE

Date: 12-2015

DOI: 10.1109/ICECCS.2015.15

Publication

deGraphCS : Embedding Variable-based Flow Graph for Neural Code Search

Publisher: Association for Computing Machinery (ACM)

Date: 30-03-2023

DOI: 10.1145/3546066

Abstract: With the rapid increase of public code repositories, developers maintain a great desire to retrieve precise code snippets by using natural language. Despite existing deep learning-based approaches that provide end-to-end solutions (i.e., accept natural language as queries and show related code fragments), the performance of code search in the large-scale repositories is still low in accuracy because of the code representation (e.g., AST) and modeling (e.g., directly fusing features in the attention stage). In this paper, we propose a novel learnable de ep G raph for C ode S earch (called deGraphCS ) to transfer source code into variable-based flow graphs based on an intermediate representation technique, which can model code semantics more precisely than directly processing the code as text or using the syntax tree representation. Furthermore, we propose a graph optimization mechanism to refine the code representation and apply an improved gated graph neural network to model variable-based flow graphs. To evaluate the effectiveness of deGraphCS , we collect a large-scale dataset from GitHub containing 41,152 code snippets written in the C language and reproduce several typical deep code search methods for comparison. The experimental results show that deGraphCS can achieve state-of-the-art performance and accurately retrieve code snippets satisfying the needs of the users.

Publication

Fusion fault localizers

Publisher: ACM

Date: 15-09-2014

DOI: 10.1145/2642937.2642983

Publication

Characterizing Common and Domain-Specific Package Bugs: A Case Study on Ubuntu

Publisher: IEEE

Date: 07-2018

DOI: 10.1109/COMPSAC.2018.00065

Publication

Extracting and analyzing time-series HCI data from screen-captured task videos

Publisher: Springer Science and Business Media LLC

Date: 19-01-2017

DOI: 10.1007/S10664-015-9417-1

Publication

Web APIs: Features, Issues, and Expectations – A Large-Scale Empirical Study of Web APIs From Two Publicly Accessible Registries Using Stack Overflow and a User Survey

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 02-2023

DOI: 10.1109/TSE.2022.3154769

Publication

Mining Sandboxes for Linux Containers

Publisher: IEEE

Date: 03-2017

DOI: 10.1109/ICST.2017.16

Publication

Locating Latent Design Information in Developer Discussions: A Study on Pull Requests

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 07-2021

DOI: 10.1109/TSE.2019.2924006

Publication

A Bayesian Network nearest k-labels method for Multi-label classification

Publisher: AICIT

Date: 31-05-2012

DOI: 10.4156/AISS.VOL4.ISSUE8.4

Publication

Measuring Program Comprehension: A Large-Scale Field Study with Professionals

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 10-2018

DOI: 10.1109/TSE.2017.2734091

Publication

API method recommendation without worrying about the task-API knowledge gap

Publisher: ACM

Date: 03-09-2018

DOI: 10.1145/3238147.3238191

Publication

What do developers search for on the web?

Publisher: Springer Science and Business Media LLC

Date: 09-04-2017

DOI: 10.1007/S10664-017-9514-4

Publication

Information credibility on twitter in emergency situation

Publisher: Springer Berlin Heidelberg

Date: 2012

DOI: 10.1007/978-3-642-30428-6_4

Publication

Predictive Models in Software Engineering: Challenges and Opportunities

Publisher: Association for Computing Machinery (ACM)

Date: 09-04-2022

DOI: 10.1145/3503509

Abstract: Predictive models are one of the most important techniques that are widely applied in many areas of software engineering. There have been a large number of primary studies that apply predictive models and that present well-performed studies in various research domains, including software requirements, software design and development, testing and debugging, and software maintenance. This article is a first attempt to systematically organize knowledge in this area by surveying a body of 421 papers on predictive models published between 2009 and 2020. We describe the key models and approaches used, classify the different models, summarize the range of key application areas, and analyze research results. Based on our findings, we also propose a set of current challenges that still need to be addressed in future work and provide a proposed research road map for these opportunities.

Publication

Accessibility in Software Practice: A Practitioner’s Perspective

Publisher: Association for Computing Machinery (ACM)

Date: 28-07-2022

DOI: 10.1145/3503508

Abstract: Being able to access software in daily life is vital for everyone, and thus accessibility is a fundamental challenge for software development. However, given the number of accessibility issues reported by many users, e.g., in app reviews, it is not clear if accessibility is widely integrated into current software projects and how software projects address accessibility issues. In this article, we report a study of the critical challenges and benefits of incorporating accessibility into software development and design. We applied a mixed qualitative and quantitative approach for gathering data from 15 interviews and 365 survey respondents from 26 countries across five continents to understand how practitioners perceive accessibility development and design in practice. We got 44 statements grouped into eight topics on accessibility from practitioners’ viewpoints and different software development stages. Our statistical analysis reveals substantial gaps between groups, e.g., practitioners have Direct vs. Indirect accessibility relevant work experience when they reviewed the summarized statements. These gaps might hinder the quality of accessibility development and design, and we use our findings to establish a set of guidelines to help practitioners be aware of accessibility challenges and benefit factors. We suggest development teams put accessibility as a first-class consideration throughout the software development process, and we also propose some remedies to resolve the gaps between groups and to highlight key future research directions to incorporate accessibility into software design and development.

Publication

How Practitioners Perceive Automated Bug Report Management Techniques

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 08-2020

DOI: 10.1109/TSE.2018.2870414

Publication

A systematic mapping study of quality assessment models for software products

Publisher: IEEE

Date: 11-2017

DOI: 10.1109/SATE.2017.16

Publication

Domain-specific cross-language relevant question retrieval

Publisher: ACM

Date: 14-05-2016

DOI: 10.1145/2901739.2901746

Publication

Neural-machine-translation-based commit message generation: how far are we?

Publisher: ACM

Date: 03-09-2018

DOI: 10.1145/3238147.3238190

Publication

How android app developers manage power consumption?

Publisher: ACM

Date: 14-05-2016

DOI: 10.1145/2901739.2901748

Publication

Collective Personalized Change Classification With Multiobjective Search

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 12-2016

DOI: 10.1109/TR.2016.2588139

Publication

Scalable relevant project recommendation on GitHub

Publisher: ACM

Date: 23-09-2017

DOI: 10.1145/3131704.3131706

Publication

Improving defect prediction with deep forest

Publisher: Elsevier BV

Date: 10-2019

DOI: 10.1016/J.INFSOF.2019.07.003

Publication

Data Quality Matters: A Case Study on Data Label Correctness for Security Bug Report Prediction

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 07-2022

DOI: 10.1109/TSE.2021.3063727

Publication

Automated Identification of High Impact Bug Reports Leveraging Imbalanced Learning Strategies

Publisher: IEEE

Date: 06-2016

DOI: 10.1109/COMPSAC.2016.67

Publication

Multi-Factor Duplicate Question Detection in Stack Overflow

Publisher: Springer Science and Business Media LLC

Date: 09-2015

DOI: 10.1007/S11390-015-1576-4

Publication

RW.KNN: A proposed random walk KNN algorithm for multi-label classification

Publisher: ACM

Date: 28-10-2011

DOI: 10.1145/2065003.2065022

Publication

Generating Question Titles for Stack Overflow from Mined Code Snippets

Publisher: Association for Computing Machinery (ACM)

Date: 26-09-2020

DOI: 10.1145/3401026

Abstract: Stack Overflow has been heavily used by software developers as a popular way to seek programming-related information from peers via the internet. The Stack Overflow community recommends users to provide the related code snippet when they are creating a question to help others better understand it and offer their help. Previous studies have shown that a significant number of these questions are of low-quality and not attractive to other potential experts in Stack Overflow. These poorly asked questions are less likely to receive useful answers and hinder the overall knowledge generation and sharing process. Considering one of the reasons for introducing low-quality questions in SO is that many developers may not be able to clarify and summarize the key problems behind their presented code snippets due to their lack of knowledge and terminology related to the problem, and/or their poor writing skills, in this study we propose an approach to assist developers in writing high-quality questions by automatically generating question titles for a code snippet using a deep sequence-to-sequence learning approach. Our approach is fully data-driven and uses an attention mechanism to perform better content selection, a copy mechanism to handle the rare-words problem and a coverage mechanism to eliminate word repetition problem. We evaluate our approach on Stack Overflow datasets over a variety of programming languages (e.g., Python, Java, Javascript, C# and SQL) and our experimental results show that our approach significantly outperforms several state-of-the-art baselines in both automatic and human evaluation. We have released our code and datasets to facilitate other researchers to verify their ideas and inspire the follow up work.

Publication

A Survey on Adaptive Random Testing

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 10-2021

DOI: 10.1109/TSE.2019.2942921

Publication

An exploratory study on the repeatedly shared external links on Stack Overflow

Publisher: Springer Science and Business Media LLC

Date: 03-11-2021

DOI: 10.1007/S10664-021-10028-Y

Publication

Combined classifier for cross-project defect prediction: an extended empirical study

Publisher: Springer Science and Business Media LLC

Date: 15-02-2018

DOI: 10.1007/S11704-017-6015-Y

Publication

Modular Tree Network for Source Code Representation Learning

Publisher: Association for Computing Machinery (ACM)

Date: 26-09-2020

DOI: 10.1145/3409331

Abstract: Learning representation for source code is a foundation of many program analysis tasks. In recent years, neural networks have already shown success in this area, but most existing models did not make full use of the unique structural information of programs. Although abstract syntax tree (AST)-based neural models can handle the tree structure in the source code, they cannot capture the richness of different types of substructure in programs. In this article, we propose a modular tree network that dynamically composes different neural network units into tree structures based on the input AST. Different from previous tree-structural neural network models, a modular tree network can capture the semantic differences between types of AST substructures. We evaluate our model on two tasks: program classification and code clone detection. Our model achieves the best performance compared with state-of-the-art approaches in both tasks, showing the advantage of leveraging more elaborate structure information of the source code.

Publication

Ranking in co-effecting multi-object/link types networks

Publisher: IEEE

Date: 11-2011

DOI: 10.1109/ICTAI.2011.84

Publication

Chaff from the Wheat: Characterizing and Determining Valid Bug Reports

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 05-2020

DOI: 10.1109/TSE.2018.2864217

Publication

An effective change recommendation approach for supplementary bug fixes

Publisher: Springer Science and Business Media LLC

Date: 26-08-2016

DOI: 10.1007/S10515-016-0204-Z

Publication

TLEL: A two-layer ensemble learning approach for just-in-time defect prediction

Publisher: Elsevier BV

Date: 07-2017

DOI: 10.1016/J.INFSOF.2017.03.007

Publication

Personality and Project Success: Insights from a Large-Scale Study with Professionals

Publisher: IEEE

Date: 09-2017

DOI: 10.1109/ICSME.2017.50

Publication

Deep code comment generation

Publisher: ACM

Date: 28-05-2018

DOI: 10.1145/3196321.3196334

Publication

An empirical study of bugs in build process

Publisher: ACM

Date: 24-03-2014

DOI: 10.1145/2554850.2555142

Publication

An empirical study of bug report field reassignment

Publisher: IEEE

Date: 02-2014

DOI: 10.1109/CSMR-WCRE.2014.6747167

Publication

Multi-Granularity Detector for Vulnerability Fixes

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 2202

DOI: 10.1109/TSE.2023.3281275

Publication

Customer satisfaction feedback in an IT outsourcing company

Publisher: ACM

Date: 27-04-2015

DOI: 10.1145/2745802.2745834

Publication

HYDRA: Massively Compositional Model for Cross-Project Defect Prediction

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 10-2016

DOI: 10.1109/TSE.2016.2543218

Publication

Towards more accurate multi-label software behavior learning

Publisher: IEEE

Date: 02-2014

DOI: 10.1109/CSMR-WCRE.2014.6747163

Publication

Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction

Publisher: Springer Science and Business Media LLC

Date: 27-10-2018

DOI: 10.1007/S10664-018-9661-2

Publication

Bug Report Enrichment with Application of Automated Fixer Recommendation

Publisher: IEEE

Date: 05-2017

DOI: 10.1109/ICPC.2017.28

Publication

Defining Smart Contract Defects on Ethereum

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 2022

DOI: 10.1109/TSE.2020.2989002

Publication

Cross-project build co-change prediction

Publisher: IEEE

Date: 03-2015

DOI: 10.1109/SANER.2015.7081841

Publication

An Empirical Study of Release Note Production and Usage in Practice

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 06-2022

DOI: 10.1109/TSE.2020.3038881

Publication

Dual analysis for recommending developers to resolve bugs

Publisher: Wiley

Date: 03-2015

DOI: 10.1002/SMR.1706

Publication

DefectChecker: Automated Smart Contract Defect Detection by Analyzing EVM Bytecode

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 07-2022

DOI: 10.1109/TSE.2021.3054928

Publication

Improving Automated Bug Triaging with Specialized Topic Model

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 04-2094

DOI: 10.1109/TSE.2016.2576454

Publication

Who should review this change?: Putting text and file location analyses together for more accurate recommendations

Publisher: IEEE

Date: 09-2015

DOI: 10.1109/ICSM.2015.7332472

Publication

Deep Learning for Just-in-Time Defect Prediction

Publisher: IEEE

Date: 08-2015

DOI: 10.1109/QRS.2015.14

Publication

Accurate developer recommendation for bug resolution

Publisher: IEEE

Date: 10-2013

DOI: 10.1109/WCRE.2013.6671282

Publication

Diversity maximization speedup for localizing faults in single-fault and multi-fault programs

Publisher: Springer Science and Business Media LLC

Date: 06-09-2014

DOI: 10.1007/S10515-014-0165-Z

Publication

A Large Scale Study of Long-Time Contributor Prediction for GitHub Projects

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Date: 06-2021

DOI: 10.1109/TSE.2019.2918536

Publication

Combining collaborative filtering and topic modeling for more accurate android mobile app library recommendation

Publisher: ACM

Date: 23-09-2017

DOI: 10.1145/3131704.3131721

Publication

Message from the SoftwareMining 2015 chairs

Publisher: IEEE

Date: 11-2016

DOI: 10.1109/ASEW.2015.35

Publication

A two-phase transfer learning model for cross-project defect prediction

Publisher: Elsevier BV

Date: 03-2019

DOI: 10.1016/J.INFSOF.2018.11.005

Publication

Revisiting the Correlation between Alerts and Software Defects: A Case Study on MyFaces, Camel, and CXF

Publisher: IEEE

Date: 07-2017

DOI: 10.1109/COMPSAC.2017.201

Publication

Predicting Crashing Releases of Mobile Applications

Publisher: ACM

Date: 08-09-2016

DOI: 10.1145/2961111.2962606

Publication

Build system analysis with link prediction

Publisher: ACM

Date: 24-03-2014

DOI: 10.1145/2554850.2555134

Publication

Why is my code change abandoned?

Publisher: Elsevier BV

Date: 06-2019

DOI: 10.1016/J.INFSOF.2019.02.007

Publication

BOAT: an experimental platform for researchers to comparatively and reproducibly evaluate bug localization techniques

Publisher: ACM

Date: 31-05-2014

DOI: 10.1145/2591062.2591066

Publication

Condensing Class Diagrams With Minimal Manual Labeling Cost

Publisher: IEEE

Date: 06-2016

DOI: 10.1109/COMPSAC.2016.83

Xin Xia

Researcher

Research Topics

Top 5 Research Topics

ANZSRC Field of Research (FoR)

ANZSRC Socio-Economic Objective (SEO)

Related Links

Publications

Broken External Links on Stack Overflow

VT-Revolution: Interactive Programming Video Tutorial Authoring and Watching System

What Permissions Should This Android App Request?

Fusing multi-abstraction vector space models for concern localization

Practical and effective sandboxing for Linux containers

Automating Change-Level Self-Admitted Technical Debt Determination

Which Packages Would be Affected by This Bug Report?

Automated Configuration Bug Report Prediction Using Text Mining

Assessing the Alignment between the Information Needs of Developers and the Documentation of Programming Languages: A Case Study on Rust

VT-revolution: interactive programming tutorials made possible

Automatic, high accuracy prediction of reopened bugs

An Empirical Study of Bugs in Software Build Systems

Vulnerability Detection by Learning from Syntax-Based Execution Paths of Code

Wireframe-based UI Design Search through Image Autoencoder

What design topics do developers discuss?

Software Internationalization and Localization: An Industrial Experience

Revisiting the Identification of the Co-evolution of Production and Test Code

Automating aggregation for Software Quality modeling

Build Predictor: More Accurate Missed Dependency Prediction in Build Configuration Files

Inference of development activities from interaction with uninstrumented applications

Who Will Leave the Company?: A Large-Scale Industry Study of Developer Turnover by Mining Monthly Work Report

Bug Characteristics in Blockchain Systems: A Large-Scale Empirical Study

“Automated Debugging Considered Harmful” Considered Harmful: A User Study Revisiting the Usefulness of Spectra-Based Fault Localization Techniques with Professionals Using Real Bugs from Large Systems

1+1$>$2: Programming Know-What and Know-How Knowledge Fusion, Semantic Enrichment and Coherent Application

TopicAns: Topic-informed Architecture for Answer Recommendation on Technical Q&A Site

Characterizing and identifying reverted commits

An Empirical Study of Bugs in Software Build System

Context-Aware Neural Fault Localization

SATD detector

Evaluating defect prediction approaches using a massive set of metrics

Maintenance-related concerns for post-deployed Ethereum smart contract development: issues, techniques, and future challenges

An Empirical Study of Classifier Combination for Cross-Project Defect Prediction

ActivitySpace: A remembrance framework to support interapplication information needs

Correlating Automated and Human Evaluation of Code Documentation Generation Quality

It Takes Two to Tango: Deleted Stack Overflow Question Prediction with Text and Meta Features

Automated Bug Report Field Reassignment and Refinement Prediction

Neural Network-based Detection of Self-Admitted Technical Debt

Recommending frequently encountered bugs

Predictive Comment Updating With Heuristics and AST-Path-Based Neural Learning: A Two-Phase Approach

Detecting similar repositories on GitHub

Automatic Defect Categorization Based on Fault Triggering Conditions

Enhancing developer recommendation with supplementary information via mining historical commits

EFSPredictor: Predicting Configuration Bugs with Ensemble Feature Selection

Who should make decision on this pull request? Analyzing time-decaying relationships and file similarities for integrator prediction

An Empirical Study of Bug Fixing Rate

Practitioners' expectations on automated fault localization

TagCombine: Recommending Tags to Contents in Software Information Sites

Towards more accurate content categorization of API discussions

Experience report: An industrial experience report on test outsourcing practices

Chatbot4QR: Interactive Query Refinement for Technical Question Retrieval

Cross-Project Change-Proneness Prediction

The Impact of Mislabeled Changes by SZZ on Just-in-Time Defect Prediction

psc2code

Code Structure–Guided Transformer for Source Code Summarization

Smart Contract Security: A Practitioners' Perspective

Combining Word Embedding with Information Retrieval to Recommend Similar Bug Reports

Cross-language bug localization

Early prediction of merged code changes to prioritize reviewing tasks

ELBlocker: Predicting blocking bugs with ensemble imbalance learning

Tag recommendation in software information sites

Automatic, highly accurate app permission recommendation

Semantic-enriched Code Knowledge Graph to Reveal Unknowns in Smart Contract Code Reuse

Automated Android application permission recommendation

Characterising deprecated Android APIs

Characterizing malicious Android apps by mining topic-specific data flow signatures

Constructing a System Knowledge Graph of User Tasks and Failures from Bug Reports to Support Soap Opera Testing

What Security Questions Do Developers Ask? A Large-Scale Study of Stack Overflow Posts

Automated prediction of bug report priority using multi-factor analysis

High-Impact Bug Report Identification with Imbalanced Learning Strategies

Revisiting Supervised and Unsupervised Methods for Effort-Aware Cross-Project Defect Prediction

A Comparative Study of Supervised Learning Algorithms for Re-opened Bug Prediction

How Does Visualisation Help App Practitioners Analyse Android Apps?