ORCID Profile
0000-0002-6302-3256
Current Organisation
Huawei Technologies Co Ltd
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
In Research Link Australia (RLA), "Research Topics" refer to ANZSRC FOR and SEO codes. These topics are either sourced from ANZSRC FOR and SEO codes listed in researchers' related grants or generated by a large language model (LLM) based on their publications.
Software Engineering | Computer Software | Mobile Technologies
Expanding Knowledge in the Information and Computing Sciences | Application Software Packages (excl. Computer Games) | Expanding Knowledge in Technology | Application Tools and System Utilities |
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 09-2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 08-2019
Publisher: IEEE
Date: 11-2016
DOI: 10.1109/SATE.2016.13
Publisher: Springer Science and Business Media LLC
Date: 27-12-2017
Publisher: Springer Science and Business Media LLC
Date: 04-07-2019
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 12-2019
Publisher: IEEE
Date: 10-2017
Publisher: IEEE
Date: 07-2014
Publisher: Association for Computing Machinery (ACM)
Date: 04-04-2023
DOI: 10.1145/3546945
Abstract: Programming language documentation refers to the set of technical documents that provide application developers with a description of the high-level concepts of a language (e.g., manuals, tutorials, and API references). Such documentation is essential to support application developers in effectively using a programming language. One of the challenges faced by documenters (i.e., personnel that design and produce documentation for a programming language) is to ensure that documentation has relevant information that aligns with the concrete needs of developers, defined as the missing knowledge that developers acquire via voluntary search. In this article, we present an automated approach to support documenters in evaluating the differences and similarities between the concrete information need of developers and the current state of documentation (a problem that we refer to as the topical alignment of a programming language documentation). Our approach leverages semi-supervised topic modelling that uses domain knowledge to guide the derivation of topics. We initially train a baseline topic model from a set of Rust -related Q& A posts. We then use this baseline model to determine the distribution of topic probabilities of each document of the official Rust documentation. Afterwards, we assess the similarities and differences between the topics of the Q& A posts and the official documentation. Our results show a relatively high level of topical alignment in Rust documentation. Still, information about specific topics is scarce in both the Q& A websites and the documentation, particularly related topics with programming niches such as network, game, and database development. For other topics (e.g., related topics with language features such as structs, patterns and matchings, and foreign function interface), information is only available on Q& A websites while lacking in the official documentation. Finally, we discuss implications for programming language documenters, particularly how to leverage our approach to prioritize topics that should be added to the documentation.
Publisher: ACM
Date: 26-10-2018
Publisher: Springer Science and Business Media LLC
Date: 18-09-2014
Publisher: IEEE
Date: 07-2013
DOI: 10.1109/QSIC.2013.60
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2023
Publisher: Association for Computing Machinery (ACM)
Date: 16-06-2020
DOI: 10.1145/3391613
Abstract: UI design is an integral part of software development. For many developers who do not have much UI design experience, exposing them to a large database of real-application UI designs can help them quickly build up a realistic understanding of the design space for a software feature and get design inspirations from existing applications. However, existing keyword-based, image-similarity-based, and component-matching-based methods cannot reliably find relevant high-fidelity UI designs in a large database alike to the UI wireframe that the developers sketch, in face of the great variations in UI designs. In this article, we propose a deep-learning-based UI design search engine to fill in the gap. The key innovation of our search engine is to train a wireframe image autoencoder using a large database of real-application UI designs, without the need for labeling relevant UI designs. We implement our approach for Android UI design search, and conduct extensive experiments with artificially created relevant UI designs and human evaluation of UI design search results. Our experiments confirm the superior performance of our search engine over existing image-similarity or component-matching-based methods and demonstrate the usefulness of our search engine in real-world UI design tasks.
Publisher: ACM
Date: 28-05-2018
Publisher: IEEE
Date: 07-2013
Publisher: Association for Computing Machinery (ACM)
Date: 30-09-2023
DOI: 10.1145/3607183
Abstract: Many software processes advocate that the test code should co-evolve with the production code. Prior work usually studies such co-evolution based on production-test co-evolution s les mined from software repositories. A production-test co-evolution s le refers to a pair of a test code change and a production code change where the test code change triggers or is triggered by the production code change. The quality of the mined s les is critical to the reliability of research conclusions. Existing studies mined production-test co-evolution s les based on the following assumption: if a test class and its associated production class change together in one commit, or a test class changes immediately after the changes of associated production class within a short time interval, this change pair should be a production-test co-evolution s le . However, the validity of this assumption has never been investigated. To fill this gap, we present an empirical study, investigating the reasons for test code updates occurring after the associated production code changes, and revealing the pervasive existence of noise in the production-test co-evolution s les identified based on the aforementioned assumption by existing works. We define a taxonomy of such noise, including 6 categories (i.e., adaptive maintenance, perfective maintenance, corrective maintenance, indirectly related production code update, indirectly related test code update, and other reasons). Guided by the empirical findings, we propose CHOSEN (an identifi C ation met H od O f production-te S t co- E volutio N ) based on a two-stage strategy. CHOSEN takes a test code change and its associated production code change as input, aiming to determine whether the production-test change pair is a production-test co-evolution s le. Such identified s les are the basis of or are useful for various downstream tasks. We conduct a series of experiments to evaluate our method. Results show that: 1) CHOSEN achieves an AUC of 0.931 and an F1-score of 0.928, significantly outperforming existing identification methods. 2) CHOSEN can help researchers and practitioners draw more accurate conclusions on studies related to the co-evolution of production and test code. For the task of Just-In-Time (JIT) obsolete test code detection, which can help detect whether a piece of test code should be updated when developers modify the production code, the test set constructed by CHOSEN can help measure the detection method’s performance more accurately, only leading to 0.76% of average error compared with ground truth. In addition, the dataset constructed by CHOSEN can be used to train a better obsolete test code detection model, of which the average improvements on accuracy, precision, recall, and F1-score are 12.00%, 17.35%, 8.75%, and 13.50% respectively.
Publisher: IEEE
Date: 09-2017
Publisher: IEEE
Date: 07-2014
Publisher: Springer Science and Business Media LLC
Date: 16-09-2017
Publisher: IEEE
Date: 05-2017
DOI: 10.1109/MSR.2017.58
Publisher: IEEE
Date: 05-2017
DOI: 10.1109/MSR.2017.59
Publisher: IEEE
Date: 10-2016
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
Publisher: Association for Computing Machinery (ACM)
Date: 11-07-2024
DOI: 10.1145/3607189
Publisher: Springer Science and Business Media LLC
Date: 02-03-2019
Publisher: Institute of Electronics, Information and Communications Engineers (IEICE)
Date: 2014
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 07-2023
Publisher: ACM
Date: 13-04-2015
Publisher: Springer Science and Business Media LLC
Date: 25-08-2021
Publisher: IEEE
Date: 07-2015
Publisher: IEEE
Date: 11-2016
DOI: 10.1109/ASE.2015.90
Publisher: Association for Computing Machinery (ACM)
Date: 28-07-2022
DOI: 10.1145/3502853
Abstract: Automatic code documentation generation has been a crucial task in the field of software engineering. It not only relieves developers from writing code documentation but also helps them to understand programs better. Specifically, deep-learning-based techniques that leverage large-scale source code corpora have been widely used in code documentation generation. These works tend to use automatic metrics (such as BLEU, METEOR, ROUGE, CIDEr, and SPICE) to evaluate different models. These metrics compare generated documentation to reference texts by measuring the overlapping words. Unfortunately, there is no evidence demonstrating the correlation between these metrics and human judgment. We conduct experiments on two popular code documentation generation tasks, code comment generation and commit message generation, to investigate the presence or absence of correlations between these metrics and human judgments. For each task, we replicate three state-of-the-art approaches and the generated documentation is evaluated automatically in terms of BLEU, METEOR, ROUGE-L, CIDEr, and SPICE. We also ask 24 participants to rate the generated documentation considering three aspects (i.e., language, content, and effectiveness). Each participant is given Java methods or commit diffs along with the target documentation to be rated. The results show that the ranking of generated documentation from automatic metrics is different from that evaluated by human annotators. Thus, these automatic metrics are not reliable enough to replace human evaluation for code documentation generation tasks. In addition, METEOR shows the strongest correlation (with moderate Pearson correlation r about 0.7) to human evaluation metrics. However, it is still much lower than the correlation observed between different annotators (with a high Pearson correlation r about 0.8) and correlations that are reported in the literature for other tasks (e.g., Neural Machine Translation [ 39 ]). Our study points to the need to develop specialized automated evaluation metrics that can correlate more closely to human evaluation metrics for code generation tasks.
Publisher: IEEE
Date: 06-2016
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 09-2016
Publisher: Association for Computing Machinery (ACM)
Date: 29-07-2019
DOI: 10.1145/3324916
Abstract: Technical debt is a metaphor to reflect the tradeoff software engineers make between short-term benefits and long-term stability. Self-admitted technical debt (SATD), a variant of technical debt, has been proposed to identify debt that is intentionally introduced during software development, e.g., temporary fixes and workarounds. Previous studies have leveraged human-summarized patterns (which represent n-gram phrases that can be used to identify SATD) or text-mining techniques to detect SATD in source code comments. However, several characteristics of SATD features in code comments, such as vocabulary ersity, project uniqueness, length, and semantic variations, pose a big challenge to the accuracy of pattern or traditional text-mining-based SATD detection, especially for cross-project deployment. Furthermore, although traditional text-mining-based method outperforms pattern-based method in prediction accuracy, the text features it uses are less intuitive than human-summarized patterns, which makes the prediction results hard to explain. To improve the accuracy of SATD prediction, especially for cross-project prediction, we propose a Convolutional Neural Network-- (CNN) based approach for classifying code comments as SATD or non-SATD. To improve the explainability of our model’s prediction results, we exploit the computational structure of CNNs to identify key phrases and patterns in code comments that are most relevant to SATD. We have conducted an extensive set of experiments with 62,566 code comments from 10 open-source projects and a user study with 150 comments of another three projects. Our evaluation confirms the effectiveness of different aspects of our approach and its superior performance, generalizability, adaptability, and explainability over current state-of-the-art traditional text-mining-based methods for SATD classification.
Publisher: ACM
Date: 28-05-2018
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 04-2023
Publisher: IEEE
Date: 02-2017
Publisher: IEEE
Date: 08-2014
Publisher: Elsevier BV
Date: 12-2017
Publisher: IEEE
Date: 12-2015
Publisher: Elsevier BV
Date: 08-2019
Publisher: IEEE
Date: 07-2015
Publisher: ACM
Date: 18-07-2016
Publisher: Springer Science and Business Media LLC
Date: 09-2015
Publisher: ACM
Date: 02-06-2014
Publisher: IEEE
Date: 11-2015
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 04-2022
Publisher: IEEE
Date: 07-2018
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 08-2021
Publisher: Association for Computing Machinery (ACM)
Date: 06-2020
DOI: 10.1145/3392093
Abstract: Programming screencasts have become a pervasive resource on the Internet, which help developers learn new programming technologies or skills. The source code in programming screencasts is an important and valuable information for developers. But the streaming nature of programming screencasts (i.e., a sequence of screen-captured images) limits the ways that developers can interact with the source code in the screencasts. Many studies use the Optical Character Recognition (OCR) technique to convert screen images (also referred to as video frames) into textual content, which can then be indexed and searched easily. However, noisy screen images significantly affect the quality of source code extracted by OCR, for ex le, no-code frames (e.g., PowerPoint slides, web pages of API specification), non-code regions (e.g., Package Explorer view, Console view), and noisy code regions with code in completion suggestion popups. Furthermore, due to the code characteristics (e.g., long compound identifiers like ItemListener), even professional OCR tools cannot extract source code without errors from screen images. The noisy OCRed source code will negatively affect the downstream applications, such as the effective search and navigation of the source code content in programming screencasts. In this article, we propose an approach named psc2code to denoise the process of extracting source code from programming screencasts. First, psc2code leverages the Convolutional Neural Network (CNN) based image classification to remove non-code and noisy-code frames. Then, psc2code performs edge detection and clustering-based image segmentation to detect sub-windows in a code frame, and based on the detected sub-windows, it identifies and crops the screen region that is most likely to be a code editor. Finally, psc2code calls the API of a professional OCR tool to extract source code from the cropped code regions and leverages the OCRed cross-frame information in the programming screencast and the statistical language model of a large corpus of source code to correct errors in the OCRed source code. We conduct an experiment on 1,142 programming screencasts from YouTube. We find that our CNN-based image classification technique can effectively remove the non-code and noisy-code frames, which achieves an F1-score of 0.95 on the valid code frames. We also find that psc2code can significantly improve the quality of the OCRed source code by truly correcting about half of incorrectly OCRed words. Based on the source code denoised by psc2code , we implement two applications: (1) a programming screencast search engine (2) an interaction-enhanced programming screencast watching tool. Based on the source code extracted from the 1,142 collected programming screencasts, our experiments show that our programming screencast search engine achieves the precision@5, 10, and 20 of 0.93, 0.81, and 0.63, respectively. We also conduct a user study of our interaction-enhanced programming screencast watching tool with 10 participants. This user study shows that our interaction-enhanced watching tool can help participants learn the knowledge in the programming video more efficiently and effectively.
Publisher: Association for Computing Machinery (ACM)
Date: 31-01-2023
DOI: 10.1145/3522674
Abstract: Code summaries help developers comprehend programs and reduce their time to infer the program functionalities during software maintenance. Recent efforts resort to deep learning techniques such as sequence-to-sequence models for generating accurate code summaries, among which Transformer-based approaches have achieved promising performance. However, effectively integrating the code structure information into the Transformer is under-explored in this task domain. In this article, we propose a novel approach named SG-Trans to incorporate code structural properties into Transformer. Specifically, we inject the local symbolic information (e.g., code tokens and statements) and global syntactic structure (e.g., dataflow graph) into the self-attention module of Transformer as inductive bias. To further capture the hierarchical characteristics of code, the local information and global structure are designed to distribute in the attention heads of lower layers and high layers of Transformer. Extensive evaluation shows the superior performance of SG-Trans over the state-of-the-art approaches. Compared with the best-performing baseline, SG-Trans still improves 1.4% and 2.0% on two benchmark datasets, respectively, in terms of METEOR score, a metric widely used for measuring generation quality.
Publisher: IEEE
Date: 05-2021
Publisher: IEEE
Date: 10-2016
Publisher: ACM
Date: 02-06-2014
Publisher: Springer Science and Business Media LLC
Date: 19-03-2018
Publisher: Elsevier BV
Date: 05-2015
Publisher: IEEE
Date: 05-2013
Publisher: Springer Science and Business Media LLC
Date: 19-03-2019
Publisher: Association for Computing Machinery (ACM)
Date: 22-05-2023
DOI: 10.1145/3597206
Abstract: Programmers who work with smart contract development often encounter challenges in reusing code from repositories. This is due to the presence of two unknowns that can lead to non-functional and functional failures. These unknowns are implicit collaborations between functions and subtle differences among similar functions. Current code mining methods can extract syntax and semantic knowledge (known knowledge), but they cannot uncover these unknowns due to a significant gap between the known and the unknown. To address this issue, we formulate knowledge acquisition as a knowledge deduction task and propose an analytic flow that uses function-clone as a bridge to gradually deduce the known knowledge into the problem-solving knowledge that can reveal the unknowns. This flow comprises five methods: clone detection, co-occurrence probability calculation, function usage frequency accumulation, description propagation, and CFG annotation. This provides a systematic and coherent approach to knowledge deduction. We then structure all the knowledge into a semantic-enriched code Knowledge Graph (KG) and integrate this KG into two software engineering tasks: code recommendation and crowd-scaled coding practice checking. As a proof of concept, we apply our approach to 5,140 smart contract files available on Etherscan.io, and confirm high accuracy of our KG construction steps. In our experiments, our code KG effectively improved code recommendation accuracy by 6% to 45%, increased ersity by 61% to 102%, and enhanced NDCG by 1% to 21%. Furthermore, compared to traditional analysis tools and the debugging-with-the-crowd method, our KG improved time efficiency by 30-380 seconds, vulnerability determination accuracy by 20%-33%, and vulnerability fixing accuracy by 24%-40% for novice developers who identified and fixed vulnerable smart contract functions.
Publisher: Springer Science and Business Media LLC
Date: 28-07-2017
Publisher: ACM
Date: 28-05-2018
Publisher: Elsevier BV
Date: 10-2017
Publisher: ACM
Date: 10-10-2022
Publisher: Springer Science and Business Media LLC
Date: 09-2016
Publisher: Springer Science and Business Media LLC
Date: 03-08-2014
Publisher: Springer Science and Business Media LLC
Date: 2017
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 03-2022
Publisher: IEEE
Date: 03-2013
DOI: 10.1109/CSMR.2013.43
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 05-2023
Publisher: IEEE
Date: 10-2017
Publisher: Springer Berlin Heidelberg
Date: 2013
Publisher: ACM
Date: 21-08-2017
Publisher: Association for Computing Machinery (ACM)
Date: 24-12-2021
DOI: 10.1145/3488245
Abstract: The selfdestruct function is provided by Ethereum smart contracts to destroy a contract on the blockchain system. However, it is a double-edged sword for developers. On the one hand, using the selfdestruct function enables developers to remove smart contracts ( SCs ) from Ethereum and transfers Ethers when emergency situations happen, e.g., being attacked. On the other hand, this function can increase the complexity for the development and open an attack vector for attackers. To better understand the reasons why SC developers include or exclude the selfdestruct function in their contracts, we conducted an online survey to collect feedback from them and summarize the key reasons. Their feedback shows that 66.67% of the developers will deploy an updated contract to the Ethereum after destructing the old contract. According to this information, we propose a method to find the self-destructed contracts (also called predecessor contracts) and their updated version (successor contracts) by computing the code similarity. By analyzing the difference between the predecessor contracts and their successor contracts, we found five reasons that led to the death of the contracts two of them (i.e., Unmatched ERC20 Token and Limits of Permission ) might affect the life span of contracts. We developed a tool named LifeScope to detect these problems. LifeScope reports 0 false positives or negatives in detecting Unmatched ERC20 Token . In terms of Limits of Permission , LifeScope achieves 77.89% of F-measure and 0.8673 of AUC in average. According to the feedback of developers who exclude selfdestruct functions, we propose suggestions to help developers use selfdestruct functions in Ethereum smart contracts better.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 10-2021
Publisher: Springer Science and Business Media LLC
Date: 20-04-2018
Publisher: IEEE
Date: 10-2016
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 12-2021
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 07-2021
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2023
Publisher: IEEE
Date: 11-2017
DOI: 10.1109/ESEM.2017.48
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
Publisher: Association for Computing Machinery (ACM)
Date: 31-12-2020
DOI: 10.1145/3412845
Abstract: Software developers have heavily used online question-and-answer platforms to seek help to solve their technical problems. However, a major problem with these technical Q8A sites is “answer hungriness,” i.e., a large number of questions remain unanswered or unresolved, and users have to wait for a long time or painstakingly go through the provided answers with various levels of quality. To alleviate this time-consuming problem, we propose a novel D EEP A NS neural network–based approach to identify the most relevant answer among a set of answer candidates. Our approach follows a three-stage process: question boosting, label establishment, and answer recommendation. Given a post, we first generate a clarifying question as a way of question boosting. We automatically establish the positive , neutral + , neutral - , and negative training s les via label establishment. When it comes to answer recommendation, we sort answer candidates by the matching scores calculated by our neural network–based model. To evaluate the performance of our proposed model, we conducted a large-scale evaluation on four datasets, collected from the real-world technical Q8A sites (i.e., Ask Ubuntu, Super User, Stack Overflow Python, and Stack Overflow Java). Our experimental results show that our approach significantly outperforms several state-of-the-art baselines in automatic evaluation. We also conducted a user study with 50 solved/unanswered/unresolved questions. The user-study results demonstrate that our approach is effective in solving the answer-hungry problem by recommending the most relevant answers from historical archives.
Publisher: Springer Science and Business Media LLC
Date: 04-11-2017
Publisher: Springer Science and Business Media LLC
Date: 09-05-2017
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 05-2023
Publisher: IEEE
Date: 05-2017
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 10-2020
Publisher: Association for Computing Machinery (ACM)
Date: 31-01-2022
DOI: 10.1145/3505243
Abstract: In 2006, Geoffrey Hinton proposed the concept of training “Deep Neural Networks (DNNs)” and an improved model training method to break the bottleneck of neural network development. More recently, the introduction of AlphaGo in 2016 demonstrated the powerful learning ability of deep learning and its enormous potential. Deep learning has been increasingly used to develop state-of-the-art software engineering (SE) research tools due to its ability to boost performance for various SE tasks. There are many factors, e.g., deep learning model selection, internal structure differences, and model optimization techniques, that may have an impact on the performance of DNNs applied in SE. Few works to date focus on summarizing, classifying, and analyzing the application of deep learning techniques in SE. To fill this gap, we performed a survey to analyze the relevant studies published since 2006. We first provide an ex le to illustrate how deep learning techniques are used in SE. We then conduct a background analysis (BA) of primary studies and present four research questions to describe the trend of DNNs used in SE (BA), summarize and classify different deep learning techniques (RQ1), and analyze the data processing including data collection, data classification, data pre-processing, and data representation (RQ2). In RQ3, we depicted a range of key research topics using DNNs and investigated the relationships between DL-based model adoption and multiple factors (i.e., DL architectures, task types, problem types, and data types). We also summarized commonly used datasets for different SE tasks. In RQ4, we summarized the widely used optimization algorithms and provided important evaluation metrics for different problem types, including regression, classification, recommendation, and generation. Based on our findings, we present a set of current challenges remaining to be investigated and outline a proposed research road map highlighting key opportunities for future work.
Publisher: Association for Computing Machinery (ACM)
Date: 22-08-2022
DOI: 10.1145/3508479
Abstract: Change-level defect prediction is widely referred to as just-in-time (JIT) defect prediction since it identifies a defect-inducing change at the check-in time, and researchers have proposed many approaches based on the language-independent change-level features. These approaches can be ided into two types: supervised approaches and unsupervised approaches, and their effectiveness has been verified on Java or C++ projects. However, whether the language-independent change-level features can effectively identify the defects of JavaScript projects is still unknown. Additionally, many researches have confirmed that supervised approaches outperform unsupervised approaches on Java or C++ projects when considering inspection effort. However, whether supervised JIT defect prediction approaches can still perform best on JavaScript projects is still unknown. Lastly, prior proposed change-level features are programming language–independent, whether programming language–specific change-level features can further improve the performance of JIT approaches on identifying defect-prone changes is also unknown. To address the aforementioned gap in knowledge, in this article, we collect and label the top-20 most starred JavaScript projects on GitHub. JavaScript is an extremely popular and widely used programming language in the industry. We propose five JavaScript-specific change-level features and conduct a large-scale empirical study (i.e., involving a total of 176,902 changes) and find that (1) supervised JIT defect prediction approaches (i.e., CBS+) still statistically significantly outperform unsupervised approaches on JavaScript projects when considering inspection effort (2) JavaScript-specific change-level features can further improve the performance of approach built with language-independent features on identifying defect-prone changes (3) the change-level features in the dimension of size (i.e., LT), diffusion (i.e., NF), and JavaScript-specific (i.e., SO and TC) are the most important features for indicating the defect-proneness of a change on JavaScript projects and (4) project-related features (i.e., Stars, Branches, Def Ratio, Changes, Files, Defective, and Forks) have a high association with the probability of a change to be a defect-prone one on JavaScript projects.
Publisher: Springer Science and Business Media LLC
Date: 31-05-2016
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 11-2020
Publisher: IEEE
Date: 12-2015
Publisher: Association for Computing Machinery (ACM)
Date: 30-03-2023
DOI: 10.1145/3546066
Abstract: With the rapid increase of public code repositories, developers maintain a great desire to retrieve precise code snippets by using natural language. Despite existing deep learning-based approaches that provide end-to-end solutions (i.e., accept natural language as queries and show related code fragments), the performance of code search in the large-scale repositories is still low in accuracy because of the code representation (e.g., AST) and modeling (e.g., directly fusing features in the attention stage). In this paper, we propose a novel learnable de ep G raph for C ode S earch (called deGraphCS ) to transfer source code into variable-based flow graphs based on an intermediate representation technique, which can model code semantics more precisely than directly processing the code as text or using the syntax tree representation. Furthermore, we propose a graph optimization mechanism to refine the code representation and apply an improved gated graph neural network to model variable-based flow graphs. To evaluate the effectiveness of deGraphCS , we collect a large-scale dataset from GitHub containing 41,152 code snippets written in the C language and reproduce several typical deep code search methods for comparison. The experimental results show that deGraphCS can achieve state-of-the-art performance and accurately retrieve code snippets satisfying the needs of the users.
Publisher: IEEE
Date: 07-2018
Publisher: Springer Science and Business Media LLC
Date: 19-01-2017
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 02-2023
Publisher: IEEE
Date: 03-2017
DOI: 10.1109/ICST.2017.16
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 07-2021
Publisher: AICIT
Date: 31-05-2012
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 10-2018
Publisher: ACM
Date: 03-09-2018
Publisher: Springer Science and Business Media LLC
Date: 09-04-2017
Publisher: Springer Berlin Heidelberg
Date: 2012
Publisher: Association for Computing Machinery (ACM)
Date: 09-04-2022
DOI: 10.1145/3503509
Abstract: Predictive models are one of the most important techniques that are widely applied in many areas of software engineering. There have been a large number of primary studies that apply predictive models and that present well-performed studies in various research domains, including software requirements, software design and development, testing and debugging, and software maintenance. This article is a first attempt to systematically organize knowledge in this area by surveying a body of 421 papers on predictive models published between 2009 and 2020. We describe the key models and approaches used, classify the different models, summarize the range of key application areas, and analyze research results. Based on our findings, we also propose a set of current challenges that still need to be addressed in future work and provide a proposed research road map for these opportunities.
Publisher: Association for Computing Machinery (ACM)
Date: 28-07-2022
DOI: 10.1145/3503508
Abstract: Being able to access software in daily life is vital for everyone, and thus accessibility is a fundamental challenge for software development. However, given the number of accessibility issues reported by many users, e.g., in app reviews, it is not clear if accessibility is widely integrated into current software projects and how software projects address accessibility issues. In this article, we report a study of the critical challenges and benefits of incorporating accessibility into software development and design. We applied a mixed qualitative and quantitative approach for gathering data from 15 interviews and 365 survey respondents from 26 countries across five continents to understand how practitioners perceive accessibility development and design in practice. We got 44 statements grouped into eight topics on accessibility from practitioners’ viewpoints and different software development stages. Our statistical analysis reveals substantial gaps between groups, e.g., practitioners have Direct vs. Indirect accessibility relevant work experience when they reviewed the summarized statements. These gaps might hinder the quality of accessibility development and design, and we use our findings to establish a set of guidelines to help practitioners be aware of accessibility challenges and benefit factors. We suggest development teams put accessibility as a first-class consideration throughout the software development process, and we also propose some remedies to resolve the gaps between groups and to highlight key future research directions to incorporate accessibility into software design and development.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 08-2020
Publisher: IEEE
Date: 11-2017
DOI: 10.1109/SATE.2017.16
Publisher: ACM
Date: 14-05-2016
Publisher: ACM
Date: 03-09-2018
Publisher: ACM
Date: 14-05-2016
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 12-2016
Publisher: ACM
Date: 23-09-2017
Publisher: Elsevier BV
Date: 10-2019
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 07-2022
Publisher: IEEE
Date: 06-2016
Publisher: Springer Science and Business Media LLC
Date: 09-2015
Publisher: ACM
Date: 28-10-2011
Publisher: Association for Computing Machinery (ACM)
Date: 26-09-2020
DOI: 10.1145/3401026
Abstract: Stack Overflow has been heavily used by software developers as a popular way to seek programming-related information from peers via the internet. The Stack Overflow community recommends users to provide the related code snippet when they are creating a question to help others better understand it and offer their help. Previous studies have shown that a significant number of these questions are of low-quality and not attractive to other potential experts in Stack Overflow. These poorly asked questions are less likely to receive useful answers and hinder the overall knowledge generation and sharing process. Considering one of the reasons for introducing low-quality questions in SO is that many developers may not be able to clarify and summarize the key problems behind their presented code snippets due to their lack of knowledge and terminology related to the problem, and/or their poor writing skills, in this study we propose an approach to assist developers in writing high-quality questions by automatically generating question titles for a code snippet using a deep sequence-to-sequence learning approach. Our approach is fully data-driven and uses an attention mechanism to perform better content selection, a copy mechanism to handle the rare-words problem and a coverage mechanism to eliminate word repetition problem. We evaluate our approach on Stack Overflow datasets over a variety of programming languages (e.g., Python, Java, Javascript, C# and SQL) and our experimental results show that our approach significantly outperforms several state-of-the-art baselines in both automatic and human evaluation. We have released our code and datasets to facilitate other researchers to verify their ideas and inspire the follow up work.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 10-2021
Publisher: Springer Science and Business Media LLC
Date: 03-11-2021
Publisher: Springer Science and Business Media LLC
Date: 15-02-2018
Publisher: Association for Computing Machinery (ACM)
Date: 26-09-2020
DOI: 10.1145/3409331
Abstract: Learning representation for source code is a foundation of many program analysis tasks. In recent years, neural networks have already shown success in this area, but most existing models did not make full use of the unique structural information of programs. Although abstract syntax tree (AST)-based neural models can handle the tree structure in the source code, they cannot capture the richness of different types of substructure in programs. In this article, we propose a modular tree network that dynamically composes different neural network units into tree structures based on the input AST. Different from previous tree-structural neural network models, a modular tree network can capture the semantic differences between types of AST substructures. We evaluate our model on two tasks: program classification and code clone detection. Our model achieves the best performance compared with state-of-the-art approaches in both tasks, showing the advantage of leveraging more elaborate structure information of the source code.
Publisher: IEEE
Date: 11-2011
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 05-2020
Publisher: Springer Science and Business Media LLC
Date: 26-08-2016
Publisher: Elsevier BV
Date: 07-2017
Publisher: IEEE
Date: 09-2017
Publisher: ACM
Date: 28-05-2018
Publisher: ACM
Date: 24-03-2014
Publisher: IEEE
Date: 02-2014
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2202
Publisher: ACM
Date: 27-04-2015
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 10-2016
Publisher: IEEE
Date: 02-2014
Publisher: Springer Science and Business Media LLC
Date: 27-10-2018
Publisher: IEEE
Date: 05-2017
DOI: 10.1109/ICPC.2017.28
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2022
Publisher: IEEE
Date: 03-2015
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 06-2022
Publisher: Wiley
Date: 03-2015
DOI: 10.1002/SMR.1706
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 07-2022
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 04-2094
Publisher: IEEE
Date: 09-2015
Publisher: IEEE
Date: 08-2015
DOI: 10.1109/QRS.2015.14
Publisher: IEEE
Date: 10-2013
Publisher: Springer Science and Business Media LLC
Date: 06-09-2014
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 06-2021
Publisher: ACM
Date: 23-09-2017
Publisher: IEEE
Date: 11-2016
DOI: 10.1109/ASEW.2015.35
Publisher: Elsevier BV
Date: 03-2019
Publisher: IEEE
Date: 07-2017
Publisher: ACM
Date: 08-09-2016
Publisher: ACM
Date: 24-03-2014
Publisher: Elsevier BV
Date: 06-2019
Publisher: ACM
Date: 31-05-2014
Publisher: IEEE
Date: 06-2016
Start Date: 2020
End Date: 05-2021
Amount: $413,665.00
Funder: Australian Research Council
View Funded ActivityStart Date: 05-2020
End Date: 12-2023
Amount: $390,000.00
Funder: Australian Research Council
View Funded Activity