ORCID Profile
0000-0001-7648-285X
Current Organisation
Bond University
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
Publisher: Springer Netherlands
Date: 2014
Publisher: Springer Berlin Heidelberg
Date: 2009
Publisher: Springer Science and Business Media LLC
Date: 14-08-2021
DOI: 10.1186/S40537-021-00499-7
Abstract: This article proposes a new parallel performance model for different workloads of Spark Big Data applications running on Hadoop clusters. The proposed model can predict the runtime for generic workloads as a function of the number of executors, without necessarily knowing how the algorithms were implemented. For a certain problem size, it is shown that a model based on serial boundaries for a 2D arrangement of executors can fit the empirical data for various workloads. The empirical data was obtained from a real Hadoop cluster, using Spark and HiBench. The workloads used in this work were included WordCount, SVM, Kmeans, PageRank and Graph (Nweight). A particular runtime pattern emerged when adding more executors to run a job. For some workloads, the runtime was longer with more executors added. This phenomenon is predicted with the new model of parallelisation. The resulting equation from the model explains certain performance patterns that do not fit Amdahl’s law predictions, nor Gustafson’s equation. The results show that the proposed model achieved the best fit with all workloads and most of the data sizes, using the R-squared metric for the accuracy of the fitting of empirical data. The proposed model has advantages over machine learning models due to its simplicity, requiring a smaller number of experiments to fit the data. This is very useful to practitioners in the area of Big Data because they can predict runtime of specific applications by analysing the logs. In this work, the model is limited to changes in the number of executors for a fixed problem size.
Publisher: Frontiers Media SA
Date: 21-05-2019
Publisher: Elsevier BV
Date: 12-2021
Publisher: IEEE
Date: 2008
Publisher: Springer Berlin Heidelberg
Date: 2010
Publisher: MDPI AG
Date: 05-11-2021
DOI: 10.3390/BDCC5040065
Abstract: Big data frameworks play a vital role in storing, processing, and analysing large datasets. Apache Spark has been established as one of the most popular big data engines for its efficiency and reliability. However, one of the significant problems of the Spark system is performance prediction. Spark has more than 150 configurable parameters, and configuration of so many parameters is challenging task when determining the suitable parameters for the system. In this paper, we proposed two distinct parallelisation models for performance prediction. Our insight is that each node in a Hadoop cluster can communicate with identical nodes, and a certain function of the non-parallelisable runtime can be estimated accordingly. Both models use simple equations that allows us to predict the runtime when the size of the job and the number of executables are known. The proposed models were evaluated based on five HiBench workloads, Kmeans, PageRank, Graph (NWeight), SVM, and WordCount. The workload’s empirical data were fitted with one of the two models meeting the accuracy requirements. Finally, the experimental findings show that the model can be a handy and helpful tool for scheduling and planning system deployment.
Publisher: Springer Berlin Heidelberg
Date: 2010
Publisher: Springer Berlin Heidelberg
Date: 2009
Publisher: Springer Berlin Heidelberg
Date: 2008
Publisher: Inderscience Publishers
Date: 2009
Publisher: IEEE
Date: 11-2015
Publisher: Springer International Publishing
Date: 2015
Publisher: IEEE
Date: 07-2018
Publisher: Springer Berlin Heidelberg
Date: 2013
Publisher: Springer International Publishing
Date: 09-07-2201
Publisher: IEEE
Date: 16-12-2020
Publisher: SPIE-Intl Soc Optical Eng
Date: 07-08-2018
Publisher: Elsevier BV
Date: 05-1997
Publisher: Elsevier BV
Date: 06-2013
Publisher: Springer Science and Business Media LLC
Date: 15-08-2023
DOI: 10.1007/S11423-023-10277-2
Abstract: An important course in the computer science discipline is ‘ Data Structures and Algorithms’ (DSA). The coursework lays emphasis on experiential learning for building students’ programming and algorithmic reasoning abilities. Teachers set up a repertoire of formative programming exercises to engage students with different programmatic scenarios to build their know-what, know-how and know-why competencies. Automated assessment tools can assist teachers in inspecting, marking, and grading of programming exercises and also support them in providing students with formative feedback in real-time. This article describes the design of a bespoke automarker that was integrated into the DSA coursework and therefore served as an instructional tool. Activity theory has provided the pedagogical lens to examine how the automarker-mediated instructional strategy enabled self-reflection and assisted students in their formative learning journey. Learner experiences gathered from 39 students enrolled in DSA course shows that the automarker facilitated practice-based learning to advance students know-what, know-why and know-how skills. This study contributes to both curricula and pedagogic practice by showcasing the integration of an automated assessment strategy with programming-related coursework to inform future teaching and assessment practice.
Publisher: Springer Science and Business Media LLC
Date: 17-06-2011
Publisher: IEEE
Date: 04-2005
Publisher: IEEE
Date: 11-2013
Publisher: IEEE
Date: 06-2018
Publisher: IGI Global
Date: 2013
DOI: 10.4018/978-1-4666-3942-3.CH006
Abstract: This chapter sets out to explore the intricacies behind developing a hybrid system for real-time autonomous robot navigation, with target pursuit and obstacle avoidance behaviour, in a dynamic environment. Three complete systems are described, namely, a cascade of four fuzzy systems, a hybrid fuzzy A* system, and a hybrid fuzzy A* with a Voronoi diagram. A highly reconfigurable integration architecture is presented, allowing for the harmonious interplay between the different component algorithms, with the option of engaging or disengaging from the system. The utilization of both global and local information about the environment is examined, as well as an additional optimal global path-planning layer. Moreover, how a fuzzy system design approach could take advantage of the presence of symmetry in the input space, cutting down the number of rules and membership functions, without sacrificing control precision is illustrated. The efficiency of all the algorithms is demonstrated by employing them in a simulation of a real-world system: the robot soccer game. Results indicate that the hybrid system can generate smooth, near-shortest paths, as well as near-shortest-safest paths, when all component algorithms are activated. A systematic approach to calibrating the system is also provided.
Publisher: IEEE
Date: 16-12-2020
Publisher: IEEE
Date: 10-2007
Publisher: Springer International Publishing
Date: 2015
Publisher: ACM
Date: 19-11-2014
Publisher: Springer Berlin Heidelberg
Date: 2010
Publisher: ACM
Date: 16-03-2008
Publisher: Springer Science and Business Media LLC
Date: 12-2020
DOI: 10.1186/S40537-020-00388-5
Abstract: Big Data analytics for storing, processing, and analyzing large-scale datasets has become an essential tool for the industry. The advent of distributed computing frameworks such as Hadoop and Spark offers efficient solutions to analyze vast amounts of data. Due to the application programming interface (API) availability and its performance, Spark becomes very popular, even more popular than the MapReduce framework. Both these frameworks have more than 150 parameters, and the combination of these parameters has a massive impact on cluster performance. The default system parameters help the system administrator deploy their system applications without much effort, and they can measure their specific cluster performance with factory-set parameters. However, an open question remains: can new parameter selection improve cluster performance for large datasets? In this regard, this study investigates the most impacting parameters, under resource utilization, input splits, and shuffle, to compare the performance between Hadoop and Spark, using an implemented cluster in our laboratory. We used a trial-and-error approach for tuning these parameters based on a large number of experiments. In order to evaluate the frameworks of comparative analysis, we select two workloads: WordCount and TeraSort. The performance metrics are carried out based on three criteria: execution time, throughput, and speedup. Our experimental results revealed that both system performances heavily depends on input data size and correct parameter selection. The analysis of the results shows that Spark has better performance as compared to Hadoop when data sets are small, achieving up to two times speedup in WordCount workloads and up to 14 times in TeraSort workloads when default parameter values are reconfigured.
Publisher: Springer Berlin Heidelberg
Date: 2010
Publisher: IGI Global
Date: 2013
DOI: 10.4018/978-1-4666-4607-0.CH076
Abstract: This chapter sets out to explore the intricacies behind developing a hybrid system for real-time autonomous robot navigation, with target pursuit and obstacle avoidance behaviour, in a dynamic environment. Three complete systems are described, namely, a cascade of four fuzzy systems, a hybrid fuzzy A* system, and a hybrid fuzzy A* with a Voronoi diagram. A highly reconfigurable integration architecture is presented, allowing for the harmonious interplay between the different component algorithms, with the option of engaging or disengaging from the system. The utilization of both global and local information about the environment is examined, as well as an additional optimal global path-planning layer. Moreover, how a fuzzy system design approach could take advantage of the presence of symmetry in the input space, cutting down the number of rules and membership functions, without sacrificing control precision is illustrated. The efficiency of all the algorithms is demonstrated by employing them in a simulation of a real-world system: the robot soccer game. Results indicate that the hybrid system can generate smooth, near-shortest paths, as well as near-shortest-safest paths, when all component algorithms are activated. A systematic approach to calibrating the system is also provided.
Publisher: IEEE
Date: 05-2009
Publisher: Springer Berlin Heidelberg
Date: 2011
Publisher: IEEE
Date: 05-2008
Publisher: IEEE
Date: 07-2011
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 2023
Publisher: IEEE
Date: 04-2014
Publisher: Springer Berlin Heidelberg
Date: 2009
Publisher: Springer Berlin Heidelberg
Date: 2011
No related grants have been discovered for Andre Barczak.