Empirical Software Engineering and Data science:

Old Wine in a New Bottle

15th International Advanced School on Empirical Software Engineering (IASESE 2018), co-located with ESEIW 2018

 

The hype around data science has increased significantly in past years. Gartner introduced data science as the domain that is generating excitement and inflated expectations. In 2017, it has been predicted that data science would create significant movements and widely be adopted by academia and industry. In particular, data science in software engineering is ranging from applying statistical tests to heavy machine learning techniques on a set of data mined from a software-related repository.

Data science for software engineering research is still in its early phase. Empirical software engineering has been focused on unbiased gathering and analysis of software engineering data. The interplay between these two fields, and between the activities they advocate to be undertaken in software engineering, have been limited. There is no established guideline or insight about the role, commonality and difference of empirical software engineering with data science and further with mining software repositories. The communities do not have considerable overlap between participants while the contributions are almost similar in subject. It is perceived that empirical software engineering community is restricted by some outdated expectations while vice versa the data science in software engineering often have empirical design and reasoning flaws.

The goal of this school is to elaborate on the synergies between data science, mining software repositories, and empirical software engineering with the intention of providing the participants a broader view when dealing with data collection, data visualization, data mining, data sharing, and how to utilize these techniques in successful software projects.

 

Objectives

 

In this edition of the International Advanced School of Empirical Software Engineering, we will show how empirical software engineering is (should be) involved in each data science activities. While we welcome the initiatives of our invited speakers, the school will cover the following aspects:

  • Introduction to data science in software engineering.

  • Introduction to relevant empirical methods and instruments and their use to support data science in software engineering.

  • Empirical software engineering process in data science activities as retrieving and preparing data, as well as data analysis, experimentation, validation, and insight dissemination.

  • Hands on data session: introducing a data set and specific empirical methods and practically going through the steps that a data scientist take, elaborating what differentiates between mining software repositories and running a fully-fledged empirical study.

  • Industry-academia collaboration: what can software engineering offer to industry for the development of a data engineering platform, educating data scientists and next generation of software engineering research.

The presentations will mix theoretical background, examples of successful projects, and discussions on case studies that participants are encouraged to bring to the general discussion. We plan to initiate group work to design and discuss practical use cases for combining empirical software engineering and data science activities in software development. Finally, we plan to organize a panel with experts in the field to discuss new trends on the interplay between empirical software engineering and data science.

As the result of this one-day advanced school, we plan to submit a round table submission with our speakers to the IEEE software journal.

Who should attend?

Anyone with a basic knowledge of software engineering, who is interested in increasing his/her repertoire of methods for designing studies, collecting data, and analyzing data. Basic concepts of experiment design and software engineering measurement will not be covered.

 

What will participants take away?

Participants will gain a basic knowledge of new techniques that they may not have been aware of or applied previously, a general idea of when they would be appropriate (and when not), and pointers to resources to be used and how to get started if they wish to apply the techniques on their own.

Program

07:30-08:00     Registration

08:00-08:15 Welcome, Where we will present the objectives of the school, provide a working definition for data science, etc. by Maleknaz Nayebi  & Silvia Abrahão

Session I: Foundations of Data Science and Empirical Software Engineering

Where we will provide the basic theoretical concepts of data science in software engineering and relevant empirical methods and instruments to support data science activities.

08:15-09:15     Empirical SE == MSR? by Bram Adams, École Polytechnique de Montréal, Canada

09:15-10:00     The Art and Science of Analysing Software Data by Leandro Minku, University of Birmingham, UK

10:00-10:30     Coffee Break

 

Session II: Applications of Data Science and Empirical Software Engineering

Where we will provide examples and evidence of successful projects applying data science and empirical methods.

10:30-11:15    Building evidence-based guidelines: the role of emotions in Stack Overflow by Filippo Lanubile, University of Bari, Italy

11:15-12:00   The next generation of Applied Empirical Software Engineering Research by Tony Gorschek,  Blekinge Institute of Technology, Sweden

12:00-13:30    Lunch Break

Session III: Hands on data & Decision making

Where we will introduce a data set and specific empirical methods and practically going through the steps that a data scientist take, elaborating what differentiates between mining software repositories and running a fully-fledged empirical study. Bring Your Own Laptop!

(CANCELLED) 13:30-14:15   From mining to planning for mobile apps - let’s mine together! by Maleknaz Nayebi, University of Toronto

14:15 – 15:00   What counts is decisions not numbers! (To be Confirmed) by Guenther Ruhe, University of Calgary, Canada

15:00 – 15:30  Coffee Break

 

Session IV: Panel

Where we will elaborate on the synergies between data science, mining software repositories, and empirical software engineering.

15:30 – 16:30 Panel: Empirical Software Engineering and Data science: Old Wine in a New Bottle?

Panelists: Bram Adams, Tony Gorschek, Filippo Lanubile, Leandro Minku, Guenther Ruhe

Moderators: Maleknaz Nayebi  & Silvia Abrahão

16:30 – 16:50    Wrap-up & future actions

 

Data science for software engineering research is still in its early phase. Empirical software engineering has been focused on unbiased gathering and analysis of software engineering data. The interplay between these two fields, and between the activities they advocate to be undertaken in software engineering, have been limited. There is no established guideline or insight about the role, commonality and difference of empirical software engineering with data science and further with mining software repositories. It is now time to think about the lessons we have learned during these years, to revisit the role of ESE/data science in software engineering, and to discuss future directions and opportunities.  The panelists will be asked to reflect on four questions:

 

a) Empirical Software Engineering and Data Science: What is the similarity?  What is the difference?  how do you differ in between these two in terms of data, model, and tools?

 b) What does MSR and ESE disciplines can learn from each other? do you think that the two communities would become more divergent in near future?

 c) What are, from your point of view, the main domains where empirical/data science research can play a key role in the next years (e.g., cyber-physical systems, cybersecurity, Internet of things, smart cities, big data, mobile computing, self-driving cars, Industry 4.0, etc.)? What is the expected role that you foresee for empirical/data science in these domains? How can empirical/data science methods be used to better support research in these domains? Cite 1 or 2.

 d) What are, from your perspective, the new and most promising research directions for ESE/Data Science in the short term? (And who should fund such research? How can we convince large IT companies to invest in ESE/Data Science research?)

Organization

Maleknaz Nayebi, University of Toronto, Canada.

Silvia Abrahão, Universidad Politècnica de València, Spain.