From: Erlend Kristiansen Date: Fri, 25 Apr 2014 12:23:12 +0000 (+0200) Subject: Thesis: adding motivation X-Git-Url: http://git.uio.no/git/?a=commitdiff_plain;h=97ae6df14f9a902f0c75927e7f58aa714a08bd99;p=ifi-stolz-refaktor.git Thesis: adding motivation --- diff --git a/thesis/.gitignore b/thesis/.gitignore index a3908b6d..1deee966 100644 --- a/thesis/.gitignore +++ b/thesis/.gitignore @@ -17,3 +17,4 @@ *.pyg *.ist *.bcf +*.lol diff --git a/thesis/master-thesis-erlenkr.tex b/thesis/master-thesis-erlenkr.tex index e86d0419..fd5369a6 100644 --- a/thesis/master-thesis-erlenkr.tex +++ b/thesis/master-thesis-erlenkr.tex @@ -166,6 +166,10 @@ identifies its participators and how they collaborate}, \DefineBibliographyStrings{english}{% bibliography = {References}, } +\newbibmacro{string+doi}[1]{% + \iffieldundef{doi}{#1}{\href{http://dx.doi.org/\thefield{doi}}{#1}}} +\DeclareFieldFormat{title}{\usebibmacro{string+doi}{\mkbibemph{#1}}} +\DeclareFieldFormat[article]{title}{\usebibmacro{string+doi}{\mkbibquote{#1}}} % UML comment in TikZ: % ref: https://tex.stackexchange.com/questions/103688/folded-paper-shape-tikz @@ -225,6 +229,7 @@ identifies its participators and how they collaborate}, \newcolumntype{L}[1]{>{\hsize=#1\hsize\raggedright\arraybackslash}X}% \newcolumntype{R}[1]{>{\hsize=#1\hsize\raggedleft\arraybackslash}X}% + \begin{document} %\pagenumbering{arabic} \mainmatter @@ -241,13 +246,41 @@ Can be done by removing ``draft'' from documentclass.}} \tableofcontents{} \listoffigures{} \listoftables{} +\listoflistings{} %\mainmatter %\setcounter{page}{13} \chapter{Introduction} + \section{Motivation and structure} +For large software projects, complex program source code is an issue. It impacts +the cost of maintenance in a negative way. It often stalls the implementation of +new functionality and other program changes. The code may be difficult to +understand, the changes may introduce new bugs that are hard to find and its +complexity can simply keep people from doing code changes in fear of breaking +some dependent piece of code. All these problems are related, and often lead to +a vicious circle that slowly degrades the overall quality of a project. + +More specifically, and in an object-oriented context, a class may depend on a +number of other classes. Sometimes these intimate relationships are appropriate, +and sometimes they are not. Inappropriate \emph{coupling} between classes can +make it difficult to know whether or not a change that is aimed at fixing a +specific problem also alters the behavior of another part of a program. + +One of the tools that are used to fight complexity and coupling in program +source code is \emph{refactoring}. The intention for this master's thesis is +therefore to create an automated composite refactoring that reduces coupling +between classes. The refactoring shall be able to operate automatically in all +phases of a refactoring, from performing analysis to executing changes. It is +also a requirement that it should be able to process large quantities of source +code in a reasonable amount of time. + + +\todoin{Structure. Write later\ldots} + + \section{What is refactoring?} This question is best answered by first defining the concept of a @@ -986,6 +1019,8 @@ tracematch (C c, X x) { \section{The Project} +In this section we look at the work that shall be done for this project, its +building stones and some of the methodologies used. \subsection{Project description} The aim of this master's project will be to explore the relationship between the @@ -1014,6 +1049,64 @@ as well as executing it over a larger code base, as a case study. To be able to execute the refactoring automatically, I have to make it analyze code to determine the best selections to extract into new methods. +\subsection{The premises} +Before we can start manipulating source code and write a tool for doing so, we +need to decide on a programming language for the code we are going to +manipulate. Also, since we do not want to start from scratch by implementing +primitive refactorings ourselves, we need to choose an existing tool that +provides the needed refactorings. In addition to be able to perform changes, we +need a framework for analyzing source code for the language we select. + +\subsubsection{Choosing the target language} +Choosing which programming language the code that shall be manipulated shall be +written in, is not a very difficult task. We choose to limit the possible +languages to the object-oriented programming languages, since most of the +terminology and literature regarding refactoring comes from the world of +object-oriented programming. In addition, the language must have existing tool +support for refactoring. + +The \name{Java} programming language\footnote{\url{https://www.java.com/}} is +the dominating language when it comes to example code in the literature of +refactoring, and is thus a natural choice. Java is perhaps, currently the most +influential programming language in the world, with its \name{Java Virtual +Machine} that runs on all of the most popular architectures and also supports +dozens of other programming languages\footnote{They compile to Java bytecode.}, +with \name{Scala}, \name{Clojure} and \name{Groovy} as the most prominent ones. +Java is currently the language that every other programming language is compared +against. It is also the primary programming language for the author of this +thesis. + +\subsubsection{Choosing the tools} +When choosing a tool for manipulating Java, there are certain criteria that +have to be met. First of all, the tool should have some existing refactoring +support that this thesis can build upon. Secondly it should provide some kind of +framework for parsing and analyzing Java source code. Third, it should itself be +open source. This is both because of the need to be able to browse the code for +the existing refactorings that is contained in the tool, and also because open +source projects hold value in them selves. Another important aspect to consider +is that open source projects of a certain size, usually has large communities of +people connected to them, that are committed to answering questions regarding the +use and misuse of the products, that to a large degree is made by the community +itself. + +There is a certain class of tools that meet these criteria, namely the class of +\emph{IDEs}\footnote{\emph{Integrated Development Environment}}. These are +programs that is meant to support the whole production cycle of a computer +program, and the most popular IDEs that support Java, generally have quite good +refactoring support. + +The main contenders for this thesis is the \name{Eclipse IDE}, with the +\name{Java development tools} (JDT), the \name{IntelliJ IDEA Community Edition} +and the \name{NetBeans IDE} \see{toolSupport}. \name{Eclipse} and +\name{NetBeans} are both free, open source and community driven, while the +\name{IntelliJ IDEA} has an open sourced community edition that is free of +charge, but also offer an \name{Ultimate Edition} with an extended set of +features, at additional cost. All three IDEs supports adding plugins to extend +their functionality and tools that can be used to parse and analyze Java source +code. But one of the IDEs stand out as a favorite, and that is the \name{Eclipse +IDE}. This is the most popular\citing{javaReport2011} among them and seems to be +de facto standard IDE for Java development regardless of platform. + \subsection{The primitive refactorings} The refactorings presented here are the primitive refactorings used in this project. They are the abstract building blocks used by the \ExtractAndMoveMethod @@ -1183,58 +1276,98 @@ And, assuming the refactoring does in fact improve the quality of source code: usefulness of the refactoring in a software development setting? In what parts of the development process can the refactoring play a role? -\subsection{The premises} -\todoin{Appropriate name?} +\subsection{Methodology} -\subsubsection{Choosing the target language} -Choosing which programming language the code that shall be manipulated shall be -written in, is not a very difficult task. We choose to limit the possible -languages to the object-oriented programming languages, since most of the -terminology and literature regarding refactoring comes from the world of -object-oriented programming. In addition, the language must have existing tool -support for refactoring. +\subsubsection{Evolutionary design} +In the programming work for this project, it have tried to use a design strategy +called evolutionary design, also known as continuous or incremental +design\citing{wiki_continuous_2014}. It is a software design strategy +advocated by the Extreme Programming community. The essence of the strategy is +that you should let the design of your program evolve naturally as your +requirements change. This is seen in contrast with up-front design, where +design decisions are made early in the process. -The \name{Java} programming language\footnote{\url{https://www.java.com/}} is -the dominating language when it comes to example code in the literature of -refactoring, and is thus a natural choice. Java is perhaps, currently the most -influential programming language in the world, with its \name{Java Virtual -Machine} that runs on all of the most popular architectures and also supports -dozens of other programming languages\footnote{They compile to Java bytecode.}, -with \name{Scala}, \name{Clojure} and \name{Groovy} as the most prominent ones. -Java is currently the language that every other programming language is compared -against. It is also the primary programming language for the author of this -thesis. +The motivation behind evolutionary design is to keep the design of software as +simple as possible. This means not introducing unneeded functionality into a +program. You should defer introducing flexibility into your software, until it +is needed to be able to add functionality in a clean way. -\subsubsection{Choosing the tools} -When choosing a tool for manipulating Java, there are certain criteria that -have to be met. First of all, the tool should have some existing refactoring -support that this thesis can build upon. Secondly it should provide some kind of -framework for parsing and analyzing Java source code. Third, it should itself be -open source. This is both because of the need to be able to browse the code for -the existing refactorings that is contained in the tool, and also because open -source projects hold value in them selves. Another important aspect to consider -is that open source projects of a certain size, usually has large communities of -people connected to them, that are committed to answering questions regarding the -use and misuse of the products, that to a large degree is made by the community -itself. +Holding up design decisions, implies that the time will eventually come when +decisions have to be made. The flexibility of the design then relies on the +programmer's abilities to perform the necessary refactoring, and \his confidence +in those abilities. From my experience working on this project, I can say that +this confidence is greatly enhanced by having automated tests to rely on +\see{tdd}. -There is a certain class of tools that meet these criteria, namely the class of -\emph{IDEs}\footnote{\emph{Integrated Development Environment}}. These are -programs that is meant to support the whole production cycle of a computer -program, and the most popular IDEs that support Java, generally have quite good -refactoring support. +The choice of going for evolutionary design developed naturally. As Fowler +points out in his article \tit{Is Design Dead?}, evolutionary design much +resembles the ``code and fix'' development strategy\citing{fowler_design_2004}. +A strategy that most of us have practiced in school. This was also the case when +I first started this work. I had to learn the inner workings of Eclipse and its +refactoring-related plugins. That meant a lot of fumbling around with code I did +not know, in a trial and error fashion. Eventually I started writing tests for +my code, and my design began to evolve. + +\subsubsection{Test-driven development}\label{tdd} +As mentioned before, the project started out as a classic code and fix +developmen process. My focus was aimed at getting something to work, rather than +doing so according to best practice. This resulted in a project that got out of +its starting blocks, but it was not accompanied by any tests. Hence it was soon +difficult to make any code changes with the confidence that the program was +still correct afterwards (assuming it was so before changing it). I always knew +that I had to introduce some tests at one point, but this experience accelerated +the process of leading me onto the path of testing. + +I then wrote tests for the core functionality of the plugin, and thus gained +more confidence in the correctness of my code. I could now perform quite drastic +changes without ``wetting my pants``. After this, nearly all of the semantic +changes done to the business logic of the project, or the addition of new +functionality, was made in a test-driven manner. This means that before +performing any changes, I would define the desired functionality through a set +of tests. I would then run the tests to check that they were run and that they +did not pass. Then I would do any code changes necessary to make the tests +pass. The definition of how the program is supposed to operate is then captured +by the tests. However, this does not prove the correctness of the analysis +leading to the test definitions. + +\subsubsection{Continuous integration} +\todoin{???} + +\section{Related Work} + +\subsection{Safer refactorings} +\todoin{write} + +\subsection{The compositional paradigm of refactoring} +This paradigm builds upon the observation of Vakilian et +al.\citing{vakilian2012}, that of the many automated refactorings existing in +modern IDEs, the simplest ones are dominating the usage statistics. The report +mainly focuses on \name{Eclipse} as the tool under investigation. + +The paradigm is described almost as the opposite of automated composition of +refactorings \see{compositeRefactorings}. It works by providing the programmer +with easily accessible primitive refactorings. These refactorings shall be +accessed via keyboard shortcuts or quick-assist menus\footnote{Think +quick-assist with Ctrl+1 in \name{Eclipse}} and be promptly executed, opposed to in the +currently dominating wizard-based refactoring paradigm. They are meant to +stimulate composing smaller refactorings into more complex changes, rather than +doing a large upfront configuration of a wizard-based refactoring, before +previewing and executing it. The compositional paradigm of refactoring is +supposed to give control back to the programmer, by supporting \himher with an +option of performing small rapid changes instead of large changes with a lesser +degree of control. The report authors hope this will lead to fewer unsuccessful +refactorings. It also could lower the bar for understanding the steps of a +larger composite refactoring and thus also help in figuring out what goes wrong +if one should choose to op in on a wizard-based refactoring. + +Vakilian and his associates have performed a survey of the effectiveness of the +compositional paradigm versus the wizard-based one. They claim to have found +evidence of that the \emph{compositional paradigm} outperforms the +\emph{wizard-based}. It does so by reducing automation, which seem +counterintuitive. Therefore they ask the question ``What is an appropriate level +of automation?'', and thus questions what they feel is a rush toward more +automation in the software engineering community. -The main contenders for this thesis is the \name{Eclipse IDE}, with the -\name{Java development tools} (JDT), the \name{IntelliJ IDEA Community Edition} -and the \name{NetBeans IDE} \see{toolSupport}. \name{Eclipse} and -\name{NetBeans} are both free, open source and community driven, while the -\name{IntelliJ IDEA} has an open sourced community edition that is free of -charge, but also offer an \name{Ultimate Edition} with an extended set of -features, at additional cost. All three IDEs supports adding plugins to extend -their functionality and tools that can be used to parse and analyze Java source -code. But one of the IDEs stand out as a favorite, and that is the \name{Eclipse -IDE}. This is the most popular\citing{javaReport2011} among them and seems to be -de facto standard IDE for Java development regardless of platform. \chapter{The search-based Extract and Move Method refactoring} @@ -4543,105 +4676,12 @@ while, before they were solved. This is reflected in the ``Test Result Trend'' and ``Code Coverage Trend'' reported by Jenkins. -\chapter{Methodology} - -\section{Evolutionary design} -In the programming work for this project, it have tried to use a design strategy -called evolutionary design, also known as continuous or incremental -design\citing{wiki_continuous_2014}. It is a software design strategy -advocated by the Extreme Programming community. The essence of the strategy is -that you should let the design of your program evolve naturally as your -requirements change. This is seen in contrast with up-front design, where -design decisions are made early in the process. - -The motivation behind evolutionary design is to keep the design of software as -simple as possible. This means not introducing unneeded functionality into a -program. You should defer introducing flexibility into your software, until it -is needed to be able to add functionality in a clean way. - -Holding up design decisions, implies that the time will eventually come when -decisions have to be made. The flexibility of the design then relies on the -programmer's abilities to perform the necessary refactoring, and \his confidence -in those abilities. From my experience working on this project, I can say that -this confidence is greatly enhanced by having automated tests to rely on -\see{tdd}. - -The choice of going for evolutionary design developed naturally. As Fowler -points out in his article \tit{Is Design Dead?}, evolutionary design much -resembles the ``code and fix'' development strategy\citing{fowler_design_2004}. -A strategy that most of us have practiced in school. This was also the case when -I first started this work. I had to learn the inner workings of Eclipse and its -refactoring-related plugins. That meant a lot of fumbling around with code I did -not know, in a trial and error fashion. Eventually I started writing tests for -my code, and my design began to evolve. - -\section{Test-driven development}\label{tdd} -As mentioned before, the project started out as a classic code and fix -developmen process. My focus was aimed at getting something to work, rather than -doing so according to best practice. This resulted in a project that got out of -its starting blocks, but it was not accompanied by any tests. Hence it was soon -difficult to make any code changes with the confidence that the program was -still correct afterwards (assuming it was so before changing it). I always knew -that I had to introduce some tests at one point, but this experience accelerated -the process of leading me onto the path of testing. - -I then wrote tests for the core functionality of the plugin, and thus gained -more confidence in the correctness of my code. I could now perform quite drastic -changes without ``wetting my pants``. After this, nearly all of the semantic -changes done to the business logic of the project, or the addition of new -functionality, was made in a test-driven manner. This means that before -performing any changes, I would define the desired functionality through a set -of tests. I would then run the tests to check that they were run and that they -did not pass. Then I would do any code changes necessary to make the tests -pass. The definition of how the program is supposed to operate is then captured -by the tests. However, this does not prove the correctness of the analysis -leading to the test definitions. - -\section{Continuous integration} -\todoin{???} - \chapter{Conclusions and Future Work} \todoin{Write} \section{Future work} -\chapter{Related Work} - -\section{Safer refactorings} -\todoin{write} - -\section{The compositional paradigm of refactoring} -This paradigm builds upon the observation of Vakilian et -al.\citing{vakilian2012}, that of the many automated refactorings existing in -modern IDEs, the simplest ones are dominating the usage statistics. The report -mainly focuses on \name{Eclipse} as the tool under investigation. - -The paradigm is described almost as the opposite of automated composition of -refactorings \see{compositeRefactorings}. It works by providing the programmer -with easily accessible primitive refactorings. These refactorings shall be -accessed via keyboard shortcuts or quick-assist menus\footnote{Think -quick-assist with Ctrl+1 in \name{Eclipse}} and be promptly executed, opposed to in the -currently dominating wizard-based refactoring paradigm. They are meant to -stimulate composing smaller refactorings into more complex changes, rather than -doing a large upfront configuration of a wizard-based refactoring, before -previewing and executing it. The compositional paradigm of refactoring is -supposed to give control back to the programmer, by supporting \himher with an -option of performing small rapid changes instead of large changes with a lesser -degree of control. The report authors hope this will lead to fewer unsuccessful -refactorings. It also could lower the bar for understanding the steps of a -larger composite refactoring and thus also help in figuring out what goes wrong -if one should choose to op in on a wizard-based refactoring. - -Vakilian and his associates have performed a survey of the effectiveness of the -compositional paradigm versus the wizard-based one. They claim to have found -evidence of that the \emph{compositional paradigm} outperforms the -\emph{wizard-based}. It does so by reducing automation, which seem -counterintuitive. Therefore they ask the question ``What is an appropriate level -of automation?'', and thus questions what they feel is a rush toward more -automation in the software engineering community. - - \appendix