]> git.uio.no Git - ifi-stolz-refaktor.git/blame - thesis/master-thesis-erlenkr.tex
Thesis: making it boring
[ifi-stolz-refaktor.git] / thesis / master-thesis-erlenkr.tex
CommitLineData
064227e3 1\documentclass[USenglish,draft]{ifimaster}
571ef294 2\usepackage{import}
7c28933b 3\usepackage[utf8]{inputenc}
9ff90080 4\usepackage[T1]{fontenc,url}
3510e539 5\usepackage{lmodern} % using Latin Modern to be able to use bold typewriter font
9ff90080 6\urlstyle{sf}
571ef294 7\usepackage{babel,textcomp,csquotes,ifimasterforside,varioref,graphicx}
79d5c2a9 8\usepackage[hidelinks]{hyperref}
8b6b22c8 9\usepackage{cleveref}
84fe308b 10\usepackage[style=numeric-comp,backend=bibtex]{biblatex}
12c254af 11\usepackage{amsthm}
064227e3 12\usepackage[obeyDraft]{todonotes}
0d7fbd88
EK
13\usepackage{xspace}
14\usepackage{he-she}
b289552b 15\usepackage{verbatim}
ddcea0b5 16\usepackage{minted}
347ed677 17\usepackage{multicol}
ddcea0b5 18\usemintedstyle{bw}
8fae7b44
EK
19\usepackage{perpage} %the perpage package
20\MakePerPage{footnote} %the perpage package command
9ff90080 21
6c51af15 22\theoremstyle{definition}
12c254af
EK
23\newtheorem*{wordDef}{Definition}
24
ddcea0b5
EK
25\graphicspath{ {./figures/} }
26
8b6b22c8
EK
27\newcommand{\citing}[1]{~\cite{#1}}
28\newcommand{\myref}[1]{\cref{#1} on \cpageref{#1}}
29
12c254af 30\newcommand{\definition}[1]{\begin{wordDef}#1\end{wordDef}}
8b6b22c8
EK
31\newcommand{\see}[1]{(see \myref{#1})}
32\newcommand{\See}[1]{(See \myref{#1}.)}
137e0e7b
EK
33\newcommand{\explanation}[3]{\noindent\textbf{\textit{#1}}\\*\emph{When:}
34#2\\*\emph{How:} #3\\*[-7px]}
4e135659 35
3510e539 36\newcommand{\type}[1]{\texttt{\textbf{#1}}}
f041551b
EK
37\newcommand{\typeref}[1]{\footnote{\type{#1}}}
38\newcommand{\typewithref}[2]{\type{#2}\typeref{#1.#2}}
39\newcommand{\method}[1]{\type{#1}}
40\newcommand{\methodref}[2]{\footnote{\type{#1}\method{\##2()}}}
41\newcommand{\methodwithref}[2]{\method{#2}\footnote{\type{#1}\method{\##2()}}}
3510e539 42\newcommand{\var}[1]{\type{#1}}
9ff90080 43
b5c7bb1b 44\newcommand{\refactoring}[1]{\emph{#1}}
0d7fbd88
EK
45\newcommand{\ExtractMethod}{\refactoring{Extract Method}\xspace}
46\newcommand{\MoveMethod}{\refactoring{Move Method}\xspace}
b5c7bb1b 47
4e135659
EK
48\newcommand\todoin[2][]{\todo[inline, caption={2do}, #1]{
49\begin{minipage}{\textwidth-4pt}#2\end{minipage}}}
50
7c28933b 51\title{Refactoring}
aa1e3779 52\subtitle{An essay}
7c28933b
EK
53\author{Erlend Kristiansen}
54
55\bibliography{bibliography/master-thesis-erlenkr-bibliography}
9ff90080
EK
56
57\begin{document}
531c4132 58\ififorside
9ff90080 59\frontmatter{}
9ff90080
EK
60
61
62\chapter*{Abstract}
064227e3
EK
63\todoin{\textbf{Remove all todos (including list) before delivery/printing!!!
64Can be done by removing ``draft'' from documentclass.}}
889ba93e 65\todoin{Write abstract}
9ff90080
EK
66
67\tableofcontents{}
68\listoffigures{}
69\listoftables{}
70
71\chapter*{Preface}
72
d1adbeef
EK
73The discussions in this report must be seen in the context of object oriented
74programming languages, and Java in particular, since that is the language in
75which most of the examples will be given. All though the techniques discussed
76may be applicable to languages from other paradigms, they will not be the
77subject of this report.
f3a108c3 78
055dca93 79\mainmatter
00aa0588 80
740e1b6c 81\chapter{What is Refactoring?}
7c28933b 82
f3a108c3
EK
83This question is best answered by first defining the concept of a
84\emph{refactoring}, what it is to \emph{refactor}, and then discuss what aspects
a1bafe90 85of programming make people want to refactor their code.
00aa0588 86
740e1b6c 87\section{Defining refactoring}
a1bafe90 88Martin Fowler, in his classic book on refactoring\citing{refactoring}, defines a
00aa0588 89refactoring like this:
ee45c41f 90
00aa0588 91\begin{quote}
aa1e3779
EK
92 \emph{Refactoring} (noun): a change made to the internal
93 structure\footnote{The structure observable by the programmer.} of software to
94 make it easier to understand and cheaper to modify without changing its
95 observable behavior.~\cite[p.~53]{refactoring}
00aa0588 96\end{quote}
ee45c41f 97
a1bafe90 98\noindent This definition assigns additional meaning to the word
ee45c41f
EK
99\emph{refactoring}, beyond the composition of the prefix \emph{re-}, usually
100meaning something like ``again'' or ``anew'', and the word \emph{factoring},
6c51af15
EK
101that can mean to isolate the \emph{factors} of something. Here a \emph{factor}
102would be close to the mathematical definition of something that divides a
103quantity, without leaving a remainder. Fowler is mixing the \emph{motivation}
a1bafe90
EK
104behind refactoring into his definition. Instead it could be more refined, formed
105to only consider the \emph{mechanical} and \emph{behavioral} aspects of
106refactoring. That is to factor the program again, putting it together in a
107different way than before, while preserving the behavior of the program. An
108alternative definition could then be:
6c51af15
EK
109
110\definition{A \emph{refactoring} is a transformation
8fae7b44 111done to a program without altering its external behavior.}
00aa0588 112
ee1d883a
EK
113From this we can conclude that a refactoring primarily changes how the
114\emph{code} of a program is perceived by the \emph{programmer}, and not the
740e1b6c
EK
115\emph{behavior} experienced by any user of the program. Although the logical
116meaning is preserved, such changes could potentially alter the program's
117behavior when it comes to performance gain or -penalties. So any logic depending
118on the performance of a program could make the program behave differently after
119a refactoring.
00aa0588 120
137e0e7b 121In the extreme case one could argue that such a thing as \emph{software
472d2dc8
EK
122obfuscation} is refactoring. Software obfuscation is to make source code harder
123to read and analyze, while preserving its semantics. It could be done composing
124many, more or less randomly chosen, refactorings. Then the question arise
125whether it can be called a \emph{composite refactoring}
126\see{compositeRefactorings} or not? The answer is not obvious. First, there is
127no way to describe \emph{the} mechanics of software obfuscation, beacause there
128are infinitely many ways to do that. Second, \emph{obfuscation} can be thought
129of as \emph{one operation}: Either the code is obfuscated, or it is not. Third,
130it makes no sense to call software obfuscation \emph{a} refactoring, since it
131holds different meaning to different people. The last point is important, since
132one of the motivations behind defining different refactorings is to build up a
133vocabulary for software professionals to reason and discuss about programs,
134similar to the motivation behind design patterns\citing{designPatterns}. So for
135describing \emph{software obfuscation}, it might be more appropriate to define
136what you do when performing it rather than precisely defining its mechanics in
137terms of other refactorings.
00aa0588 138
740e1b6c 139\section{The etymology of 'refactoring'}
f3a108c3
EK
140It is a little difficult to pinpoint the exact origin of the word
141``refactoring'', as it seems to have evolved as part of a colloquial
142terminology, more than a scientific term. There is no authoritative source for a
143formal definition of it.
144
b5c7bb1b 145According to Martin Fowler\citing{etymology-refactoring}, there may also be more
f3a108c3
EK
146than one origin of the word. The most well-known source, when it comes to the
147origin of \emph{refactoring}, is the Smalltalk\footnote{\emph{Smalltalk},
a1bafe90
EK
148object-oriented, dynamically typed, reflective programming language. See
149\url{http://www.smalltalk.org}} community and their infamous \emph{Refactoring
f3a108c3
EK
150Browser}\footnote{\url{http://st-www.cs.illinois.edu/users/brant/Refactory/RefactoringBrowser.html}}
151described in the article \emph{A Refactoring Tool for
b5c7bb1b
EK
152Smalltalk}\citing{refactoringBrowser1997}, published in 1997.
153Allegedly\citing{etymology-refactoring}, the metaphor of factoring programs was
f3a108c3
EK
154also present in the Forth\footnote{\emph{Forth} -- stack-based, extensible
155programming language, without type-checking. See \url{http://www.forth.org}}
156community, and the word ``refactoring'' is mentioned in a book by Leo Brodie,
b5c7bb1b 157called \emph{Thinking Forth}\citing{brodie1984}, first published in
f3a108c3
EK
1581984\footnote{\emph{Thinking Forth} was first published in 1984 by the
159\emph{Forth Interest Group}. Then it was reprinted in 1994 with minor
160typographical corrections, before it was transcribed into an electronic edition
161typeset in \LaTeX\ and published under a Creative Commons licence in 2004. The
162edition cited here is the 2004 edition, but the content should essentially be as
a1bafe90
EK
163in 1984.}. The exact word is only printed one place~\cite[p.~232]{brodie1984},
164but the term \emph{factoring} is prominent in the book, that also contains a
165whole chapter dedicated to (re)factoring, and how to keep the (Forth) code clean
166and maintainable.
ee45c41f 167
f3a108c3
EK
168\begin{quote}
169 \ldots good factoring technique is perhaps the most important skill for a
4cb06723 170 Forth programmer.~\cite[p.~172]{brodie1984}
f3a108c3 171\end{quote}
ee45c41f
EK
172
173\noindent Brodie also express what \emph{factoring} means to him:
174
f3a108c3
EK
175\begin{quote}
176 Factoring means organizing code into useful fragments. To make a fragment
177 useful, you often must separate reusable parts from non-reusable parts. The
178 reusable parts become new definitions. The non-reusable parts become arguments
4cb06723 179 or parameters to the definitions.~\cite[p.~172]{brodie1984}
f3a108c3
EK
180\end{quote}
181
182Fowler claims that the usage of the word \emph{refactoring} did not pass between
183the \emph{Forth} and \emph{Smalltalk} communities, but that it emerged
184independently in each of the communities.
185
740e1b6c 186\section{Motivation -- Why people refactor}
2f6e6dec
EK
187There are many reasons why people want to refactor their programs. They can for
188instance do it to remove duplication, break up long methods or to introduce
189design patterns\citing{designPatterns} into their software systems. The shared
190trait for all these are that peoples intentions are to make their programs
191\emph{better}, in some sense. But what aspects of their programs are becoming
192improved?
51a854d4
EK
193
194As already mentioned, people often refactor to get rid of duplication. Moving
a1bafe90 195identical or similar code into methods, and maybe pushing methods up or down in
740e1b6c 196their class hierarchies. Making template methods for overlapping
a1bafe90
EK
197algorithms/functionality and so on. It is all about gathering what belongs
198together and putting it all in one place. The resulting code is then easier to
199maintain. When removing the implicit coupling\footnote{When duplicating code,
200the code might not be coupled in other ways than that it is supposed to
201represent the same functionality. So if this functionality is going to change,
202it might need to change in more than one place, thus creating an implicit
203coupling between the multiple pieces of code.} between code snippets, the
137e0e7b 204location of a bug is limited to only one place, and new functionality need only
a1bafe90
EK
205to be added to this one place, instead of a number of places people might not
206even remember.
207
208A problem you often encounter when programming, is that a program contains a lot
209of long and hard-to-grasp methods. It can then help to break the methods into
210smaller ones, using the \ExtractMethod refactoring\citing{refactoring}. Then you
211may discover something about a program that you were not aware of before;
212revealing bugs you did not know about or could not find due to the complex
213structure of your program. \todo{Proof?} Making the methods smaller and giving
214good names to the new ones clarifies the algorithms and enhances the
215\emph{understandability} of the program \see{magic_number_seven}. This makes
216refactoring an excellent method for exploring unknown program code, or code that
217you had forgotten that you wrote.
218
219Most primitive refactorings are simple. Their true power is first revealed when
220they are combined into larger --- higher level --- refactorings, called
221\emph{composite refactorings} \see{compositeRefactorings}. Often the goal of
222such a series of refactorings is a design pattern. Thus the \emph{design} can be
223evolved throughout the lifetime of a program, as opposed to designing up-front.
224It is all about being structured and taking small steps to improve a program's
225design.
226
227Many software design pattern are aimed at lowering the coupling between
228different classes and different layers of logic. One of the most famous is
229perhaps the \emph{Model-View-Controller}\citing{designPatterns} pattern, or
230\emph{MVC} for short. It is aimed at lowering the coupling between the user
231interface and the business logic and data representation of a program. This also
232has the added benefit that the business logic could much easier be the target of
233automated tests, increasing the productivity in the software development
234process. Refactoring is an important tool on the way to something greater.
0b0567f2
EK
235
236Another effect of refactoring is that with the increased separation of concerns
a1bafe90
EK
237coming out of many refactorings, the \emph{performance} can be improved. When
238profiling programs, the problematic parts are narrowed down to smaller parts of
239the code, which are easier to tune, and optimization can be performed only where
137e0e7b
EK
240needed and in a more effective way.
241
b01d328a
EK
242Last, but not least, and this should probably be the best reason to refactor, is
243to refactor to \emph{facilitate a program change}. If one has managed to keep
244one's code clean and tidy, and the code is not bloated with design patterns that
a1bafe90 245are not ever going to be needed, then some refactoring might be needed to
b01d328a
EK
246introduce a design pattern that is appropriate for the change that is going to
247happen.
248
137e0e7b
EK
249Refactoring program code --- with a goal in mind --- can give the code itself
250more value. That is in the form of robustness to bugs, understandability and
a1bafe90
EK
251maintainability. Having robust code is an obvious advantage, but
252understandability and maintainability are both very important aspects of
253software development. By incorporating refactoring in the development process,
254bugs are found faster, new functionality is added more easily and code is easier
255to understand by the next person exposed to it, which might as well be the
256person who wrote it. The consequence of this, is that refactoring can increase
257the average productivity of the development process, and thus also add to the
258monetary value of a business in the long run. The perspective on productivity
259and money should also be able to open the eyes of the many nearsighted managers
260that seldom see beyond the next milestone.
137e0e7b 261
b01d328a 262\section{The magical number seven}\label{magic_number_seven}
a1bafe90
EK
263The article \emph{The magical number seven, plus or minus two: some limits on
264our capacity for processing information}\citing{miller1956} by George A.
265Miller, was published in the journal \emph{Psychological Review} in 1956. It
f4cea2d6
EK
266presents evidence that support that the capacity of the number of objects a
267human being can hold in its working memory is roughly seven, plus or minus two
268objects. This number varies a bit depending on the nature and complexity of the
269objects, but is according to Miller ``\ldots never changing so much as to be
270unrecognizable.''
271
272Miller's article culminates in the section called \emph{Recoding}, a term he
273borrows from communication theory. The central result in this section is that by
274recoding information, the capacity of the amount of information that a human can
275process at a time is increased. By \emph{recoding}, Miller means to group
276objects together in chunks and give each chunk a new name that it can be
277remembered by. By organizing objects into patterns of ever growing depth, one
278can memorize and process a much larger amount of data than if it were to be
279represented as its basic pieces. This grouping and renaming is analogous to how
280many refactorings work, by grouping pieces of code and give them a new name.
a1bafe90 281Examples are the fundamental \ExtractMethod and \refactoring{Extract Class}
b5c7bb1b 282refactorings\citing{refactoring}.
f4cea2d6
EK
283
284\begin{quote}
285 \ldots recoding is an extremely powerful weapon for increasing the amount of
4cb06723 286 information that we can deal with.~\cite[p.~95]{miller1956}
f4cea2d6 287\end{quote}
ee45c41f 288
a1bafe90 289An example from the article addresses the problem of memorizing a sequence of
f4cea2d6
EK
290binary digits. Let us say we have the following sequence\footnote{The example
291 presented here is slightly modified (and shortened) from what is presented in
b5c7bb1b 292 the original article\citing{miller1956}, but it is essentially the same.} of
f4cea2d6
EK
29316 binary digits: ``1010001001110011''. Most of us will have a hard time
294memorizing this sequence by only reading it once or twice. Imagine if we instead
295translate it to this sequence: ``A273''. If you have a background from computer
296science, it will be obvious that the latest sequence is the first sequence
297recoded to be represented by digits with base 16. Most people should be able to
298memorize this last sequence by only looking at it once.
299
300Another result from the Miller article is that when the amount of information a
301human must interpret increases, it is crucial that the translation from one code
302to another must be almost automatic for the subject to be able to remember the
0d7fbd88 303translation, before \heshe is presented with new information to recode. Thus
f4cea2d6
EK
304learning and understanding how to best organize certain kinds of data is
305essential to efficiently handle that kind of data in the future. This is much
a1bafe90
EK
306like when humans learn to read. First they must learn how to recognize letters.
307Then they can learn distinct words, and later read sequences of words that form
308whole sentences. Eventually, most of them will be able to read whole books and
309briefly retell the important parts of its content. This suggest that the use of
310design patterns\citing{designPatterns} is a good idea when reasoning about
311computer programs. With extensive use of design patterns when creating complex
312program structures, one does not always have to read whole classes of code to
313comprehend how they function, it may be sufficient to only see the name of a
314class to almost fully understand its responsibilities.
f4cea2d6
EK
315
316\begin{quote}
317 Our language is tremendously useful for repackaging material into a few chunks
4cb06723 318 rich in information.~\cite[p.~95]{miller1956}
f4cea2d6 319\end{quote}
ee45c41f 320
a1bafe90 321Without further evidence, these results at least indicate that refactoring
f4cea2d6
EK
322source code into smaller units with higher cohesion and, when needed,
323introducing appropriate design patterns, should aid in the cause of creating
324computer programs that are easier to maintain and has code that is easier (and
325better) understood.
326
740e1b6c 327\section{Notable contributions to the refactoring literature}
4e135659 328\todoin{Update with more contributions}
36d99783 329
d21ef41f
EK
330\begin{description}
331 \item[1992] William F. Opdyke submits his doctoral dissertation called
b5c7bb1b 332 \emph{Refactoring Object-Oriented Frameworks}\citing{opdyke1992}. This
d21ef41f
EK
333 work defines a set of refactorings, that are behavior preserving given that
334 their preconditions are met. The dissertation is focused on the automation
335 of refactorings.
336 \item[1999] Martin Fowler et al.: \emph{Refactoring: Improving the Design of
b5c7bb1b
EK
337 Existing Code}\citing{refactoring}. This is maybe the most influential text
338 on refactoring. It bares similarities with Opdykes thesis\citing{opdyke1992}
d21ef41f
EK
339 in the way that it provides a catalog of refactorings. But Fowler's book is
340 more about the craft of refactoring, as he focuses on establishing a
341 vocabulary for refactoring, together with the mechanics of different
342 refactorings and when to perform them. His methodology is also founded on
36d99783
EK
343 the principles of test-driven development.
344 \item[2005] Joshua Kerievsky: \emph{Refactoring to
345 Patterns}\citing{kerievsky2005}. This book is heavily influenced by Fowler's
a1bafe90 346 \emph{Refactoring}\citing{refactoring} and the ``Gang of Four'' \emph{Design
36d99783
EK
347 Patterns}\citing{designPatterns}. It is building on the refactoring
348 catalogue from Fowler's book, but is trying to bridge the gap between
349 \emph{refactoring} and \emph{design patterns} by providing a series of
350 higher-level composite refactorings, that makes code evolve toward or away
351 from certain design patterns. The book is trying to build up the readers
352 intuition around \emph{why} one would want to use a particular design
353 pattern, and not just \emph{how}. The book is encouraging evolutionary
354 design. \See{relationToDesignPatterns}
d21ef41f 355\end{description}
3b7c1d90 356
4e135659
EK
357\section{Tool support}\label{toolSupport}
358
359\subsection{Tool support for Java}
360This section will briefly compare the refatoring support of the three IDEs
361\emph{Eclipse}\footnote{\url{http://www.eclipse.org/}}, \emph{IntelliJ
362IDEA}\footnote{The IDE under comparison is the \emph{Community Edition},
363\url{http://www.jetbrains.com/idea/}} and
364\emph{NetBeans}\footnote{\url{https://netbeans.org/}}. These are the most
365popular Java IDEs\citing{javaReport2011}.
366
367All three IDEs provide support for the most useful refactorings, like the
368different extract, move and rename refactorings. In fact, Java-targeted IDEs are
369known for their good refactoring support, so this did not appear as a big
370surprise.
371
372The IDEs seem to have excellent support for the \ExtractMethod refactoring, so
373at least they have all passed the first refactoring
374rubicon\citing{fowlerRubicon2001,secondRubicon2012}.
375
376Regarding the \MoveMethod refactoring, the \emph{Eclipse} and \emph{IntelliJ}
377IDEs do the job in very similar manners. In most situations they both do a
378satisfying job by producing the expected outcome. But they do nothing to check
a1bafe90
EK
379that the result does not break the semantics of the program \see{correctness}.
380The \emph{NetBeans} IDE implements this refactoring in a somewhat
381unsophisticated way. For starters, its default destination for the move is
382itself, although it refuses to perform the refactoring if chosen. But the worst
347ed677 383part is, that if moving the method \method{f} of the class \type{C} to the class
8b6b22c8
EK
384\type{X}, it will break the code. The result is shown in
385\myref{lst:moveMethod_NetBeans}.
4e135659 386
347ed677
EK
387\begin{listing}
388\begin{multicols}{2}
4e135659
EK
389\begin{minted}[samepage]{java}
390public class C {
391 private X x;
392 ...
393 public void f() {
394 x.m();
395 x.n();
396 }
397}
398\end{minted}
399
347ed677 400\columnbreak
4e135659
EK
401
402\begin{minted}[samepage]{java}
403public class X {
404 ...
405 public void f(C c) {
406 c.x.m();
407 c.x.n();
408 }
409}
410\end{minted}
347ed677
EK
411\end{multicols}
412\caption{Moving method \method{f} from \type{C} to \type{X}.}
413\label{lst:moveMethod_NetBeans}
414\end{listing}
4e135659
EK
415
416NetBeans will try to make code that call the methods \method{m} and \method{n}
417of \type{X} by accessing them through \var{c.x}, where \var{c} is a parameter of
a1bafe90
EK
418type \type{C} that is added the method \method{f} when it is moved. (This is
419seldom the desired outcome of this refactoring, but ironically, this ``feature''
8b6b22c8
EK
420keeps NetBeans from breaking the code in the example from \myref{correctness}.)
421If \var{c.x} for some reason is inaccessible to \type{X}, as in this case, the
422refactoring breaks the code, and it will not compile. NetBeans presents a
423preview of the refactoring outcome, but the preview does not catch it if the IDE
424is about break the program.
4778044b
EK
425
426The IDEs under investigation seems to have fairly good support for primitive
427refactorings, but what about more complex ones, such as the \refactoring{Extract
428Class}\citing{refactoring}? The \refactoring{Extract Class} refactoring works by
429creating a class, for then to move members to that class and access them from
a1bafe90
EK
430the old class via a reference to the new class. \emph{IntelliJ} handles this in
431a fairly good manner, although, in the case of private methods, it leaves unused
432methods behind. These are methods that delegate to a field with the type of the
4778044b 433new class, but are not used anywhere. \emph{Eclipse} has added (or withdrawn)
a1bafe90
EK
434its own quirk to the Extract Class refactoring, and only allows for
435\emph{fields} to be moved to a new class, \emph{not methods}. This makes it
436effectively only extracting a data structure, and calling it
437\refactoring{Extract Class} is a little misleading. One would often be better
438off with textual extract and paste than using the Extract Class refactoring in
439Eclipse. When it comes to \emph{NetBeans}, it does not even seem to have made an
440attempt on providing this refactoring. (Well, it probably has, but it does not
441show in the IDE.)
4778044b
EK
442
443\todoin{Visual Studio (C++/C\#), Smalltalk refactoring browser?,
4e135659 444second refactoring rubicon?}
3b7c1d90 445
36d99783 446\section{The relation to design patterns}\label{relationToDesignPatterns}
4cb06723
EK
447
448\emph{Refactoring} and \emph{design patterns} have at least one thing in common,
449they are both promoted by advocates of \emph{clean code}\citing{cleanCode} as
450fundamental tools on the road to more maintanable and extendable source code.
451
452\begin{quote}
453 Design patterns help you determine how to reorganize a design, and they can
454 reduce the amount of refactoring you need to do
455 later.~\cite[p.~353]{designPatterns}
456\end{quote}
457
a1bafe90
EK
458Although sometimes associated with
459over-engineering\citing{kerievsky2005,refactoring}, design patterns are in
460general assumed to be good for maintainability of source code. That may be
d1adbeef
EK
461because many of them are designed to support the \emph{open/closed principle} of
462object-oriented programming. The principle was first formulated by Bertrand
463Meyer, the creator of the Eiffel programming language, like this: ``Modules
464should be both open and closed.''\citing{meyer1988} It has been popularized,
465with this as a common version:
466
467\begin{quote}
468 Software entities (classes, modules, functions, etc.) should be open for
469 extension, but closed for modification.\footnote{See
470 \url{http://c2.com/cgi/wiki?OpenClosedPrinciple} or
471 \url{https://en.wikipedia.org/wiki/Open/closed_principle}}
472\end{quote}
473
474Maintainability is often thought of as the ability to be able to introduce new
a1bafe90 475functionality without having to change too much of the old code. When
d1adbeef
EK
476refactoring, the motivation is often to facilitate adding new functionality. It
477is about factoring the old code in a way that makes the new functionality being
478able to benefit from the functionality already residing in a software system,
479without having to copy old code into new. Then, next time someone shall add new
a1bafe90
EK
480functionality, it is less likely that the old code has to change. Assuming that
481a design pattern is the best way to get rid of duplication and assist in
482implementing new functionality, it is reasonable to conclude that a design
483pattern often is the target of a series of refactorings. Having a repertoire of
484design patterns can also help in knowing when and how to refactor a program to
485make it reflect certain desired characteristics.
d1adbeef
EK
486
487\begin{quote}
a1bafe90 488 There is a natural relation between patterns and refactorings. Patterns are
d1adbeef
EK
489 where you want to be; refactorings are ways to get there from somewhere
490 else.~\cite[p.~107]{refactoring}
491\end{quote}
492
493This quote is wise in many contexts, but it is not always appropriate to say
a1bafe90
EK
494``Patterns are where you want to be\ldots''. \emph{Sometimes}, patterns are
495where you want to be, but only because it will benefit your design. It is not
496true that one should always try to incorporate as many design patterns as
497possible into a program. It is not like they have intrinsic value. They only add
498value to a system when they support its design. Otherwise, the use of design
499patterns may only lead to a program that is more complex than necessary.
d1adbeef
EK
500
501\begin{quote}
502 The overuse of patterns tends to result from being patterns happy. We are
503 \emph{patterns happy} when we become so enamored of patterns that we simply
504 must use them in our code.~\cite[p.~24]{kerievsky2005}
505\end{quote}
506
507This can easily happen when relying largely on up-front design. Then it is
508natural, in the very beginning, to try to build in all the flexibility that one
509believes will be necessary throughout the lifetime of a software system.
510According to Joshua Kerievsky ``That sounds reasonable --- if you happen to be
511psychic.''~\cite[p.~1]{kerievsky2005} He is advocating what he believes is a
512better approach: To let software continually evolve. To start with a simple
513design that meets today's needs, and tackle future needs by refactoring to
514satisfy them. He believes that this is a more economic approach than investing
515time and money into a design that inevitably is going to change. By relying on
516continuously refactoring a system, its design can be made simpler without
517sacrificing flexibility. To be able to fully rely on this approach, it is of
518utter importance to have a reliable suit of tests to lean on. \See{testing} This
519makes the design process more natural and less characterized by difficult
520decisions that has to be made before proceeding in the process, and that is
521going to define a project for all of its unforeseeable future.
522
b289552b
EK
523\begin{comment}
524
137e0e7b
EK
525\section{Classification of refactorings}
526% only interesting refactorings
527% with 2 detailed examples? One for structured and one for intra-method?
528% Is replacing Bubblesort with Quick Sort considered a refactoring?
529
530\subsection{Structural refactorings}
531
f65da046 532\subsubsection{Primitive refactorings}
137e0e7b
EK
533
534% Composing Methods
535\explanation{Extract Method}{You have a code fragment that can be grouped
536together.}{Turn the fragment into a method whose name explains the purpose of
537the method.}
538
539\explanation{Inline Method}{A method's body is just as clear as its name.}{Put
540the method's body into the body of its callers and remove the method.}
541
542\explanation{Inline Temp}{You have a temp that is assigned to once with a simple
543expression, and the temp is getting in the way of other refactorings.}{Replace
544all references to that temp with the expression}
545
546% Moving Features Between Objects
547\explanation{Move Method}{A method is, or will be, using or used by more
548features of another class than the class on which it is defined.}{Create a new
549method with a similar body in the class it uses most. Either turn the old method
550into a simple delegation, or remove it altogether.}
551
552\explanation{Move Field}{A field is, or will be, used by another class more than
553the class on which it is defined}{Create a new field in the target class, and
554change all its users.}
555
556% Organizing Data
557\explanation{Replace Magic Number with Symbolic Constant}{You have a literal
558number with a particular meaning.}{Create a constant, name it after the meaning,
559and replace the number with it.}
560
561\explanation{Encapsulate Field}{There is a public field.}{Make it private and
562provide accessors.}
563
564\explanation{Replace Type Code with Class}{A class has a numeric type code that
8fae7b44 565does not affect its behavior.}{Replace the number with a new class.}
137e0e7b
EK
566
567\explanation{Replace Type Code with Subclasses}{You have an immutable type code
8fae7b44 568that affects the behavior of a class.}{Replace the type code with subclasses.}
137e0e7b
EK
569
570\explanation{Replace Type Code with State/Strategy}{You have a type code that
8fae7b44 571affects the behavior of a class, but you cannot use subclassing.}{Replace the
137e0e7b
EK
572type code with a state object.}
573
574% Simplifying Conditional Expressions
575\explanation{Consolidate Duplicate Conditional Fragments}{The same fragment of
8fae7b44 576code is in all branches of a conditional expression.}{Move it outside of the
137e0e7b
EK
577expression.}
578
579\explanation{Remove Control Flag}{You have a variable that is acting as a
580control flag fro a series of boolean expressions.}{Use a break or return
581instead.}
582
583\explanation{Replace Nested Conditional with Guard Clauses}{A method has
8fae7b44 584conditional behavior that does not make clear the normal path of
137e0e7b
EK
585execution.}{Use guard clauses for all special cases.}
586
8fae7b44 587\explanation{Introduce Null Object}{You have repeated checks for a null
137e0e7b
EK
588value.}{Replace the null value with a null object.}
589
590\explanation{Introduce Assertion}{A section of code assumes something about the
591state of the program.}{Make the assumption explicit with an assertion.}
592
593% Making Method Calls Simpler
594\explanation{Rename Method}{The name of a method does not reveal its
595purpose.}{Change the name of the method}
596
597\explanation{Add Parameter}{A method needs more information from its
598caller.}{Add a parameter for an object that can pass on this information.}
599
600\explanation{Remove Parameter}{A parameter is no longer used by the method
601body.}{Remove it.}
602
603%\explanation{Parameterize Method}{Several methods do similar things but with
604%different values contained in the method.}{Create one method that uses a
605%parameter for the different values.}
606
607\explanation{Preserve Whole Object}{You are getting several values from an
608object and passing these values as parameters in a method call.}{Send the whole
609object instead.}
610
611\explanation{Remove Setting Method}{A field should be set at creation time and
612never altered.}{Remove any setting method for that field.}
613
614\explanation{Hide Method}{A method is not used by any other class.}{Make the
615method private.}
616
8fae7b44
EK
617\explanation{Replace Constructor with Factory Method}{You want to do more than
618simple construction when you create an object}{Replace the constructor with a
137e0e7b
EK
619factory method.}
620
621% Dealing with Generalization
8fae7b44 622\explanation{Pull Up Field}{Two subclasses have the same field.}{Move the field
137e0e7b
EK
623to the superclass.}
624
625\explanation{Pull Up Method}{You have methods with identical results on
626subclasses.}{Move them to the superclass.}
627
8fae7b44 628\explanation{Push Down Method}{Behavior on a superclass is relevant only for
137e0e7b
EK
629some of its subclasses.}{Move it to those subclasses.}
630
631\explanation{Push Down Field}{A field is used only by some subclasses.}{Move the
632field to those subclasses}
633
634\explanation{Extract Interface}{Several clients use the same subset of a class's
8fae7b44 635interface, or two classes have part of their interfaces in common.}{Extract the
137e0e7b
EK
636subset into an interface.}
637
638\explanation{Replace Inheritance with Delegation}{A subclass uses only part of a
639superclasses interface or does not want to inherit data.}{Create a field for the
640superclass, adjust methods to delegate to the superclass, and remove the
641subclassing.}
642
643\explanation{Replace Delegation with Inheritance}{You're using delegation and
644are often writing many simple delegations for the entire interface}{Make the
645delegating class a subclass of the delegate.}
646
647\subsubsection{Composite refactorings}
648
649% Composing Methods
650% \explanation{Replace Method with Method Object}{}{}
651
652% Moving Features Between Objects
653\explanation{Extract Class}{You have one class doing work that should be done by
654two}{Create a new class and move the relevant fields and methods from the old
655class into the new class.}
656
657\explanation{Inline Class}{A class isn't doing very much.}{Move all its features
658into another class and delete it.}
659
660\explanation{Hide Delegate}{A client is calling a delegate class of an
661object.}{Create Methods on the server to hide the delegate.}
662
663\explanation{Remove Middle Man}{A class is doing to much simple delegation.}{Get
664the client to call the delegate directly.}
665
666% Organizing Data
667\explanation{Replace Data Value with Object}{You have a data item that needs
8fae7b44 668additional data or behavior.}{Turn the data item into an object.}
137e0e7b
EK
669
670\explanation{Change Value to Reference}{You have a class with many equal
671instances that you want to replace with a single object.}{Turn the object into a
672reference object.}
673
674\explanation{Encapsulate Collection}{A method returns a collection}{Make it
8fae7b44 675return a read-only view and provide add/remove methods.}
137e0e7b
EK
676
677% \explanation{Replace Array with Object}{}{}
678
679\explanation{Replace Subclass with Fields}{You have subclasses that vary only in
680methods that return constant data.}{Change the methods to superclass fields and
681eliminate the subclasses.}
682
683% Simplifying Conditional Expressions
684\explanation{Decompose Conditional}{You have a complicated conditional
685(if-then-else) statement.}{Extract methods from the condition, then part, an
686else part.}
687
688\explanation{Consolidate Conditional Expression}{You have a sequence of
689conditional tests with the same result.}{Combine them into a single conditional
690expression and extract it.}
691
692\explanation{Replace Conditional with Polymorphism}{You have a conditional that
8fae7b44 693chooses different behavior depending on the type of an object.}{Move each leg
137e0e7b
EK
694of the conditional to an overriding method in a subclass. Make the original
695method abstract.}
696
697% Making Method Calls Simpler
698\explanation{Replace Parameter with Method}{An object invokes a method, then
699passes the result as a parameter for a method. The receiver can also invoke this
700method.}{Remove the parameter and let the receiver invoke the method.}
701
702\explanation{Introduce Parameter Object}{You have a group of parameters that
703naturally go together.}{Replace them with an object.}
704
705% Dealing with Generalization
706\explanation{Extract Subclass}{A class has features that are used only in some
707instances.}{Create a subclass for that subset of features.}
708
709\explanation{Extract Superclass}{You have two classes with similar
710features.}{Create a superclass and move the common features to the
711superclass.}
712
713\explanation{Collapse Hierarchy}{A superclass and subclass are not very
714different.}{Merge them together.}
715
716\explanation{Form Template Method}{You have two methods in subclasses that
717perform similar steps in the same order, yet the steps are different.}{Get the
718steps into methods with the same signature, so that the original methods become
719the same. Then you can pull them up.}
720
721
722\subsection{Functional refactorings}
723
724\explanation{Substitute Algorithm}{You want to replace an algorithm with one
725that is clearer.}{Replace the body of the method with the new algorithm.}
00aa0588 726
b289552b 727\end{comment}
00aa0588
EK
728
729\section{The impact on software quality}
730
a1bafe90 731\subsection{What is software quality?}
00aa0588 732The term \emph{software quality} has many meanings. It all depends on the
9a55a5bc 733context we put it in. If we look at it with the eyes of a software developer, it
a1bafe90 734usually means that the software is easily maintainable and testable, or in other
9a55a5bc
EK
735words, that it is \emph{well designed}. This often correlates with the
736management scale, where \emph{keeping the schedule} and \emph{customer
137e0e7b
EK
737satisfaction} is at the center. From the customers point of view, in addition to
738good usability, \emph{performance} and \emph{lack of bugs} is always
739appreciated, measurements that are also shared by the software developer. (In
740addition, such things as good documentation could be measured, but this is out
741of the scope of this document.)
9a55a5bc 742
00aa0588 743\subsection{The impact on performance}
9a55a5bc 744\begin{quote}
a1bafe90
EK
745 Refactoring certainly will make software go more slowly\footnote{With todays
746 compiler optimization techniques and performance tuning of e.g. the Java
747virtual machine, the penalties of object creation and method calls are
748debatable.}, but it also makes the software more amenable to performance
749tuning.~\cite[p.~69]{refactoring}
9a55a5bc 750\end{quote}
ee45c41f
EK
751
752\noindent There is a common belief that refactoring compromises performance, due
753to increased degree of indirection and that polymorphism is slower than
9a55a5bc
EK
754conditionals.
755
b5c7bb1b 756In a survey, Demeyer\citing{demeyer2002} disproves this view in the case of
a1bafe90 757polymorphism. He did an experiment on, what he calls, ``Transform Self Type
9a55a5bc
EK
758Checks'' where you introduce a new polymorphic method and a new class hierarchy
759to get rid of a class' type checking of a ``type attribute``. He uses this kind
760of transformation to represent other ways of replacing conditionals with
761polymorphism as well. The experiment is performed on the C++ programming
a1bafe90
EK
762language and with three different compilers and platforms. Demeyer concludes
763that, with compiler optimization turned on, polymorphism beats middle to large
764sized if-statements and does as well as case-statements. (In accordance with
765his hypothesis, due to similarities between the way C++ handles polymorphism and
766case-statements.)
ee45c41f 767
9a55a5bc
EK
768\begin{quote}
769 The interesting thing about performance is that if you analyze most programs,
b5c7bb1b 770 you find that they waste most of their time in a small fraction of the
4cb06723 771 code.~\cite[p.~70]{refactoring}
9a55a5bc 772\end{quote}
9a55a5bc 773
ee45c41f
EK
774\noindent So, although an increased amount of method calls could potentially
775slow down programs, one should avoid premature optimization and sacrificing good
776design, leaving the performance tuning until after profiling\footnote{For and
777 example of a Java profiler, check out VisualVM:
778 \url{http://visualvm.java.net/}} the software and having isolated the actual
779 problem areas.
00aa0588 780
0d7fbd88 781\section{Composite refactorings}\label{compositeRefactorings}
f3a108c3
EK
782\todo{motivation, examples, manual vs automated?, what about refactoring in a
783very large code base?}
6065c96c 784Generally, when thinking about refactoring, at the mechanical level, there are
f65da046 785essentially two kinds of refactorings. There are the \emph{primitive}
a1bafe90 786refactorings, and the \emph{composite} refactorings.
6065c96c 787
6c51af15
EK
788\definition{A \emph{primitive refactoring} is a refactoring that cannot be
789expressed in terms of other refactorings.}
f65da046 790
b5c7bb1b 791\noindent Examples are the \refactoring{Pull Up Field} and \refactoring{Pull Up
a1bafe90 792Method} refactorings\citing{refactoring}, that move members up in their class
ee45c41f
EK
793hierarchies.
794
6c51af15
EK
795\definition{A \emph{composite refactoring} is a refactoring that can be
796expressed in terms of two or more other refactorings.}
f65da046 797
b5c7bb1b
EK
798\noindent An example of a composite refactoring is the \refactoring{Extract
799Superclass} refactoring\citing{refactoring}. In its simplest form, it is composed
800of the previously described primitive refactorings, in addition to the
801\refactoring{Pull Up Constructor Body} refactoring\citing{refactoring}. It works
802by creating an abstract superclass that the target class(es) inherits from, then
803by applying \refactoring{Pull Up Field}, \refactoring{Pull Up Method} and
804\refactoring{Pull Up Constructor Body} on the members that are to be members of
805the new superclass. For an overview of the \refactoring{Extract Superclass}
8b6b22c8 806refactoring, see \myref{fig:extractSuperclass}.
6065c96c 807
ddcea0b5
EK
808\begin{figure}[h]
809 \centering
faa9f4f3 810 \includegraphics[angle=270,width=\linewidth]{extractSuperclassItalic.pdf}
ddcea0b5
EK
811 \caption{The Extract Superclass refactoring}
812 \label{fig:extractSuperclass}
813\end{figure}
6065c96c
EK
814
815\section{Manual vs. automated refactorings}
0d7fbd88 816Refactoring is something every programmer does, even if \heshe does not known
f65da046
EK
817the term \emph{refactoring}. Every refinement of source code that does not alter
818the program's behavior is a refactoring. For small refactorings, such as
0d7fbd88 819\ExtractMethod, executing it manually is a manageable task, but is still prone
a1bafe90 820to errors. Getting it right the first time is not easy, considering the method
f65da046
EK
821signature and all the other aspects of the refactoring that has to be in place.
822
823Take for instance the renaming of classes, methods and fields. For complex
824programs these refactorings are almost impossible to get right. Attacking them
825with textual search and replace, or even regular expressions, will fall short on
826these tasks. Then it is crucial to have proper tool support that can perform
a1bafe90
EK
827them automatically. Tools that can parse source code and thus have semantic
828knowledge about which occurrences of which names belong to what construct in the
829program. For even trying to perform one of these complex task manually, one
830would have to be very confident on the existing test suite \see{testing}.
00aa0588 831
19c4f27d 832\section{Correctness of refactorings}\label{correctness}
f65da046 833For automated refactorings to be truly useful, they must show a high degree of
4e135659
EK
834behavior preservation. This last sentence might seem obvious, but there are
835examples of refactorings in existing tools that break programs. I will now
836present an example of an \ExtractMethod refactoring followed by a \MoveMethod
837refactoring that breaks a program in both the \emph{Eclipse} and \emph{IntelliJ}
a1bafe90
EK
838IDEs\footnote{The NetBeans IDE handles this particular situation without
839 altering ther program's beavior, mainly because its Move Method refactoring
840 implementation is a bit rancid in other ways \see{toolSupport}.}. The
841 following piece of code shows the target for the composed refactoring:
4e135659
EK
842
843\begin{minted}[linenos,samepage]{java}
ddcea0b5
EK
844public class C {
845 public X x = new X();
ee45c41f 846
ddcea0b5
EK
847 public void f() {
848 x.m(this);
849 x.n();
850 }
851}
852\end{minted}
ee45c41f
EK
853
854\noindent The next piece of code shows the destination of the refactoring. Note
3510e539
EK
855that the method \method{m(C c)} of class \type{C} assigns to the field \var{x}
856of the argument \var{c} that has type \type{C}:
ee45c41f 857
4e135659 858\begin{minted}[samepage]{java}
ee45c41f
EK
859public class X {
860 public void m(C c) {
861 c.x = new X();
862 }
863 public void n() {}
864}
865\end{minted}
866
867The refactoring sequence works by extracting line 5 and 6 from the original
3510e539
EK
868class \type{C} into a method \method{f} with the statements from those lines as
869its method body. The method is then moved to the class \type{X}. The result is
ee45c41f
EK
870shown in the following two pieces of code:
871
4e135659 872\begin{minted}[linenos,samepage]{java}
ee45c41f
EK
873public class C {
874 public X x = new X();
875
876 public void f() {
877 x.f(this);
878 }
879}
880\end{minted}
881
4e135659 882\begin{minted}[linenos,samepage]{java}
ee45c41f
EK
883public class X {
884 public void m(C c) {
885 c.x = new X();
886 }
887 public void n() {}
888 public void f(C c) {
889 m(c);
890 n();
891 }
892}
893\end{minted}
894
a1bafe90
EK
895After the refactoring, the method \method{f} of class \type{C} is calling the
896method \method{f} of class \type{X}, and the program now behaves different than
897before. (See line 5 of the version of class \type{C} after the refactoring.)
898Before the refactoring, the methods \method{m} and \method{n} of class \type{X}
899are called on different object instances (see line 5 and 6 of the original class
900\type{C}). After, they are called on the same object, and the statement on line
9013 of class \type{X} (the version after the refactoring) no longer have any
902 effect in our example.
ddcea0b5 903
aa1e3779
EK
904The bug introduced in the previous example is of such a nature\footnote{Caused
905 by aliasing. See \url{https://en.wikipedia.org/wiki/Aliasing_(computing)}}
906 that it is very difficult to spot if the refactored code is not covered by
907 tests. It does not generate compilation errors, and will thus only result in
908 a runtime error or corrupted data, which might be hard to detect.
19c4f27d
EK
909
910\section{Refactoring and testing}\label{testing}
911\begin{quote}
912 If you want to refactor, the essential precondition is having solid
913 tests.\citing{refactoring}
914\end{quote}
915
916When refactoring, there are roughly two kinds of errors that can be made. There
917are errors that make the code unable to compile, and there are the silent
918errors, only popping up at runtime. Compile-time errors are the nice ones. They
919flash up at the moment they are made (at least when using an IDE), and are
920usually easy to fix. The other kind of error is the dangerous one. It is the
8b6b22c8
EK
921kind of error introduced in the example of \myref{correctness}. It is an error
922sneaking into your code without you noticing, maybe. For discovering those kind
923of errors when refactoring, it is essential to have good test coverage. It is
924not a way to \emph{prove} that the code is correct, but it is a way to make you
925confindent that it \emph{probably} works as desired. In the context of test
19c4f27d
EK
926driven development, the tests are even a way to define how the program is
927supposed to work. It is then, by definition, working if the tests are passing.
928
929If the test coverage for a code base is perfect, then it should, theoretically,
a1bafe90
EK
930be risk-free to perform refactorings on it. This is why tests and refactoring
931are such a great match.
f65da046
EK
932
933\section{Software metrics}
d1adbeef 934\todoin{Is this the appropriate place to have this section?}
00aa0588
EK
935
936%\part{The project}
937%\chapter{Planning the project}
938%\part{Conclusion}
939%\chapter{Results}
940
b0e80574 941
3b7c1d90
EK
942
943\chapter{\ldots}
4e135659 944\todoin{write}
3b7c1d90 945\section{The problem statement}
3f929fcc
EK
946\section{Choosing the target language}
947Choosing which programming language to use as the target for manipulation is not
a1bafe90 948a very difficult task. The language has to be an object-oriented programming
3f929fcc
EK
949language, and it must have existing tool support for refactoring. The
950\emph{Java} programming language\footnote{\url{https://www.java.com/}} is the
951dominating language when it comes to examples in the literature of refactoring,
952and is thus a natural choice. Java is perhaps, currently the most influential
953programming language in the world, with its \emph{Java Virtual Machine} that
954runs on all of the most popular architectures and also supports\footnote{They
955compile to java bytecode.} dozens of other programming languages, with
956\emph{Scala}, \emph{Clojure} and \emph{Groovy} as the most prominent ones. Java
957is currently the language that every other programming language is compared
958against. It is also the primary language of the author of this thesis.
959
960\section{Choosing the tools}
961When choosing a tool for manipulating Java, there are certain criterias that
962have to be met. First of all, the tool should have some existing refactoring
963support that this thesis can build upon. Secondly it should provide some kind of
964framework for parsing and analyzing Java source code. Third, it should itself be
965open source. This is both because of the need to be able to browse the code for
966the existing refactorings that is contained in the tool, and also because open
967source projects hold value in them selves. Another important aspect to consider
968is that open source projects of a certain size, usually has large communities of
969people connected to them, that are commited to answering questions regarding the
970use and misuse of the products, that to a large degree is made by the cummunity
971itself.
972
973There is a certain class of tools that meet these criterias, namely the class of
974\emph{IDEs}\footnote{\emph{Integrated Development Environment}}. These are
975proagrams that is ment to support the whole production cycle of a cumputer
976program, and the most popular IDEs that support Java, generally have quite good
977refactoring support.
978
4e135659
EK
979The main contenders for this thesis is the \emph{Eclipse IDE}, with the
980\emph{Java development tools} (JDT), the \emph{IntelliJ IDEA Community Edition}
981and the \emph{NetBeans IDE}. \See{toolSupport} Eclipse and NetBeans are both
982free, open source and community driven, while the IntelliJ IDEA has an open
983sourced community edition that is free of charge, but also offer an
984\emph{Ultimate Edition} with an extended set of features, at additional cost.
985All three IDEs supports adding plugins to extend their functionality and tools
aa1e3779
EK
986that can be used to parse and analyze Java source code. But one of the IDEs
987stand out as a favorite, and that is the \emph{Eclipse IDE}. This is the most
988popular\citing{javaReport2011} among them and seems to be de facto standard IDE
989for Java development regardless of platform.
4e135659 990
3b7c1d90 991
5837a41f
EK
992\chapter{Refactorings in Eclipse JDT: Design, Shortcomings and Wishful
993Thinking}\label{ch:jdt_refactorings}
994
995This chapter will deal with some of the design behind refactoring support in
996Eclipse, and the JDT in specific. After which it will follow a section about
997shortcomings of the refactoring API in terms of composition of refactorings. The
998chapter will be concluded with a section telling some of the ways the
999implementation of refactorings in the JDT could have worked to facilitate
1000composition of refactorings.
055dca93 1001
b0e80574 1002\section{Design}
f041551b 1003The refactoring world of Eclipse can in general be separated into two parts: The
b289552b 1004language independent part and the part written for a specific programming
07e173d4
EK
1005language -- the language that is the target of the supported refactorings.
1006\todo{What about the language specific part?}
f041551b
EK
1007
1008\subsection{The Language Toolkit}
1009The Language Toolkit, or LTK for short, is the framework that is used to
1010implement refactorings in Eclipse. It is language independent and provides the
1011abstractions of a refactoring and the change it generates, in the form of the
1012classes \typewithref{org.eclipse.ltk.core.refactoring}{Refactoring} and
1013\typewithref{org.eclipse.ltk.core.refactoring}{Change}. (There is also parts of
1014the LTK that is concerned with user interaction, but they will not be discussed
1015here, since they are of little value to us and our use of the framework.)
1016
1017\subsubsection{The Refactoring Class}
1018The abstract class \type{Refactoring} is the core of the LTK framework. Every
1019refactoring that is going to be supported by the LTK have to end up creating an
1020instance of one of its subclasses. The main responsibilities of subclasses of
1021\type{Refactoring} is to implement template methods for condition checking
1022(\methodwithref{org.eclipse.ltk.core.refactoring.Refactoring}{checkInitialConditions}
1023and
1024\methodwithref{org.eclipse.ltk.core.refactoring.Refactoring}{checkFinalConditions}),
1025in addition to the
1026\methodwithref{org.eclipse.ltk.core.refactoring.Refactoring}{createChange}
07e173d4
EK
1027method that creates and returns an instance of the \type{Change} class.
1028
1029If the refactoring shall support that others participate in it when it is
1030executed, the refactoring has to be a processor-based
1031refactoring\typeref{org.eclipse.ltk.core.refactoring.participants.ProcessorBasedRefactoring}.
1032It then delegates to its given
1033\typewithref{org.eclipse.ltk.core.refactoring.participants}{RefactoringProcessor}
1034for condition checking and change creation.
f041551b
EK
1035
1036\subsubsection{The Change Class}
07e173d4
EK
1037This class is the base class for objects that is responsible for performing the
1038actual workspace transformations in a refactoring. The main responsibilities for
1039its subclasses is to implement the
1040\methodwithref{org.eclipse.ltk.core.refactoring.Change}{perform} and
1041\methodwithref{org.eclipse.ltk.core.refactoring.Change}{isValid} methods. The
1042\method{isValid} method verifies that the change object is valid and thus can be
1043executed by calling its \method{perform} method. The \method{perform} method
1044performs the desired change and returns an undo change that can be executed to
1045reverse the effect of the transformation done by its originating change object.
1046
61420ef7 1047\subsubsection{Executing a Refactoring}\label{executing_refactoring}
07e173d4
EK
1048The life cycle of a refactoring generally follows two steps after creation:
1049condition checking and change creation. By letting the refactoring object be
1050handled by a
1051\typewithref{org.eclipse.ltk.core.refactoring}{CheckConditionsOperation} that
1052in turn is handled by a
1053\typewithref{org.eclipse.ltk.core.refactoring}{CreateChangeOperation}, it is
1054assured that the change creation process is managed in a proper manner.
1055
1056The actual execution of a change object has to follow a detailed life cycle.
1057This life cycle is honored if the \type{CreateChangeOperation} is handled by a
1058\typewithref{org.eclipse.ltk.core.refactoring}{PerformChangeOperation}. If also
1059an undo manager\typeref{org.eclipse.ltk.core.refactoring.IUndoManager} is set
1060for the \type{PerformChangeOperation}, the undo change is added into the undo
1061history.
055dca93 1062
b0e80574 1063\section{Shortcomings}
80663734 1064This section is introduced naturally with a conclusion: The JDT refactoring
5837a41f
EK
1065implementation does not facilitate composition of refactorings.
1066\todo{refine}This section will try to explain why, and also identify other
1067shortcomings of both the usability and the readability of the JDT refactoring
1068source code.
80663734
EK
1069
1070I will begin at the end and work my way toward the composition part of this
1071section.
1072
5837a41f 1073\subsection{Absence of Generics in Eclipse Source Code}
80663734
EK
1074This section is not only concerning the JDT refactoring API, but also large
1075quantities of the Eclipse source code. The code shows a striking absence of the
1076Java language feature of generics. It is hard to read a class' interface when
5837a41f
EK
1077methods return objects or takes parameters of raw types such as \type{List} or
1078\type{Map}. This sometimes results in having to read a lot of source code to
1079understand what is going on, instead of relying on the available interfaces. In
1080addition, it results in a lot of ugly code, making the use of typecasting more
1081of a rule than an exception.
1082
1083\subsection{Composite Refactorings Will Not Appear as Atomic Actions}
1084
1085\subsubsection{Missing Flexibility from JDT Refactorings}
1086The JDT refactorings are not made with composition of refactorings in mind. When
1087a JDT refactoring is executed, it assumes that all conditions for it to be
1088applied successfully can be found by reading source files that has been
1089persisted to disk. They can only operate on the actual source material, and not
1090(in-memory) copies thereof. This constitutes a major disadvantage when trying to
1091compose refactorings, since if an exception occur in the middle of a sequence of
1092refactorings, it can leave the project in a state where the composite
1093refactoring was executed only partly. It makes it hard to discard the changes
1094done without monitoring and consulting the undo manager, an approach that is not
1095bullet proof.
1096
1097\subsubsection{Broken Undo History}
1098When designing a composed refactoring that is to be performed as a sequence of
1099refactorings, you would like it to appear as a single change to the workspace.
1100This implies that you would also like to be able to undo all the changes done by
1101the refactoring in a single step. This is not the way it appears when a sequence
1102of JDT refactorings is executed. It leaves the undo history filled up with
1103individual undo actions corresponding to every single JDT refactoring in the
8b6b22c8
EK
1104sequence. This problem is not trivial to handle in Eclipse.
1105\See{hacking_undo_history}
5837a41f
EK
1106
1107\section{Wishful Thinking}
80663734 1108
80663734 1109
b0e80574
EK
1110\chapter{Composite Refactorings in Eclipse}
1111
1112\section{A Simple Ad Hoc Model}
8b6b22c8
EK
1113As pointed out in \myref{ch:jdt_refactorings}, the Eclipse JDT refactoring model
1114is not very well suited for making composite refactorings. Therefore a simple
1115model using changer objects (of type \type{RefaktorChanger}) is used as an
1116abstraction layer on top of the existing Eclipse refactorings.
b0e80574
EK
1117
1118\section{The Extract and Move Method Refactoring}
61420ef7
EK
1119%The Extract and Move Method Refactoring is implemented mainly using these
1120%classes:
1121%\begin{itemize}
1122% \item \type{ExtractAndMoveMethodChanger}
1123% \item \type{ExtractAndMoveMethodPrefixesExtractor}
1124% \item \type{Prefix}
1125% \item \type{PrefixSet}
1126%\end{itemize}
1127
1128\subsection{The Building Blocks}
1129This is a composite refactoring, and hence is built up using several primitive
b5c7bb1b
EK
1130refactorings. These basic building blocks are, as its name implies, the
1131\ExtractMethod refactoring\citing{refactoring} and the \MoveMethod
1132refactoring\citing{refactoring}. In Eclipse, the implementations of these
1133refactorings are found in the classes
61420ef7
EK
1134\typewithref{org.eclipse.jdt.internal.corext.refactoring.code}{ExtractMethodRefactoring}
1135and
1136\typewithref{org.eclipse.jdt.internal.corext.refactoring.structure}{MoveInstanceMethodProcessor},
1137where the last class is designed to be used together with the processor-based
1138\typewithref{org.eclipse.ltk.core.refactoring.participants}{MoveRefactoring}.
1139
1140\subsubsection{The ExtractMethodRefactoring Class}
1141This class is quite simple in its use. The only parameters it requires for
1142construction is a compilation
1143unit\typeref{org.eclipse.jdt.core.ICompilationUnit}, the offset into the source
1144code where the extraction shall start, and the length of the source to be
1145extracted. Then you have to set the method name for the new method together with
1146which access modifier that shall be used and some not so interesting parameters.
1147
1148\subsubsection{The MoveInstanceMethodProcessor Class}
1149For the Move Method the processor requires a little more advanced input than
1150the class for the Extract Method. For construction it requires a method
1151handle\typeref{org.eclipse.jdt.core.IMethod} from the Java Model for the method
1152that is to be moved. Then the target for the move have to be supplied as the
1153variable binding from a chosen variable declaration. In addition to this, one
1154have to set some parameters regarding setters/getters and delegation.
1155
1156To make a whole refactoring from the processor, one have to construct a
1157\type{MoveRefactoring} from it.
b0e80574
EK
1158
1159\subsection{The ExtractAndMoveMethodChanger Class}
61420ef7
EK
1160The \typewithref{no.uio.ifi.refaktor.changers}{ExtractAndMoveMethodChanger}
1161class, that is a subclass of the class
1162\typewithref{no.uio.ifi.refaktor.changers}{RefaktorChanger}, is the class
1163responsible for composing the \type{ExtractMethodRefactoring} and the
1164\type{MoveRefactoring}. Its constructor takes a project
1165handle\typeref{org.eclipse.core.resources.IProject}, the method name for the new
1166method and a \typewithref{no.uio.ifi.refaktor.utils}{SmartTextSelection}.
1167
1168A \type{SmartTextSelection} is basically a text
1169selection\typeref{org.eclipse.jface.text.ITextSelection} object that enforces
1170the providing of the underlying document during creation. I.e. its
1171\methodwithref{no.uio.ifi.refaktor.utils.SmartTextSelection}{getDocument} method
1172will never return \type{null}.
1173
1174Before extracting the new method, the possible targets for the move operation is
1175found with the help of an
1176\typewithref{no.uio.ifi.refaktor.extractors}{ExtractAndMoveMethodPrefixesExtractor}.
72b64328
EK
1177The possible targets is computed from the prefixes that the extractor returns
1178from its
61420ef7
EK
1179\methodwithref{no.uio.ifi.refaktor.extractors.ExtractAndMoveMethodPrefixesExtractor}{getSafePrefixes}
1180method. The changer then choose the most suitable target by finding the most
72b64328
EK
1181frequent occurring prefix among the safe ones. The target is the type of the
1182first part of the prefix.
61420ef7
EK
1183
1184After finding a suitable target, the \type{ExtractAndMoveMethodChanger} first
1185creates an \type{ExtractMethodRefactoring} and performs it as explained in
8b6b22c8 1186\myref{executing_refactoring} about the execution of refactorings. Then it
61420ef7
EK
1187creates and performs the \type{MoveRefactoring} in the same way, based on the
1188changes done by the Extract Method refactoring.
1189
b0e80574 1190\subsection{The ExtractAndMoveMethodPrefixesExtractor Class}
61420ef7 1191This extractor extracts properties needed for building the Extract and Move
72b64328
EK
1192Method refactoring. It searches through the given selection to find safe
1193prefixes, and those prefixes form a base that can be used to compute possible
1194targets for the move part of the refactoring. It finds both the candidates, in
1195the form of prefixes, and the non-candidates, called unfixes. All prefixes (and
1196unfixes) are represented by a
1197\typewithref{no.uio.ifi.refaktor.extractors}{Prefix}, and they are collected
1198into prefix sets.\typeref{no.uio.ifi.refaktor.extractors.PrefixSet}.
1199
1200The prefixes and unfixes are found by property
1201collectors\typeref{no.uio.ifi.refaktor.extractors.collectors.PropertyCollector}.
4cb06723
EK
1202A property collector follows the visitor pattern\citing{designPatterns} and is
1203of the \typewithref{org.eclipse.jdt.core.dom}{ASTVisitor} type. An
1204\type{ASTVisitor} visits nodes in an abstract syntax tree that forms the Java
1205document object model. The tree consists of nodes of type
72b64328
EK
1206\typewithref{org.eclipse.jdt.core.do}{ASTNode}.
1207
1208\subsubsection{The PrefixesCollector}
1209The \typewithref{no.uio.ifi.refaktor.extractors.collectors}{PrefixesCollector}
1210is of type \type{PropertyCollector}. It visits expression
1211statements\typeref{org.eclipse.jdt.core.dom.ExpressionStatement} and creates
1212prefixes from its expressions in the case of method invocations. The prefixes
1213found is registered with a prefix set, together with all its sub-prefixes.
1214\todo{Rewrite in the case of changes to the way prefixes are found}
1215
1216\subsubsection{The UnfixesCollector}
1217The \typewithref{no.uio.ifi.refaktor.extractors.collectors}{UnfixesCollector}
1218finds unfixes within the selection. An unfix is a name that is assigned to
1219within the selection. The reason that this cannot be allowed, is that the result
1220would be an assignment to the \type{this} keyword, which is not valid in Java.
1221
1222\subsubsection{Computing Safe Prefixes}
1223A safe prefix is a prefix that does not enclose an unfix. A prefix is enclosing
1224an unfix if the unfix is in the set of its sub-prefixes. As an example,
1225\texttt{``a.b''} is enclosing \texttt{``a''}, as is \texttt{``a''}. The safe
1226prefixes is unified in a \type{PrefixSet} and can be fetched calling the
1227\method{getSafePrefixes} method of the
1228\type{ExtractAndMoveMethodPrefixesExtractor}.
61420ef7 1229
b0e80574 1230\subsection{The Prefix Class}
72b64328 1231\todo{?}
b0e80574
EK
1232\subsection{The PrefixSet Class}
1233
5837a41f
EK
1234\subsection{Hacking the Refactoring Undo
1235History}\label{hacking_undo_history}
8fae7b44
EK
1236\todo{Where to put this section?}
1237
1238As an attempt to make multiple subsequent changes to the workspace appear as a
1239single action (i.e. make the undo changes appear as such), I tried to alter
1240the undo changes\typeref{org.eclipse.ltk.core.refactoring.Change} in the history
1241of the refactorings.
1242
1243My first impulse was to remove the, in this case, last two undo changes from the
f041551b
EK
1244undo manager\typeref{org.eclipse.ltk.core.refactoring.IUndoManager} for the
1245Eclipse refactorings, and then add them to a composite
8fae7b44
EK
1246change\typeref{org.eclipse.ltk.core.refactoring.CompositeChange} that could be
1247added back to the manager. The interface of the undo manager does not offer a
1248way to remove/pop the last added undo change, so a possible solution could be to
4cb06723
EK
1249decorate\citing{designPatterns} the undo manager, to intercept and collect the
1250undo changes before delegating to the \method{addUndo}
f041551b 1251method\methodref{org.eclipse.ltk.core.refactoring.IUndoManager}{addUndo} of the
8fae7b44
EK
1252manager. Instead of giving it the intended undo change, a null change could be
1253given to prevent it from making any changes if run. Then one could let the
1254collected undo changes form a composite change to be added to the manager.
1255
1256There is a technical challenge with this approach, and it relates to the undo
1257manager, and the concrete implementation
1258UndoManager2\typeref{org.eclipse.ltk.internal.core.refactoring.UndoManager2}.
1259This implementation is designed in a way that it is not possible to just add an
1260undo change, you have to do it in the context of an active
1261operation\typeref{org.eclipse.core.commands.operations.TriggeredOperations}.
1262One could imagine that it might be possible to trick the undo manager into
1263believing that you are doing a real change, by executing a refactoring that is
1264returning a kind of null change that is returning our composite change of undo
1265refactorings when it is performed.
1266
1267Apart from the technical problems with this solution, there is a functional
1268problem: If it all had worked out as planned, this would leave the undo history
1269in a dirty state, with multiple empty undo operations corresponding to each of
1270the sequentially executed refactoring operations, followed by a composite undo
1271change corresponding to an empty change of the workspace for rounding of our
1272composite refactoring. The solution to this particular problem could be to
1273intercept the registration of the intermediate changes in the undo manager, and
1274only register the last empty change.
1275
1276Unfortunately, not everything works as desired with this solution. The grouping
1277of the undo changes into the composite change does not make the undo operation
1278appear as an atomic operation. The undo operation is still split up into
1279separate undo actions, corresponding to the change done by its originating
1280refactoring. And in addition, the undo actions has to be performed separate in
1281all the editors involved. This makes it no solution at all, but a step toward
1282something worse.
1283
1284There might be a solution to this problem, but it remains to be found. The
1285design of the refactoring undo management is partly to be blamed for this, as it
1286it is to complex to be easily manipulated.
1287
b0e80574 1288
0d7fbd88
EK
1289
1290\chapter{Related Work}
1291
1292\section{The compositional paradigm of refactoring}
1293This paradigm builds upon the observation of Vakilian et
1294al.\citing{vakilian2012}, that of the many automated refactorings existing in
1295modern IDEs, the simplest ones are dominating the usage statistics. The report
1296mainly focuses on \emph{Eclipse} as the tool under investigation.
1297
1298The paradigm is described almost as the opposite of automated composition of
1299refactorings \see{compositeRefactorings}. It works by providing the programmer
1300with easily accessible primitive refactorings. These refactorings shall be
1301accessed via keyboard shortcuts or quick-assist menus\footnote{Think
1302quick-assist with Ctrl+1 in Eclipse} and be promptly executed, opposed to in the
1303currently dominating wizard-based refactoring paradigm. They are ment to
1304stimulate composing smaller refactorings into more complex changes, rather than
1305doing a large upfront configuration of a wizard-based refactoring, before
1306previewing and executing it. The compositional paradigm of refactoring is
1307supposed to give control back to the programmer, by supporting \himher with an
1308option of performing small rapid changes instead of large changes with a lesser
1309degree of control. The report authors hope this will lead to fewer unsuccessful
1310refactorings. It also could lower the bar for understanding the steps of a
1311larger composite refactoring and thus also help in figuring out what goes wrong
1312if one should choose to op in on a wizard-based refactoring.
1313
1314Vakilian and his associates have performed a survey of the effectiveness of the
1315compositional paradigm versus the wizard-based one. They claim to have found
1316evidence of that the \emph{compositional paradigm} outperforms the
1317\emph{wizard-based}. It does so by reducing automation, which seem
1318counterintuitive. Therefore they ask the question ``What is an appropriate level
1319of automation?'', and thus questions what they feel is a rush toward more
1320automation in the software engineering community.
1321
1322
9ff90080
EK
1323\backmatter{}
1324\printbibliography
055dca93 1325\listoftodos
9ff90080 1326\end{document}