]>
Commit | Line | Data |
---|---|---|
1 | \documentclass[USenglish,draft]{ifimaster} | |
2 | \usepackage{import} | |
3 | \usepackage[utf8]{inputenc} | |
4 | \usepackage[T1]{fontenc,url} | |
5 | \usepackage{lmodern} % using Latin Modern to be able to use bold typewriter font | |
6 | \urlstyle{sf} | |
7 | \usepackage{babel,textcomp,csquotes,ifimasterforside,varioref,graphicx} | |
8 | \usepackage[hidelinks]{hyperref} | |
9 | \usepackage{cleveref} | |
10 | \usepackage[style=numeric-comp,backend=bibtex]{biblatex} | |
11 | \usepackage{amsthm} | |
12 | \usepackage[obeyDraft]{todonotes} | |
13 | \usepackage{xspace} | |
14 | \usepackage{he-she} | |
15 | \usepackage{verbatim} | |
16 | \usepackage{minted} | |
17 | \usepackage{multicol} | |
18 | \usemintedstyle{bw} | |
19 | \usepackage{perpage} %the perpage package | |
20 | \MakePerPage{footnote} %the perpage package command | |
21 | ||
22 | \theoremstyle{definition} | |
23 | \newtheorem*{wordDef}{Definition} | |
24 | ||
25 | \graphicspath{ {./figures/} } | |
26 | ||
27 | \newcommand{\citing}[1]{~\cite{#1}} | |
28 | \newcommand{\myref}[1]{\cref{#1} on \cpageref{#1}} | |
29 | ||
30 | \newcommand{\definition}[1]{\begin{wordDef}#1\end{wordDef}} | |
31 | \newcommand{\see}[1]{(see \myref{#1})} | |
32 | \newcommand{\See}[1]{(See \myref{#1}.)} | |
33 | \newcommand{\explanation}[3]{\noindent\textbf{\textit{#1}}\\*\emph{When:} | |
34 | #2\\*\emph{How:} #3\\*[-7px]} | |
35 | ||
36 | \newcommand{\type}[1]{\texttt{\textbf{#1}}} | |
37 | \newcommand{\typeref}[1]{\footnote{\type{#1}}} | |
38 | \newcommand{\typewithref}[2]{\type{#2}\typeref{#1.#2}} | |
39 | \newcommand{\method}[1]{\type{#1}} | |
40 | \newcommand{\methodref}[2]{\footnote{\type{#1}\method{\##2()}}} | |
41 | \newcommand{\methodwithref}[2]{\method{#2}\footnote{\type{#1}\method{\##2()}}} | |
42 | \newcommand{\var}[1]{\type{#1}} | |
43 | ||
44 | \newcommand{\refactoring}[1]{\emph{#1}} | |
45 | \newcommand{\ExtractMethod}{\refactoring{Extract Method}\xspace} | |
46 | \newcommand{\MoveMethod}{\refactoring{Move Method}\xspace} | |
47 | ||
48 | \newcommand\todoin[2][]{\todo[inline, caption={2do}, #1]{ | |
49 | \begin{minipage}{\textwidth-4pt}#2\end{minipage}}} | |
50 | ||
51 | \title{Refactoring} | |
52 | \subtitle{An essay} | |
53 | \author{Erlend Kristiansen} | |
54 | ||
55 | \bibliography{bibliography/master-thesis-erlenkr-bibliography} | |
56 | ||
57 | \begin{document} | |
58 | \ififorside | |
59 | \frontmatter{} | |
60 | ||
61 | ||
62 | \chapter*{Abstract} | |
63 | \todoin{\textbf{Remove all todos (including list) before delivery/printing!!! | |
64 | Can be done by removing ``draft'' from documentclass.}} | |
65 | \todoin{Write abstract} | |
66 | ||
67 | \tableofcontents{} | |
68 | \listoffigures{} | |
69 | \listoftables{} | |
70 | ||
71 | \chapter*{Preface} | |
72 | ||
73 | The discussions in this report must be seen in the context of object oriented | |
74 | programming languages, and Java in particular, since that is the language in | |
75 | which most of the examples will be given. All though the techniques discussed | |
76 | may be applicable to languages from other paradigms, they will not be the | |
77 | subject of this report. | |
78 | ||
79 | \mainmatter | |
80 | ||
81 | \chapter{What is Refactoring?} | |
82 | ||
83 | This question is best answered by first defining the concept of a | |
84 | \emph{refactoring}, what it is to \emph{refactor}, and then discuss what aspects | |
85 | of programming make people want to refactor their code. | |
86 | ||
87 | \section{Defining refactoring} | |
88 | Martin Fowler, in his classic book on refactoring\citing{refactoring}, defines a | |
89 | refactoring like this: | |
90 | ||
91 | \begin{quote} | |
92 | \emph{Refactoring} (noun): a change made to the internal | |
93 | structure\footnote{The structure observable by the programmer.} of software to | |
94 | make it easier to understand and cheaper to modify without changing its | |
95 | observable behavior.~\cite[p.~53]{refactoring} | |
96 | \end{quote} | |
97 | ||
98 | \noindent This definition assigns additional meaning to the word | |
99 | \emph{refactoring}, beyond the composition of the prefix \emph{re-}, usually | |
100 | meaning something like ``again'' or ``anew'', and the word \emph{factoring}, | |
101 | that can mean to isolate the \emph{factors} of something. Here a \emph{factor} | |
102 | would be close to the mathematical definition of something that divides a | |
103 | quantity, without leaving a remainder. Fowler is mixing the \emph{motivation} | |
104 | behind refactoring into his definition. Instead it could be more refined, formed | |
105 | to only consider the \emph{mechanical} and \emph{behavioral} aspects of | |
106 | refactoring. That is to factor the program again, putting it together in a | |
107 | different way than before, while preserving the behavior of the program. An | |
108 | alternative definition could then be: | |
109 | ||
110 | \definition{A \emph{refactoring} is a transformation | |
111 | done to a program without altering its external behavior.} | |
112 | ||
113 | From this we can conclude that a refactoring primarily changes how the | |
114 | \emph{code} of a program is perceived by the \emph{programmer}, and not the | |
115 | \emph{behavior} experienced by any user of the program. Although the logical | |
116 | meaning is preserved, such changes could potentially alter the program's | |
117 | behavior when it comes to performance gain or -penalties. So any logic depending | |
118 | on the performance of a program could make the program behave differently after | |
119 | a refactoring. | |
120 | ||
121 | In the extreme case one could argue that such a thing as \emph{software | |
122 | obfuscation} is refactoring. Software obfuscation is to make source code harder | |
123 | to read and analyze, while preserving its semantics. It could be done composing | |
124 | many, more or less randomly chosen, refactorings. Then the question arise | |
125 | whether it can be called a \emph{composite refactoring} | |
126 | \see{compositeRefactorings} or not? The answer is not obvious. First, there is | |
127 | no way to describe \emph{the} mechanics of software obfuscation, beacause there | |
128 | are infinitely many ways to do that. Second, \emph{obfuscation} can be thought | |
129 | of as \emph{one operation}: Either the code is obfuscated, or it is not. Third, | |
130 | it makes no sense to call software obfuscation \emph{a} refactoring, since it | |
131 | holds different meaning to different people. The last point is important, since | |
132 | one of the motivations behind defining different refactorings is to build up a | |
133 | vocabulary for software professionals to reason and discuss about programs, | |
134 | similar to the motivation behind design patterns\citing{designPatterns}. So for | |
135 | describing \emph{software obfuscation}, it might be more appropriate to define | |
136 | what you do when performing it rather than precisely defining its mechanics in | |
137 | terms of other refactorings. | |
138 | ||
139 | \section{The etymology of 'refactoring'} | |
140 | It is a little difficult to pinpoint the exact origin of the word | |
141 | ``refactoring'', as it seems to have evolved as part of a colloquial | |
142 | terminology, more than a scientific term. There is no authoritative source for a | |
143 | formal definition of it. | |
144 | ||
145 | According to Martin Fowler\citing{etymology-refactoring}, there may also be more | |
146 | than one origin of the word. The most well-known source, when it comes to the | |
147 | origin of \emph{refactoring}, is the Smalltalk\footnote{\emph{Smalltalk}, | |
148 | object-oriented, dynamically typed, reflective programming language. See | |
149 | \url{http://www.smalltalk.org}} community and their infamous \emph{Refactoring | |
150 | Browser}\footnote{\url{http://st-www.cs.illinois.edu/users/brant/Refactory/RefactoringBrowser.html}} | |
151 | described in the article \emph{A Refactoring Tool for | |
152 | Smalltalk}\citing{refactoringBrowser1997}, published in 1997. | |
153 | Allegedly\citing{etymology-refactoring}, the metaphor of factoring programs was | |
154 | also present in the Forth\footnote{\emph{Forth} -- stack-based, extensible | |
155 | programming language, without type-checking. See \url{http://www.forth.org}} | |
156 | community, and the word ``refactoring'' is mentioned in a book by Leo Brodie, | |
157 | called \emph{Thinking Forth}\citing{brodie1984}, first published in | |
158 | 1984\footnote{\emph{Thinking Forth} was first published in 1984 by the | |
159 | \emph{Forth Interest Group}. Then it was reprinted in 1994 with minor | |
160 | typographical corrections, before it was transcribed into an electronic edition | |
161 | typeset in \LaTeX\ and published under a Creative Commons licence in 2004. The | |
162 | edition cited here is the 2004 edition, but the content should essentially be as | |
163 | in 1984.}. The exact word is only printed one place~\cite[p.~232]{brodie1984}, | |
164 | but the term \emph{factoring} is prominent in the book, that also contains a | |
165 | whole chapter dedicated to (re)factoring, and how to keep the (Forth) code clean | |
166 | and maintainable. | |
167 | ||
168 | \begin{quote} | |
169 | \ldots good factoring technique is perhaps the most important skill for a | |
170 | Forth programmer.~\cite[p.~172]{brodie1984} | |
171 | \end{quote} | |
172 | ||
173 | \noindent Brodie also express what \emph{factoring} means to him: | |
174 | ||
175 | \begin{quote} | |
176 | Factoring means organizing code into useful fragments. To make a fragment | |
177 | useful, you often must separate reusable parts from non-reusable parts. The | |
178 | reusable parts become new definitions. The non-reusable parts become arguments | |
179 | or parameters to the definitions.~\cite[p.~172]{brodie1984} | |
180 | \end{quote} | |
181 | ||
182 | Fowler claims that the usage of the word \emph{refactoring} did not pass between | |
183 | the \emph{Forth} and \emph{Smalltalk} communities, but that it emerged | |
184 | independently in each of the communities. | |
185 | ||
186 | \section{Motivation -- Why people refactor} | |
187 | There are many reasons why people want to refactor their programs. They can for | |
188 | instance do it to remove duplication, break up long methods or to introduce | |
189 | design patterns\citing{designPatterns} into their software systems. The shared | |
190 | trait for all these are that peoples intentions are to make their programs | |
191 | \emph{better}, in some sense. But what aspects of their programs are becoming | |
192 | improved? | |
193 | ||
194 | As already mentioned, people often refactor to get rid of duplication. Moving | |
195 | identical or similar code into methods, and maybe pushing methods up or down in | |
196 | their class hierarchies. Making template methods for overlapping | |
197 | algorithms/functionality and so on. It is all about gathering what belongs | |
198 | together and putting it all in one place. The resulting code is then easier to | |
199 | maintain. When removing the implicit coupling\footnote{When duplicating code, | |
200 | the code might not be coupled in other ways than that it is supposed to | |
201 | represent the same functionality. So if this functionality is going to change, | |
202 | it might need to change in more than one place, thus creating an implicit | |
203 | coupling between the multiple pieces of code.} between code snippets, the | |
204 | location of a bug is limited to only one place, and new functionality need only | |
205 | to be added to this one place, instead of a number of places people might not | |
206 | even remember. | |
207 | ||
208 | A problem you often encounter when programming, is that a program contains a lot | |
209 | of long and hard-to-grasp methods. It can then help to break the methods into | |
210 | smaller ones, using the \ExtractMethod refactoring\citing{refactoring}. Then you | |
211 | may discover something about a program that you were not aware of before; | |
212 | revealing bugs you did not know about or could not find due to the complex | |
213 | structure of your program. \todo{Proof?} Making the methods smaller and giving | |
214 | good names to the new ones clarifies the algorithms and enhances the | |
215 | \emph{understandability} of the program \see{magic_number_seven}. This makes | |
216 | refactoring an excellent method for exploring unknown program code, or code that | |
217 | you had forgotten that you wrote. | |
218 | ||
219 | Most primitive refactorings are simple. Their true power is first revealed when | |
220 | they are combined into larger --- higher level --- refactorings, called | |
221 | \emph{composite refactorings} \see{compositeRefactorings}. Often the goal of | |
222 | such a series of refactorings is a design pattern. Thus the \emph{design} can be | |
223 | evolved throughout the lifetime of a program, as opposed to designing up-front. | |
224 | It is all about being structured and taking small steps to improve a program's | |
225 | design. | |
226 | ||
227 | Many software design pattern are aimed at lowering the coupling between | |
228 | different classes and different layers of logic. One of the most famous is | |
229 | perhaps the \emph{Model-View-Controller}\citing{designPatterns} pattern, or | |
230 | \emph{MVC} for short. It is aimed at lowering the coupling between the user | |
231 | interface and the business logic and data representation of a program. This also | |
232 | has the added benefit that the business logic could much easier be the target of | |
233 | automated tests, increasing the productivity in the software development | |
234 | process. Refactoring is an important tool on the way to something greater. | |
235 | ||
236 | Another effect of refactoring is that with the increased separation of concerns | |
237 | coming out of many refactorings, the \emph{performance} can be improved. When | |
238 | profiling programs, the problematic parts are narrowed down to smaller parts of | |
239 | the code, which are easier to tune, and optimization can be performed only where | |
240 | needed and in a more effective way. | |
241 | ||
242 | Last, but not least, and this should probably be the best reason to refactor, is | |
243 | to refactor to \emph{facilitate a program change}. If one has managed to keep | |
244 | one's code clean and tidy, and the code is not bloated with design patterns that | |
245 | are not ever going to be needed, then some refactoring might be needed to | |
246 | introduce a design pattern that is appropriate for the change that is going to | |
247 | happen. | |
248 | ||
249 | Refactoring program code --- with a goal in mind --- can give the code itself | |
250 | more value. That is in the form of robustness to bugs, understandability and | |
251 | maintainability. Having robust code is an obvious advantage, but | |
252 | understandability and maintainability are both very important aspects of | |
253 | software development. By incorporating refactoring in the development process, | |
254 | bugs are found faster, new functionality is added more easily and code is easier | |
255 | to understand by the next person exposed to it, which might as well be the | |
256 | person who wrote it. The consequence of this, is that refactoring can increase | |
257 | the average productivity of the development process, and thus also add to the | |
258 | monetary value of a business in the long run. The perspective on productivity | |
259 | and money should also be able to open the eyes of the many nearsighted managers | |
260 | that seldom see beyond the next milestone. | |
261 | ||
262 | \section{The magical number seven}\label{magic_number_seven} | |
263 | The article \emph{The magical number seven, plus or minus two: some limits on | |
264 | our capacity for processing information}\citing{miller1956} by George A. | |
265 | Miller, was published in the journal \emph{Psychological Review} in 1956. It | |
266 | presents evidence that support that the capacity of the number of objects a | |
267 | human being can hold in its working memory is roughly seven, plus or minus two | |
268 | objects. This number varies a bit depending on the nature and complexity of the | |
269 | objects, but is according to Miller ``\ldots never changing so much as to be | |
270 | unrecognizable.'' | |
271 | ||
272 | Miller's article culminates in the section called \emph{Recoding}, a term he | |
273 | borrows from communication theory. The central result in this section is that by | |
274 | recoding information, the capacity of the amount of information that a human can | |
275 | process at a time is increased. By \emph{recoding}, Miller means to group | |
276 | objects together in chunks and give each chunk a new name that it can be | |
277 | remembered by. By organizing objects into patterns of ever growing depth, one | |
278 | can memorize and process a much larger amount of data than if it were to be | |
279 | represented as its basic pieces. This grouping and renaming is analogous to how | |
280 | many refactorings work, by grouping pieces of code and give them a new name. | |
281 | Examples are the fundamental \ExtractMethod and \refactoring{Extract Class} | |
282 | refactorings\citing{refactoring}. | |
283 | ||
284 | \begin{quote} | |
285 | \ldots recoding is an extremely powerful weapon for increasing the amount of | |
286 | information that we can deal with.~\cite[p.~95]{miller1956} | |
287 | \end{quote} | |
288 | ||
289 | An example from the article addresses the problem of memorizing a sequence of | |
290 | binary digits. Let us say we have the following sequence\footnote{The example | |
291 | presented here is slightly modified (and shortened) from what is presented in | |
292 | the original article\citing{miller1956}, but it is essentially the same.} of | |
293 | 16 binary digits: ``1010001001110011''. Most of us will have a hard time | |
294 | memorizing this sequence by only reading it once or twice. Imagine if we instead | |
295 | translate it to this sequence: ``A273''. If you have a background from computer | |
296 | science, it will be obvious that the latest sequence is the first sequence | |
297 | recoded to be represented by digits with base 16. Most people should be able to | |
298 | memorize this last sequence by only looking at it once. | |
299 | ||
300 | Another result from the Miller article is that when the amount of information a | |
301 | human must interpret increases, it is crucial that the translation from one code | |
302 | to another must be almost automatic for the subject to be able to remember the | |
303 | translation, before \heshe is presented with new information to recode. Thus | |
304 | learning and understanding how to best organize certain kinds of data is | |
305 | essential to efficiently handle that kind of data in the future. This is much | |
306 | like when humans learn to read. First they must learn how to recognize letters. | |
307 | Then they can learn distinct words, and later read sequences of words that form | |
308 | whole sentences. Eventually, most of them will be able to read whole books and | |
309 | briefly retell the important parts of its content. This suggest that the use of | |
310 | design patterns\citing{designPatterns} is a good idea when reasoning about | |
311 | computer programs. With extensive use of design patterns when creating complex | |
312 | program structures, one does not always have to read whole classes of code to | |
313 | comprehend how they function, it may be sufficient to only see the name of a | |
314 | class to almost fully understand its responsibilities. | |
315 | ||
316 | \begin{quote} | |
317 | Our language is tremendously useful for repackaging material into a few chunks | |
318 | rich in information.~\cite[p.~95]{miller1956} | |
319 | \end{quote} | |
320 | ||
321 | Without further evidence, these results at least indicate that refactoring | |
322 | source code into smaller units with higher cohesion and, when needed, | |
323 | introducing appropriate design patterns, should aid in the cause of creating | |
324 | computer programs that are easier to maintain and has code that is easier (and | |
325 | better) understood. | |
326 | ||
327 | \section{Notable contributions to the refactoring literature} | |
328 | \todoin{Update with more contributions} | |
329 | ||
330 | \begin{description} | |
331 | \item[1992] William F. Opdyke submits his doctoral dissertation called | |
332 | \emph{Refactoring Object-Oriented Frameworks}\citing{opdyke1992}. This | |
333 | work defines a set of refactorings, that are behavior preserving given that | |
334 | their preconditions are met. The dissertation is focused on the automation | |
335 | of refactorings. | |
336 | \item[1999] Martin Fowler et al.: \emph{Refactoring: Improving the Design of | |
337 | Existing Code}\citing{refactoring}. This is maybe the most influential text | |
338 | on refactoring. It bares similarities with Opdykes thesis\citing{opdyke1992} | |
339 | in the way that it provides a catalog of refactorings. But Fowler's book is | |
340 | more about the craft of refactoring, as he focuses on establishing a | |
341 | vocabulary for refactoring, together with the mechanics of different | |
342 | refactorings and when to perform them. His methodology is also founded on | |
343 | the principles of test-driven development. | |
344 | \item[2005] Joshua Kerievsky: \emph{Refactoring to | |
345 | Patterns}\citing{kerievsky2005}. This book is heavily influenced by Fowler's | |
346 | \emph{Refactoring}\citing{refactoring} and the ``Gang of Four'' \emph{Design | |
347 | Patterns}\citing{designPatterns}. It is building on the refactoring | |
348 | catalogue from Fowler's book, but is trying to bridge the gap between | |
349 | \emph{refactoring} and \emph{design patterns} by providing a series of | |
350 | higher-level composite refactorings, that makes code evolve toward or away | |
351 | from certain design patterns. The book is trying to build up the readers | |
352 | intuition around \emph{why} one would want to use a particular design | |
353 | pattern, and not just \emph{how}. The book is encouraging evolutionary | |
354 | design. \See{relationToDesignPatterns} | |
355 | \end{description} | |
356 | ||
357 | \section{Tool support}\label{toolSupport} | |
358 | ||
359 | \subsection{Tool support for Java} | |
360 | This section will briefly compare the refatoring support of the three IDEs | |
361 | \emph{Eclipse}\footnote{\url{http://www.eclipse.org/}}, \emph{IntelliJ | |
362 | IDEA}\footnote{The IDE under comparison is the \emph{Community Edition}, | |
363 | \url{http://www.jetbrains.com/idea/}} and | |
364 | \emph{NetBeans}\footnote{\url{https://netbeans.org/}}. These are the most | |
365 | popular Java IDEs\citing{javaReport2011}. | |
366 | ||
367 | All three IDEs provide support for the most useful refactorings, like the | |
368 | different extract, move and rename refactorings. In fact, Java-targeted IDEs are | |
369 | known for their good refactoring support, so this did not appear as a big | |
370 | surprise. | |
371 | ||
372 | The IDEs seem to have excellent support for the \ExtractMethod refactoring, so | |
373 | at least they have all passed the first refactoring | |
374 | rubicon\citing{fowlerRubicon2001,secondRubicon2012}. | |
375 | ||
376 | Regarding the \MoveMethod refactoring, the \emph{Eclipse} and \emph{IntelliJ} | |
377 | IDEs do the job in very similar manners. In most situations they both do a | |
378 | satisfying job by producing the expected outcome. But they do nothing to check | |
379 | that the result does not break the semantics of the program \see{correctness}. | |
380 | The \emph{NetBeans} IDE implements this refactoring in a somewhat | |
381 | unsophisticated way. For starters, its default destination for the move is | |
382 | itself, although it refuses to perform the refactoring if chosen. But the worst | |
383 | part is, that if moving the method \method{f} of the class \type{C} to the class | |
384 | \type{X}, it will break the code. The result is shown in | |
385 | \myref{lst:moveMethod_NetBeans}. | |
386 | ||
387 | \begin{listing} | |
388 | \begin{multicols}{2} | |
389 | \begin{minted}[samepage]{java} | |
390 | public class C { | |
391 | private X x; | |
392 | ... | |
393 | public void f() { | |
394 | x.m(); | |
395 | x.n(); | |
396 | } | |
397 | } | |
398 | \end{minted} | |
399 | ||
400 | \columnbreak | |
401 | ||
402 | \begin{minted}[samepage]{java} | |
403 | public class X { | |
404 | ... | |
405 | public void f(C c) { | |
406 | c.x.m(); | |
407 | c.x.n(); | |
408 | } | |
409 | } | |
410 | \end{minted} | |
411 | \end{multicols} | |
412 | \caption{Moving method \method{f} from \type{C} to \type{X}.} | |
413 | \label{lst:moveMethod_NetBeans} | |
414 | \end{listing} | |
415 | ||
416 | NetBeans will try to make code that call the methods \method{m} and \method{n} | |
417 | of \type{X} by accessing them through \var{c.x}, where \var{c} is a parameter of | |
418 | type \type{C} that is added the method \method{f} when it is moved. (This is | |
419 | seldom the desired outcome of this refactoring, but ironically, this ``feature'' | |
420 | keeps NetBeans from breaking the code in the example from \myref{correctness}.) | |
421 | If \var{c.x} for some reason is inaccessible to \type{X}, as in this case, the | |
422 | refactoring breaks the code, and it will not compile. NetBeans presents a | |
423 | preview of the refactoring outcome, but the preview does not catch it if the IDE | |
424 | is about break the program. | |
425 | ||
426 | The IDEs under investigation seems to have fairly good support for primitive | |
427 | refactorings, but what about more complex ones, such as the \refactoring{Extract | |
428 | Class}\citing{refactoring}? The \refactoring{Extract Class} refactoring works by | |
429 | creating a class, for then to move members to that class and access them from | |
430 | the old class via a reference to the new class. \emph{IntelliJ} handles this in | |
431 | a fairly good manner, although, in the case of private methods, it leaves unused | |
432 | methods behind. These are methods that delegate to a field with the type of the | |
433 | new class, but are not used anywhere. \emph{Eclipse} has added (or withdrawn) | |
434 | its own quirk to the Extract Class refactoring, and only allows for | |
435 | \emph{fields} to be moved to a new class, \emph{not methods}. This makes it | |
436 | effectively only extracting a data structure, and calling it | |
437 | \refactoring{Extract Class} is a little misleading. One would often be better | |
438 | off with textual extract and paste than using the Extract Class refactoring in | |
439 | Eclipse. When it comes to \emph{NetBeans}, it does not even seem to have made an | |
440 | attempt on providing this refactoring. (Well, it probably has, but it does not | |
441 | show in the IDE.) | |
442 | ||
443 | \todoin{Visual Studio (C++/C\#), Smalltalk refactoring browser?, | |
444 | second refactoring rubicon?} | |
445 | ||
446 | \section{The relation to design patterns}\label{relationToDesignPatterns} | |
447 | ||
448 | \emph{Refactoring} and \emph{design patterns} have at least one thing in common, | |
449 | they are both promoted by advocates of \emph{clean code}\citing{cleanCode} as | |
450 | fundamental tools on the road to more maintanable and extendable source code. | |
451 | ||
452 | \begin{quote} | |
453 | Design patterns help you determine how to reorganize a design, and they can | |
454 | reduce the amount of refactoring you need to do | |
455 | later.~\cite[p.~353]{designPatterns} | |
456 | \end{quote} | |
457 | ||
458 | Although sometimes associated with | |
459 | over-engineering\citing{kerievsky2005,refactoring}, design patterns are in | |
460 | general assumed to be good for maintainability of source code. That may be | |
461 | because many of them are designed to support the \emph{open/closed principle} of | |
462 | object-oriented programming. The principle was first formulated by Bertrand | |
463 | Meyer, the creator of the Eiffel programming language, like this: ``Modules | |
464 | should be both open and closed.''\citing{meyer1988} It has been popularized, | |
465 | with this as a common version: | |
466 | ||
467 | \begin{quote} | |
468 | Software entities (classes, modules, functions, etc.) should be open for | |
469 | extension, but closed for modification.\footnote{See | |
470 | \url{http://c2.com/cgi/wiki?OpenClosedPrinciple} or | |
471 | \url{https://en.wikipedia.org/wiki/Open/closed_principle}} | |
472 | \end{quote} | |
473 | ||
474 | Maintainability is often thought of as the ability to be able to introduce new | |
475 | functionality without having to change too much of the old code. When | |
476 | refactoring, the motivation is often to facilitate adding new functionality. It | |
477 | is about factoring the old code in a way that makes the new functionality being | |
478 | able to benefit from the functionality already residing in a software system, | |
479 | without having to copy old code into new. Then, next time someone shall add new | |
480 | functionality, it is less likely that the old code has to change. Assuming that | |
481 | a design pattern is the best way to get rid of duplication and assist in | |
482 | implementing new functionality, it is reasonable to conclude that a design | |
483 | pattern often is the target of a series of refactorings. Having a repertoire of | |
484 | design patterns can also help in knowing when and how to refactor a program to | |
485 | make it reflect certain desired characteristics. | |
486 | ||
487 | \begin{quote} | |
488 | There is a natural relation between patterns and refactorings. Patterns are | |
489 | where you want to be; refactorings are ways to get there from somewhere | |
490 | else.~\cite[p.~107]{refactoring} | |
491 | \end{quote} | |
492 | ||
493 | This quote is wise in many contexts, but it is not always appropriate to say | |
494 | ``Patterns are where you want to be\ldots''. \emph{Sometimes}, patterns are | |
495 | where you want to be, but only because it will benefit your design. It is not | |
496 | true that one should always try to incorporate as many design patterns as | |
497 | possible into a program. It is not like they have intrinsic value. They only add | |
498 | value to a system when they support its design. Otherwise, the use of design | |
499 | patterns may only lead to a program that is more complex than necessary. | |
500 | ||
501 | \begin{quote} | |
502 | The overuse of patterns tends to result from being patterns happy. We are | |
503 | \emph{patterns happy} when we become so enamored of patterns that we simply | |
504 | must use them in our code.~\cite[p.~24]{kerievsky2005} | |
505 | \end{quote} | |
506 | ||
507 | This can easily happen when relying largely on up-front design. Then it is | |
508 | natural, in the very beginning, to try to build in all the flexibility that one | |
509 | believes will be necessary throughout the lifetime of a software system. | |
510 | According to Joshua Kerievsky ``That sounds reasonable --- if you happen to be | |
511 | psychic.''~\cite[p.~1]{kerievsky2005} He is advocating what he believes is a | |
512 | better approach: To let software continually evolve. To start with a simple | |
513 | design that meets today's needs, and tackle future needs by refactoring to | |
514 | satisfy them. He believes that this is a more economic approach than investing | |
515 | time and money into a design that inevitably is going to change. By relying on | |
516 | continuously refactoring a system, its design can be made simpler without | |
517 | sacrificing flexibility. To be able to fully rely on this approach, it is of | |
518 | utter importance to have a reliable suit of tests to lean on. \See{testing} This | |
519 | makes the design process more natural and less characterized by difficult | |
520 | decisions that has to be made before proceeding in the process, and that is | |
521 | going to define a project for all of its unforeseeable future. | |
522 | ||
523 | \begin{comment} | |
524 | ||
525 | \section{Classification of refactorings} | |
526 | % only interesting refactorings | |
527 | % with 2 detailed examples? One for structured and one for intra-method? | |
528 | % Is replacing Bubblesort with Quick Sort considered a refactoring? | |
529 | ||
530 | \subsection{Structural refactorings} | |
531 | ||
532 | \subsubsection{Primitive refactorings} | |
533 | ||
534 | % Composing Methods | |
535 | \explanation{Extract Method}{You have a code fragment that can be grouped | |
536 | together.}{Turn the fragment into a method whose name explains the purpose of | |
537 | the method.} | |
538 | ||
539 | \explanation{Inline Method}{A method's body is just as clear as its name.}{Put | |
540 | the method's body into the body of its callers and remove the method.} | |
541 | ||
542 | \explanation{Inline Temp}{You have a temp that is assigned to once with a simple | |
543 | expression, and the temp is getting in the way of other refactorings.}{Replace | |
544 | all references to that temp with the expression} | |
545 | ||
546 | % Moving Features Between Objects | |
547 | \explanation{Move Method}{A method is, or will be, using or used by more | |
548 | features of another class than the class on which it is defined.}{Create a new | |
549 | method with a similar body in the class it uses most. Either turn the old method | |
550 | into a simple delegation, or remove it altogether.} | |
551 | ||
552 | \explanation{Move Field}{A field is, or will be, used by another class more than | |
553 | the class on which it is defined}{Create a new field in the target class, and | |
554 | change all its users.} | |
555 | ||
556 | % Organizing Data | |
557 | \explanation{Replace Magic Number with Symbolic Constant}{You have a literal | |
558 | number with a particular meaning.}{Create a constant, name it after the meaning, | |
559 | and replace the number with it.} | |
560 | ||
561 | \explanation{Encapsulate Field}{There is a public field.}{Make it private and | |
562 | provide accessors.} | |
563 | ||
564 | \explanation{Replace Type Code with Class}{A class has a numeric type code that | |
565 | does not affect its behavior.}{Replace the number with a new class.} | |
566 | ||
567 | \explanation{Replace Type Code with Subclasses}{You have an immutable type code | |
568 | that affects the behavior of a class.}{Replace the type code with subclasses.} | |
569 | ||
570 | \explanation{Replace Type Code with State/Strategy}{You have a type code that | |
571 | affects the behavior of a class, but you cannot use subclassing.}{Replace the | |
572 | type code with a state object.} | |
573 | ||
574 | % Simplifying Conditional Expressions | |
575 | \explanation{Consolidate Duplicate Conditional Fragments}{The same fragment of | |
576 | code is in all branches of a conditional expression.}{Move it outside of the | |
577 | expression.} | |
578 | ||
579 | \explanation{Remove Control Flag}{You have a variable that is acting as a | |
580 | control flag fro a series of boolean expressions.}{Use a break or return | |
581 | instead.} | |
582 | ||
583 | \explanation{Replace Nested Conditional with Guard Clauses}{A method has | |
584 | conditional behavior that does not make clear the normal path of | |
585 | execution.}{Use guard clauses for all special cases.} | |
586 | ||
587 | \explanation{Introduce Null Object}{You have repeated checks for a null | |
588 | value.}{Replace the null value with a null object.} | |
589 | ||
590 | \explanation{Introduce Assertion}{A section of code assumes something about the | |
591 | state of the program.}{Make the assumption explicit with an assertion.} | |
592 | ||
593 | % Making Method Calls Simpler | |
594 | \explanation{Rename Method}{The name of a method does not reveal its | |
595 | purpose.}{Change the name of the method} | |
596 | ||
597 | \explanation{Add Parameter}{A method needs more information from its | |
598 | caller.}{Add a parameter for an object that can pass on this information.} | |
599 | ||
600 | \explanation{Remove Parameter}{A parameter is no longer used by the method | |
601 | body.}{Remove it.} | |
602 | ||
603 | %\explanation{Parameterize Method}{Several methods do similar things but with | |
604 | %different values contained in the method.}{Create one method that uses a | |
605 | %parameter for the different values.} | |
606 | ||
607 | \explanation{Preserve Whole Object}{You are getting several values from an | |
608 | object and passing these values as parameters in a method call.}{Send the whole | |
609 | object instead.} | |
610 | ||
611 | \explanation{Remove Setting Method}{A field should be set at creation time and | |
612 | never altered.}{Remove any setting method for that field.} | |
613 | ||
614 | \explanation{Hide Method}{A method is not used by any other class.}{Make the | |
615 | method private.} | |
616 | ||
617 | \explanation{Replace Constructor with Factory Method}{You want to do more than | |
618 | simple construction when you create an object}{Replace the constructor with a | |
619 | factory method.} | |
620 | ||
621 | % Dealing with Generalization | |
622 | \explanation{Pull Up Field}{Two subclasses have the same field.}{Move the field | |
623 | to the superclass.} | |
624 | ||
625 | \explanation{Pull Up Method}{You have methods with identical results on | |
626 | subclasses.}{Move them to the superclass.} | |
627 | ||
628 | \explanation{Push Down Method}{Behavior on a superclass is relevant only for | |
629 | some of its subclasses.}{Move it to those subclasses.} | |
630 | ||
631 | \explanation{Push Down Field}{A field is used only by some subclasses.}{Move the | |
632 | field to those subclasses} | |
633 | ||
634 | \explanation{Extract Interface}{Several clients use the same subset of a class's | |
635 | interface, or two classes have part of their interfaces in common.}{Extract the | |
636 | subset into an interface.} | |
637 | ||
638 | \explanation{Replace Inheritance with Delegation}{A subclass uses only part of a | |
639 | superclasses interface or does not want to inherit data.}{Create a field for the | |
640 | superclass, adjust methods to delegate to the superclass, and remove the | |
641 | subclassing.} | |
642 | ||
643 | \explanation{Replace Delegation with Inheritance}{You're using delegation and | |
644 | are often writing many simple delegations for the entire interface}{Make the | |
645 | delegating class a subclass of the delegate.} | |
646 | ||
647 | \subsubsection{Composite refactorings} | |
648 | ||
649 | % Composing Methods | |
650 | % \explanation{Replace Method with Method Object}{}{} | |
651 | ||
652 | % Moving Features Between Objects | |
653 | \explanation{Extract Class}{You have one class doing work that should be done by | |
654 | two}{Create a new class and move the relevant fields and methods from the old | |
655 | class into the new class.} | |
656 | ||
657 | \explanation{Inline Class}{A class isn't doing very much.}{Move all its features | |
658 | into another class and delete it.} | |
659 | ||
660 | \explanation{Hide Delegate}{A client is calling a delegate class of an | |
661 | object.}{Create Methods on the server to hide the delegate.} | |
662 | ||
663 | \explanation{Remove Middle Man}{A class is doing to much simple delegation.}{Get | |
664 | the client to call the delegate directly.} | |
665 | ||
666 | % Organizing Data | |
667 | \explanation{Replace Data Value with Object}{You have a data item that needs | |
668 | additional data or behavior.}{Turn the data item into an object.} | |
669 | ||
670 | \explanation{Change Value to Reference}{You have a class with many equal | |
671 | instances that you want to replace with a single object.}{Turn the object into a | |
672 | reference object.} | |
673 | ||
674 | \explanation{Encapsulate Collection}{A method returns a collection}{Make it | |
675 | return a read-only view and provide add/remove methods.} | |
676 | ||
677 | % \explanation{Replace Array with Object}{}{} | |
678 | ||
679 | \explanation{Replace Subclass with Fields}{You have subclasses that vary only in | |
680 | methods that return constant data.}{Change the methods to superclass fields and | |
681 | eliminate the subclasses.} | |
682 | ||
683 | % Simplifying Conditional Expressions | |
684 | \explanation{Decompose Conditional}{You have a complicated conditional | |
685 | (if-then-else) statement.}{Extract methods from the condition, then part, an | |
686 | else part.} | |
687 | ||
688 | \explanation{Consolidate Conditional Expression}{You have a sequence of | |
689 | conditional tests with the same result.}{Combine them into a single conditional | |
690 | expression and extract it.} | |
691 | ||
692 | \explanation{Replace Conditional with Polymorphism}{You have a conditional that | |
693 | chooses different behavior depending on the type of an object.}{Move each leg | |
694 | of the conditional to an overriding method in a subclass. Make the original | |
695 | method abstract.} | |
696 | ||
697 | % Making Method Calls Simpler | |
698 | \explanation{Replace Parameter with Method}{An object invokes a method, then | |
699 | passes the result as a parameter for a method. The receiver can also invoke this | |
700 | method.}{Remove the parameter and let the receiver invoke the method.} | |
701 | ||
702 | \explanation{Introduce Parameter Object}{You have a group of parameters that | |
703 | naturally go together.}{Replace them with an object.} | |
704 | ||
705 | % Dealing with Generalization | |
706 | \explanation{Extract Subclass}{A class has features that are used only in some | |
707 | instances.}{Create a subclass for that subset of features.} | |
708 | ||
709 | \explanation{Extract Superclass}{You have two classes with similar | |
710 | features.}{Create a superclass and move the common features to the | |
711 | superclass.} | |
712 | ||
713 | \explanation{Collapse Hierarchy}{A superclass and subclass are not very | |
714 | different.}{Merge them together.} | |
715 | ||
716 | \explanation{Form Template Method}{You have two methods in subclasses that | |
717 | perform similar steps in the same order, yet the steps are different.}{Get the | |
718 | steps into methods with the same signature, so that the original methods become | |
719 | the same. Then you can pull them up.} | |
720 | ||
721 | ||
722 | \subsection{Functional refactorings} | |
723 | ||
724 | \explanation{Substitute Algorithm}{You want to replace an algorithm with one | |
725 | that is clearer.}{Replace the body of the method with the new algorithm.} | |
726 | ||
727 | \end{comment} | |
728 | ||
729 | \section{The impact on software quality} | |
730 | ||
731 | \subsection{What is software quality?} | |
732 | The term \emph{software quality} has many meanings. It all depends on the | |
733 | context we put it in. If we look at it with the eyes of a software developer, it | |
734 | usually means that the software is easily maintainable and testable, or in other | |
735 | words, that it is \emph{well designed}. This often correlates with the | |
736 | management scale, where \emph{keeping the schedule} and \emph{customer | |
737 | satisfaction} is at the center. From the customers point of view, in addition to | |
738 | good usability, \emph{performance} and \emph{lack of bugs} is always | |
739 | appreciated, measurements that are also shared by the software developer. (In | |
740 | addition, such things as good documentation could be measured, but this is out | |
741 | of the scope of this document.) | |
742 | ||
743 | \subsection{The impact on performance} | |
744 | \begin{quote} | |
745 | Refactoring certainly will make software go more slowly\footnote{With todays | |
746 | compiler optimization techniques and performance tuning of e.g. the Java | |
747 | virtual machine, the penalties of object creation and method calls are | |
748 | debatable.}, but it also makes the software more amenable to performance | |
749 | tuning.~\cite[p.~69]{refactoring} | |
750 | \end{quote} | |
751 | ||
752 | \noindent There is a common belief that refactoring compromises performance, due | |
753 | to increased degree of indirection and that polymorphism is slower than | |
754 | conditionals. | |
755 | ||
756 | In a survey, Demeyer\citing{demeyer2002} disproves this view in the case of | |
757 | polymorphism. He did an experiment on, what he calls, ``Transform Self Type | |
758 | Checks'' where you introduce a new polymorphic method and a new class hierarchy | |
759 | to get rid of a class' type checking of a ``type attribute``. He uses this kind | |
760 | of transformation to represent other ways of replacing conditionals with | |
761 | polymorphism as well. The experiment is performed on the C++ programming | |
762 | language and with three different compilers and platforms. Demeyer concludes | |
763 | that, with compiler optimization turned on, polymorphism beats middle to large | |
764 | sized if-statements and does as well as case-statements. (In accordance with | |
765 | his hypothesis, due to similarities between the way C++ handles polymorphism and | |
766 | case-statements.) | |
767 | ||
768 | \begin{quote} | |
769 | The interesting thing about performance is that if you analyze most programs, | |
770 | you find that they waste most of their time in a small fraction of the | |
771 | code.~\cite[p.~70]{refactoring} | |
772 | \end{quote} | |
773 | ||
774 | \noindent So, although an increased amount of method calls could potentially | |
775 | slow down programs, one should avoid premature optimization and sacrificing good | |
776 | design, leaving the performance tuning until after profiling\footnote{For and | |
777 | example of a Java profiler, check out VisualVM: | |
778 | \url{http://visualvm.java.net/}} the software and having isolated the actual | |
779 | problem areas. | |
780 | ||
781 | \section{Composite refactorings}\label{compositeRefactorings} | |
782 | \todo{motivation, examples, manual vs automated?, what about refactoring in a | |
783 | very large code base?} | |
784 | Generally, when thinking about refactoring, at the mechanical level, there are | |
785 | essentially two kinds of refactorings. There are the \emph{primitive} | |
786 | refactorings, and the \emph{composite} refactorings. | |
787 | ||
788 | \definition{A \emph{primitive refactoring} is a refactoring that cannot be | |
789 | expressed in terms of other refactorings.} | |
790 | ||
791 | \noindent Examples are the \refactoring{Pull Up Field} and \refactoring{Pull Up | |
792 | Method} refactorings\citing{refactoring}, that move members up in their class | |
793 | hierarchies. | |
794 | ||
795 | \definition{A \emph{composite refactoring} is a refactoring that can be | |
796 | expressed in terms of two or more other refactorings.} | |
797 | ||
798 | \noindent An example of a composite refactoring is the \refactoring{Extract | |
799 | Superclass} refactoring\citing{refactoring}. In its simplest form, it is composed | |
800 | of the previously described primitive refactorings, in addition to the | |
801 | \refactoring{Pull Up Constructor Body} refactoring\citing{refactoring}. It works | |
802 | by creating an abstract superclass that the target class(es) inherits from, then | |
803 | by applying \refactoring{Pull Up Field}, \refactoring{Pull Up Method} and | |
804 | \refactoring{Pull Up Constructor Body} on the members that are to be members of | |
805 | the new superclass. For an overview of the \refactoring{Extract Superclass} | |
806 | refactoring, see \myref{fig:extractSuperclass}. | |
807 | ||
808 | \begin{figure}[h] | |
809 | \centering | |
810 | \includegraphics[angle=270,width=\linewidth]{extractSuperclassItalic.pdf} | |
811 | \caption{The Extract Superclass refactoring} | |
812 | \label{fig:extractSuperclass} | |
813 | \end{figure} | |
814 | ||
815 | \section{Manual vs. automated refactorings} | |
816 | Refactoring is something every programmer does, even if \heshe does not known | |
817 | the term \emph{refactoring}. Every refinement of source code that does not alter | |
818 | the program's behavior is a refactoring. For small refactorings, such as | |
819 | \ExtractMethod, executing it manually is a manageable task, but is still prone | |
820 | to errors. Getting it right the first time is not easy, considering the method | |
821 | signature and all the other aspects of the refactoring that has to be in place. | |
822 | ||
823 | Take for instance the renaming of classes, methods and fields. For complex | |
824 | programs these refactorings are almost impossible to get right. Attacking them | |
825 | with textual search and replace, or even regular expressions, will fall short on | |
826 | these tasks. Then it is crucial to have proper tool support that can perform | |
827 | them automatically. Tools that can parse source code and thus have semantic | |
828 | knowledge about which occurrences of which names belong to what construct in the | |
829 | program. For even trying to perform one of these complex task manually, one | |
830 | would have to be very confident on the existing test suite \see{testing}. | |
831 | ||
832 | \section{Correctness of refactorings}\label{correctness} | |
833 | For automated refactorings to be truly useful, they must show a high degree of | |
834 | behavior preservation. This last sentence might seem obvious, but there are | |
835 | examples of refactorings in existing tools that break programs. I will now | |
836 | present an example of an \ExtractMethod refactoring followed by a \MoveMethod | |
837 | refactoring that breaks a program in both the \emph{Eclipse} and \emph{IntelliJ} | |
838 | IDEs\footnote{The NetBeans IDE handles this particular situation without | |
839 | altering ther program's beavior, mainly because its Move Method refactoring | |
840 | implementation is a bit rancid in other ways \see{toolSupport}.}. The | |
841 | following piece of code shows the target for the composed refactoring: | |
842 | ||
843 | \begin{minted}[linenos,samepage]{java} | |
844 | public class C { | |
845 | public X x = new X(); | |
846 | ||
847 | public void f() { | |
848 | x.m(this); | |
849 | x.n(); | |
850 | } | |
851 | } | |
852 | \end{minted} | |
853 | ||
854 | \noindent The next piece of code shows the destination of the refactoring. Note | |
855 | that the method \method{m(C c)} of class \type{C} assigns to the field \var{x} | |
856 | of the argument \var{c} that has type \type{C}: | |
857 | ||
858 | \begin{minted}[samepage]{java} | |
859 | public class X { | |
860 | public void m(C c) { | |
861 | c.x = new X(); | |
862 | } | |
863 | public void n() {} | |
864 | } | |
865 | \end{minted} | |
866 | ||
867 | The refactoring sequence works by extracting line 5 and 6 from the original | |
868 | class \type{C} into a method \method{f} with the statements from those lines as | |
869 | its method body. The method is then moved to the class \type{X}. The result is | |
870 | shown in the following two pieces of code: | |
871 | ||
872 | \begin{minted}[linenos,samepage]{java} | |
873 | public class C { | |
874 | public X x = new X(); | |
875 | ||
876 | public void f() { | |
877 | x.f(this); | |
878 | } | |
879 | } | |
880 | \end{minted} | |
881 | ||
882 | \begin{minted}[linenos,samepage]{java} | |
883 | public class X { | |
884 | public void m(C c) { | |
885 | c.x = new X(); | |
886 | } | |
887 | public void n() {} | |
888 | public void f(C c) { | |
889 | m(c); | |
890 | n(); | |
891 | } | |
892 | } | |
893 | \end{minted} | |
894 | ||
895 | After the refactoring, the method \method{f} of class \type{C} is calling the | |
896 | method \method{f} of class \type{X}, and the program now behaves different than | |
897 | before. (See line 5 of the version of class \type{C} after the refactoring.) | |
898 | Before the refactoring, the methods \method{m} and \method{n} of class \type{X} | |
899 | are called on different object instances (see line 5 and 6 of the original class | |
900 | \type{C}). After, they are called on the same object, and the statement on line | |
901 | 3 of class \type{X} (the version after the refactoring) no longer have any | |
902 | effect in our example. | |
903 | ||
904 | The bug introduced in the previous example is of such a nature\footnote{Caused | |
905 | by aliasing. See \url{https://en.wikipedia.org/wiki/Aliasing_(computing)}} | |
906 | that it is very difficult to spot if the refactored code is not covered by | |
907 | tests. It does not generate compilation errors, and will thus only result in | |
908 | a runtime error or corrupted data, which might be hard to detect. | |
909 | ||
910 | \section{Refactoring and testing}\label{testing} | |
911 | \begin{quote} | |
912 | If you want to refactor, the essential precondition is having solid | |
913 | tests.\citing{refactoring} | |
914 | \end{quote} | |
915 | ||
916 | When refactoring, there are roughly two kinds of errors that can be made. There | |
917 | are errors that make the code unable to compile, and there are the silent | |
918 | errors, only popping up at runtime. Compile-time errors are the nice ones. They | |
919 | flash up at the moment they are made (at least when using an IDE), and are | |
920 | usually easy to fix. The other kind of error is the dangerous one. It is the | |
921 | kind of error introduced in the example of \myref{correctness}. It is an error | |
922 | sneaking into your code without you noticing, maybe. For discovering those kind | |
923 | of errors when refactoring, it is essential to have good test coverage. It is | |
924 | not a way to \emph{prove} that the code is correct, but it is a way to make you | |
925 | confindent that it \emph{probably} works as desired. In the context of test | |
926 | driven development, the tests are even a way to define how the program is | |
927 | supposed to work. It is then, by definition, working if the tests are passing. | |
928 | ||
929 | If the test coverage for a code base is perfect, then it should, theoretically, | |
930 | be risk-free to perform refactorings on it. This is why tests and refactoring | |
931 | are such a great match. | |
932 | ||
933 | \section{Software metrics} | |
934 | \todoin{Is this the appropriate place to have this section?} | |
935 | ||
936 | %\part{The project} | |
937 | %\chapter{Planning the project} | |
938 | %\part{Conclusion} | |
939 | %\chapter{Results} | |
940 | ||
941 | ||
942 | ||
943 | \chapter{\ldots} | |
944 | \todoin{write} | |
945 | \section{The problem statement} | |
946 | \section{Choosing the target language} | |
947 | Choosing which programming language to use as the target for manipulation is not | |
948 | a very difficult task. The language has to be an object-oriented programming | |
949 | language, and it must have existing tool support for refactoring. The | |
950 | \emph{Java} programming language\footnote{\url{https://www.java.com/}} is the | |
951 | dominating language when it comes to examples in the literature of refactoring, | |
952 | and is thus a natural choice. Java is perhaps, currently the most influential | |
953 | programming language in the world, with its \emph{Java Virtual Machine} that | |
954 | runs on all of the most popular architectures and also supports\footnote{They | |
955 | compile to java bytecode.} dozens of other programming languages, with | |
956 | \emph{Scala}, \emph{Clojure} and \emph{Groovy} as the most prominent ones. Java | |
957 | is currently the language that every other programming language is compared | |
958 | against. It is also the primary language of the author of this thesis. | |
959 | ||
960 | \section{Choosing the tools} | |
961 | When choosing a tool for manipulating Java, there are certain criterias that | |
962 | have to be met. First of all, the tool should have some existing refactoring | |
963 | support that this thesis can build upon. Secondly it should provide some kind of | |
964 | framework for parsing and analyzing Java source code. Third, it should itself be | |
965 | open source. This is both because of the need to be able to browse the code for | |
966 | the existing refactorings that is contained in the tool, and also because open | |
967 | source projects hold value in them selves. Another important aspect to consider | |
968 | is that open source projects of a certain size, usually has large communities of | |
969 | people connected to them, that are commited to answering questions regarding the | |
970 | use and misuse of the products, that to a large degree is made by the cummunity | |
971 | itself. | |
972 | ||
973 | There is a certain class of tools that meet these criterias, namely the class of | |
974 | \emph{IDEs}\footnote{\emph{Integrated Development Environment}}. These are | |
975 | proagrams that is ment to support the whole production cycle of a cumputer | |
976 | program, and the most popular IDEs that support Java, generally have quite good | |
977 | refactoring support. | |
978 | ||
979 | The main contenders for this thesis is the \emph{Eclipse IDE}, with the | |
980 | \emph{Java development tools} (JDT), the \emph{IntelliJ IDEA Community Edition} | |
981 | and the \emph{NetBeans IDE}. \See{toolSupport} Eclipse and NetBeans are both | |
982 | free, open source and community driven, while the IntelliJ IDEA has an open | |
983 | sourced community edition that is free of charge, but also offer an | |
984 | \emph{Ultimate Edition} with an extended set of features, at additional cost. | |
985 | All three IDEs supports adding plugins to extend their functionality and tools | |
986 | that can be used to parse and analyze Java source code. But one of the IDEs | |
987 | stand out as a favorite, and that is the \emph{Eclipse IDE}. This is the most | |
988 | popular\citing{javaReport2011} among them and seems to be de facto standard IDE | |
989 | for Java development regardless of platform. | |
990 | ||
991 | ||
992 | \chapter{Refactorings in Eclipse JDT: Design, Shortcomings and Wishful | |
993 | Thinking}\label{ch:jdt_refactorings} | |
994 | ||
995 | This chapter will deal with some of the design behind refactoring support in | |
996 | Eclipse, and the JDT in specific. After which it will follow a section about | |
997 | shortcomings of the refactoring API in terms of composition of refactorings. The | |
998 | chapter will be concluded with a section telling some of the ways the | |
999 | implementation of refactorings in the JDT could have worked to facilitate | |
1000 | composition of refactorings. | |
1001 | ||
1002 | \section{Design} | |
1003 | The refactoring world of Eclipse can in general be separated into two parts: The | |
1004 | language independent part and the part written for a specific programming | |
1005 | language -- the language that is the target of the supported refactorings. | |
1006 | \todo{What about the language specific part?} | |
1007 | ||
1008 | \subsection{The Language Toolkit} | |
1009 | The Language Toolkit, or LTK for short, is the framework that is used to | |
1010 | implement refactorings in Eclipse. It is language independent and provides the | |
1011 | abstractions of a refactoring and the change it generates, in the form of the | |
1012 | classes \typewithref{org.eclipse.ltk.core.refactoring}{Refactoring} and | |
1013 | \typewithref{org.eclipse.ltk.core.refactoring}{Change}. (There is also parts of | |
1014 | the LTK that is concerned with user interaction, but they will not be discussed | |
1015 | here, since they are of little value to us and our use of the framework.) | |
1016 | ||
1017 | \subsubsection{The Refactoring Class} | |
1018 | The abstract class \type{Refactoring} is the core of the LTK framework. Every | |
1019 | refactoring that is going to be supported by the LTK have to end up creating an | |
1020 | instance of one of its subclasses. The main responsibilities of subclasses of | |
1021 | \type{Refactoring} is to implement template methods for condition checking | |
1022 | (\methodwithref{org.eclipse.ltk.core.refactoring.Refactoring}{checkInitialConditions} | |
1023 | and | |
1024 | \methodwithref{org.eclipse.ltk.core.refactoring.Refactoring}{checkFinalConditions}), | |
1025 | in addition to the | |
1026 | \methodwithref{org.eclipse.ltk.core.refactoring.Refactoring}{createChange} | |
1027 | method that creates and returns an instance of the \type{Change} class. | |
1028 | ||
1029 | If the refactoring shall support that others participate in it when it is | |
1030 | executed, the refactoring has to be a processor-based | |
1031 | refactoring\typeref{org.eclipse.ltk.core.refactoring.participants.ProcessorBasedRefactoring}. | |
1032 | It then delegates to its given | |
1033 | \typewithref{org.eclipse.ltk.core.refactoring.participants}{RefactoringProcessor} | |
1034 | for condition checking and change creation. | |
1035 | ||
1036 | \subsubsection{The Change Class} | |
1037 | This class is the base class for objects that is responsible for performing the | |
1038 | actual workspace transformations in a refactoring. The main responsibilities for | |
1039 | its subclasses is to implement the | |
1040 | \methodwithref{org.eclipse.ltk.core.refactoring.Change}{perform} and | |
1041 | \methodwithref{org.eclipse.ltk.core.refactoring.Change}{isValid} methods. The | |
1042 | \method{isValid} method verifies that the change object is valid and thus can be | |
1043 | executed by calling its \method{perform} method. The \method{perform} method | |
1044 | performs the desired change and returns an undo change that can be executed to | |
1045 | reverse the effect of the transformation done by its originating change object. | |
1046 | ||
1047 | \subsubsection{Executing a Refactoring}\label{executing_refactoring} | |
1048 | The life cycle of a refactoring generally follows two steps after creation: | |
1049 | condition checking and change creation. By letting the refactoring object be | |
1050 | handled by a | |
1051 | \typewithref{org.eclipse.ltk.core.refactoring}{CheckConditionsOperation} that | |
1052 | in turn is handled by a | |
1053 | \typewithref{org.eclipse.ltk.core.refactoring}{CreateChangeOperation}, it is | |
1054 | assured that the change creation process is managed in a proper manner. | |
1055 | ||
1056 | The actual execution of a change object has to follow a detailed life cycle. | |
1057 | This life cycle is honored if the \type{CreateChangeOperation} is handled by a | |
1058 | \typewithref{org.eclipse.ltk.core.refactoring}{PerformChangeOperation}. If also | |
1059 | an undo manager\typeref{org.eclipse.ltk.core.refactoring.IUndoManager} is set | |
1060 | for the \type{PerformChangeOperation}, the undo change is added into the undo | |
1061 | history. | |
1062 | ||
1063 | \section{Shortcomings} | |
1064 | This section is introduced naturally with a conclusion: The JDT refactoring | |
1065 | implementation does not facilitate composition of refactorings. | |
1066 | \todo{refine}This section will try to explain why, and also identify other | |
1067 | shortcomings of both the usability and the readability of the JDT refactoring | |
1068 | source code. | |
1069 | ||
1070 | I will begin at the end and work my way toward the composition part of this | |
1071 | section. | |
1072 | ||
1073 | \subsection{Absence of Generics in Eclipse Source Code} | |
1074 | This section is not only concerning the JDT refactoring API, but also large | |
1075 | quantities of the Eclipse source code. The code shows a striking absence of the | |
1076 | Java language feature of generics. It is hard to read a class' interface when | |
1077 | methods return objects or takes parameters of raw types such as \type{List} or | |
1078 | \type{Map}. This sometimes results in having to read a lot of source code to | |
1079 | understand what is going on, instead of relying on the available interfaces. In | |
1080 | addition, it results in a lot of ugly code, making the use of typecasting more | |
1081 | of a rule than an exception. | |
1082 | ||
1083 | \subsection{Composite Refactorings Will Not Appear as Atomic Actions} | |
1084 | ||
1085 | \subsubsection{Missing Flexibility from JDT Refactorings} | |
1086 | The JDT refactorings are not made with composition of refactorings in mind. When | |
1087 | a JDT refactoring is executed, it assumes that all conditions for it to be | |
1088 | applied successfully can be found by reading source files that has been | |
1089 | persisted to disk. They can only operate on the actual source material, and not | |
1090 | (in-memory) copies thereof. This constitutes a major disadvantage when trying to | |
1091 | compose refactorings, since if an exception occur in the middle of a sequence of | |
1092 | refactorings, it can leave the project in a state where the composite | |
1093 | refactoring was executed only partly. It makes it hard to discard the changes | |
1094 | done without monitoring and consulting the undo manager, an approach that is not | |
1095 | bullet proof. | |
1096 | ||
1097 | \subsubsection{Broken Undo History} | |
1098 | When designing a composed refactoring that is to be performed as a sequence of | |
1099 | refactorings, you would like it to appear as a single change to the workspace. | |
1100 | This implies that you would also like to be able to undo all the changes done by | |
1101 | the refactoring in a single step. This is not the way it appears when a sequence | |
1102 | of JDT refactorings is executed. It leaves the undo history filled up with | |
1103 | individual undo actions corresponding to every single JDT refactoring in the | |
1104 | sequence. This problem is not trivial to handle in Eclipse. | |
1105 | \See{hacking_undo_history} | |
1106 | ||
1107 | \section{Wishful Thinking} | |
1108 | ||
1109 | ||
1110 | \chapter{Composite Refactorings in Eclipse} | |
1111 | ||
1112 | \section{A Simple Ad Hoc Model} | |
1113 | As pointed out in \myref{ch:jdt_refactorings}, the Eclipse JDT refactoring model | |
1114 | is not very well suited for making composite refactorings. Therefore a simple | |
1115 | model using changer objects (of type \type{RefaktorChanger}) is used as an | |
1116 | abstraction layer on top of the existing Eclipse refactorings. | |
1117 | ||
1118 | \section{The Extract and Move Method Refactoring} | |
1119 | %The Extract and Move Method Refactoring is implemented mainly using these | |
1120 | %classes: | |
1121 | %\begin{itemize} | |
1122 | % \item \type{ExtractAndMoveMethodChanger} | |
1123 | % \item \type{ExtractAndMoveMethodPrefixesExtractor} | |
1124 | % \item \type{Prefix} | |
1125 | % \item \type{PrefixSet} | |
1126 | %\end{itemize} | |
1127 | ||
1128 | \subsection{The Building Blocks} | |
1129 | This is a composite refactoring, and hence is built up using several primitive | |
1130 | refactorings. These basic building blocks are, as its name implies, the | |
1131 | \ExtractMethod refactoring\citing{refactoring} and the \MoveMethod | |
1132 | refactoring\citing{refactoring}. In Eclipse, the implementations of these | |
1133 | refactorings are found in the classes | |
1134 | \typewithref{org.eclipse.jdt.internal.corext.refactoring.code}{ExtractMethodRefactoring} | |
1135 | and | |
1136 | \typewithref{org.eclipse.jdt.internal.corext.refactoring.structure}{MoveInstanceMethodProcessor}, | |
1137 | where the last class is designed to be used together with the processor-based | |
1138 | \typewithref{org.eclipse.ltk.core.refactoring.participants}{MoveRefactoring}. | |
1139 | ||
1140 | \subsubsection{The ExtractMethodRefactoring Class} | |
1141 | This class is quite simple in its use. The only parameters it requires for | |
1142 | construction is a compilation | |
1143 | unit\typeref{org.eclipse.jdt.core.ICompilationUnit}, the offset into the source | |
1144 | code where the extraction shall start, and the length of the source to be | |
1145 | extracted. Then you have to set the method name for the new method together with | |
1146 | which access modifier that shall be used and some not so interesting parameters. | |
1147 | ||
1148 | \subsubsection{The MoveInstanceMethodProcessor Class} | |
1149 | For the Move Method the processor requires a little more advanced input than | |
1150 | the class for the Extract Method. For construction it requires a method | |
1151 | handle\typeref{org.eclipse.jdt.core.IMethod} from the Java Model for the method | |
1152 | that is to be moved. Then the target for the move have to be supplied as the | |
1153 | variable binding from a chosen variable declaration. In addition to this, one | |
1154 | have to set some parameters regarding setters/getters and delegation. | |
1155 | ||
1156 | To make a whole refactoring from the processor, one have to construct a | |
1157 | \type{MoveRefactoring} from it. | |
1158 | ||
1159 | \subsection{The ExtractAndMoveMethodChanger Class} | |
1160 | The \typewithref{no.uio.ifi.refaktor.changers}{ExtractAndMoveMethodChanger} | |
1161 | class, that is a subclass of the class | |
1162 | \typewithref{no.uio.ifi.refaktor.changers}{RefaktorChanger}, is the class | |
1163 | responsible for composing the \type{ExtractMethodRefactoring} and the | |
1164 | \type{MoveRefactoring}. Its constructor takes a project | |
1165 | handle\typeref{org.eclipse.core.resources.IProject}, the method name for the new | |
1166 | method and a \typewithref{no.uio.ifi.refaktor.utils}{SmartTextSelection}. | |
1167 | ||
1168 | A \type{SmartTextSelection} is basically a text | |
1169 | selection\typeref{org.eclipse.jface.text.ITextSelection} object that enforces | |
1170 | the providing of the underlying document during creation. I.e. its | |
1171 | \methodwithref{no.uio.ifi.refaktor.utils.SmartTextSelection}{getDocument} method | |
1172 | will never return \type{null}. | |
1173 | ||
1174 | Before extracting the new method, the possible targets for the move operation is | |
1175 | found with the help of an | |
1176 | \typewithref{no.uio.ifi.refaktor.extractors}{ExtractAndMoveMethodPrefixesExtractor}. | |
1177 | The possible targets is computed from the prefixes that the extractor returns | |
1178 | from its | |
1179 | \methodwithref{no.uio.ifi.refaktor.extractors.ExtractAndMoveMethodPrefixesExtractor}{getSafePrefixes} | |
1180 | method. The changer then choose the most suitable target by finding the most | |
1181 | frequent occurring prefix among the safe ones. The target is the type of the | |
1182 | first part of the prefix. | |
1183 | ||
1184 | After finding a suitable target, the \type{ExtractAndMoveMethodChanger} first | |
1185 | creates an \type{ExtractMethodRefactoring} and performs it as explained in | |
1186 | \myref{executing_refactoring} about the execution of refactorings. Then it | |
1187 | creates and performs the \type{MoveRefactoring} in the same way, based on the | |
1188 | changes done by the Extract Method refactoring. | |
1189 | ||
1190 | \subsection{The ExtractAndMoveMethodPrefixesExtractor Class} | |
1191 | This extractor extracts properties needed for building the Extract and Move | |
1192 | Method refactoring. It searches through the given selection to find safe | |
1193 | prefixes, and those prefixes form a base that can be used to compute possible | |
1194 | targets for the move part of the refactoring. It finds both the candidates, in | |
1195 | the form of prefixes, and the non-candidates, called unfixes. All prefixes (and | |
1196 | unfixes) are represented by a | |
1197 | \typewithref{no.uio.ifi.refaktor.extractors}{Prefix}, and they are collected | |
1198 | into prefix sets.\typeref{no.uio.ifi.refaktor.extractors.PrefixSet}. | |
1199 | ||
1200 | The prefixes and unfixes are found by property | |
1201 | collectors\typeref{no.uio.ifi.refaktor.extractors.collectors.PropertyCollector}. | |
1202 | A property collector follows the visitor pattern\citing{designPatterns} and is | |
1203 | of the \typewithref{org.eclipse.jdt.core.dom}{ASTVisitor} type. An | |
1204 | \type{ASTVisitor} visits nodes in an abstract syntax tree that forms the Java | |
1205 | document object model. The tree consists of nodes of type | |
1206 | \typewithref{org.eclipse.jdt.core.do}{ASTNode}. | |
1207 | ||
1208 | \subsubsection{The PrefixesCollector} | |
1209 | The \typewithref{no.uio.ifi.refaktor.extractors.collectors}{PrefixesCollector} | |
1210 | is of type \type{PropertyCollector}. It visits expression | |
1211 | statements\typeref{org.eclipse.jdt.core.dom.ExpressionStatement} and creates | |
1212 | prefixes from its expressions in the case of method invocations. The prefixes | |
1213 | found is registered with a prefix set, together with all its sub-prefixes. | |
1214 | \todo{Rewrite in the case of changes to the way prefixes are found} | |
1215 | ||
1216 | \subsubsection{The UnfixesCollector} | |
1217 | The \typewithref{no.uio.ifi.refaktor.extractors.collectors}{UnfixesCollector} | |
1218 | finds unfixes within the selection. An unfix is a name that is assigned to | |
1219 | within the selection. The reason that this cannot be allowed, is that the result | |
1220 | would be an assignment to the \type{this} keyword, which is not valid in Java. | |
1221 | ||
1222 | \subsubsection{Computing Safe Prefixes} | |
1223 | A safe prefix is a prefix that does not enclose an unfix. A prefix is enclosing | |
1224 | an unfix if the unfix is in the set of its sub-prefixes. As an example, | |
1225 | \texttt{``a.b''} is enclosing \texttt{``a''}, as is \texttt{``a''}. The safe | |
1226 | prefixes is unified in a \type{PrefixSet} and can be fetched calling the | |
1227 | \method{getSafePrefixes} method of the | |
1228 | \type{ExtractAndMoveMethodPrefixesExtractor}. | |
1229 | ||
1230 | \subsection{The Prefix Class} | |
1231 | \todo{?} | |
1232 | \subsection{The PrefixSet Class} | |
1233 | ||
1234 | \subsection{Hacking the Refactoring Undo | |
1235 | History}\label{hacking_undo_history} | |
1236 | \todo{Where to put this section?} | |
1237 | ||
1238 | As an attempt to make multiple subsequent changes to the workspace appear as a | |
1239 | single action (i.e. make the undo changes appear as such), I tried to alter | |
1240 | the undo changes\typeref{org.eclipse.ltk.core.refactoring.Change} in the history | |
1241 | of the refactorings. | |
1242 | ||
1243 | My first impulse was to remove the, in this case, last two undo changes from the | |
1244 | undo manager\typeref{org.eclipse.ltk.core.refactoring.IUndoManager} for the | |
1245 | Eclipse refactorings, and then add them to a composite | |
1246 | change\typeref{org.eclipse.ltk.core.refactoring.CompositeChange} that could be | |
1247 | added back to the manager. The interface of the undo manager does not offer a | |
1248 | way to remove/pop the last added undo change, so a possible solution could be to | |
1249 | decorate\citing{designPatterns} the undo manager, to intercept and collect the | |
1250 | undo changes before delegating to the \method{addUndo} | |
1251 | method\methodref{org.eclipse.ltk.core.refactoring.IUndoManager}{addUndo} of the | |
1252 | manager. Instead of giving it the intended undo change, a null change could be | |
1253 | given to prevent it from making any changes if run. Then one could let the | |
1254 | collected undo changes form a composite change to be added to the manager. | |
1255 | ||
1256 | There is a technical challenge with this approach, and it relates to the undo | |
1257 | manager, and the concrete implementation | |
1258 | UndoManager2\typeref{org.eclipse.ltk.internal.core.refactoring.UndoManager2}. | |
1259 | This implementation is designed in a way that it is not possible to just add an | |
1260 | undo change, you have to do it in the context of an active | |
1261 | operation\typeref{org.eclipse.core.commands.operations.TriggeredOperations}. | |
1262 | One could imagine that it might be possible to trick the undo manager into | |
1263 | believing that you are doing a real change, by executing a refactoring that is | |
1264 | returning a kind of null change that is returning our composite change of undo | |
1265 | refactorings when it is performed. | |
1266 | ||
1267 | Apart from the technical problems with this solution, there is a functional | |
1268 | problem: If it all had worked out as planned, this would leave the undo history | |
1269 | in a dirty state, with multiple empty undo operations corresponding to each of | |
1270 | the sequentially executed refactoring operations, followed by a composite undo | |
1271 | change corresponding to an empty change of the workspace for rounding of our | |
1272 | composite refactoring. The solution to this particular problem could be to | |
1273 | intercept the registration of the intermediate changes in the undo manager, and | |
1274 | only register the last empty change. | |
1275 | ||
1276 | Unfortunately, not everything works as desired with this solution. The grouping | |
1277 | of the undo changes into the composite change does not make the undo operation | |
1278 | appear as an atomic operation. The undo operation is still split up into | |
1279 | separate undo actions, corresponding to the change done by its originating | |
1280 | refactoring. And in addition, the undo actions has to be performed separate in | |
1281 | all the editors involved. This makes it no solution at all, but a step toward | |
1282 | something worse. | |
1283 | ||
1284 | There might be a solution to this problem, but it remains to be found. The | |
1285 | design of the refactoring undo management is partly to be blamed for this, as it | |
1286 | it is to complex to be easily manipulated. | |
1287 | ||
1288 | ||
1289 | ||
1290 | \chapter{Related Work} | |
1291 | ||
1292 | \section{The compositional paradigm of refactoring} | |
1293 | This paradigm builds upon the observation of Vakilian et | |
1294 | al.\citing{vakilian2012}, that of the many automated refactorings existing in | |
1295 | modern IDEs, the simplest ones are dominating the usage statistics. The report | |
1296 | mainly focuses on \emph{Eclipse} as the tool under investigation. | |
1297 | ||
1298 | The paradigm is described almost as the opposite of automated composition of | |
1299 | refactorings \see{compositeRefactorings}. It works by providing the programmer | |
1300 | with easily accessible primitive refactorings. These refactorings shall be | |
1301 | accessed via keyboard shortcuts or quick-assist menus\footnote{Think | |
1302 | quick-assist with Ctrl+1 in Eclipse} and be promptly executed, opposed to in the | |
1303 | currently dominating wizard-based refactoring paradigm. They are ment to | |
1304 | stimulate composing smaller refactorings into more complex changes, rather than | |
1305 | doing a large upfront configuration of a wizard-based refactoring, before | |
1306 | previewing and executing it. The compositional paradigm of refactoring is | |
1307 | supposed to give control back to the programmer, by supporting \himher with an | |
1308 | option of performing small rapid changes instead of large changes with a lesser | |
1309 | degree of control. The report authors hope this will lead to fewer unsuccessful | |
1310 | refactorings. It also could lower the bar for understanding the steps of a | |
1311 | larger composite refactoring and thus also help in figuring out what goes wrong | |
1312 | if one should choose to op in on a wizard-based refactoring. | |
1313 | ||
1314 | Vakilian and his associates have performed a survey of the effectiveness of the | |
1315 | compositional paradigm versus the wizard-based one. They claim to have found | |
1316 | evidence of that the \emph{compositional paradigm} outperforms the | |
1317 | \emph{wizard-based}. It does so by reducing automation, which seem | |
1318 | counterintuitive. Therefore they ask the question ``What is an appropriate level | |
1319 | of automation?'', and thus questions what they feel is a rush toward more | |
1320 | automation in the software engineering community. | |
1321 | ||
1322 | ||
1323 | \backmatter{} | |
1324 | \printbibliography | |
1325 | \listoftodos | |
1326 | \end{document} |