was first published in 1984 by the \name{Forth Interest Group}. Then it was
reprinted in 1994 with minor typographical corrections, before it was
transcribed into an electronic edition typeset in \LaTeX\ and published under a
-Creative Commons licence in
+Creative Commons license in
2004. The edition cited here is the 2004 edition, but the content should
essentially be as in 1984.}. The exact word is only printed one
place~\cite[p.~232]{brodie2004}, but the term \emph{factoring} is prominent in
\subsection{Reasons for refactoring}
There are many reasons why people want to refactor their programs. They can for
instance do it to remove duplication, break up long methods or to introduce
-design patterns into their software systems. The shared trait for all these are
+design patterns into their software systems. The shared trait for all these is
that peoples' intentions are to make their programs \emph{better}, in some
sense. But what aspects of their programs are becoming improved?
like when humans learn to read. First they must learn how to recognize letters.
Then they can learn distinct words, and later read sequences of words that form
whole sentences. Eventually, most of them will be able to read whole books and
-briefly retell the important parts of its content. This suggest that the use of
+briefly retell the important parts of its content. This suggests that the use of
design patterns is a good idea when reasoning about computer programs. With
extensive use of design patterns when creating complex program structures, one
does not always have to read whole classes of code to comprehend how they
\begin{description}
\item[1992] William F. Opdyke submits his doctoral dissertation called
\tit{Refactoring Object-Oriented Frameworks}\citing{opdyke1992}. This work
- defines a set of refactorings, that are behavior preserving given that their
+ defines a set of refactorings that are behavior-preserving given that their
preconditions are met. The dissertation is focused on the automation of
refactorings.
\item[1999] Martin Fowler et al.: \tit{Refactoring: Improving the Design of
keeps \name{NetBeans} from breaking the code in the example from
\myref{sec:correctness}.) If \var{c.x} for some reason is inaccessible to
\type{X}, as in this case, the refactoring breaks the code, and it will not
-compile. \name{NetBeans} presents a preview of the refactoring outcome, but the
+compile. \name{NetBeans} presents a preview of the refactoring outcome, but the
preview does not catch it if the IDE is about break the program.
The IDEs under investigation seem to have fairly good support for primitive
\subsubsection{The impact on performance}
\begin{quote}
- Refactoring certainly will make software go more slowly\footnote{With todays
+ Refactoring certainly will make software go more slowly\footnote{With today's
compiler optimization techniques and performance tuning of e.g. the Java
virtual machine, the penalties of object creation and method calls are
debatable.}, but it also makes the software more amenable to performance
Then it is crucial to have proper tool support that can perform them
automatically. Tools that can parse source code and thus have semantic knowledge
about which occurrences of which names belong to what construct in the program.
-For even trying to perform one of these complex task manually, one would have to
-be very confident on the existing test suite \see{testing}.
+For even trying to perform one of these complex tasks manually, one would have
+to be very confident on the existing test suite \see{testing}.
\subsection{Correctness of refactorings}\label{sec:correctness}
For automated refactorings to be truly useful, they must show a high degree of
particular situation without altering the program's behavior, mainly because
its \refa{Move Method} refactoring implementation is a bit flawed in other ways
\see{toolSupport}.}. The target and the destination for the composed
- refactoring is shown in \myref{lst:correctnessExtractAndMove}. Note that the
+ refactoring are shown in \myref{lst:correctnessExtractAndMove}. Note that the
method \method{m(C c)} of class \type{X} assigns to the field \var{x} of the
argument \var{c} that has type \type{C}.
\end{quote}
When refactoring, there are roughly three classes of errors that can be made.
-The first class of errors are the ones that make the code unable to compile.
+The first class of errors is the one that makes the code unable to compile.
These \emph{compile-time} errors are of the nicer kind. They flash up at the
moment they are made (at least when using an IDE), and are usually easy to fix.
-The second class are the \emph{runtime} errors. Although they take a bit longer
-to surface, they usually manifest after some time in an illegal argument
+The second class is the \emph{runtime} errors. Although these errors take a bit
+longer to surface, they usually manifest after some time in an illegal argument
exception, null pointer exception or similar during the program execution.
-These kind of errors are a bit harder to handle, but at least they will show,
+These kinds of errors are a bit harder to handle, but at least they will show,
eventually. Then there are the \emph{behavior-changing} errors. These errors are
of the worst kind. They do not show up during compilation and they do not turn
on a blinking red light during runtime either. The program can seem to work
automated tests. Manual testing may have its uses, but when refactoring, it is
automated unit testing that dominate. For discovering behavior changes it is
especially important to have tests that cover potential problems, since these
-kind of errors does not reveal themselves.
+kinds of errors do not reveal themselves.
Unit testing is not a way to \emph{prove} that a program is correct, but it is a
way to make you confident that it \emph{probably} works as desired. In the
If the test coverage for a code base is perfect, then it should, theoretically,
be risk-free to perform refactorings on it. This is why automated tests and
-refactoring are such a great match.
+refactoring is such a great match.
\subsubsection{Testing the code from correctness section}
The worst thing that can happen when refactoring is to introduce changes to the
There is a certain class of tools that meet these criteria, namely the class of
\emph{IDEs}\footnote{\emph{Integrated Development Environment}}. These are
-programs that is meant to support the whole production cycle of a computer
+programs that are meant to support the whole production cycle of a computer
program, and the most popular IDEs that support Java, generally have quite good
refactoring support.
thesis.
\subsubsection{Evolutionary design}
-In the programming work for this project, it have tried to use a design strategy
+In the programming work for this project, I have tried using a design strategy
called evolutionary design, also known as continuous or incremental
-design\citing{wiki_continuous_2014}. It is a software design strategy
-advocated by the Extreme Programming community. The essence of the strategy is
-that you should let the design of your program evolve naturally as your
-requirements change. This is seen in contrast with up-front design, where
-design decisions are made early in the process.
+design\citing{wiki_continuous_2014}. It is a software design strategy advocated
+by the Extreme Programming community. The essence of the strategy is that you
+should let the design of your program evolve naturally as your requirements
+change. This is seen in contrast with up-front design, where design decisions
+are made early in the process.
The motivation behind evolutionary design is to keep the design of software as
simple as possible. This means not introducing unneeded functionality into a
is presented in \myref{sec:correctness}.
\subsubsection{Project ``Safer Refactorings''}
-\title{Safer Refactorings}\citing{stolzSaferRefactorings} is a proposal for a
+\tit{Safer Refactorings}\citing{stolzSaferRefactorings} is a proposal for a
master's thesis. The proposer is my supervisor, Volker Stolz from the University
of Oslo.
Vakilian and his associates have performed a survey of the effectiveness of the
compositional paradigm versus the wizard-based one. They claim to have found
evidence of that the \emph{compositional paradigm} outperforms the
-\emph{wizard-based}. It does so by reducing automation, which seem
+\emph{wizard-based}. It does so by reducing automation, which seems
counterintuitive. Therefore they ask the question ``What is an appropriate level
of automation?'', and thus questions what they feel is a rush toward more
automation in the software engineering community.
specific portion of text we are about to deal with.
To be able to clearly reason about a text selection done to a portion of text in
-a computer file, that consist of pure text, we put up the following definition.
+a computer file, which consists of pure text, we put up the following
+definition:
\definition{A \emph{text selection} in a text file is defined by two
non-negative integers, in addition to a reference to the file itself. The first
\section{Where we look for text selections}
\subsection{Text selections are found in methods}
-The text selections we are interested in are those that surrounds program
+The text selections we are interested in are those that surround program
statements. Therefore, the place we look for selections that can form candidates
for an execution of the \ExtractAndMoveMethod refactoring, is within the body of
a single method.
\ExtractAndMoveMethod refactoring. The reason for this is that in the cases
where we want to perform the refactoring for a selection within a static method,
the first step is to extract the selection into a new method. Hence this method
-also become static, since it must be possible to call it from a static context.
+also becomes static, since it must be possible to call it from a static context.
It would then be difficult to move the method to another class, make it
non-static and calling it through a variable. To avoid these obstacles, we
simply ignore static methods.
\end{listing}
Each nesting level of a method body can have many such sequences of statements.
-The outermost nesting level has one such sequence, and each branch contains
-their own sequence of statements. \Myref{lst:grandExample} has a version of some
+The outermost nesting level has one such sequence, and each branch contains
+its own sequence of statements. \Myref{lst:grandExample} has a version of some
code where all such sequences of statements are highlighted for a method body.
To complete our example of possible text selections, I will now list all
\end{description}
\subsubsection{The complexity}\label{sec:complexity}
-The complexity of how many text selections that needs to be analyzed for a body
+The complexity of how many text selections that need to be analyzed for a body
of in total $n$ statements, is bounded by $O(n^2)$. A body of statements is here
all the statements in all nesting levels of a sequence of statements. A method
body (or a block) is a body of statements. To prove that the complexity is
-bounded by $O(n^2)$, I present a couple of theorems and proves them.
+bounded by $O(n^2)$, I present a couple of theorems and prove them.
\begin{theorem}
The number of text selections that need to be analyzed for each list of
\end{proof}
-Therefore, the complexity for the number of selections that needs to be analyzed
+Therefore, the complexity for the number of selections that need to be analyzed
for a body of $n$ statements is $O\bigl(\frac{n(n+1)}{2}\bigr) = O(n^2)$.
\section{Disqualifying a selection}
Situations like this would lead to code that will not compile. Therefore, we
have to avoid them by not allowing selections to contain such double class
-instance creations that also contains references to fields.
+instance creations that also contain references to fields.
\begin{comment}
\todoin{File Eclipse bug report}
\end{comment}
Performing a move on a method that instantiates a non-static inner class, will
break the code if the instantiation is not handled properly. For this reason,
-selections that contains instantiations of non-static inner classes are deemed
+selections that contain instantiations of non-static inner classes are deemed
unsuitable for the \ExtractAndMoveMethod refactoring.
\subsection{References to enclosing instances of the enclosing class}
return type, then a call to it would also be a valid return point for the
calling method. If its return value is of the void type, then the \ExtractMethod
refactoring will append an empty return statement to the back of the method
-call. Therefore, the analysis does not discriminate on either kinds of return
+call. Therefore, the analysis does not discriminate on either kind of return
statements, with or without a return value.
A throw statement is accepted anywhere a return statement is required. This is
We can start the check for this property by looking at the last statement of a
selection to see if it is a return statement (explicit or implicit) or a throw
statement. If this is the case, then the property holds, assuming the selected
-code does not contain any compilation errors. All execution paths within the
+code do not contain any compilation errors. All execution paths within the
selection should end in either this, or another, return or throw statement.
\todoin{State somewhere that we assume no compilation errors?}
of it must eventually end in one for the selection to be legal. This means that
all branches of the last statement of every branch must end in a return or
throw. Given this recursive definition, there are only five types of statements
-that are guaranteed to end in a return or throw if their child branches does.
-All other statements would have to be considered illegal. The first three:
+that are guaranteed to end in a return or throw if their child branches do. All
+other statements would have to be considered illegal. The first three:
Block-statements, labeled statements and do-statements are all kinds of
-fall-through statements that always gets their body executed. Do-statements
-would not make much sense if written such that they
+fall-through statements that always get their body executed. Do-statements would
+not make much sense if written such that they
always ends after the first round of execution of their body, but that is not
our concern. The remaining two statements that can end in a return or throw are
if-statements and try-statements.
clauses and finally body.
\subsection{Ambiguous return values}
-The problem with ambiguous return values arise when a selection is chosen to be
+The problem with ambiguous return values arises when a selection is chosen to be
extracted into a new method, but it needs to return more than one value from
that method.
-This problem occurs in two situations. The first situation arise when there is
+This problem occurs in two situations. The first situation arises when there is
more than one local variable that is both assigned to within a selection and
-also referenced after the selection. The other situation occur when there is
+also referenced after the selection. The other situation occurs when there is
only one such assignment, but the selection also contain return statements.
Therefore we must examine the selection for assignments to local variables that
\section{Finding a move target}
In the analysis needed to perform the \ExtractAndMoveMethod refactoring
automatically, the selection we choose is found among all the selections that
-has a possible move target. Therefore, the best possible move target must be
+have a possible move target. Therefore, the best possible move target must be
found for all the candidate selections, so that we are able to sort out the
selection that is best suited for the refactoring.
The names we are looking for, we call prefixes. This is because we are not
interested in names that occur in the middle of a dot-separated sequence of
-names. We are only interested in names that constitutes prefixes of other names,
+names. We are only interested in names constituting prefixes of other names, and
possibly themselves. The reason for this, is that two lexically equal names need
not be referencing the same variable, if they themselves are not referenced via
the same prefix. Consider the two method calls \code{a.x.foo()} and
\section{Unfixes}\label{s:unfixes}
The prefixes that are not valid as move targets are called unfixes.
-An unfix can be a name that is assigned to within a selection. The reason that
-this cannot be allowed, is that the result would be an assignment to the
-\type{this} keyword, which is not valid in Java \see{eclipse_bug_420726}.
+A name that is assigned to within a selection can be an unfix. The reason for
+this is that the result would be an assignment to the \type{this} keyword, which
+is not valid in Java \see{eclipse_bug_420726}.
-Prefixes that originates from variable declarations within the same selection
-are also considered unfixes. This is because when a method is moved, it needs to
-be called through a variable. If this variable is also declared within the
-method that is to be moved, this obviously cannot be done.
+Prefixes that originate from variable declarations within the same selection are
+also considered unfixes. This is because when a method is moved, it needs to be
+called through a variable. If this variable is also declared within the method
+that is to be moved, this obviously cannot be done.
Also considered as unfixes are variable references that are of types that are
not suitable for moving methods to. This can either be because it is not
compilation errors by doing so.
If the type binding for a name is not resolved it is considered and unfix. The
-same applies to types that is only found in compiled code, so they have no
+same applies to types that are only found in compiled code, so they have no
underlying source that is accessible to us. (E.g. the \type{java.lang.String}
class.)
features of Java versions later than Java 7. Java 8 has interfaces with default
implementations of methods.)
-Neither are local types allowed. This accounts for both local and anonymous
+Neither is local types allowed. This accounts for both local and anonymous
classes. Anonymous classes are effectively the same as interface types with
respect to unfixes. Local classes could in theory be used as targets, but this
is not possible due to limitations of the way the \refa{Extract and Move Method}
\end{listing}
The last class of names that are considered unfixes are names used in null
-tests. These are tests that reads like this: if \code{<name>} equals \var{null}
+tests. These are tests that read like this: if \code{<name>} equals \var{null}
then do something. If allowing variables used in those kinds of expressions as
targets for moving methods, we would end up with code containing boolean
expressions like \code{this == null}, which would not be meaningful, since
\section{Finding the example selections that have possible targets}
We now pick up the thread from \myref{sec:disqualifyingExample} where we have a
-set of text selections that needs to be analyzed to find out if some of them are
+set of text selections that need to be analyzed to find out if some of them are
suitable targets for the \ExtractAndMoveMethod refactoring.
We start by analyzing the text selections for nesting level 2, because these
only found in one statement, and in a prefix with only one segment, so it is
not considered to be a possible move target. The only variable left is
\var{b}. Just as in the selection $(16,17)$, \var{b} is part of the prefix
- \code{b.a}, that has 2 appearances. We have found a new candidate $((14,21),
- \texttt{b.a}, f(2))$.
+ \code{b.a}, which has 2 appearances. We have found a new candidate
+ $((14,21), \texttt{b.a}, f(2))$.
\end{description}
\subsubsection{The Refactoring Class}
The abstract class \type{Refactoring} is the core of the LTK framework. Every
-refactoring that is going to be supported by the LTK have to end up creating an
+refactoring that is going to be supported by the LTK has to end up creating an
instance of one of its subclasses. The main responsibilities of subclasses of
-\type{Refactoring} is to implement template methods for condition checking
+\type{Refactoring} are to implement template methods for condition checking
(\methodwithref{org.eclipse.ltk.core.refactoring.Refactoring}{checkInitialConditions}
and
\methodwithref{org.eclipse.ltk.core.refactoring.Refactoring}{checkFinalConditions}),
It then delegates to its given
\typewithref{org.eclipse.ltk.core.refactoring.participants}{RefactoringProcessor}
for condition checking and change creation. Participating in a refactoring can
-be useful in cases where the changes done to programming source code affects
+be useful in cases where the changes done to programming source code affect
other related resources in the workspace. This can be names or paths in
configuration files, or maybe one would like to perform additional logging of
changes done in the workspace.
\subsubsection{The Change Class}
This class is the base class for objects that is responsible for performing the
actual workspace transformations in a refactoring. The main responsibilities for
-its subclasses is to implement the
+its subclasses are to implement the
\methodwithref{org.eclipse.ltk.core.refactoring.Change}{perform} and
\methodwithref{org.eclipse.ltk.core.refactoring.Change}{isValid} methods. The
\method{isValid} method verifies that the change object is valid and thus can be
\subsubsection{The MoveInstanceMethodProcessor Class}
For the \refa{Move Method}, the processor requires a little more advanced input than
the class for the \refa{Extract Method}. For construction it requires a method
-handle\typeref{org.eclipse.jdt.core.IMethod} for the method that is to be moved.
-Then the target for the move have to be supplied as the variable binding from a
-chosen variable declaration. In addition to this, one have to set some
-parameters regarding setters/getters, as well as delegation.
+handle\typeref{org.eclipse.jdt.core.IMethod} for the method that is to be moved.
+Then the target for the move has to be supplied as the variable binding from a
+chosen variable declaration. In addition to this, some parameters have to be set
+regarding setters/getters, as well as delegation.
-To make a working refactoring from the processor, one have to create a
-\type{MoveRefactoring} with it.
+To make the processor a working refactoring, a \type{MoveRefactoring} must be
+created with it as a parameter.
\subsection{The ExtractAndMoveMethodChanger}
For finding the best suitable target the analyzer is using a
\typewithref{no.uio.ifi.refaktor.analyze.collectors}{PrefixesCollector} that
collects all the possible candidate targets for the refactoring. All the
-non-candidates is found by an
+non-candidates are found by an
\typewithref{no.uio.ifi.refaktor.analyze.collectors}{UnfixesCollector} that
collects all the targets that will give some kind of error if used. (For
details about the property collectors, see \myref{propertyCollectors}.) All
prefixes (and unfixes) are represented by a
\typewithref{no.uio.ifi.refaktor.extractors}{Prefix}, and they are collected
-into sets of prefixes. The safe prefixes is found by subtracting from the set of
-candidate prefixes the prefixes that is enclosing any of the unfixes. A prefix
-is enclosing an unfix if the unfix is in the set of its sub-prefixes. As an
-example, \code{``a.b''} is enclosing \code{``a''}, as is \code{``a''}. The safe
-prefixes is unified in a \type{PrefixSet}. If a prefix has only one occurrence,
-and is a simple expression, it is considered unsuitable as a move target. This
-occurs in statements such as \code{``a.foo()''}. For such statements it bares no
-meaning to extract and move them. It only generates an extra method and the
-calling of it.
+into sets of prefixes. The safe prefixes are found by subtracting from the set
+of candidate prefixes the prefixes that is enclosing any of the unfixes. A
+prefix is enclosing an unfix if the unfix is in the set of its sub-prefixes. As
+an example, \code{``a.b''} is enclosing \code{``a''}, as is \code{``a''}. The
+safe prefixes is unified in a \type{PrefixSet}. If a prefix has only one
+occurrence, and is a simple expression, it is considered unsuitable as a move
+target. This occurs in statements such as \code{``a.foo()''}. For such
+statements it bares no meaning to extract and move them. It only generates an
+extra method and the calling of it.
The most suitable target for the refactoring is found by finding the prefix with
the most occurrences. If two prefixes have the same occurrence count, but they
\subsubsection{The SearchBasedExtractAndMoveMethodAnalyzer}
This analyzer is responsible for analyzing all the possible text selections of a
-method and then choose the best result out of the analysis results that is, by
-the analyzer, considered to be the potential candidates for the Extract and Move
-Method refactoring.
+method and then to choose the best result out of the analysis results that are,
+by the analyzer, considered to be the potential candidates for the Extract and
+Move Method refactoring.
Before the analyzer is able to work with the text selections of a method, it
needs to generate them. To do this, it parses the method to obtain a
Unfortunately, not everything works as desired with this solution. The grouping
of the undo changes into the composite change does not make the undo operation
appear as an atomic operation. The undo operation is still split up into
-separate undo actions, corresponding to the change done by its originating
-refactoring. And in addition, the undo actions has to be performed separate in
+separate undo actions, corresponding to the changes done by their originating
+refactorings. And in addition, the undo actions have to be performed separate in
all the editors involved. This makes it no solution at all, but a step toward
something worse.
There might be a solution to this problem, but it remains to be found. The
-design of the refactoring undo management is partly to be blamed for this, as it
-it is to complex to be easily manipulated.
-
-
+design of the refactoring undo management is partly to be blamed for this, as
+it is too complex to be easily manipulated.
\chapter{Analyzing Source Code in Eclipse}
is light-weight, and has only limited possibilities for manipulating source
code. It is typically used as a basis for the Package Explorer in \name{Eclipse}.
-The elements of the Java model is only handles to the underlying elements. This
+The elements of the Java model are only handles to the underlying elements. This
means that the underlying element of a handle does not need to actually exist.
Hence the user of a handle must always check that it exist by calling the
\method{exists} method of the handle.
-The handles with descriptions is listed in \myref{tab:javaModel}, while the
+The handles with descriptions are listed in \myref{tab:javaModel}, while the
hierarchy of the Java Model is shown in \myref{fig:javaModel}.
\begin{table}[htb]
bracket \var{\{} or a \var{nameToken}. It is recognized by the scanner on the
basis of something equivalent of a regular expression. This part of the process
is often implemented with the use of a finite automata. In fact, it is common to
-specify the tokens in regular expressions, that in turn is translated into a
+specify the tokens in regular expressions, which in turn are translated into a
finite automata lexer. This process can be automated.
The program component used to translate a stream of tokens into something
Form}\footnote{\url{https://en.wikipedia.org/wiki/Backus-Naur\_Form}}. The
result coming from the parser is in the form of an \emph{Abstract Syntax Tree},
AST for short. It is called \emph{abstract}, because the structure does not
-contain all of the tokens produced by the scanner. It only contain logical
+contain all of the tokens produced by the scanner. It only contains logical
constructs, and because it forms a tree, all kinds of parentheses and brackets
are implicit in the structure. It is this AST that is used when performing the
semantic analysis of the code.
So far, the only thing that has been addressed is how the data that is going to
be the basis for our analysis is structured. Another aspect of it is how we are
going to traverse the AST to gather the information we need, so we can conclude
-about the properties we are analysing. It is of course possible to start at the
+about the properties we are analyzing. It is of course possible to start at the
top of the tree, and manually search through its nodes for the ones we are
looking for, but that is a bit inconvenient. To be able to efficiently utilize
such an approach, we would need to make our own framework for traversing the
The use of the visitor pattern can be appropriate when the hierarchy of elements
is mostly stable, but the family of operations over its elements is constantly
-growing. This is clearly the case for the \name{Eclipse} AST, since the hierarchy of
-type \type{ASTNode} is very stable, but the functionality of its elements is
-extended every time someone needs to operate on the AST. Another aspect of the
-\name{Eclipse} implementation is that it is a public API, and the visitor pattern is an
-easy way to provide access to the nodes in the tree.
+growing. This is clearly the case for the \name{Eclipse} AST, since the
+hierarchy for the type \type{ASTNode} is very stable, but the functionality of
+its elements is extended every time someone need to operate on the AST. Another
+aspect of the \name{Eclipse} implementation is that it is a public API, and the
+visitor pattern is an easy way to provide access to the nodes in the tree.
The version of the visitor pattern implemented for the AST nodes in \name{Eclipse} also
provides an elegant way to traverse the tree. It does so by following the
before it also makes all its children accept the visitor. The children are only
visited if the visit method of their parent returns \var{true}. This pattern
then makes for a prefix traversal of the AST. If postfix traversal is desired,
-the visitors also has \method{endVisit} methods for each node type, that is
+the visitors also have \method{endVisit} methods for each node type, which is
called after the \method{visit} method for a node. In addition to these visit
methods, there are also the methods \method{preVisit(ASTNode)},
\method{postVisit(ASTNode)} and \method{preVisit2(ASTNode)}. The
\refa{Extract and Move Method} refactoring. It visits expression
statements\typeref{org.eclipse.jdt.core.dom.ExpressionStatement} and creates
prefixes from its expressions in the case of method invocations. The prefixes
-found is registered with a prefix set, together with all its sub-prefixes.
+found are registered with a prefix set, together with all its sub-prefixes.
\subsection{The UnfixesCollector}\label{unfixes}
The \typewithref{no.uio.ifi.refaktor.extractors.collectors}{UnfixesCollector}
encountered statement with the end offset of the previous statement found.
\section{Checkers}\label{checkers}
-The checkers are a range of classes that checks that text selections complies
+The checkers are a range of classes that checks that text selections comply
with certain criteria. All checkers operates under the assumption that the code
they check is free from compilation errors. If a
\typewithref{no.uio.ifi.refaktor.analyze.analyzers}{Checker} fails, it throws a
\subsection{The DoubleClassInstanceCreationChecker}
The \type{DoubleClassInstanceCreationChecker} checks that there are no double
-class instance creations where the inner constructor call take and argument that
+class instance creations where the inner constructor call takes an argument that
is built up using field references.
The checker visits all nodes of type \type{ClassInstanceCreation} within a
\subsection{The InstantiationOfNonStaticInnerClassChecker}
The \type{InstantiationOfNonStaticInnerClassChecker} checks that selections
-does not contain instantiations of non-static inner classes. The
+do not contain instantiations of non-static inner classes. The
\type{MoveInstanceMethodProcessor} in \name{Eclipse} does not handle such
instantiations gracefully when moving a method. This problem is also related to
bug\ldots \todoin{File Eclipse bug report}
First, the checker needs to collect some data. Those data are the binding keys
for all simple names that are assigned to within the selection, including
-variable declarations, but excluding fields. The checker also collects whether
-there exists a return statement in the selection or not. No further checks of
-return statements are needed, since, at this point, the selection is already
-checked for illegal return statements \see{returnStatementsChecker}.
+variable declarations, but excluding fields. The checker also finds out whether
+a return statement is found in the selection or not. No further checks of return
+statements are needed, since, at this point, the selection is already checked
+for illegal return statements \see{returnStatementsChecker}.
After the binding keys of the assignees are collected, the checker searches the
part of the enclosing method that is after the selection for references whose
\subsection{The IllegalStatementsChecker}
This checker is designed to check for illegal statements.
-Notice that labels in break and continue statements needs some special
-treatment. Since a label does not have any binding information, we have to
-search upwards in the AST to find the \type{LabeledStatement} that corresponds
-to the label from the break or continue statement, and check that it is
-contained in the selection. If the break or continue statement does not have a
-label attached to it, it is checked that its innermost enclosing loop or switch
-statement (break statements only) also is contained in the selection.
+Notice that labels in break and continue statements need some special treatment.
+Since a label does not have any binding information, we have to search upwards
+in the AST to find the \type{LabeledStatement} that corresponds to the label
+from the break or continue statement, and check that it is contained in the
+selection. If the break or continue statement does not have a label attached to
+it, it is checked that its innermost enclosing loop or switch statement (break
+statements only) also is contained in the selection.
\todoin{Follow the development in the semantics section\ldots}
\chapter{Technicalities}
\section{Source code organization}
-All the parts of this master's project is under version control with
+All the parts of this master's project are under version control with
\name{Git}\footnote{\url{http://git-scm.com/}}.
The software written is organized as some \name{Eclipse} plugins. Writing a plugin is
composite refactoring over selected Eclipse projects.
\item[no.uio.ifi.refaktor.releng] \hfill \\
- Contains the rmap, queries and target definitions needed by by Buckminster
- on the Jenkins continuous integration server.
+ Contains the rmap, queries and target definitions needed by Buckminster on
+ the Jenkins continuous integration server.
\end{description}
\item[no.uio.ifi.refaktor.analyze.analyzers] \hfill \\
This package contains source code analyzers. These are usually responsible
for analyzing text selections or running specialized analyzers for different
- kinds of entities. Their structure are often hierarchical. This means that
+ kinds of entities. Their structures are often hierarchical. This means that
you have an analyzer for text selections, that in turn is utilized by an
analyzer that analyzes all the selections of a method. Then there are
analyzers for analyzing all the methods of a type, all the types of a
\item[no.uio.ifi.refaktor.analyze.collectors] \hfill \\
This package contains the property collectors. Collectors are used to gather
properties from a text selection. This is mostly properties regarding
- referenced names and their occurrences. It is these properties that makes up
+ referenced names and their occurrences. It is these properties that make up
the basis for finding the best candidates for a refactoring.
\end{description}
classes for different caching strategies.
\item[no.uio.ifi.refaktor.utils.nullobjects] \hfill \\
- Contains classes for creating different null objects. Most of the classes is
- used to represent null objects of different handle types. These null objects
- are returned from various utility classes instead of returning a \var{null}
- value when other values are not available.
+ Contains classes for creating different null objects. Most of the classes
+ are used to represent null objects of different handle types. These null
+ objects are returned from various utility classes instead of returning a
+ \var{null} value when other values are not available.
\end{description}
\subsection{Problems with AspectJ}
The Buckminster build worked fine until introducing AspectJ into the project.
-When building projects using AspectJ, there are some additional steps that needs
+When building projects using AspectJ, there are some additional steps that need
to be performed. First of all, the aspects themselves must be compiled. Then the
-aspects needs to be woven with the classes they affect. This demands a process
+aspects need to be woven with the classes they affect. This demands a process
that does multiple passes over the source code.
When using AspectJ with \name{Eclipse}, the specialized compilation and the
with files that has the extensions \code{*.java} and \code{*.aj}. I tried to
setup this in the build properties file for the project containing the aspects,
but to no avail. The project containing the aspects does not seem to be built at
-all, and the projects that depends on it complains that they cannot find certain
+all, and the projects that depend on it complain that they cannot find certain
classes.
I then managed to write an \name{Ant}\footnote{\url{https://ant.apache.org/}}
provided by the \name{Eclipse} installation when building the project. This
obviously creates some problems with maintaining the list of dependencies in the
Ant file, as well as remembering to copy the plugins every time the list of
-dependencies change.
+dependencies changes.
The Ant script described above is run by Jenkins before the Buckminster setup
and build. When setup like this, the Buckminster build succeeds for the projects
The Java project that is going to be used as the data for the benchmark, must be
imported into the JUnit workspace. This is done by the
\typewithref{no.uio.ifi.refaktor.benchmark}{ProjectImporter}. The importer
-require the absolute path to the project description file. It is named
+requires the absolute path to the project description file. This file is named
\code{.project} and is located at the root of the project directory.
The project description is loaded to find the name of the project to be
Aspect-oriented programming is a programming paradigm that is meant to isolate
so-called \emph{cross-cutting concerns} into their own modules. These
-cross-cutting concerns are functionalities that spans over multiple classes, but
+cross-cutting concerns are functionalities that span over multiple classes, but
may not belong naturally in any of them. It can be functionality that does not
concern the business logic of an application, and thus may be a burden when
entangled with parts of the source code it does not really belong. Examples
time where it is desired that it starts its data gathering. At any point in time
the statistics aspect can be queried for a snapshot of the current statistics.
-The \type{Statistics} class also include functionality for generating a report
+The \type{Statistics} class also includes functionality for generating a report
of its gathered statistics. The report can be given either as a string or it can
be written to a file.
For the analysis part, there are advices to count the number of text selections
analyzed and the number of methods, types, compilation units and packages
analyzed. There are also advices that counts for how many of the methods there
-is found a selection that is a candidate for the refactoring, and for how many
-methods there is not.
+are found a selection that is a candidate for the refactoring, and for how many
+methods there are not.
-There exists advices for counting both the successful and unsuccessful
-executions of all the refactorings. Both for the \ExtractMethod and \MoveMethod
+There exist advices for counting both the successful and unsuccessful executions
+of all the refactorings. Both for the \ExtractMethod and \MoveMethod
refactorings in isolation, as well as for the combination of them.
\section{Optimizations}
garbage collected. A strong reference is basically the same as a regular Java
reference. A soft reference has the same guarantees as a week reference when it
comes to its relation to strong references, but it is not necessarily garbage
-collected whenever there exists no strong references to it. A soft reference
-\emph{may} reside in memory as long as the JVM has enough free memory in the
-heap. A soft reference will therefore usually perform better than a weak
-reference when used for simple caching and similar tasks. The way to use a
-soft/weak reference is to as it for its referent. The return value then has to
-be tested to check that it is not \var{null}. For the basic usage of soft
-references, see \myref{lst:softReferenceExample}. For a more thorough
-explanation of weak references in general, see\citing{weakRef2006}.
+collected if there are no strong references to it. A soft reference \emph{may}
+reside in memory as long as the JVM has enough free memory in the heap. A soft
+reference will therefore usually perform better than a weak reference when used
+for simple caching and similar tasks. The way to use a soft/weak reference is to
+as it for its referent. The return value then has to be tested to check that it
+is not \var{null}. For the basic usage of soft references, see
+\myref{lst:softReferenceExample}. For a more thorough explanation of weak
+references in general, see\citing{weakRef2006}.
\begin{listing}[h]
\begin{minted}{java}
the refactoring is a variable binding. This variable binding indirectly retains
a whole AST. Since ASTs are large structures, this quickly leads to an
\type{OutOfMemoryError} if trying to analyze a large project without optimizing
-how we store the candidates data. This means that the JVM cannot allocate more
-memory for out benchmark, and it exists disgracefully.
+how we store the candidates' data. This means that the JVM cannot allocate more
+memory for our benchmark, and it exits disgracefully.
A possible solution could be to just allow the JVM to allocate even more memory,
but this is not a dependable solution. The allocated memory could easily
supersede the physical memory of a machine, and that would make the benchmark go
really slow.
-Thus, the candidates data must be stored in another format. Therefore, we use
-the \gloss{mementoPattern} to store the variable binding information. This is
-done in a way that makes it possible to retrieve the variable binding at a later
-point. The data that is stored to achieve this, is the key to the original
-variable binding. In addition to the key, we know which method and text
-selection the variable is referenced in, so that we can find it by parsing the
-source code and search for it when it is needed.
+Thus, the candidates' data must be stored in another format. Therefore, we use
+the \gloss{mementoPattern} to store variable binding information. This is done
+in a way that makes it possible to retrieve a variable binding at a later point.
+The data that is stored to achieve this, is the key to the original variable
+binding. In addition to the key, we know which method and text selection the
+variable is referenced in, so that we can find it by parsing the source code and
+search for it when it is needed.
\section{Handling failures}
\todoin{write}
both uses Eclipse as their platform. The first is our own tool,
written to be able to run the \ExtractAndMoveMethod refactoring as a batch
process, analyzing and refactoring many methods after each other. The second is
-JUnit, that is used for running the projects own unit tests on the target code
+JUnit, which is used for running the project's own unit tests on the target code
both before and after it is refactored. The last tool that is used is a code
quality management tool, called \name{SonarQube}. It can be used to perform
different tasks for assuring code quality, but we are only going to take
dependencies of nested classes. It is meant to promote the Single
Responsibility Principle. The metric for the rule resembles the CBO metric
that is described in \myref{sec:CBO}, but is only considering outgoing
- dependencies. The max value for the rule is chosen on the background of an
- empirical study by Raed Shatnawi, that concludes that the number 9 is the
+ dependencies. The max value for the rule is chosen on the basis of an
+ empirical study by Raed Shatnawi, which concludes that the number 9 is the
most useful threshold for the CBO metric\citing{shatnawiQuantitative2010}.
This study is also performed on Eclipse source code, so this threshold value
should be particularly well suited for the Eclipse JDT UI case in this
\item[NPath Complexity] is a rule that measures the number of possible
execution paths through a function. The value used is the default value of
- 200, that seems like a recognized threshold for this metric.
+ 200, which seems like a recognized threshold for this metric.
\item[Too many methods] is a rule that measures the number of methods in a
class. The threshold value used is the default value of 10.
\MoveMethod refactorings. Included in the change time is the parsing and
precondition checking done by the refactorings, as well as textual changes done
to files on disk. All this parsing and disk access is time-consuming, and
-constitute a large part of the change time.
+constitutes a large part of the change time.
In comparison, the pure analysis time, used to find suitable candidates, only
make up for 15\% of the total time consumed. This includes analyzing almost
600,000 text selections, while the number of attempted executions of the
-\ExtractAndMoveMethod refactoring are only about 2,500. So the number of
-executed primitive refactorings are approximately 5,000. Assuming the time used
-on miscellaneous tasks are used mostly for parsing source code for the analysis,
-we can say that the time used for analyzing code is at most 25\% of the total
-time. This means that for every primitive refactoring executed, we can analyze
-around 360 text selections. So, with an average of about 21 text selections per
-method, it is reasonable to say that we can analyze over 15 methods in the time
-it takes to perform a primitive refactoring.
+\ExtractAndMoveMethod refactoring is only about 2,500. So the number of executed
+primitive refactorings is approximately 5,000. Assuming the time used on
+miscellaneous tasks are used mostly for parsing source code for the analysis, we
+can say that the time used for analyzing code is at most 25\% of the total time.
+This means that for every primitive refactoring executed, we can analyze around
+360 text selections. So, with an average of about 21 text selections per method,
+ it is reasonable to say that we can analyze over 15 methods in the time it
+ takes to perform a primitive refactoring.
\subsubsection{Refactoring candidates}
Out of the 27,667 methods that were analyzed, 2,552 methods contained selections
\subsubsection{Executed refactorings}
2,469 out of 2,552 attempts on executing the \ExtractAndMoveMethod refactoring
-were successful, giving a success rate of 96.7\%. The failure rate of 3.3\% stem
-from situations where the analysis finds a candidate selection, but the change
-execution fails. This failure could be an exception that was thrown, and the
-refactoring aborts. It could also be the precondition checking for one of the
-primitive refactorings that gives us an error status, meaning that if the
+were successful, giving a success rate of 96.7\%. The failure rate of 3.3\%
+stems from situations where the analysis finds a candidate selection, but the
+change execution fails. This failure could be an exception that was thrown, and
+the refactoring aborts. It could also be the precondition checking for one of
+the primitive refactorings that gives us an error status, meaning that if the
refactoring proceeds, the code will contain compilation errors afterwards,
forcing the composite refactoring to abort. This means that if the
\ExtractMethod refactoring fails, no attempt is done for the \MoveMethod
refactoring. \todo{Redundant information? Put in benchmark chapter?}
Out of the 2,552 \ExtractMethod refactorings that were attempted executed, 69 of
-them failed. This give a failure rate of 2.7\% for the primitive refactoring. In
-comparison, the \MoveMethod refactoring had a failure rate of 0.6 \% of the
+them failed. This gives a failure rate of 2.7\% for the primitive refactoring.
+In comparison, the \MoveMethod refactoring had a failure rate of 0.6 \% of the
2,483 attempts on the refactoring.
\subsection{\name{SonarQube} analysis}
by the pre-refactoring analysis. \name{SonarQube} discriminates between
functions (methods) and accessors, so the 1,296 accessors play a part in this
calculation. \name{SonarQube} also has the same definition as our plugin when
-it comes to how a class is defined. Therefore is seems like \name{SonarQube}
+it comes to how a class is defined. Therefore it seems like \name{SonarQube}
misses 277 classes that our plugin handles. This can explain why the {SonarQube}
report differs from our numbers by approximately 2,500 methods,
said to have a slightly positive impact.
The improvement in complexity on the method level is somewhat traded for
-complexity on the class level. The complexity per class metric is worsen by 3\%
-from before to after. The issues for the ``Too many methods'' rule also
+complexity on the class level. The complexity per class metric is worsened by
+3\% from before to after. The issues for the ``Too many methods'' rule also
increases by 14.5\%. These numbers indicate that the refactoring makes quite a
lot of the classes a little more complex overall. This is the expected outcome,
since the \ExtractAndMoveMethod refactoring introduces almost 2,500 new methods
caused those problems. I then stopped looking for more, since some of the bugs
would take more time to fix than I could justify using on them at this point.
-The only thing that can be said in my defence, is that all the compilation
-errors could have been avoided if the types of situations that causes them were
-properly handled by the primitive refactorings, that again are supported by the
+The only thing that can be said in my defense, is that all the compilation
+errors could have been avoided if the types of situations that cause them were
+properly handled by the primitive refactorings, which again are supported by the
Eclipse JDT UI project. All of the four randomly found bugs that I mentioned
before, are also weaknesses of the \MoveMethod refactoring. If the primitive
refactorings had detected the up-coming errors
\subsection{Conclusions}
After automatically analyzing and executing the \ExtractAndMoveMethod
-refactoring for all the methods in the Eclipse JDT UI project, the results does
+refactoring for all the methods in the Eclipse JDT UI project, the results do
not look that promising. For this case, the refactoring seems almost unusable as
it is now. The error rate and measurements done tells us this.
The analysis done before the \ExtractAndMoveMethod refactoring, is currently not
complete enough to make the refactoring useful. It introduces too many errors in
-the code, and the code may change it's behavior. It also remains to prove that
+the code, and the code may change its behavior. It also remains to prove that
large scale refactoring with it can decrease coupling between classes. A better
analysis may prove this, but in its present state, the opposite is the fact. The
-coupling measurements done by \name{SonarQube} shows this.
+coupling measurements done by \name{SonarQube} show this.
On the bright side, the performance of the refactoring process is not that bad.
It shows that it is possible to make a tool the way we do, if we can make the
tool do anything useful. As long as the analysis phase is not going to involve
-anything that uses to much disk access, a lot of analysis can be done in a
+anything that uses too much disk access, a lot of analysis can be done in a
reasonable amount of time.
The time used on performing the actual changes excludes a trial and error
percentage points worse than for case 1.
\subsection{\name{SonarQube} analysis}
-Results from the \name{SonarQube} analysis is shown in
+Results from the \name{SonarQube} analysis are shown in
\myref{tab:case2ResultsProfile1}.
Not much is to be said about these results. The trends in complexity and
Since the errors were easy to fix manually, I corrected them and ran the unit
tests as planned. Before the refactoring, all tests passed. All tests also
passed after the refactoring, with the six error corrections. Since the
-corrections done is not of a kind that could make the behavior of the program
+corrections done are not of a kind that could make the behavior of the program
change, it is likely that the refactorings done to the
\type{no.uio.ifi.refaktor} project did not change its behavior. This is also
supported by the informal experiment presented next.
This bug surfaces when trying to use the \refa{Move Method} refactoring to move a
method from an anonymous class to another class. This happens both for my
simulation as well as in \name{Eclipse}, through the user interface. It only occurs
-when \name{Eclipse} analyzes the program and finds it necessary to pass an instance of
-the originating class as a parameter to the moved method. I.e. it want to pass a
-\var{this} expression. The execution ends in an
+when \name{Eclipse} analyzes the program and finds it necessary to pass an
+instance of the originating class as a parameter to the moved method. I.e. it
+wants to pass a \var{this} expression. The execution ends in an
\typewithref{java.lang}{IllegalArgumentException} in
\typewithref{org.eclipse.jdt.core.dom}{SimpleName} and its
\method{setIdentifier(String)} method. The simple name is attempted created in