dissertation/TSW/intro.tex

\section{Context and motivation}

The C89 standard provides a loose definition of undefined
behavior. This permits compiler implementations to
abuse the definition and use it as a mean to agressive optimizations
that break the semantics of the code. Code that works with a lower level
of optimization is broken when the optimization level is elavated.
Furthermore code that works on previous versions of the compiler is
suddenly broken in newer versions because the standard imposes no
requirements on undefined behavior.

This has created serious security problems throughout the
years~\cite{wang2012undefined,checks_2008}. A number of initiatives to
solve this problem were started from different
parties~\cite{google_2015,regehr_2014,wang2013towards} however the
problem still persists. The primary open source developer groups have
seized the unsteady definition of undefined behavior to justify
dangerous slient code transformations that break the intention of the
programmer.

This philosophy is very dangerous in terms of programming expresivity
and intentionality. The compiler has very little context of what the
developer wants to accomplish with a specific piece of code. For example
the compiler cannot make the distinction between a read from an
uninitialized memory region that might produce unspecified results and a
read from a memory mapped device that cannot be written in order to
initialize it. Or it cannot distinguish between an erroneous floating
pointer access to an integer variable and a smart method of computing an
arithmetic function~\cite{lomont2003fast}. The general principle is that
the developer has the responsability to decide what the code should to,
the job of the compiler is to translate the code into machine readable
instructions and to apply optimizations only when there is no risk of
losing developer intentionality.

The argument of the people that defend this kind of optimizations is
that C code that contains undefined behavior has no meaning and the
compiler is free to do various types of modifications on it. In Control
Theory terms, such a system is described by a low degree of
controllability and observability. Which is paradoxical in the
philosophy described above where the compiler forcefully takes the
responsability, from the developer, of generating relevant code. The
implications of this are that no meaningful engineering can be done in
this framework where processes inside a compiler cannot be understood
and analyzed.

Another argument that defends the aggresive optimizations view is that
code generated by these compilers runs faster on artificial benchmarks.
\todo{do some more research here} This does not necessarly hold for real
life benchmarks that differ in complexity from the artificial ones and
that make use of non-trivial code constructs.  Ertl~\cite{ertl2015every}
makes an interesting observation regarding the performance of undefined
behavior based optimizations. He notes that source level changes buy
greater speedup factors than UB based optimizations for certain classes
of programs. While his research in this field is valuable, the
limitation of his work is that he draws conclusions based on SPECint
benchmarks.

The contribution of this work is that we try to analyze the speedup
factors for real life programs, such as operating systems, in particular
OpenBSD. \todo{motivate the choice of openbsd} By doing this we want to
provide a tradeoff analysis between the performance gained using UB
based optimizations and the risks of issuing them.

This paper is structured as follows. \todo{later}