65 lines
3.5 KiB
TeX
65 lines
3.5 KiB
TeX
\section{Context and motivation}
|
|
|
|
The C89 standard provides a loose definition of undefined
|
|
behavior. This permits compiler implementations to
|
|
abuse the definition and use it as a mean to agressive optimizations
|
|
that break the semantics of the code. Code that works with a lower level
|
|
of optimization is broken when the optimization level is elavated.
|
|
Furthermore code that works on previous versions of the compiler is
|
|
suddenly broken in newer versions because the standard imposes no
|
|
requirements on undefined behavior.
|
|
|
|
This has created serious security problems throughout the
|
|
years~\cite{wang2012undefined,checks_2008}. A number of initiatives to
|
|
solve this problem were started from different
|
|
parties~\cite{google_2015,regehr_2014,wang2013towards} however the
|
|
problem still persists. The primary open source developer groups have
|
|
seized the unsteady definition of undefined behavior to justify
|
|
dangerous slient code transformations that break the intention of the
|
|
programmer.
|
|
|
|
This philosophy is very dangerous in terms of programming expresivity
|
|
and intentionality. The compiler has very little context of what the
|
|
developer wants to accomplish with a specific piece of code. For example
|
|
the compiler cannot make the distinction between a read from an
|
|
uninitialized memory region that might produce unspecified results and a
|
|
read from a memory mapped device that cannot be written in order to
|
|
initialize it. Or it cannot distinguish between an erroneous floating
|
|
pointer access to an integer variable and a smart method of computing an
|
|
arithmetic function~\cite{lomont2003fast}. The general principle is that
|
|
the developer has the responsability to decide what the code should to,
|
|
the job of the compiler is to translate the code into machine readable
|
|
instructions and to apply optimizations only when there is no risk of
|
|
losing developer intentionality.
|
|
|
|
The argument of the people that defend this kind of optimizations is
|
|
that C code that contains undefined behavior has no meaning and the
|
|
compiler is free to do various types of modifications on it. In Control
|
|
Theory terms, such a system is described by a low degree of
|
|
controllability and observability. Which is paradoxical in the
|
|
philosophy described above where the compiler forcefully takes the
|
|
responsability, from the developer, of generating relevant code. The
|
|
implications of this are that no meaningful engineering can be done in
|
|
this framework where processes inside a compiler cannot be understood
|
|
and analyzed.
|
|
|
|
Another argument that defends the aggresive optimizations view is that
|
|
code generated by these compilers runs faster on artificial benchmarks.
|
|
\todo{do some more research here} This does not necessarly hold for real
|
|
life benchmarks that differ in complexity from the artificial ones and
|
|
that make use of non-trivial code constructs. Ertl~\cite{ertl2015every}
|
|
makes an interesting observation regarding the performance of undefined
|
|
behavior based optimizations. He notes that source level changes buy
|
|
greater speedup factors than UB based optimizations for certain classes
|
|
of programs. While his research in this field is valuable, the
|
|
limitation of his work is that he draws conclusions based on SPECint
|
|
benchmarks.
|
|
|
|
The contribution of this work is that we try to analyze the speedup
|
|
factors for real life programs, such as operating systems, in particular
|
|
OpenBSD. \todo{motivate the choice of openbsd} By doing this we want to
|
|
provide a tradeoff analysis between the performance gained using UB
|
|
based optimizations and the risks of issuing them.
|
|
|
|
This paper is structured as follows. \todo{later}
|