dissertation/TSW/backg.tex

78 lines
4.0 KiB
TeX

\section{Background}
%% https://dl.acm.org/doi/pdf/10.1145/2737924.2737979?casa_token=GNGhq36jtkYAAAAA:0WPAvpuTgdhHooeOdrS3gB8zPfCW4gf0HyEBWv6KJwea8IXpjW6Ja-YA7o7ZJeIg18QN7lGO01c_yQ
%% https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7163211&casa_token=CTb8zMXhG1sAAAAA:CFHcfgeuVw1M3xN0QlT4IA9PQndO_ZbxCj2p5niBJ_zFLtpk8rGTNvy39euK332OE-03Blcbkgtx6Q
This section presents cases of UB based optimizations present in popular
projects such as Linux or PostgreSQL and discusses social implications
of the above mentioned optimizations by analyzing bugs present in GCC
mailing list.
Wang et al.~\cite{wang2012undefined} compiled a list of UBBO that show
the dangerous effects of using the UB definition when issuing compiler
optimizations. They created case studies for the following classes of
UBs: division by zero, oversized shift, signed integer overflow,
out-of-bounds pointer, null pointer dereference, type-punned pointer
dereference and uninitialized read. The consequences of these
optimizations range from unexpected code
generation~\cite{chen2014,fermatub} to real-life
vulnerabilities~\cite{mitreub}.
To address these issues the research community created solutions that
tackle the problem from different angles. One approach was to introduce
new compiler improvements that would catch UB either at compile-time or
at run-time. However such endeavours could not provide the expected
results.
On one hand, generating reports for all UB at compile-time is
undecidable~\cite{hathhorn2015defining}. Moreover, generating such
reports is unuseful in specific cases. Listing~\ref{lst:uur}, for
example, could generate reports such as:
This is the case because the internal representation of the compiler may
not have enough context to report only the useful information about UB
and because the compiler cannot understand the intention of the
progammer when issuing an UB. In this context, to issue UB
optimizations is paradoxical. We don't have the context to find and
report UB, but we use them in order to generate code
transformations~\cite{lee2017taming}.
On the other hand, catching UB at run-time proves to be an incomplete
approach. The run-time checker would need to visit all the states of the
program in order to ensure that no UB is triggered. To catch all states
that may contain UB we need to run the checker for as long as it
requires, which may not be desirable in most cases because it may take
too much time. Checkers for this task are
IOC~\cite{dietz2015understanding}, UBsan~\cite{ubsan} and various
compiler flags such as GCC's -ftrapv and Clang's
-fcatch-undefined-behavior.
Another approach for run-time checking is to compare the unoptimized
code with the optimized code generated by the compiler. However program
equivalence is undecidable~\cite{sipser1996introduction}. Also,
decompilation might be used to compute the semantic distance between the
original C code and the decompiled optimized assembly code. Doing so we
could spot the introduced UB optimizations and delete them later.
However decompilation is a hard problem in
general~\cite{cifuentes1995decompilation} because of type erasure.
Besides the introduction of compiler improvements, another solution
would be to issue additions to the stan- dard that would provide more
robustness to the definition of undefined behavior. At the moment,
state-of-the-art compilers, such as GCC and LLVM, take a liberal view of
the standard and interpret it in a way that allows them to push various
dangerous optimizations. The oposite view is the constructivist one,
where the compiler implementations construct a robust definition of
undefined behavior, even if the standard imposes no strong re-
quirements. Until the standard makes it clear what approach it would
take in the future, implementations and developers need to decide their
approach based on the loose definition provided in the standard.
\begin{lstlisting}[language=C, caption=Code that may report false UB,
label={lst:uur}]
void foo(int *a, int b) {
*a = b;
}
\end{lstlisting}
\todo{talk about the discussions in gcc mailing list}