78 lines
4.0 KiB
TeX
78 lines
4.0 KiB
TeX
\section{Background}
|
|
%% https://dl.acm.org/doi/pdf/10.1145/2737924.2737979?casa_token=GNGhq36jtkYAAAAA:0WPAvpuTgdhHooeOdrS3gB8zPfCW4gf0HyEBWv6KJwea8IXpjW6Ja-YA7o7ZJeIg18QN7lGO01c_yQ
|
|
%% https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7163211&casa_token=CTb8zMXhG1sAAAAA:CFHcfgeuVw1M3xN0QlT4IA9PQndO_ZbxCj2p5niBJ_zFLtpk8rGTNvy39euK332OE-03Blcbkgtx6Q
|
|
|
|
This section presents cases of UB based optimizations present in popular
|
|
projects such as Linux or PostgreSQL and discusses social implications
|
|
of the above mentioned optimizations by analyzing bugs present in GCC
|
|
mailing list.
|
|
|
|
Wang et al.~\cite{wang2012undefined} compiled a list of UBBO that show
|
|
the dangerous effects of using the UB definition when issuing compiler
|
|
optimizations. They created case studies for the following classes of
|
|
UBs: division by zero, oversized shift, signed integer overflow,
|
|
out-of-bounds pointer, null pointer dereference, type-punned pointer
|
|
dereference and uninitialized read. The consequences of these
|
|
optimizations range from unexpected code
|
|
generation~\cite{chen2014,fermatub} to real-life
|
|
vulnerabilities~\cite{mitreub}.
|
|
|
|
To address these issues the research community created solutions that
|
|
tackle the problem from different angles. One approach was to introduce
|
|
new compiler improvements that would catch UB either at compile-time or
|
|
at run-time. However such endeavours could not provide the expected
|
|
results.
|
|
|
|
On one hand, generating reports for all UB at compile-time is
|
|
undecidable~\cite{hathhorn2015defining}. Moreover, generating such
|
|
reports is unuseful in specific cases. Listing~\ref{lst:uur}, for
|
|
example, could generate reports such as:
|
|
|
|
This is the case because the internal representation of the compiler may
|
|
not have enough context to report only the useful information about UB
|
|
and because the compiler cannot understand the intention of the
|
|
progammer when issuing an UB. In this context, to issue UB
|
|
optimizations is paradoxical. We don't have the context to find and
|
|
report UB, but we use them in order to generate code
|
|
transformations~\cite{lee2017taming}.
|
|
|
|
On the other hand, catching UB at run-time proves to be an incomplete
|
|
approach. The run-time checker would need to visit all the states of the
|
|
program in order to ensure that no UB is triggered. To catch all states
|
|
that may contain UB we need to run the checker for as long as it
|
|
requires, which may not be desirable in most cases because it may take
|
|
too much time. Checkers for this task are
|
|
IOC~\cite{dietz2015understanding}, UBsan~\cite{ubsan} and various
|
|
compiler flags such as GCC's -ftrapv and Clang's
|
|
-fcatch-undefined-behavior.
|
|
|
|
Another approach for run-time checking is to compare the unoptimized
|
|
code with the optimized code generated by the compiler. However program
|
|
equivalence is undecidable~\cite{sipser1996introduction}. Also,
|
|
decompilation might be used to compute the semantic distance between the
|
|
original C code and the decompiled optimized assembly code. Doing so we
|
|
could spot the introduced UB optimizations and delete them later.
|
|
However decompilation is a hard problem in
|
|
general~\cite{cifuentes1995decompilation} because of type erasure.
|
|
|
|
Besides the introduction of compiler improvements, another solution
|
|
would be to issue additions to the stan- dard that would provide more
|
|
robustness to the definition of undefined behavior. At the moment,
|
|
state-of-the-art compilers, such as GCC and LLVM, take a liberal view of
|
|
the standard and interpret it in a way that allows them to push various
|
|
dangerous optimizations. The oposite view is the constructivist one,
|
|
where the compiler implementations construct a robust definition of
|
|
undefined behavior, even if the standard imposes no strong re-
|
|
quirements. Until the standard makes it clear what approach it would
|
|
take in the future, implementations and developers need to decide their
|
|
approach based on the loose definition provided in the standard.
|
|
|
|
\begin{lstlisting}[language=C, caption=Code that may report false UB,
|
|
label={lst:uur}]
|
|
void foo(int *a, int b) {
|
|
*a = b;
|
|
}
|
|
\end{lstlisting}
|
|
|
|
\todo{talk about the discussions in gcc mailing list}
|