add recent work
This commit is contained in:
parent
f9c900526d
commit
a33673140b
|
@ -5,11 +5,9 @@
|
|||
This section presents UB optimizations in real-life software projects
|
||||
such as Linux and OpenBSD. After we presents such examples, we provide
|
||||
an analysis of the risks they introduce and current solutions that try
|
||||
to tackle the risks, but are however incomplete. Then we introduce the
|
||||
notion of programmer intentionality and present how it correlates with
|
||||
UB optimizations.
|
||||
|
||||
\subsection{Undefined Behavior Optimizations}
|
||||
to tackle the risks, but are however incomplete. The final part of this
|
||||
section present the situation of UB optimizations as seen by various
|
||||
acaedmics and C programmers in the field.
|
||||
|
||||
Wang et al.~\cite{wang2012undefined} compiled a list of UB optimizations
|
||||
that show the dangerous effects of using the UB definition when issuing
|
||||
|
@ -115,32 +113,65 @@ one, where the compiler implementations construct a robust definition of
|
|||
undefined behavior, even if the standard imposes no strong requirements.
|
||||
Until the standard makes it clear what approach it would take in the
|
||||
future, implementations and developers need to decide their approach
|
||||
based on the loose definition provided in the standard.
|
||||
based on the ambiguous definition provided in the standard.
|
||||
|
||||
Various academics and C programmers have complained throughout the years
|
||||
about this situation. Linus Torvalds~\cite{linusgcc} said in 2016 on the
|
||||
GCC mailing list (paranthesis mine):
|
||||
\begin{displayquote}
|
||||
The fact is, undefined compiler behavior is never a good idea. Not for
|
||||
serious projects.
|
||||
|
||||
Performance doesn't come from occasional small and odd
|
||||
micro-optimizations. I care about performance a lot, and I actually look
|
||||
at generated code and do profiling etc. None of those three options
|
||||
(-fno-strict-overflow, -fno-strict-aliasing and
|
||||
-fno-deletel-null-pointer-checks) have *ever* shown up as issues. But
|
||||
the incorrect code they generate? It has.
|
||||
\end{displayquote}
|
||||
|
||||
John Regehr wrote in one of his blog posts:
|
||||
\begin{displayquote}
|
||||
One suspects that the C standard body simply got used to throwing
|
||||
behaviors into the “undefined” bucket and got a little carried away.
|
||||
Actually, since the C99 standard lists 191 different kinds of undefined
|
||||
behavior, it’s fair to say they got a lot carried away.
|
||||
\end{displayquote}
|
||||
|
||||
DJ Bernstein, one of the most important voice in writing cryptography
|
||||
code, wrote:
|
||||
\begin{displayquote}
|
||||
Pretty much every real-world C
|
||||
program is "undefined" according to the C "standard", and new compiler
|
||||
"optimizations" often produce new security holes in the resulting object
|
||||
code, as illustrated by
|
||||
|
||||
https://lwn.net/Articles/342330/
|
||||
https://kb.isc.org/article/AA-01167
|
||||
|
||||
and many other examples. Crypto code isn't magically immune to this
|
||||
\end{displayquote}
|
||||
|
||||
Bernstein's complaint was heard by GCC developers and they tried to
|
||||
start an initiative for creating a boringcc dialect in GCC. However the
|
||||
efforts is stopped because human resources are missing.
|
||||
|
||||
Finally, Dennis Ritchie also commented about the dangerous effect of
|
||||
noalias, an early undefined behavior that was in the end dropped by
|
||||
the ANSI C standard:
|
||||
\begin{displayquote}
|
||||
`Noalias' is much more dangerous; the committee is planting timebombs
|
||||
that are sure to explode in people's faces. Assigning an ordinary
|
||||
pointer to a pointer to a `noalias' object is a license for the compiler
|
||||
to undertake aggressive optimizations that are completely legal by the
|
||||
committee's rules, but make hash of apparently safe programs. Again,
|
||||
the problem is most visible in the library; parameters declared `noalias
|
||||
type *' are especially problematical.
|
||||
\end{displayquote}
|
||||
|
||||
While noalias does not exist in current standards, the "timebomb"
|
||||
effects that Ritchie describes can be found in many other undefined
|
||||
behaviors as described in the examples provided in this section.
|
||||
|
||||
\todo{talk about the discussions in GCC mailing list}
|
||||
\todo{talk about what do you mean by the intention of the programmer}
|
||||
|
||||
\subsection{Programmer intentionality}
|
||||
|
||||
To issue UB optimizations the compiler is required to have knowledge of
|
||||
the programmer's intention in order to generate relevant code, i.e. code
|
||||
that is equivalent to the expectations of the programmer, not to the
|
||||
unsteady definition of undefined behavior presented in the standard.
|
||||
This is a complicated task because intention detection is a hard problem
|
||||
in psychology~\cite{gollwitzer1993goal}.
|
||||
|
||||
Given this problem, the safest thing the compiler can do in this case is
|
||||
not to reason about intentions in any way. Doing this, the risk
|
||||
of losing programmer intentionality is lost.
|
||||
|
||||
For code that is free of undefined behavior this problem is not relevant
|
||||
as the compiler is expected to generate code that preserves the
|
||||
intention of the programmer. Here, the compiler is free to do whatever
|
||||
code transformations that increase the performance of the system and
|
||||
that preserve the semantics of the code.
|
||||
|
||||
However, most real-life projects make use of non-trivial code constructs
|
||||
that trigger undefined behavior in order to help the programmer
|
||||
communicate various intentions~\cite{yodaiken2021iso,kell2017some}. Code
|
||||
transformations in this case introduce more unwanted consequences that
|
||||
expected results.
|
||||
|
|
|
@ -142,6 +142,13 @@ last visited \today},
|
|||
year={2020}
|
||||
}
|
||||
|
||||
@misc{linusgcc,
|
||||
title={Re: \[isocpp-parallel\] Proposal for new memory\_order\_consume
|
||||
definition},
|
||||
note={\url{https://gcc.gnu.org/legacy-ml/gcc/2016-02/msg00381.html},
|
||||
last visited \today},
|
||||
year={2016}
|
||||
}
|
||||
|
||||
@inproceedings{hathhorn2015defining,
|
||||
title={Defining the undefinedness of C},
|
||||
|
|
|
@ -7,6 +7,7 @@
|
|||
\usepackage{listings}
|
||||
\usepackage{url,hyperref}
|
||||
\usepackage{datetime}
|
||||
\usepackage{csquotes}
|
||||
|
||||
\newcommand{\todo}[1]{}
|
||||
% \renewcommand{\todo}[1]{{\color{red} TODO: {#1}}}
|
||||
|
|
Loading…
Reference in New Issue