add recent work

This commit is contained in:
lucic71 2023-01-18 12:31:45 +02:00
parent f9c900526d
commit a33673140b
3 changed files with 70 additions and 31 deletions

View File

@ -5,11 +5,9 @@
This section presents UB optimizations in real-life software projects
such as Linux and OpenBSD. After we presents such examples, we provide
an analysis of the risks they introduce and current solutions that try
to tackle the risks, but are however incomplete. Then we introduce the
notion of programmer intentionality and present how it correlates with
UB optimizations.
\subsection{Undefined Behavior Optimizations}
to tackle the risks, but are however incomplete. The final part of this
section present the situation of UB optimizations as seen by various
acaedmics and C programmers in the field.
Wang et al.~\cite{wang2012undefined} compiled a list of UB optimizations
that show the dangerous effects of using the UB definition when issuing
@ -115,32 +113,65 @@ one, where the compiler implementations construct a robust definition of
undefined behavior, even if the standard imposes no strong requirements.
Until the standard makes it clear what approach it would take in the
future, implementations and developers need to decide their approach
based on the loose definition provided in the standard.
based on the ambiguous definition provided in the standard.
Various academics and C programmers have complained throughout the years
about this situation. Linus Torvalds~\cite{linusgcc} said in 2016 on the
GCC mailing list (paranthesis mine):
\begin{displayquote}
The fact is, undefined compiler behavior is never a good idea. Not for
serious projects.
Performance doesn't come from occasional small and odd
micro-optimizations. I care about performance a lot, and I actually look
at generated code and do profiling etc. None of those three options
(-fno-strict-overflow, -fno-strict-aliasing and
-fno-deletel-null-pointer-checks) have *ever* shown up as issues. But
the incorrect code they generate? It has.
\end{displayquote}
John Regehr wrote in one of his blog posts:
\begin{displayquote}
One suspects that the C standard body simply got used to throwing
behaviors into the “undefined” bucket and got a little carried away.
Actually, since the C99 standard lists 191 different kinds of undefined
behavior, its fair to say they got a lot carried away.
\end{displayquote}
DJ Bernstein, one of the most important voice in writing cryptography
code, wrote:
\begin{displayquote}
Pretty much every real-world C
program is "undefined" according to the C "standard", and new compiler
"optimizations" often produce new security holes in the resulting object
code, as illustrated by
https://lwn.net/Articles/342330/
https://kb.isc.org/article/AA-01167
and many other examples. Crypto code isn't magically immune to this
\end{displayquote}
Bernstein's complaint was heard by GCC developers and they tried to
start an initiative for creating a boringcc dialect in GCC. However the
efforts is stopped because human resources are missing.
Finally, Dennis Ritchie also commented about the dangerous effect of
noalias, an early undefined behavior that was in the end dropped by
the ANSI C standard:
\begin{displayquote}
`Noalias' is much more dangerous; the committee is planting timebombs
that are sure to explode in people's faces. Assigning an ordinary
pointer to a pointer to a `noalias' object is a license for the compiler
to undertake aggressive optimizations that are completely legal by the
committee's rules, but make hash of apparently safe programs. Again,
the problem is most visible in the library; parameters declared `noalias
type *' are especially problematical.
\end{displayquote}
While noalias does not exist in current standards, the "timebomb"
effects that Ritchie describes can be found in many other undefined
behaviors as described in the examples provided in this section.
\todo{talk about the discussions in GCC mailing list}
\todo{talk about what do you mean by the intention of the programmer}
\subsection{Programmer intentionality}
To issue UB optimizations the compiler is required to have knowledge of
the programmer's intention in order to generate relevant code, i.e. code
that is equivalent to the expectations of the programmer, not to the
unsteady definition of undefined behavior presented in the standard.
This is a complicated task because intention detection is a hard problem
in psychology~\cite{gollwitzer1993goal}.
Given this problem, the safest thing the compiler can do in this case is
not to reason about intentions in any way. Doing this, the risk
of losing programmer intentionality is lost.
For code that is free of undefined behavior this problem is not relevant
as the compiler is expected to generate code that preserves the
intention of the programmer. Here, the compiler is free to do whatever
code transformations that increase the performance of the system and
that preserve the semantics of the code.
However, most real-life projects make use of non-trivial code constructs
that trigger undefined behavior in order to help the programmer
communicate various intentions~\cite{yodaiken2021iso,kell2017some}. Code
transformations in this case introduce more unwanted consequences that
expected results.

View File

@ -142,6 +142,13 @@ last visited \today},
year={2020}
}
@misc{linusgcc,
title={Re: \[isocpp-parallel\] Proposal for new memory\_order\_consume
definition},
note={\url{https://gcc.gnu.org/legacy-ml/gcc/2016-02/msg00381.html},
last visited \today},
year={2016}
}
@inproceedings{hathhorn2015defining,
title={Defining the undefinedness of C},

View File

@ -7,6 +7,7 @@
\usepackage{listings}
\usepackage{url,hyperref}
\usepackage{datetime}
\usepackage{csquotes}
\newcommand{\todo}[1]{}
% \renewcommand{\todo}[1]{{\color{red} TODO: {#1}}}