Compare commits

...

8 Commits

Author SHA1 Message Date
lucic71 65efce75e2 Merge branch 'master' of tildegit.org:lucic71/dissertation into HEAD 2022-12-17 15:09:50 +02:00
lucic71 7ec3c40e1a finish first draft of intro 2022-12-17 15:08:34 +02:00
lucic71 067376cc06 add first part of intro 2022-12-17 14:03:19 +02:00
lucic71 92e4a5afe7 add initial draft of abstract 2022-12-17 12:39:05 +02:00
lucic71 7c1ec0f85a Add template for tsw paper 2022-12-16 17:08:54 +02:00
lucic71 06f16f4a12 Delete -x flag because tcc does not know about it
At some point I should delete other unused flags too.
2022-11-12 21:00:02 +02:00
lucic71 6712edc242 Add ideas about tcc being dependent on GNU 2022-11-12 20:45:16 +02:00
lucic71 c396cad786 Documented problems with tcc 2022-11-12 20:38:05 +02:00
14 changed files with 5580 additions and 1 deletions

13
TSW/Makefile Normal file
View File

@ -0,0 +1,13 @@
PROJECT=main
build:
pdflatex ${PROJECT}.tex
bibtex ${PROJECT}
pdflatex ${PROJECT}.tex
pdflatex ${PROJECT}.tex
show:
evince ${PROJECT}.pdf
clean:
rm -f *.aux *.blg *.out *.bbl *.log *.toc *.pdf

15
TSW/abstract.tex Normal file
View File

@ -0,0 +1,15 @@
\begin{abstract}
\noindent
The ISO C Standard added the undefined behavior notion as a mean to
portability. State-of-the-art compilers such as gcc and LLVM use it to
issue aggressive optimizations that break the semantics of the code.
\todo{or the intention of the progammer} We argue that the performance
impact of undefined behavior based optimizations in operating systems,
such as OpenBSD, is low. Furthermore they introduce more unobservable
and undocumented effects than performance advantages. To test our
hypothesis we aim to disable the above mentioned \todo{should i write UB
based optimiatoins instead?} optimizations and test the generated system
against real-life testbenches \todo{too generic}.
\end{abstract}
\noindent\keywords{compiler, optimization, operating system}\\

53
TSW/bib.bib Normal file
View File

@ -0,0 +1,53 @@
@inproceedings{wang2012undefined,
title={Undefined behavior: what happened to my code?},
author={Wang, Xi and Chen, Haogang and Cheung, Alvin and Jia, Zhihao
and Zeldovich, Nickolai and Kaashoek, M Frans},
booktitle={Proceedings of the Asia-Pacific Workshop on Systems},
pages={1--7},
year={2012}
}
@misc{checks_2008,
title={CERT/CC Vulnerability note vu162289},
url={https://www.kb.cert.org/vuls/id/162289/},
journal={VU162289 - C compilers may silently discard some wraparound
checks},
year={2008},
month={Apr}
}
@inproceedings{wang2013towards,
title={Towards optimization-safe systems: Analyzing the impact of
undefined behavior},
author={Wang, Xi and Zeldovich, Nickolai and Kaashoek, M Frans and
Solar-Lezama, Armando},
booktitle={Proceedings of the Twenty-Fourth ACM Symposium on Operating
Systems Principles},
pages={260--275},
year={2013}
}
@article{lomont2003fast,
title={Fast inverse square root},
author={Lomont, Chris},
journal={Tech-315 nical Report},
volume={32},
year={2003}
}
@misc{google_2015, title={BORINGCC},
url={https://groups.google.com/g/boring-crypto/c/48qa1kWignU/m/o8GGp2K1DAAJ},
journal={Google}, publisher={Google}, year={2015}, month={Dec}}
@inproceedings{ertl2015every,
title={What every compiler writer should know about programmers or
Optimization based on undefined behaviour hurts performance},
author={Ertl, M Anton},
booktitle={Kolloquium Programmiersprachen und Grundlagen der
Programmierung (KPS 2015)},
year={2015}
}
@misc{regehr_2014, url={https://blog.regehr.org/archives/1180},
journal={Embedded in Academia}, author={Regehr, John}, year={2014},
month={Aug}}

215
TSW/bibliography.bib Normal file
View File

@ -0,0 +1,215 @@
@inproceedings{tanaka,
title={Approaches to making software porting more productive},
author={Tanaka, Toshikiyo and Hakuta, M and Iwata, N and Ohminami, M},
booktitle={Proceedings of the 12th TRON Project international Symposium},
pages={73--85},
year={1995},
organization={IEEE}
}
@article{hakuta,
title={A study of software portability evaluation},
author={Hakuta, Mitsuari and Ohminami, Masato},
journal={Journal of Systems and Software},
volume={38},
number={2},
pages={145--154},
year={1997},
publisher={Elsevier}
}
@inproceedings{kanai,
title={A cost model for software conversion based on program characteristics and a converter effect},
author={Kanai, A and Furuyama, T and Takahashi, M},
booktitle={1992 Proceedings. The Sixteenth Annual International Computer Software and Applications Conference},
pages={63--64},
year={1992},
organization={IEEE Computer Society}
}
@incollection{mooney2004developing,
title={Developing portable software},
author={Mooney, James D},
booktitle={Information Technology},
pages={55--84},
year={2004},
publisher={Springer}
}
@article{capretz,
title={Bringing the human factor to software engineering},
author={Capretz, Luiz Fernando},
journal={IEEE software},
volume={31},
number={2},
pages={104--104},
year={2014},
publisher={IEEE}
}
@article{ejiogu,
title={A simple measure of software complexity},
author={Ejiogu, Lem O},
journal={ACM SIGPLAN Notices},
volume={20},
number={3},
pages={16--31},
year={1985},
publisher={ACM New York, NY, USA}
}
@article{mooney1990strategies,
title={Strategies for supporting application portability},
author={Mooney, James D.},
journal={Computer},
volume={23},
number={11},
pages={59--70},
year={1990},
publisher={IEEE}
}
@inproceedings{cho2011case,
title={Case Study on Installing a Porting Process for Embedded Operating System in a Small Team},
author={Cho, DongSeok and Bae, DooHwan},
booktitle={2011 Fifth International Conference on Secure Software Integration and Reliability Improvement-Companion},
pages={19--25},
year={2011},
organization={IEEE}
}
@misc{porquet2015,
title={Porting {Linux} to a new processor architecture, part 1: The basics},
howpublished="\url{https://lwn.net/Articles/654783/}",
journal={[LWN.net]},
author={Porquet, Joël},
year={2015},
month={Aug}
}
@article{bodenstab1984unix,
title={The {UNIX} system: {UNIX} operating system porting experiences},
author={Bodenstab, DE and Houghton, Thomas F and Kelleman, Keith A and Ronkin, George and Schan, Edward P},
journal={AT\&T Bell Laboratories Technical Journal},
volume={63},
number={8},
pages={1769--1790},
year={1984},
publisher={Nokia Bell Labs}
}
@misc{osdevcrossport,
title={Cross-Porting Software},
howpublished="\url{https://wiki.osdev.org/Cross-Porting\_Software}",
journal={[osdev.org]},
year={2019},
month={Sept}
}
@article{jolitz1990porting,
title={Porting {UNIX} to the 386: A practical approach},
author={Jolitz, William Frederick and Jolitz, Lynne Greer},
journal={Dr. Dobb's Journal},
volume={16},
number={1},
pages={16--46},
year={1990},
publisher={CMP Media, Inc.}
}
@article{frakes1995sixteen,
title={Sixteen questions about software reuse},
author={Frakes, William B and Fox, Christopher J},
journal={Communications of the ACM},
volume={38},
number={6},
pages={75--ff},
year={1995},
publisher={ACM New York, NY, USA}
}
@article{tanenbaum1978guidelines,
title={Guidelines for software portability},
author={Tanenbaum, Andrew S and Klint, Paul and Bohm, Wim},
journal={Software: Practice and Experience},
volume={8},
number={6},
pages={681--698},
year={1978},
publisher={Wiley Online Library}
}
@article{johnson1978unix,
title={{UNIX} time-sharing system: Portability of C programs and the {UNIX} system},
author={Johnson, Steven C and Ritchie, Dennis M},
journal={The Bell System Technical Journal},
volume={57},
number={6},
pages={2021--2048},
year={1978},
publisher={Nokia Bell Labs}
}
@article{xia2017measuring,
title={Measuring program comprehension: A large-scale field study with professionals},
author={Xia, Xin and Bao, Lingfeng and Lo, David and Xing, Zhenchang and Hassan, Ahmed E and Li, Shanping},
journal={IEEE Transactions on Software Engineering},
volume={44},
number={10},
pages={951--976},
year={2017},
publisher={IEEE}
}
@article{morgan1994controlling,
title={Controlling software development costs},
author={Morgan, Malcolm J},
journal={Industrial Management \& Data Systems},
year={1994},
publisher={MCB UP Ltd}
}
@article{boehm2000software,
title={Software development cost estimation approaches—A survey},
author={Boehm, Barry and Abts, Chris and Chulani, Sunita},
journal={Annals of software engineering},
volume={10},
number={1},
pages={177--205},
year={2000},
publisher={Springer}
}
@article{walli1995posix,
title={The POSIX family of standards},
author={Walli, Stephen R},
journal={StandardView},
volume={3},
number={1},
pages={11--17},
year={1995},
publisher={ACM New York, NY, USA}
}
@inproceedings{spencer1992ifdef,
title={\# ifdef considered harmful, or portability experience with C News},
author={Spencer, Henry and Collyer, Geoff},
booktitle={USENIX Summer 1992 Technical Conference (USENIX Summer 1992 Technical Conference)},
year={1992}
}
@misc{callahanopenbsd,
title={I ported the new Hare compiler to {OpenBSD}},
howpublished="\url={https://briancallahan.net/blog/20220427.html}",
journal={[Brian Robert Callahan]},
author={Callahan, B. R.},
year={2022},
month={April}
}
@article{lyonunixportability,
author = {Lyon, Tom},
year = {1977},
month = {08},
title = {Inter-{UNIX} Portability}
}

BIN
TSW/images/image.jpeg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 127 KiB

64
TSW/intro.tex Normal file
View File

@ -0,0 +1,64 @@
\section{Context and motivation}
The C89 standard provides a loose definition of undefined
behavior. This permits compiler implementations to
abuse the definition and use it as a mean to agressive optimizations
that break the semantics of the code. Code that works with a lower level
of optimization is broken when the optimization level is elavated.
Furthermore code that works on previous versions of the compiler is
suddenly broken in newer versions because the standard imposes no
requirements on undefined behavior.
This has created serious security problems throughout the
years~\cite{wang2012undefined,checks_2008}. A number of initiatives to
solve this problem were started from different
parties~\cite{google_2015,regehr_2014,wang2013towards} however the
problem still persists. The primary open source developer groups have
seized the unsteady definition of undefined behavior to justify
dangerous slient code transformations that break the intention of the
programmer.
This philosophy is very dangerous in terms of programming expresivity
and intentionality. The compiler has very little context of what the
developer wants to accomplish with a specific piece of code. For example
the compiler cannot make the distinction between a read from an
uninitialized memory region that might produce unspecified results and a
read from a memory mapped device that cannot be written in order to
initialize it. Or it cannot distinguish between an erroneous floating
pointer access to an integer variable and a smart method of computing an
arithmetic function~\cite{lomont2003fast}. The general principle is that
the developer has the responsability to decide what the code should to,
the job of the compiler is to translate the code into machine readable
instructions and to apply optimizations only when there is no risk of
losing developer intentionality.
The argument of the people that defend this kind of optimizations is
that C code that contains undefined behavior has no meaning and the
compiler is free to do various types of modifications on it. In Control
Theory terms, such a system is described by a low degree of
controllability and observability. Which is paradoxical in the
philosophy described above where the compiler forcefully takes the
responsability, from the developer, of generating relevant code. The
implications of this are that no meaningful engineering can be done in
this framework where processes inside a compiler cannot be understood
and analyzed.
Another argument that defends the aggresive optimizations view is that
code generated by these compilers runs faster on artificial benchmarks.
\todo{do some more research here} This does not necessarly hold for real
life benchmarks that differ in complexity from the artificial ones and
that make use of non-trivial code constructs. Ertl~\cite{ertl2015every}
makes an interesting observation regarding the performance of undefined
behavior based optimizations. He notes that source level changes buy
greater speedup factors than UB based optimizations for certain classes
of programs. While his research in this field is valuable, the
limitation of his work is that he draws conclusions based on SPECint
benchmarks.
The contribution of this work is that we try to analyze the speedup
factors for real life programs, such as operating systems, in particular
OpenBSD. \todo{motivate the choice of openbsd} By doing this we want to
provide a tradeoff analysis between the performance gained using UB
based optimizations and the risks of issuing them.
This paper is structured as follows. \todo{later}

5090
TSW/johd.bst Normal file

File diff suppressed because it is too large Load Diff

36
TSW/johd.sty Normal file
View File

@ -0,0 +1,36 @@
\setlength{\paperwidth}{21cm} % A4
\setlength{\paperheight}{29.7cm}% A4
\setlength\topmargin{-0.5cm}
\setlength\oddsidemargin{0cm}
\setlength\textheight{24.7cm}
\setlength\textwidth{16.0cm}
\setlength\columnsep{0.6cm}
\newlength\titlebox
\setlength\titlebox{5cm}
\setlength\headheight{5pt}
\setlength\headsep{0pt}
\pagestyle{plain}
\usepackage{color}
\usepackage[natbibapa]{apacite}
\usepackage{xurl}
\usepackage[colorlinks,citecolor=blue,urlcolor=blue, linkcolor=blue, bookmarks=false,hypertexnames=true]{hyperref}
\usepackage{url}
%\usepackage{libertine}
\usepackage{float}
\usepackage{graphicx}
\usepackage{doi} % hyperlink URLs
\renewcommand{\doi}{DOI:~}
\newcommand\outauthor{
\begin{tabular}[t]{c}
\bf\@author
\end{tabular}}
%Add keyword command
\providecommand{\keywords}[1]
{\small\textbf{Keywords:} #1
}
\providecommand{\authorroles}[1]
{\small\textbf{Author roles:} #1
}

64
TSW/main.tex Normal file
View File

@ -0,0 +1,64 @@
\documentclass{article}
\usepackage[english]{babel}
\usepackage[utf8]{inputenc}
\usepackage{johd}
\usepackage{color}
\newcommand{\todo}[1]{}
\renewcommand{\todo}[1]{{\color{red} TODO: {#1}}}
\title{Semantic Preservation of Undefined Behavior Based Optimizations}
\author{Popescu Lucian-Ioan \\
\small CS Department, Politehnica Univesity of Bucharest,
Romania \\
\small lucian.popescu187@gmail.com
}
\date{}
\begin{document}
\maketitle
\input{abstract}
\input{intro}
\section{Dataset description}
Here you can provide, if applicable, information about the dataset(s) whose creation, collection, management, access, processing or analysis have been discussed in this paper, following this schema:
\paragraph{Object name} Typically the name of the file or file set in the repository.
\paragraph{Format names and versions} E.g., ASCII, CSV, Autocad, EPS, JPEG, Excel, SQL, etc.
\paragraph{Creation dates} The start and end dates of when the data was created (YYYY-MM-DD).
\paragraph{Dataset creators} Please list anyone who helped to create the dataset (who may or may not be an author of the data paper), including their roles (using and affiliations).
\paragraph{Language} Languages used in the dataset (i.e., for variable names etc.).
\paragraph{License} The open license under which the data has been deposited (e.g., CC0).
\paragraph{Repository name} The name of the repository to which the data is uploaded. E.g., Figshare, Dataverse, etc.
\paragraph{Publication date} If already known, the date in which the dataset was published in the repository (YYYY-MM-DD).
\section{Method}
Describe the methods used in the study.
\section{Results and discussion}
Describe and discuss the results of the study.
\section{Implications/Applications}
Provide information about the implications of this research and/or how it can be applied.
\section*{Acknowledgements}
Please add any relevant acknowledgements to anyone else that assisted with the project in which the data was created but did not work directly on the data itself.
\section*{Funding Statement}
If the research resulted from funded research please list the funder and grant number here.
\section*{Competing interests}
If any of the authors have any competing interests then these must be declared. If there are no competing interests to declare then the following statement should be present: The author(s) has/have no competing interests to declare.
\bibliographystyle{johd}
\bibliography{bib}
\section*{Supplementary Files (optional)}
Any supplementary/additional files that should link to the main publication must be listed, with a corresponding number, title and option description. Ideally the supplementary files are also cited in the main text.
Note: supplementary files will not be typeset so they must be provided in their final form. They will be assigned a DOI and linked to from the publication.
\end{document}

1
TSW/relwork.tex Normal file
View File

@ -0,0 +1 @@
http://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_2015_submission_29.pdf

View File

@ -0,0 +1,13 @@
# locore0.S
I should have expected problems with the compiler, but not so early. The
first problem that I encountered here is that for 64 bit mode intel
there were no .code32 and .code16 opcodes. I don't know why they didn't
add this support. Maybe because no-one tried to use the compiler for
such a task.
However, I patched the compiler and the .S file is ok with the opcode.
We will see on the generated output if the machine code is correct.
Until then I still have to patch the compiler because there is no popfl
opcode. I don't even know what it does, but the problem needs to be
solved.

15
Tech/tcc/gnu.md Normal file
View File

@ -0,0 +1,15 @@
# GNU
TCC tries to keep a lot of compatibility with GCC for whatever reason. I
wonder what is the reason for that? I don't think that the author was
dumb enough to think that if the project kisses the ass of GCC it will
stay popular and relevant.
Just grep GCC in /tinycc and you will see how much dependency has the
project on GNU. Even the problem I solved earlier with .code32 in 64 bit
mode had to do with GAS, the GNU assembler. The makefile is written in
GNU make dialect, in /tinycc/configure gcc is the default compiler, etc,
etc.
I'm starting to think that maybe tcc is not the best solution for what
I'm trying to do.

View File

@ -77,7 +77,7 @@ CWARNFLAGS+= -Wno-address-of-packed-member -Wno-constant-conversion \
DEBUG?= -g
COPTIMIZE?= -O2
CFLAGS= ${DEBUG} ${CWARNFLAGS} ${CMACHFLAGS} ${COPTIMIZE} ${COPTS} ${PIPE}
AFLAGS= -D_LOCORE -x assembler-with-cpp ${CWARNFLAGS} ${CMACHFLAGS}
AFLAGS= -D_LOCORE ${CWARNFLAGS} ${CMACHFLAGS}
LINKFLAGS= -T ld.script -X --warn-common -nopie
HOSTCC?= ${CC}