Browse Source

Merge branch 'trunk' of into trunk

Callum R. Renwick 3 months ago
  1. 1
  2. 0
  3. 0
  4. 0
  5. 0
  6. 0
  7. 0
  8. 0
  9. 0
  10. 0
  11. 0
  12. 0
  13. 0
  14. 0
  15. 0
  16. 0
  17. 0
  18. 0
  19. 0
  20. 0
  21. 0
  22. 0
  23. 0
  24. 0
  25. 0
  26. 0
  27. 0
  28. 0
  29. 0
  30. 0
  31. 0
  32. 0
  33. 0
  34. 0
  35. 0
  36. 0
  37. 0
  38. 0
  39. 0
  40. 8
  41. 0
  42. 0
  43. 0
  44. 0
  45. 0
  46. 0
  47. 0
  48. 0
  49. 0
  50. 0
  51. 0
  52. 0
  53. 0
  54. 0
  55. 0
  56. 0
  57. 0
  58. 0
  59. 0
  60. 0
  61. 0
  62. 0
  63. 0
  64. 0
  65. 0
  66. 0
  67. 6
  68. 104
  69. 49
  70. 0
  71. 9
  72. 47
  73. 20
  74. 46
  75. 29
  76. 67
  77. 21
  78. 20
  79. 54
  80. 38
  81. 90
  82. 15
  83. 43
  84. 50
  85. 55
  86. 41
  87. 25
  88. 11
  89. 82
  90. 49
  91. 7
  92. 3
  93. 25
  94. 65
  95. 75
  96. 59
  97. 71
  98. 62
  99. 3
  100. 69


@ -4,3 +4,4 @@

sem1/arch/ → yr1/sem1/arch/

sem1/arch/advanced/arithmetic_correctness.tex → yr1/sem1/arch/advanced/arithmetic_correctness.tex

sem1/arch/advanced/ → yr1/sem1/arch/advanced/

sem1/arch/advanced/ → yr1/sem1/arch/advanced/

sem1/arch/advanced/ → yr1/sem1/arch/advanced/

sem1/arch/ → yr1/sem1/arch/

sem1/arch/ → yr1/sem1/arch/

sem1/arch/ → yr1/sem1/arch/

sem1/arch/ → yr1/sem1/arch/

sem1/arch/ → yr1/sem1/arch/

sem1/arch/memory-2.tex → yr1/sem1/arch/memory-2.tex

sem1/arch/ → yr1/sem1/arch/

sem1/arch/mips-asm.tex → yr1/sem1/arch/mips-asm.tex

sem1/arch/ → yr1/sem1/arch/

sem1/arch/ → yr1/sem1/arch/

sem1/arch/ → yr1/sem1/arch/

sem1/arch/two_s_complement.tex → yr1/sem1/arch/two_s_complement.tex

sem1/comp/ → yr1/sem1/comp/

sem1/comp/ → yr1/sem1/comp/

sem1/comp/ → yr1/sem1/comp/

sem1/comp/ → yr1/sem1/comp/

sem1/comp/ → yr1/sem1/comp/

sem1/comp/info_retrieval.tex → yr1/sem1/comp/info_retrieval.tex

sem1/comp/ → yr1/sem1/comp/

sem1/comp/ → yr1/sem1/comp/

sem1/comp/ → yr1/sem1/comp/

sem1/comp/ → yr1/sem1/comp/

sem1/comp/ → yr1/sem1/comp/

sem1/comp/ → yr1/sem1/comp/

sem1/maths/big_o.tex → yr1/sem1/maths/big_o.tex

sem1/maths/combinatorics.tex → yr1/sem1/maths/combinatorics.tex

sem1/maths/functions.tex → yr1/sem1/maths/functions.tex

sem1/maths/graphs.tex → yr1/sem1/maths/graphs.tex

sem1/maths/matrices.tex → yr1/sem1/maths/matrices.tex

sem1/maths/matrices_and_relations.tex → yr1/sem1/maths/matrices_and_relations.tex

sem1/maths/matrix_multiplications.tex → yr1/sem1/maths/matrix_multiplications.tex

sem1/maths/powersets.tex → yr1/sem1/maths/powersets.tex

sem1/maths/probability.tex → yr1/sem1/maths/probability.tex

sem1/maths/ → yr1/sem1/maths/

@ -8,15 +8,15 @@ In reasoning algebra, capital Latin letters (*A*, *B*, *P*, *Q*, *etc.*) are use
Short propositions can be combined together to form longer ones. For example:
* Giant Redwoods are very tall **and** Asuka wears a red plugsuit
* Giant Redwoods are very tall **and** I am about 5 foot 9
* This note was written in raw HTML **or** I have never actually seen *Neon Genesis Evangelion*
* These notes were written in raw HTML **or** these notes are about reasoning
The phrases in **bold**, which connect the propositions together, are called *logical connectives*. There are other logical connectives:
* **If** Asuka wears the test plugsuit **then** she is embarrassed
* **If** you eat too much cake **then** you will get ill
* Asuka does **not** want to wear the test plugsuit
* I do **not** want come to your birthday party
Combining propositions with logical connectives forms a new proposition.

sem1/maths/relations.tex → yr1/sem1/maths/relations.tex

sem1/maths/strings.tex → yr1/sem1/maths/strings.tex

sem1/maths/turing_machines.tex → yr1/sem1/maths/turing_machines.tex

sem2/comp_found/automata.tex → yr1/sem2/comp_found/automata.tex

sem2/comp_found/complexity.tex → yr1/sem2/comp_found/complexity.tex

sem2/comp_found/context-free_grammars.tex → yr1/sem2/comp_found/context-free_grammars.tex

sem2/comp_found/grammars.tex → yr1/sem2/comp_found/grammars.tex

sem2/comp_found/parsing.tex → yr1/sem2/comp_found/parsing.tex

sem2/comp_found/pumping_lemma.tex → yr1/sem2/comp_found/pumping_lemma.tex

sem2/comp_found/push-down_automata.tex → yr1/sem2/comp_found/push-down_automata.tex

sem2/comp_found/sets.tex → yr1/sem2/comp_found/sets.tex

sem2/comp_found/turing_machines.tex → yr1/sem2/comp_found/turing_machines.tex

sem2/data_structs/ → yr1/sem2/data_structs/

sem2/data_structs/ → yr1/sem2/data_structs/

sem2/data_structs/ → yr1/sem2/data_structs/

sem2/data_structs/ → yr1/sem2/data_structs/

sem2/data_structs/ → yr1/sem2/data_structs/

sem2/data_structs/ → yr1/sem2/data_structs/

sem2/data_structs/ → yr1/sem2/data_structs/

sem2/data_structs/ → yr1/sem2/data_structs/

sem2/data_structs/ → yr1/sem2/data_structs/

sem2/oop/ → yr1/sem2/oop/

sem2/oop/ → yr1/sem2/oop/

sem2/oop/ → yr1/sem2/oop/

sem2/reqs_eng/ → yr1/sem2/reqs_eng/

sem2/reqs_eng/ → yr1/sem2/reqs_eng/

sem2/reqs_eng/ → yr1/sem2/reqs_eng/

@ -78,3 +78,9 @@ problems.
#### Wireframes
**Wireframes** are prototypes which provide an outline of the functional design of a system.
#### Higher-fidelity prototypes
Then follow higher-fidelity prototypes which show how more complex designs (colours and complex graphics) will
affect the final system, making it easier to check that design is consistent and will not impact functionality
(colours being too diffiuclt to see, for example).


@ -0,0 +1,104 @@
# Requirements
Requirements should be:
* Detailed
* Unambiguous
* Complete (that is, not missing anything)
* Understandable (clear)
* Testable (it must be possible to measure whether a requirement has been met)
* Necessary (don't include requirements which aren't required, obviously)
* Attainable (requirements should be possible to fulfill)
* Independent (each requirement should be distinct and separate from other requirements)
* Non-redundant (requirements should not cover things already covered by other requirements)
## Requirements Specifications
A **requirements specification** is a document which records the requirements for a system. In principle
these documents should always be **complete** (documenting *all* the requirements for a system) and
**consistent** (never self-contradictory), although in practice achieving this perfectly is impossible.
A requirements specification usually includes both user and system requirements (see below).
A requirements specification is *not a design document*, meaning that it should set out *what the system
should do* rather than *how it should do it*. That would overreach the scope of requirements.
## Kinds of Requirement
### Functional requirements
These are statements of services that the system should provide, how the system should react to particular
inputs and how the system should behave in particular situations.
### Non-functional requirements
Constraints on the services or functionality offered by the system: for example, timing constraints,
constraints on the development process, standards, *etc.* Non-functional requirments often apply to
the system *as a whole*, rather than individual features or services. They also often create other
requirements by existing (because some non-functional requirements may imply other functional or
non-functional requirements).
The difference between these types of requirement is that non-functional requirements are not needed
by the user directly, but are nevertheless required because they must be met for the system to work properly
or according to regulations.
There are two other kinds of requirement which overlap with functional and non-functional. They are:
### User requirements
Requirements aimed at the end-user. They *make the user the subject* and typically consist of statements in
*natural language*, plus *diagrams* of the services the system provides and its operational constraints.
### System requirements
Requirements aimed at the system designers and maintainers. They are contained within a *structured document*
which sets out *detailed descriptions* of the system's functions, services and operational constraints. These
requirements define what *will and will not be implemented* and so may be part of a contract between a client
and a contractor. They make the *system itself the subject*.
User requirements are likely to be more abstract than system requirements because the user only generally cares
about what functionality they have access to and not the full scope of the system's requirements.
System requirments are written for the stakeholders in the system's *development*, so they are designed to be
more precise, although they shouldn't specify implementation details (this goes beyond the scope of requirements).
They describe exactly what the system should do.
### Domain requirements
Domain requirements are a subtype of non-functional requirements which come with the **application domain** of the
software system. For example, a software system dealing with railway train bookings must follow any applicable
laws or regulations about rail travel, as well as codes of conduct or convention for the rail industry.
# Requirements Engineering
Requirements engineering is about:
* what software can do (that is, what problems software can solve)
* what software *does* do
* communicating these things to people
## Stages of Requirements Engineering
### Requirements Elicitation
Getting the requirements
### Requirement Specification
Writing down the requirements
### Requirement Validation
Checking that all **stakeholders** agree with the requirements
(**Stakeholders** are people who care about the software system.
They include the end users as well as the client org if
software is being developed for another organisation.)
# Stages of Software Design
1. Requirements: what must the software do
2. Design: how shall it do that
3. Implementation: how does it do that
4. Verification: does it, in fact, do that
5. Maintenance: does it still do that / can it now do this

sem2/reqs_eng/ → yr1/sem2/reqs_eng/

@ -24,7 +24,7 @@ different stakeholders are.
A **qualitative** analysis asks open-ended questions about general feelings and experiences
around the domain of the system to gather ideas about what the system should be like. These are
usually followed up by **qualitative** analyses, which use larger sample sizes and gather
usually followed up by **quantitative** analyses, which use larger sample sizes and gather
quantitative (that is, numerical) data in order to determine whether the ideas from
qualitative studies are widely supported. For example, in a qualitative analysis, a stakeholder
may suggest a certain kind of model for data; a quantitative analysis might follow this up by
@ -36,3 +36,50 @@ motivations and feelings that are behind the answers to questions; qualitiative
necessary to find these details out. So if a quantitative analysis reveals a trend that is
unexpected and for which the motivation is unclear, it can be followed up by a qualitative
analysis to determine what the motivation is that's driving the quantitatively derived data.
### Quantitative Analyses
* Characteristics of population (via sample)
* Few details on attitudes, behaviours and motivations
#### Techniques
* surveys, questionnaires
* mixture of open and closed, subjective and objective questions
##### Appendix: Considerations for surveys
* Asking sensitive questions (respondent may not want to answer)
* Accidentally discriminating by gathering irrelevant data and then making decisions based on it
* Reliability: multiple administrations of the same **quantitative instrument** (that is, survey or similar)
should give similar results
* Validity: questionnaire measures what it purports to measure
* Usability: is it easy to answer & easy for results to be analysed by the reader?
When designing a survey, first consider the purpose of the survey; only then write questions.
Avoid biased, leading and **double-barelled questions** (questions which ask for two responses but only provide
space for a single response).
It's generally better to put the easiest/least-threatening questions at the beginning of the survey and the
harder ones later.
### Qualitative Analyses
* New insights
* Understand experiences or situations
* Generate ideas and hypotheses
* Doesn't give a statistical picture of trends in the population
##### Techniques
* interviews
* focus groups
* observations
* ethnographies
(An **ethnography** is an in-depth study of how people act, usually undertaken as part of a study
of a culture. It is performed by placing researchers in positions where they can observe all the
normal activities of people in detail. In the context of software engineering, an ethnography might
place a researcher inside a company, who would observe all the actions taken by an employee and may
ask the employee questions about why they do things a certain way, for example.)

sem2/reqs_eng/ → yr1/sem2/reqs_eng/


@ -0,0 +1,9 @@
# Notes
Jose prefers contact over Teams rather than email (faster response). Indeed Teams is the expected way that communication about the module will occur, in both directions.
0 marks for lab work submissions which fail to compile or execute (check your solution builds and runs!). 0 marks also for lab submissions which are detected to be plagiarism.
More information on slides about lab submission policy, but the gist is that it goes pretty late.


@ -0,0 +1,47 @@
# Spring MVC Overview
Where `package` is the package of the Spring MVC app, it's conventional to place **Controller** classes in the package `package.controller`. Similarly, **Model** classes are placed by convention in the package `package.model`. However, these things are not necessary for the Controller and Model classes to be used. **views** are HTML-based templates using the language **JSP (Java Server Pages)**, and *not classes*.
To make a Controller class work, you need to use the `@Controller` annotation on the class. Model classes are used by other classes; they are used by reference from those classes.
## Controller class example structure
This main controller maps requests to the path `/greet` to the view `greet.jsp` (assuming the `suffix` for views is configured to `.jsp`).
public class MainController {
public String greetWorld(Model model) {
model.addAttribute("name", "World");
return "greet";
### Request Mappings
The most general **request mapping** is `@RequestMapping(url)`. This is effectively short for `@GetMapping(path = "greet", method = RequestMethod.GET)`.
For POST requests, we can use `@PostMapping(url)`.
Controller classes are made up of request mapping methods. These methods take information from the request made by the user (which is the user's input) and process it, using it to build a model which is used to communicate with a view.
The information that is taken from the request is passed as arguments to a call to the request mapping method by using the `@PathVariable` and `@RequestParam` annotations.
A `@PathVariable` annotation before a parameter declaration indicates that that parameter takes its value from the segment of the request mapping path marked with `{curly braces}` in the `@RequestMapping()` annotation.
A `@RequestMapping` annotation before a parameter declaration indicates that it takes its value from a request *parameter*.
## Model
Models are *abstract data representations* which serve as forms for data generated or retrived by Controllers to be transmitted to the views, which use that data to display output (see below).
## Views
Views are designed to manage the output as it is directly displayed to the user. It takes the data, which it places within the output given to the user, from a model.
In Spring MVC, views are created using Java Server Pages. These take the form of files which consist mostly of HTML, but which can *contain embedded Java statements and expressions*, which allows *data from Java classes to be embedded in output pages*. (The embedded Java is executed and converted to HTML before the page is served to the user. See also: Django template language; Jinja2.)
## JSP Standard Tag Library


@ -0,0 +1,20 @@
# Model-View-Controller
Under the **model-view-controller (MVC)** pattern:
* The view manages the output to the output method that the application uses (which may be drawing directly to a display, or sending out web pages)
## Advantages of MVC
* Simulataneous development: the model, view and controller parts of the system can be developed indepdendently
* High cohesion: logically related program sections are recorded together
* Loose coupling:
* Ease of modification
* Multiple views for a model
## Disadvantages of MVC
* Code navigatibility difficulties
* Mutli-artifact inconsistency
* Undermined by inevitable clustering
* Significant learning curve


@ -0,0 +1,46 @@
# Software Architecture and Patterns
## Difficulties in Software Engineering
Software engineering is a relatively young discipline. Despite this, most people have very high expectations of what can be achieved by software engineering. So software engineers must work very hard to produce highly complex but intangible (no direct physical representations apart from the ones we create) systems, systems which rely on other systems (for example, platforms and operating systems) to function at all (unlike physical machines like cars, which are isolated machines which require only a road, and that not necessarily). And software engineering must react quickly to changes in other areas of engineering. These are the things which make software engineering such a difficult discipline.
### Complexity
Software systems have multiple layers of **complexity**. They usually try to be able to perform many different tasks, so data structures and organisation must be sufficiently complex to allow every application. The more complex a system, the more difficult it is to design, build, test and maintain.
## The Importance of Design
An **architecture-centric** approach to software engineering emphasises the *design* part of the engineering process. This makes it easier to cope with highly complex systems: the system is carefully designed throughout the engineering process to fulfill its goals. One thing this allows for is the selection of **design patterns** which have been used before effectively. An architecture-centric approach makes it easier to select the best design patterns.
## Software Architecture vs. Building Architecture
There are many similarities between software engineering and civil construction engineering:
* Customers' needs must be satisfied
* Different engineers are employed in different specialised roles (for example, requirements engineer or project manager)
* Different stakeholders may have different opinions of the final product
* Plans and progress for system construction are reviewed at intermediate points in the engineering process
But those are fairly surface-level similarities. There are deeper parallels:
* The architecture of the constrution is linked to but distinct from the actual kind of building/(software) structure being constructed
* The properties of the construction are induced by the architecture
* The architect of the whole construction has a *distinctive role and character*
* The construction process is less important than the architecture being constructed
* Over time, the architecture of constructions has matured over time into a distinct discipline
There are of course limitations to this analogy:
* The science of building construction is well-studied and mature. The science of software construction, well...
* Software is fundamentally different (dependent on other systems, intangible, modifiable, needs to be "run" to work) from physical buildings
* Building on the last point: software is much more malleable than physical materials. This means it's easy to make an error which brings down the whole system in a way that would require a serious screw-up in civil engineering, but it's similarly easy to change the system to fix the error in a way that would be impossible in civil engineering.
## What is Architecture?
*"Architecture is a set of principal design decisions about a software system."*
Every system has an architecture, even if it's not an easily-recognised one. Every system has someone or some people who design the architecture. And architecture is *not a phase of development*: the architecting of a system is never 'finished' as such, but metamorphoses and develops throughout and with the development of a system.
## Architecture and Re-use
Re-using designs for software systems which have been used before is very useful because designs that have been effective before are highly likely to be effective again, and moreover, since this is well-known, using common design patterns will increase trust in your software system.


@ -0,0 +1,29 @@
# Software System Requirements
The things a software system need to do can be captured as a set of requirements in a **requirements specification**. These specifications are used to make decisions about the design and implementation of a system.
## Natural language specifications
**Natural language** specifications are written in a **natural language** such as English. This is easier for people to write and read in general than specifications written in **formal language** -- formal language requirements specifications are not popular because non-technical stakeholders often want to work with the requirements specification, and they can't be expected to learn the relevant formal language to do that.
## Problems with requirements
The problems with requirements in requirements specifications fall into three categories:
* Incompleteness: the requirement doesn't specify everything it needs to specify, leaving things unexplained or unmentioned
* Imprecision: the requirement doesn't provide a clear description of something (for example, describing something as "very small" instead of giving a measurement)
* Ambiguity: the requirement could be interpreted in different ways with different meanings
### Solving problems with requirements
There are different ways one could reduce the ambiguity of requirements. One is simply to improve the practice of writing requirements to reduce ambiguity. Another is to develop systems to catch and fix ambiguous requirements during proofreading. It's also possible to use *restricted subsets* of natural languages, to reduce the possible ambiguity.
## User Stories
**User stories** are a way of communicating requirements developed for use with the *Agile* methodology. They take the form of short descriptions of requirements by stakeholders, like this:
"As a user, I want to create an account so that my page history can be stored."
"As an administrator, I want to delete accounts so that duplicate accounts can be removed from the system."
They include the background of the user, the task they want to perform and (notably differently from traditional requirements specifications) the motivation of the user in question to perform the task. Their short length and the way they include user motivations make user stories good for communication with clients. However, they still suffer from the problem of traditional natural language requirements: being written in natural language makes them likely to be ambiguous or imprecise.


@ -0,0 +1,67 @@
# Software Lifecycle Theories and Software Development Strategies
It is important to consider that some models for the software lifecycle are better for some situations. The choice must be made in light of the context. Relevant factors include:
* Project size (no point wasting work on a tiny program)
* Criticality (how important is the produced software?)
* Expected level of variability in the requirements (the Waterfall model is not effective for fluid requirements)
The choice of lifecycle model has far-reaching effects in the produced software.
The relative cost to fix an error in the software development process increases exponentially as the current stage in the process becomes later. This means that the most important parts of software development for preventing serious errors are the early stages: requirements elicitation, design and prototyping.
## Waterfall Model
The **Waterfall Model** instructs that the software development process should be entirely linear. The process should start with requirements elicitation and proceed through analysis and design to implementation and testing. The idea is that changes to the requirements should not need to occur after the requirements elicitation phae, design should not need to occur after the design phase, *etc.* This makes the assumption at each stage that there were no mistakes at earlier stages, and also crucially assumes that the stakeholders' requirements will not change part-way through the engineering process.
The origin of the Waterfall Model was in an article written in 1970. The author did not actually think that it was perfect; he was aware that risks existed and did propose variations to reduce the risk of serious failure.
### Appraisal of the Waterfall Model
The Waterfall Model works well when:
* the defintion of what you want to achieve is clear
It does not work well when:
* clients don't know what they want
The model could be improved by allowing travel "up the waterfall" to change decisions made at earlier stages. This would still follow the model mostly, just allow a bit of back-tracking.
## Spiral Model
This is an incremental, risk-oriented lifecycle model. It was proposed in a 1988 article.
The Spiral Model suggests a **risk-driven process**. It observes the problems with what it terms a **code-driven process** -- a "code-and-fix" approach, which makes code messy and hampers communication with clients -- and a **document-driven process** -- which is unrealistic, since it is unlikely that a completely developed document will be produced at each stage. A risk-driven process decides how to proceed at each stage based on *what would reduce the risk of failure*.
The Sprial Model works by iterating on very similar processes. Each cycle, as the process proceeds from the early requirements elicitation phases to design, and then from design to implementation, there is a fresh risk analysis, which informs each stage. Many communicative processes such as prototypes are repeated each iteration also. I assume this is to help communicate with clients and probably also forms part of the risk analysis.
### What is risk?
Risk is brought about by high levels of uncertainty. The level of **exposure to risk** is effectively the product of the probability of a failure and the significance or cost of what would be lost if the failure occurred. This formula can be used to choose between two risks.
### Appraisal of the Sprial Model
The advantages of the spiral model are:
* risk reduction
* the ability to add functionality part-way through the project
* working software can be produced even early in the process
* risk analysis is hard to do -- you need *specific expertise* and good estimations
* entire process is dependent on risk analysis -- if there's not much risk then there's no point using the Spiral Model
* the development process is complex
### The Cone of Uncertainty
The level of uncertainty about every variable in a software project starts off very great but reduces exponentially as the project proceeds. This also means that risk tends to be lower in the later stages.
# Agile Methodologies
**Agile** methodologies take the idea of iteration to the extreme.
## Scrum process
The Scrum process starts with developing **user stories**, which are very simple statements of general requirements. Some user stories (which are now part of the **product backlog**) are chosen to be implemented during a 2-4 week long **sprint**, which is a period of implementation. Each sprint produces some actually working software, which is then shared with the stakeholders to see what they think of it. Requirements and design are revised after sharing the produced software with stakeholders again.


@ -0,0 +1,21 @@
# Cover Letters and Personal Statements
## Differences
The only real differences between a cover letter and a personal statement are the appearance and the length.
### Cover Letter
* Should be structured like a traditional formal letter
* One page rough size limit
* More scope to describe detail of previous experiences and motivations
* STARS technique (situation, task, action, results, *self-reflection*) to describe your experience
## Things to Include
* Motivations
* Capabilities
Don't just repeat your CV -- include more detail.


@ -0,0 +1,20 @@
# CV Tips
* Clear and concise -- no more than 2 sides of A4
* No long paragraphs
* Logical order (most recent first)
## Core Sections
* Personal details
* Qualifications
* Career goals/outline?
* Personal statement (short)
## Additional Sections
## CV Tailoring
CVs for different job role should be *different*. They should demonstrate enthusiasm for that particular role and skills and experience relevant to that particular role. You should clearly match your CV to the essential criteria given for a role.


@ -0,0 +1,54 @@
# Entity Relationship Diagrams
## Chen Model
One kind of ER diagram. Entities are represented by rectangles with their names in.
### Types of Entity
* **Weak entity**: double rectangle box. A weak entity is one that can't exist by itself.
* **Associative entity**: an entity used in many-to-many relationships. All its relations must be many-to-many.
### Attributes
Attributes are represented by ovals joined to the rectangles for entities by a line. The **key attribute** (the primary key) is identified by a *double underline under the attribute name*.
**Partial key attributes** are those attributes which form a primary key when combined with other attributes. They are represented by a dashed single underline.
**Multivalued attributes** are attribute names which can identify several values. For example, if each `Person` has several `name`s, then the attribute is multivalued. (An attribute that is not multi-valued is a **single-valued attribute**.)
**Derived attributes** are those attributes whose values are derived from the value of another or many other attributes of the same entity. For example, if `Product` has `VAT rate` and `Price excluding VAT`, `Price including VAT` is a derived attribute.
**Composite attributes** are those attributes which can be broken down into several other discrete attributes, each with a distinct semantic meaning. For example, if `Person` has an attribute `Address`, that attribute might contain `street name`, `city`, `county`, and so on. That would make `Person` a composite attribute. (An attribute that is not composite is a `simple attribute`.)
### Relationships
In the Chen notation, a relationship is represented by a rhombus containing the relationship's name. There are two kinds of relationship:
* **Strong relationship**: A relationship where Represented by a single-lined rhombus.
* **Weak (identifying) relationship**: Represented by a double-lined rhombus.
#### Optionality of Relationships
* **Optional relationship**
* **Mandatory relationship**
#### Cardinality of Relationships
* **One-to-one relationships**: only one instance of entity either side
* **One-to-many relationships**: one instance of one entity related to several instances of another entity. Where one side of the relationship is many, the line on that side ends with a fork into three lines, to indicate that many instances of that entity are involved in each relationship.
* **Many-to-one relationships**: many instances of one entity related to a single instance of another entity. (The same as one-to-many, just from a different perspective.)
* **Many-to-many relationships**: many instances of one entity with many instance of another entity.
Many-to-many relationships are impossible to represent directly in database management systems. They are represented as a pair of one-to-many relationships, typically. This requires the use of an entity to represent the relationship. For example, if, in a library system, one `User` can borrow many books, but one `Book` can be borrowed by many `User`s, it would be necessary to create another entity, `UserBook`, to represent the borrowing of a particular book by a particular user. One `User` can then be related to several `UserBook`s and one `Book` can then be related to several `UserBooks`.
#### Participation Constraints
* **Total participation**: every instance of an entity is involved in a relationship. For example, every `Student` has a many-to-one relationship with a `PersonalTutor`. Total participation is represented by a double line for the relationship.
* **Partial participation**: *not* every instance of an entity is involved in a relationship. For example, some `Student`s are in a many-to-one relationship with a `SpecialistMentor`, but not all! Some `Student`s have no `SpecialistMentor`.
## Class Diagram
One kind of ER diagram. Entities are boxes full of text instead of boxes connected to ovals.


@ -0,0 +1,38 @@
# Database Anomalies
## Update Anomaly
If the same information appears more than once in the database, then if one instance of that atom of information is altered, the rest of the instances of that atom must be updated, otherwise the database will *contain contradictions*.
## Insertion Anomaly
If new information becomes available, it may not be possible to enter the new data until new data for every field becomes available. (?)
## Deletion Anomaly
Losing information that was only stored in a single record but *that was not related to that record* by deleting the record.
All these anomalies occur because some fields are **functionally dependent** on one another. You can prevent anomalies by making sure information that should be stored in a different table is in fact in an independent table.
# Database Normalisation
**Database normalisation** is a method of organising the information in a database to make sure that every table is properly independent and there is no redundancy.
## First Normal Form
All fields should be **atomic** (contain exactly one piece of information *only*).
Remove columns that repeat -- move them into separate tables with a many-to-one relationship.
## Second Normal Form
First normal form *and*...
Each non-key attribute in a table must be **functionally dependent** on the whole primary key.
(You can skip this normalisation if there are no **composite primary keys** in the table, as it will be already achieved.
## Third Normal Form
Second normal form *and*...
Every non-key field in a table must be dependent on the primary key of that table, and *on no other field in that table*.


@ -0,0 +1,90 @@
# Accessing Multiple Tables with SQL
## Subqueries
A **subquery** is used to return data that will be used in the main query.
It is written as an SQL query inside another SQL query. (Query *Inception*.)
They look something like this:
SELECT columns
FROM table_name
WHERE value IN (SELECT other_columns
FROM another_table
WHERE value < 5);
Subqueries can go in several places in a main query, but in this module only subqueries in the WHERE clause will be covered.
### Correlated subqueries
For these, each subquery is executed once for every row of the outer query. (These will not be covered in detail in this module.)
## ANY keyword
To compare against every entry in a set of values, you can use the `ANY` keyword. For example:
`SELECT ... FROM ... WHERE some_value > ANY (SELECT ... FROM ... WHERE other_value > 10)`
In the example above, the `ANY` keyword is used in combination with a subquery.
## EXISTS operator
The **`EXISTS`** operator compares against every entry in a set of values, but in a different way to the ANY keyword. The `EXISTS` operator can check if a value *exists at all* within a set of values.
## Matching Columns
One way to make a query which spans more than one table is by looking at two tables and only SELECTing the output where the primary/foreign keys which relate the tables are equal. For example:
SELECT DISTINCT Viewing.PropertyNo, Street, City, ViewDate
FROM Viewing, PropertyForRent
WHERE ClientNo = 'CR56'
AND Viewing.PropertyNo == Viewing.PropertyForRent;
## Table Aliases
You can change the name that's used to refer to tables within a query, like so:
SELECT C.Fname, C.Lname, P.Street, V.ViewDate ...
FROM Viewing V, PropertyForRent R, Client C ...
Note that the creation of the alias actually occurs further into the query than the first reference to each alias.
## SQL Joins
### Inner Join
Returns all rows where there is a least one match in BOTH tables.
### Left Join
Returns all rows from the left table, and the matched rows from the right table.
### Right Join
Return all rows from the right table, and the matched rows from the left table.
### Full Join (*aka* Full Outer Join)
Returns all rows from both tables.
To indicate where an SQL join should go, you should use the keyword ON. For example:
### Natural Join
Like an **inner join**, but doesn't use the ON keyword, and only works where the associated tables have *one or more pairs of identically named columns*. As you may have guessed from the last sentence,
SELECT * FROM table1
ON table1.primaKey = table2.primaKey;


@ -0,0 +1,15 @@
## Components of SQL
* Data definition language (DDL): parts of SQL for creating a database and defining the data it stores and how it stores them. Relevant instructions: `CREATE`, `DROP`, `ALTER`
* Data manipulation language (DML): for accessing the data in a database, including reading from and writing to it. Relevant instructions: `SELECT`, `INSERT`, `UPDATE`
* Data control language (DCL): for administering the database, including controlling access to different parts of the database. Relevant instructions: `GRANT`, `DENY`, `USE`
## History of SQL
Created at IBM.


@ -0,0 +1,43 @@
# SQL Data Types, Schema Controls and Integrity Constraints
Multiple users may access a database *at the same time*. This means that one user might go to delete a record at the same time another user makes a read which would otherwise include that record. Which happens first? How do we prevent weird behaviour (for example, a half-deleted record)? We use **integrity constraints**, which are put in place using technology like locks.
## Data Definition Language Integrity Constraints
### `NOT NULL`
A column marked with `NOT NULL` insists that every record in a table has a value for that column.
This constraint makes a column, or a set of columns, the **primary key** of a table. A primary key consisting of a set of columns is calle a **composite primary key** (see relevant notes). The result of creating a primary key is that the database will refuse to perform any operation which breaks *primary key uniqueness*.
This constraint links two database tables using a **foreign key**. In SQL it looks like this:
someColumn INT NOT NULL,
FOREIGN KEY (SomeColumn) REFERENCES OtherTable(primaryKeyColumn);
### `UNIQUE`
Requiring that a column be *unique* means that there can be no two records in the table with the smae value for that column. (Primary keys are unique by default, but this keyword makes it possible to enforce that other columns are unique.)
## Update and Delete Cascades
A **cascade** is when a change to one database table carries through in its logical implications to other tables. For example, if `Sandwich`es contain `Filling`s, when a `Filling` is deleted, it might make sense to delete every `Sandwich` which contained that `Filling`, since those sandwiches probably can't be made any more.
In SQL this looks something like:
... FOREIGN KEY (keyName) REFERENCES (otherTable.primaryKey) ON DELETE CASCADE ...
It is also possible to `ON UPDATE CASCADE` to allow updates to cascade through.
## Scalar Functions
Scalar functions are used to convert and manipulate data values. They include `SUBSTRING`, `CONVERT` and `EXTRACT`.


@ -0,0 +1,50 @@
# SQL Views
An SQL **view** is a *virtual table*. They are constructed from **base tables**, which are the "real" tables created by `CREATE TABLE` statements. A view is effectively a query, the output of which is used to construct a virtual table which contains some relevant subset of the data in other tables. But rather than simply being the collated output of a query, the *content of the view changes dyanmically with the data in the base table(s)*. The views effectively do not contain any data; they only *reflect* the data of the underlying tables.
## Creating a view in SQL
CREATE VIEW view_name AS
SELECT fields
FROM tables
WHERE condition;
## Kinds of view
A view which restrics the *columns* which may be accessed is called a **vertical view**.
A view which restricts the *rows* which may be accessed is caled a **horizontal view**.
## Views from views
It's possible to create views from other views. Just create the view using a `SELECT` statement which selects from another view, like this:
SELECT fields
FROM view_name
WHERE condition;
## View Names with Spaces
You can put spaces in view names by surrounding the name with square brackets, like this:
CREATE VIEW [View With Spaces In Its Name] AS
## Updateable Views
Views can *sometimes* be updated, but only under specific conditions:
* You cannot update a view which is based on more than one table
* The view must include the primary key of the table based upon which the view has been created
* The view must not have any fields made up of aggregate functions
* Any views this view is based on must also be updateable
## How are views implemented?
Most DBMSs implement views using **view resolution**, which translates stored query into actual query on source table specified in the view. SQL re-writes the view references back to the underlying base tables.


@ -0,0 +1,55 @@
# CPU Virtualisation
## Contents
* What are the key functions of an OS with respect to processes
* What is a process lifecycle
* What happens during a context switch
The CPU has to share **system resources** appropriately between several concurrently-running **processes**. Because programs are written to work independently of one another, the CPU must provide each process the illusion that each resource is available for the *exclusive use* of each process. The CPU should also hide how resources are being used by one process from other processes, to prevent processes snooping on information that should be private. This is achieved through **CPU virtualisation** -- a false version of the CPU, based on how much processing power is available for a process to use, is presented to that process instead of the full capability of the CPU.
## Process API
All operating systems must provide ways to perform the following on processes:
* Create
* Destroy
* Wait (for a process to end)
* Suspend
* Resume
* Status check
The OS must also ensure that resource-sharing is efficient -- there's no use using CPU virtualisation if the virtual CPU ends up being incredibly slow.
*Note: suspend/resume gets very complicated*
## Process spawning
Programs are stored on disk. When the computer is commanded to execute a program, the program is *copied to memory* (because the secondary storage is much slower than the main memory). Modern operating systems only load just enough of the program to get it running -- if additional libraries are needed, they are loaded later.
The OS also allocates memory on the stack for the process.
## Process resource sharing
At any given time there are very many processes running on a single operating system. One CPU must be efficiently shared across many simultaneously running programs.
The illusion is given that processes are running non-stop but actually only one instruction can be executed by each CPU at a time.
Processes might be suspended because they are waiting for some resource before continuing execution: secondary storage, the network... In CPU terms, this takes an *eternity*! (CPUs can execute instructions *incredibly* quickly compared to the time it takes for the network or secondary storage to be accessed.) Other processes can run in the meantime. (There are other reasons that processes might be suspended.)
## Process lifecycle
When scheduled, processes go from the `READY` state to the `RUNNING` state. When the operating system decides to give another process a "turn on the CPU", it **deschedules** a process, putting it back in the `READY` state. If a process initiates some **I/O** interaction like trying to access secondary storage or the network, the process enters the `BLOCKED` state. The process stays in the state, **blocked** from running by whatever it's waiting for, until it completes its interaction with I/O. The time this will take is pretty impossible for the OS to predict, so the OS just leaves the process in `BLOCKED` until the transaction is complete, at which point the process is moved back to being `READY`.
### Other process states
* Initial (the process has just started and is being loaded)
* Final state (the process has finished but has not yet been completely deleted -- the process hangs around for just long enough to report on whether it succeeded or failed)
* Processes in the final state are also called "zombie" processes
## Context Switching
The OS must be able to rapidly **schedule** and deschedule processes. If a process is descheduled, when it is rescheduled it must be *as though it never stopped running*.
It is impractical to save the entire contents of the process's available memory when it is descheduled -- the process may have a lot of stuff in memory. And there is no need to save **cached data** -- cache is only there for speed, not for correctness. So the OS only saves the contents of **registers** while the process is running. (This is made faster by the fact that modern CPUs have hardware instructions for the specific purpose of dumping the information in the registers.)


@ -0,0 +1,41 @@
# Direct Execution
At any given time on a CPU, some code is executing. That code can either be code from the OS or user code (user programs). Since only one instruction can be executing on the CPU at any given time, it is not possible that OS and user code can be executing at the same time. This presents an interesting problem: when user instructions are running, the OS is effectively "asleep" -- it isn't functioning. So what if, for example, the user code puts the CPU into an infinite loop? *It would not be possible for the OS to regain control.* Consequently, *the CPU must provide hardware-level support for allowing the OS to gain control over the system.*
Such features include:
* Support for fast **context switching**
* Support for distinct modes of execution: a privileged **kernel mode** and a non-privileged **user mode**
* Support for a "**trap**" instruction to switch between kernel and user modes
* Suppport for **timers** and time-based interrupts of running code
# User vs Kernel Modes
In user mode, the CPU cannot *access or modify memory locations which contain OS code or data*, and *cannot talk directly to storage or network devices*. For this to happen, the CPU must switch into kernel mode.
Any time a user program needs to access a resource controlled by the OS, it needs to use a **system call** (see relevant notes in this directory). (*System call* is often abbreviated as **syscall**). The first instruction in a syscall is a trap instruction as described above. This instruction:
* pauses execution of the user program,
* saves the state of the CPU registers,
* switches the CPU into kernel mode,
* and starts running the OS code.
Note that this process is very similar to the process of a context switch.
Once the OS finishes servicing a syscall, it returns to the user program via a **return-from-trap** instruction, which switches from kernel to user mode and returns control of the CPU to the instruction immediately following the trap instruction which made the mode switch in the first place.
## Timer interrupts
We have seen that user code runs in a sort of **"sandbox"**, where it is safely unable to access important OS components. Observe also that we need to share CPUs between different processes, which may belong to different users.
One way to accomplish this is **cooperative scheduling** -- at the point where the user program makes a syscall, thereby handing control to the OS, the OS decides whether to continue running that process or context switch to a different process. But this doesn't work very well in many cases -- again, what if one process runs in an infinite loop without making any syscalls? More common is **uncooperative scheduling**, where the OS takes control when it wants, even if the user process has not made a syscall.
## Syscalls and security
In any syscall, the user passes at least one parameter; for example:
`f = open("myfile", "rw")`
Because the user code controls the passed parameter, this presents a potential security problem. What if the user, instead of passing `"myfile"`, passed `"crucial_system_file"`? The user might be able to exfiltrate information about the activities of other users on the system or change how the system functions, for example.
Unusual filenames can cause the operating system to behave strangely, usually due to obscure bugs in OS code. For example, several years ago, opening a file with the string `$MFT` in its name on Windows 8.1 caused Windows to crash.


@ -0,0 +1,25 @@
# Memory Virtualisation
One of the main functions of an operating system is to share **system resources** between different processes running on the same machine. **Memory virtualisation** refers to how the OS shares the main memory (RAM) between different processes on a machine.
## Memory and Context Switching
When a process has "control" of the CPU, it has control over *all* of the CPU. But when a process is using some of the main memory, it only has access to *a proportion* of the available RAM. This means that changing which processes have access to RAM is a much more involved task than changing which process is running on the CPU at any given time.
## Address Space
Each process has its own **address space**. This address space is *virtual* (not quite the real state of memory) but in theory each process has memory addresses in the interval [0..2^64-1] in a 64-bit computer, or [0..2^32-1] for a 32-bit computer.
Address space is divided into three parts: the **program code**, the **stack** and the **heap**.
### Allocation and Freeing
It's hard work for the OS if the OS works under the assumption that any process could be using any part of its allocated address space. So the OS provides **memory allocation** and **memory freeing** system calls for programs to useto let the OS know which parts of its address space it is using at a given time.
When using a **memory-managed language** -- a category which includes most programming languages, including Python, Java, JavaScript and C# -- the *language itself* makes allocation and freeing syscalls, so the end-programmer does not have to. On the other hand, if you're programming in C or C++, you will need to make those calls yourself sometimes (especially in C).
### Accessing Elements of Arrays
To make accessing elements of arrays easier, the data underlying an array is stored in *contiguous memory locations*. The address of the whole array is recorded as the address of the first element in the array. Then, if a program needs to access the *n*th element of the array, the element *n-1* memory locations after the address of "the array" (that is, the first element) is accessed. This makes accessing elements of arrays as fast as accessing a single memory location and doing an addition, which is pretty fast. This is an important part of why arrays are useful data structures. (If they weren't fast, they wouldn't be very useful.)


@ -0,0 +1,11 @@
# Operating Systems
An operating system is system software that manages computer hardware and software resources, and provides common services for computer programs. [Wikipedia]
Most operating systems are descended either from Bell Labs' UNIX or from Microsoft Windows.
Most of the content of this module is "relevant to both flavours".
# Processes
A **process** is a fundamental abstraction provided by operating systems. A process is a running instance of a program. Processes use **system resources**.


@ -0,0 +1,82 @@
# Scheduling
In order to share processor time between many processes, the OS must stop one process and start another, alloting only a little time to each process. But how does the OS decide which process should be started next after the previous one is stopped? This is the **scheduling algorithm** used by the OS.
Scheduling is a much-studied subject. There are *very* many possible algorithms, and which ones are appropriate depend on the context: what is known about the situation, what the "desirable behaviour" is.
The context discussed in these notes is the scheduling performed by a (relatively) small multi-user system, such as the University of Leicester's `xanthus`.
## Assumptions
First, a note: in the context of scheduling, processes are often called **jobs**. This arises the use of "job" to mean "particular computational task" in the context of traditional batch-processing machines.
Processes are assumed to have a definite **arrival time** -- when the process is started -- and to need a definite amount of CPU time before the necessary computation for them is complete. Sometimes, it is assume that the amount of CPU time that a process needs is known before it completes. In the examples in this module, that is the case. However, in modern multi-processor systems, the CPU time that a process will need is *not* known beforehand. (It is actually useful information to know, so large cloud computing providers often use statistical modelling (AI) to predict how long a process will take.)
## Metrics
We assume that the objectives of scheduling are **efficiency** and **fairness**: efficiency being the idea that jobs should be completed sooner rather than later -- as soon as possible, ideally -- and fairness being the idea that, where jobs are competing, no job should be rushed to the detriment of other jobs. Apart from being murky concepts, these are hard to measure, so we use the following **scheduling metrics**:
* **Completion time**: if `t_arrival` and `t_completion` are the (wall clock) times job `X` arrives and completes, respectively, then `t_completion` is the completion time for `X`. This is also called **turnaround time**.
* **Response time**: the time between the user instructing the system to perform a particular task and that task being begun. I would argue that this is harder to measure, because the level of responsiveness that is required is based on what the user perceives as being a fast response.
## Scheduling Algorithms
### FIFO
**FIFO (first in, first out)** describes scheduling techniques where the next process to be run is always the process with the earliest arrival time. This is very straightforward to implement and to understand, but creates a very large average turn-around time in general. It works okay if jobs are *short*, but if a very long job comes in, every later job will have to wait until the long job is finished before any of them can begin, which is not optimal. Imagine if you couldn't move your mouse because the computer was too busy working out a complicated sum. (This has probably actually happened to you at some point if you have used a very old computer...)
### An Alternative Scheduling Algorithm
If the very long job discussed in the previous paragraph were to be interrupted upon the arrival of the following short jobs, this would reduce the turnaround time of the short jobs hugely, and hence reduce the **average turnaround time** of jobs on the computer. Again, as discussed above, this would also produce better behaviour for an interactive machine by allowing simple jobs to complete even while longer ones continue in the background.
### STCF
**STCF (shortest time to completion first)** is a scheduling algorithm following the eponymous principle: the scheduler keeps track of how much CPU each job has until its completion, and at any given time, schedules the job with the shortest time to completion first. So a process runs until another process arrives which has a shorter time to completion, at which point the OS interrupts the running process and schedules the newly-arrived one. In the "one long process followed by many short processes" example discussed above, STCF can reduce the average turnaround time by as much as half.
#### Problem with practical implementation
It is rare that the scheduler knows exactly how much CPU time a process will need to complete (until the process completes). This makes the implementation of STCF in most real-world (more than theoretical) systems.
## Responsiveness
Responsiveness, the other scheduling metric described above, is not necessarily aligned with turnaround time. Consider that two jobs arrive at roughly the same time: one which requires 100s CPU time *and user input*, and another which requires only 50s CPU time and no user input. Under STCF, the 50s process would run first, meaning that it would take almost a minute from the user executing the program which created the 100s process before that process would ask for input! Clearly this isn't tenable in an interactive system.
## More scheduling algorithms
### Round-robin
**Round-robin** is a predictable scheduling algorithm. It picks the first job in its ready queue and runs it for a *fixed amount of time*; this amount of time is called the **time slice** or the **scheduling quantum**. This avoids some of the issues of STCF but stil isn't really viable for use in a regularly-used system. Following this algorithm gives much better *responsiveness* scores, but extends the average *turnaround time* a little.
### Mutli-level Feedback Queue (MLFQ)
This algorithm was developed (in its original form) by Fernando Corbato in 1962 for an OS called CTSS. Corbato went on to lead the development of MULTICS in 1964-7, a joint project between MIT and Bell Labs. (MULTICS was the inspiration for UNIX.) Corbato won the Turing award for his work on MUTLICS.
A variation of **MLFQ** is used in most Unix systems today, and also in Windows NT (that is to say, every major operating system).
Under MLFQ, jobs are categorised into (roughly) two sets:
* CPU-intensive jobs, which use a lot of CPU time
* IO-intensive jobs, which use relatively less CPU time
MLFQ **prioritises** IO-intensive jobs, in order to reduce the response time in human-computer interaction, and **de-prioritises** CPU-intensive jobs. As suggested above, however, jobs cannot always be neatly put into one of the above categories: jobs may alternate between the two sets, and some jobs will not fit neatly into either category. MLFQ looks at the properties of a running job and treats it accordingly.
MLFQ actually has several queues for scheduling. Each queue records jobs with a different priority level: there will be one queue for high priority, one for medium, *etc.* Every job in a queue has the same priority. Based on their behaviour, the scheduler *moves jobs between queues* when necessary.
The rules the scheduler uses to determine whether a job should be moved between queues are:
* Always pick a job from the highest-priority queue that isn't empty
* Jobs in the top-priority queue are run round-robin
* When a job arrives, it is placed at the highest priority queue
* Once a job uses up some fixed allotment of CPU time in one queue, its priority is reduced (so it is moved down to a lower-priority queue)
* Any job which has not completed is moved to the highest-priority queue once a fixed time period has elapsed since it started
In this way, the set of queues actually form what is effectively a queue of queues; jobs are waiting to be moved to the highest-priority queue.
#### Motivations for MLFQ
In the scenario where a short but IO-intensive process is started after a long-running CPU-intensive process has been running for a while, MLFQ acts pretty much the same as STCF would: the new job gets prioritised at the start, because by this time the long-running process has been demoted to one of the lowest-priority queues, and that should be enough to complete the job; once it's complete, the CPU returns to running the non-interactive CPU-intensive task.
The purpose of the fifth rule (the priority boost after a fixed time period) is manifold: firstly, to ensure that long-running jobs do eventually get a chance to have significant CPU time, even if short jobs are starting regularly; but also, to ensure that processes which have been running a long time do not appear unresponsive if they ask for IO when they have not done so previously.
There are many ways you can vary implementations of MLFQ: how many queues should there be? How much CPU time should a job have at one level before being demoted? (Many implementations make the scheduling quantum different for different priority queues; usually it is larger in lower-priority queues to help long-running processes get some significant CPU time. When should the priority of jobs be boosted? Variations also change how they implement the fourth rule (priority decay) based on whether a process is IO- or CPU-intensive.


@ -0,0 +1,49 @@
# Process Management
## List of main system calls
* `fork()`
* `wait()`
* `exec()`
* `exit()`
* `getpid()`
* `kill()`
You can access these system calls in other languages through interfaces specific to the language in question. For example, in Python, you can make system calls through the `os` built-in module.
## Process management
Every process has a uniquely identifying number, its **PID (process identifier)**.
### What happens when a process calls `fork()`
First, the process creates a new process which is an *almost identical copy of itself*. The new process has its own memory space and registers. The newly-created process starts running immediately after the `fork()` line in the program. The newly-created process is often called the **child process**; the original process is called the **parent process**.
These processes *are distinct* -- they have different PIDs. They are difficult to tell apart because they are running the same program. But they are possible to distinguish by using the value returned by `fork()`: `fork()` returns the PID of the *child* process.
#### Why fork processes?
Forking a program allows for several processes executing the same code to run in parallel, which can allow for faster processing on a multi-core machine. The concept of forking is also essential for programs which run other programs by design, like shells.
#### Shells and forking
Shells are effectively programs which call other programs. They do this using `fork()` calls. For example, if the user executes the command `ls` inside a command-line shell, the shell forks itself, then the fork calls `execv()` (see below) with the path to the `ls` executable as the first argument.
(If you execute `ls > file_list`, the shell first sets the output for `ls` to be directed to `file_list`, then calls `execv()` with `ls` as the first argument.)
### `exec()`
`exec()` is not an actual system call. Every platform for executing system calls provides several variations. A common variaton is `execv()`: this function takes as arguments
1. the path of an executable to execute and
2. a list of arguments to pass to the executable.
`execv()` *replaces the current process*: when it's called, the current running process executes immediately and a new process for the executable executed by the `execv()` call is created.
## Exiting, waiting and killing
`exit(arg)` ends the process passed as an argument and returns the argument (in C, in the return code of the function). (???)
`wait()` makes a parent process wait until its child process terminates.
`kill(PID, signal)` sends the signal `signal` to the process with PID `PID`. Assuming the process *catches* the signal, the process will perform an action appropriate to the signal in question, such as suspending, ending, *etc.*


@ -0,0 +1,7 @@
# Terminals
Originally a little screen with a keyboard (and before that, a teletype!).
# Shells
The program that monitors and responds to the terminal's input is known as a shell. Actually, any core user interface is a **shell** -- for example, the default user interface on Windows, or GNOME on Linux, are both shells. Command-line shells usually offer much greater flexibility and control than "graphical" (GUI) shells, but GUI shells are easier to use for new users, and make tasks which are simple and novel much easier than they would be on command-line shells.


@ -0,0 +1,3 @@
# The notion of the user
If any computer user could perform any system call on any process, it could leave the OS in a very chaotic state. For this reason, operating systems use the concept of "users" -- different levels of user have different permissions to affect processes running on the machine.


@ -0,0 +1,25 @@
# Kinds of Dependency
Most task dependencies take the form of one task which **depends** on another: until the depended-upon task is complete, the depending task cannot be begun. For example, building the walls of a building depends upon first having the foundations in place. These are called **finish-start dependencies**.
## Finish-start dependency
One task must not *start* until another task has *finished*.
But not all dependencies follow these rules. Consider growing plants. In the process of growing vegetable plants for eating, you have to water the plants to keep them healthy. This must happen *before* the plants are harvested, but that does not mean that "harvesting the plants" depends on "watering the plants" in the traditional way: rather, it is necessary that the plants *continue to be watered without cease* until "harvesting the plants" is complete. This is called a **finish-to-finish dependency**.
## Finish-to-finish Dependency
One task must not *finish* until another task has *finished*.
There are yet more kinds of dependency:
## Start-to-start Dependency
One task must not *start* until another task has *started*. For example, you can't *start* painting a building wall until you've at least *started* putting up scaffolding.
# Lags
Sometimes there is some necessary waiting time between two tasks. For example, if you paint a wall with one coat of paint, it is necessary to wait for that coat to dry before adding the second coat. This is called a **lag** and it is implemented in a Gantt chart as, well, a gap.
Note that there is a fourth kind of dependency, the start-to-finish dependency -- one task cannot finish until another has started -- but it's very rare.


@ -0,0 +1,65 @@
# Activity 1.1
* Census of people to determine how many kits you need
* Local distribution centers to ensure fair distribution of kits across local areas
# What is a project? (1.2)
Examples: third year project, software engineering project this year, looking for an industrial placement for intercalated year
# Activity 1.3
* Planning
* Timing
# Activity 1.4
## Is a project
* HS2
* Preparing to help your child start university
* Decorating a room
* Planning a road trip across the USA
## Is not a project
* Teaching this module
* Parenting a child
* Visiting a friend in hospital
* Cooking a Sunday roast
* Keeping your house tidy
* Buying an new pencilcase
## Differences
* Projects are things you only do once, nonprojects can be done over and over again YY
* Projects require very long-term planning
* Projects have a deadline ~~ Projects occur over a fixed period of time
* Projects require gathering resources ahead of time
* Projects consist of several sub-tasks, which usually must be completed in a particular order
### Missed
* Projects have an end goal
* Projects have to be of a certain size
# Summary: Features of a project
* Unique
* Delivers some sort of end goal
* Large-scale
* Permanently achieves something -- not temporary or in need or repetition
## Note: Organisational terminology
* Anything that would be a project but is too small is an **activity**.
* Anything that is not a project for other reasons is part of **operations**.
# What is grading for?
In my opinion grading reinforces capitalist systems of oppression by creating the illusion of fairness and "justice" in a stratified or class society wherein some have much greater access to resources.
# Grading in CO2201
We will *grade our own work* and then ask the module convener (I assume) to approve our grades. (It seems to be implied that the module convener can reject a student's grade if they don't agree with it.)


@ -0,0 +1,75 @@
# Triple Constraints for Project Management
A project aims for four targets:
* (Low) cost
* (Short) time
* Scope
* Quality
This is the triple constraint system. (Yes there are four of them. No I don't like this module at the moment either.)
## Activity 3.2: Which of these did Tottenham Hotspurs' Stadium not meet?
I think it's cost because apparently it cost loads to produce
## Cost
The total money spent on a project. Including e.g. salaries, consultant costs, materials, office space, software licences...
**Budget** is the amount of money available to complete a project at the beginning.
While planning the project, one can **estimate** the cost of the project to see if it's possible to complete within budget.
## Time
**Elapsed time** is the amount of time spent working on a project.
A **schedule** is the plan for when a project will be completed. If the completion date is later than the scheduled completion date, the project is **late**.
## Scope
The size or oomph that is actually being built. A **specification** specifies the **scope** of a project. For example, for a project of writing a book, the scope might be defined as the number of chapters.
## Quality
**Quality** is whether or not the outcome of the project has the ability to *perform satisfactorily* and/or is *suitable for its intended purpose*.
As an example, for software projects, **software quality** can be measured as the number (or rather the absence) of bugs and the ease of use of the software.
It can be quite difficult to measure the quality of a project.
## Activity 3.3
Constraint | Target | Outcome | Success?
Time | Open by August 2018 | April 2019 | No
Cost | £400M | £1B | No
Scope | Capacity of >60000 | Capacity of 62,062 | Yes
Quality | Premier League Standards | It's very very good | Yes
# Settings targets
* Time constraints are defined as a deadline
* Cost constraints are defined as a budget
* Scope constraints are defined in a specification
* Quality constraints are also defined in a specification
# Project Management is Balancing Constraints
You have to balance the four constraints.
# Success after change of scope
Imagine you have to dig a 1 metre-deep hole in 2 days as a project. (Normally it would take 1.5 days.)
This is completeable successfully.
Now imagine that the scope is changed: the hole must be 2 metres deep. Can the project still be a success?
## You can't change just one constraint
If you want to try to still dig the hole in 2 days, you would need to have the digger work longer hours, but this would increase the cost. Or you could use a drill instead of a shovel, but again, that would cost more. Or you could move the deadline, but that would increase the elapsed time.
You can't change the scope without changing the time and cost also. Otherwise you will be forced to have the quality suffer.


@ -0,0 +1,59 @@
# Phases
Humans like to break time down into semantic chunks to make time management easier. This is done with projects too. Projects tend to proceed along the following phases:
**Initiation => Planning => Execution => Closure**
## Activity 4.2
Destroy personal data in **Closure** phase
Regularly produce reports on whether the project is on schedule in the **Execution** phase
Decide whether the project should go ahead in **Initiation** phase
## Inititation
"Take something from being an idea or opportunity to being ready to start planning the details."
* Find key stakeholders
* Define purpose and initial constraints
* "Project Initiation Document"
* A decision on whether to go ahead with the project (!)
## Planning
"Create detailed plans for how the project will be successfully completed."
* A lot of documentation (!)
* Project schedule
* Detailed specification of scope
## Execution
"Where the working on tasks to complete the project happens."
* The project team work on tasks.
* Monitor and report progress on tasks
* Make (short-term) decisions to make sure the project succeeds
* Completed work (a stadium, software, *etc.*)
## Closure
"Ensure everything that needs to happen for the project to be complete are done. Create final reports."
* Final reports on project success (according to the **triple constraints**, see appropriate notes)
* Payment or bonuses for project team if the project was successful


@ -0,0 +1,71 @@
# Plan-Driven Project Management
In plan-driven project management, a plan is created and then followed.
It is important to understand that, even if your plan is excellent, it will usually be wrong in many ways. "No plan survives contact with the enemy." So every plan is really just an *educated guess*. It's the project manager's job to try and make reality match the plan, and adapt to changes.
(**Agile** project management does not have a plan. This kind of project management will be discussed later.)
# Schedules should...
* Describe what tasks should be happening now or later
* Provide an estimate of the project end date
* Is helpful for deciding whether a project is feasible
* Provides a baseline against which it's possible to assess whether one is on target
* Helps calculate the expected cost of the project
Schedules do not:
* Provide an accurate description of what will happen
# Creating a project schedule
1. Create a **Work Breakdown Structure (WBS)**
2. Estimate the length of tasks in the WBS
3. Identify dependencies between tasks in the WBS
4. Create a **Gantt chart** using this data
## White Kitchen Items
Kitchen surface
## Work Breakdown Structure
A **hierarchical** decomposition of tasks involved in completing a project.
The process of 'breaking down' continues until the tasks are "small enough" -- clearly this is a subjective measure.
### WBS terminology
Everything in a WBS is a task.
The top level (no decomposition) is referred to as Level 1. Subsequent levels are referred to as Levels 2, 3, etc. (one, two, etc. levels of decomposition).
### Estimation
We can't know how long a project will take. But we need some idea. So we **estimate** it. An estimate is a rough prediction about how long a task will take.
Estimation is important because the biggest cost on a project (especially software projects) is usually *paying people for their time*.
#### Bottom-up estimation
Smaller tasks are more familiar and easier to estimate the length of time for. This is the foundational idea of **bottom-up estimation**. It works like this:
1. Estimate the times for the tasks at the bottom of the WBS
2. Add up those times to get an estimate of the time for longer tasks
#### Top-down estimation
There is not always time to produce a full WBS just to estimate the *ballpark* cost of a project. Top-down estimation starts with the time for the whole project and then splits up into smaller tasks as necessary. This is easier to do earlier in the project before a lot of analysis has occurred but *requires experience of similar projects* for the estimate to be anything close to accurate.
#### Activity 5.4
A: 40 minutes
B: 40 minutes
C: 30 minutes
D: 10 minutes
E: 1 minute


@ -0,0 +1,62 @@
# Self-Assessement
## Kinds of self-assessment
### Diagnostic: "self-assessment"
Provide feedback to identify ways to improve
Honest, but highlighting *weaknesses*
### Judgemental: "self-evaluation"
To convince someone else of your abilities/achievements
Honest, but highlighting *positives*
(Example: interview, promotion application)
## Kinds of assessment quiz
Health questionnaire for GP registration: diagnostic
Job application: judgemental
Teacher asks "how well do you think you are doing this year?" (casual context): diagnostic
### Quantitative
Easy to compare, rank, sort, average
### Qualitative
Harder to rank but can give more important information like reasoning and crucial details
## How to self-assess
1. Understand whether self-assessment is diagnostic or judgemental
2. Gather information
3. Create an honest, accurate assessment using qualitative methods
4. Support this with evidence (especially for judgemental assessments)
### Useful information for step 2
* Criteria/benchmarks
* Feedback (from experts or peers)
* Data (about you and others)
* Exemplars
Note: be careful about judging yourself negatively against others
### Activity 10.3: Examples of useful information for self assessing programming skill
* Criteria: ability to create different kinds of program and what platforms you can use for that, how long such projects take
* Feedback: feedback from peers, teachers
* Data: statistical data from training tools like CodeWars, maybe statistics from previous work such as programming assignments
* Exemplars: well-wrriten pre-existing programs
## Evidence
Self-assessment is stronger with evidence.


@ -0,0 +1,3 @@
# Source Control
## Activity


@ -0,0 +1,69 @@
# Task Allocation
## Prediction 8.1: Who will do each task?
Allocate Julia (Author) and Axel (Illustrator) to this project
Prefer allocation 2 because:
* More even distribution of work
* Julia shouldn't proofread her own work
* Julia should publish because she's the primary author
Crucially, allocation 2 has better **resource utilisation** because in allocation 1, there are times when some people are doing no work.
Resource utilisation is how *busy* a resource is during a project. We aim to have all resources, including people, at as close to 100% utilisation as possible throughout the project.
## Ways to fix **over-utilisation**
### Rescheduling
You can delay tasks in a project until a time when the person who should do the task is available. This can actually speed up the project if the time when the person who should do the task was going to be free, but if it moves a task back too far, this can lengthen the time required for the project.
### Re-allocation
You can change to whom a task is allocated. If the person who will now being doing a task is less skilled at the task, this could reduce the *qulality* of the project, but it may also save on *time* if it frees someone up to work on a different task.
## Activit 8.2: Fix the schedule
> A developer is over-utilised during week 43. Imagine that the software tester an do some software development, but not very well. For each of the following fixes to the schedule above, identify which of the triple constraints will be most affected (cost, quality, or time).
Possible Solution | Answer | Feedback
Allocate "create database" to tester | Quality | Correct
Move "create database" to after "write web interface" | Time | Correct
Hire a database specialist to create the database | Cost | Correct
Note: it's possible to allocate someone to more than one task at once if they don't spend all their time doing that task. For example, one developer could spend 70% of their time doing one task and 30% doing another. The difficulty with this is that it can be inefficient and hard to monitor.
## Project Costs
There are two types of project costs: **direct** costs and **indirect costs**. On most projects, the biggest direct cost is *worker's salaries*. Other direct costs are the equipment and materials required for the project. Indirect costs are not really covered in this module.
To calculate the total cost of a project, you simply have to multiply the cost per hour by the amount of time that eah resource (usually a person in this case) will be needed for.
It is possible to allocate more than one person to a task. This is done to reduce the time taken to do the task so the project finishes sooner. However, this may increase the cost.
## Activity 8.3: How has adding multiple workers affected...
### Cost
~~The cost has increased because now another developer's time must be paid for~~
Actually the cost is the same (assuming both developers have the same salary): although another developer must be paid, the time for which each developer needs to be paid has decreased
### Duration
Duration has been reduced
### Scope
The scope remains the same
### Quality
The quality *should* remain the same, assuming the developers can work well together
## Scheduling your final-year project
Although you are technically the only worker, there are tasks you may need to include on a Gantt chart for other people, such as your supervisor.