1. How to write a good report
Tony Field
With material from Simon Peyton Jones’s
“Research Skills”
(
http://research.microsoft.com/~simonpj/papers/giving-a-talk/gi
)
2. Lesson 101: Get your story right
Many projects contain good ideas, but do not distil what
they are.
Figure out what your story is
What’s the problem (motivation)?
What’s the key idea (your solution/approach)?
What’s the likely benefit/tradeoff?
What did you build (deliverable)? Did it work?
What did you conclude (insight)?
Is what you’ve produced usable/useful (utility)?
Hint: Sketch your story on a piece of paper using simple,
clear words. Be 100% explicit.
Remember that your project is an investigation - finding
out that your idea didn’t work is not the same as failing your
project
3. Example story sketch A
The Haskell GHC RTS has a garbage collector The problem
Real-time applications need real-time collectors
Classical real-time collectors rely on read
barriers, which are expensive
The IDEA
We can use GHC’s dynamic dispatch to build a
really cheap barrier by “hijacking” an object’s entry
code
We already pay up front for the dynamic dispatch; The benefit
the extra work to build the barrier is minimal
The payoff is that there is no barrier when the
collector is off, and only then when an object has
yet to be copied The
We’ve built an implementation in GHC vX. It works! deliverable
Via benchmarking, the barrier overhead is found to
The
be small – much smaller than existing schemes
In principle, the idea can be used in any conclusion
implementation based on dynamic dispatch, e.g.
the JVM The utility
4. Example story sketch B
The Haskell GHC RTS has a garbage collector The problem
Real-time applications need real-time collectors
Classical real-time collectors rely on read
barriers, which are expensive
The IDEA
We can use GHC’s dynamic dispatch to build a
really cheap barrier by “hijacking” an object’s entry
code
We already pay up front for the dynamic dispatch; The benefit
the extra work to build the barrier is minimal
The payoff is that there is no barrier when the
collector is off, and only then when an object has
yet to be copied The
We’ve built an implementation in GHC vX. It only deliverable
works with one or two simple examples
For those codes that work the overheads are The
enormous conclusion
More work is needed, but it looks like the idea
The utility
doesn’t pay. But there is an alternative...
5. Report Structure
Title
Abstract
Introduction
Background
The details
Evaluation/Discussion
Conclusions and further work
Bibliography
(Appendices)
6. The abstract
An extremely condensed summary of the
project, comprising a small number of
sentences
Make minimal references to the
background
Focus on the project content
Hint: write the abstract last
7. Example
Abstract
An technique is described for incorporating Baker's incremental
garbage collection algorithm into the run-time system of the Glasgow
Haskell Compiler (GHC) at very low cost. The technique exploits the
fact that GHC objects are always accessed by jumping to “entry code”
rather than by dereferencing the object explicitly. The idea is to hijack
the entry code pointer when an object is in the transient state of being
evacuated but not scavenged. An attempt to access the object in this
state causes it to “self-scavenge” transparently before executing its
normal entry code. This supports the normally expensive read barrier,
but at very low cost as the barrier is only active when the object is in the
transient state; in particular, there is no barrier overhead at all when the
collector is off. An implementation in v4.01 of GHC is described,
together with performance results for a subset of the “nofib” benchmark
suite. These experiments show that the Baker collector can be
implemented in GHC with very short program pause times and with
negligible overhead on execution time compared with stop-the-world
schemes.
8. Report Structure
Title
Abstract
Introduction
Background
The details
Evaluation/Discussion
Conclusions and further work
Bibliography
(Appendices)
9. The introduction
Now tell your story
Keep it simple and to the point
Avoid philosophical rambling
Avoid excessive background
Use examples to explain the problem and
your idea
State your contributions and where in the
report they can be found
10. Example introduction
Minimise the background
– cut to the chase
1. Introduction
A significant drawback of automatic garbage collection is the detrimental effect it can have on the responsiveness
of a program. While the system is collecting garbage it is unavailable to user programs for useful work, and this
can cause a program to appear `dead’ for extended periods of time. This problem is caused by the fact that
garbage collection algorithms typically collect all of the system‘s free store in one go, an operation that takes time
at least proportional to the number of live objects.
This “unbounded pause” problem has been exacerbated by the tendency towards larger memories and it has
discouraged the use of garbage-collected programming languages for many interactive or real-time applications. In
the former, for example, the acid test is the responsiveness of the system when performing intensive I/O, e.g. when
tracking a mouse or performing continuous real-time animation.
An algorithm that avoids this problem, at least in the context of copying collection schemes [5], is Baker‘s
incremental algorithm [3] (an excellent general overview is also given in [7]). Baker's scheme interleaves program
execution with small incremental copying phases, causing the program to pause periodically for short periods
rather than stopping completely to perform the copy in one go.
The major problem with the implementation of Baker‘s scheme, at least in software, is the very high cost
associated with the read barrier, which is a check required at each pointer load to ensure that the referenced
object has been copied if it needs to be.
In this report a technique is developed for supporting Baker’s algorithm at very low cost in systems that already
make use of dynamic dispatch. The idea is to use the dispatch mechanism to change the behaviour of an object
depending upon its state. Blah blah…
‘For example’s help to clarify the
problem. Where possible list code.
11. The contributions
The contributions drive the rest of the
report: the report substantiates the
claims you have made
The reader thinks “gosh, if they can
really deliver this, that’s be exciting; I’d
better read on”
12. No “rest of the report is...”
Not: “The rest of the report is structured as
follows. Chapter 2 presents the background.
Chapter 3 ... Finally, Chapter 8
concludes...”.
Instead: use forward references from the
narrative in the introduction.
13. Example contributions
A simple technique is presented that exploits GHC’s Make the
underlying dynamic-dispatch mechanism to support the read
barrier invariant (Chapter 2). The key advantage is that there
contributions
is no barrier overhead associated with object references when clear, e.g. by
either i. the garbage collector is off, or ii. the garbage collector
is on and the object has already been scavenged. a bulleted
The idea is extended to treat stack activation records in a list
similar way, so that stacks, too, can be collected
incrementally (Chapter 3).
A prototype implementation for the Glasgow Haskell
Compiler (GHC) is described in detail, including some of the
key design tradeoffs (Chapter 4).
Using benchmarks from the “nofib” suite, the average
overhead on program execution time is shown to be less than
4% for the benchmarks tested; sub-millisecond average
pause times are also achieved for all benchmarks (Chapter
5).
14. Report Structure
Title
Abstract
Introduction
Background
The details
Evaluation/Discussion
Conclusions and further work
Bibliography
(Appendices)
15. Background
Make it clear that you're aware of the
important key work in the subject
Provide enough detail to make the report
as self-contained as possible, but no
more
Avoid 'routine' background, e.g. An intro to
the C programming language
Don't quote material that’s irrelevant or
that you haven't read
16. Report Structure
Title
Abstract
Introduction
Background
The details
Evaluation/Discussion
Conclusions and further work
Bibliography
(Appendices)
17. The details
Say what you did, how and why
(“method”); this is highly project-specific
Justify your decisions regarding design,
tools, techniques etc.
Emphasise the key challenges that you
encountered and how you overcame them
18. Intuition first…
Explain your ideas as if you were speaking to
someone using a whiteboard. Only then go into
the details.
Conveying the intuition is primary, not
secondary
Once your reader has the intuition, they can
follow the details (but not vice versa)
Even if they skip the details, they still take away
something valuable
ALWAYS use examples and diagrams to help
19. Presenting your ideas
Do not recapitulate your personal
journey of discovery. This route may be
soaked with your blood, but that is not
interesting to the reader.
However, “failed attempts” can often add
insight (First Attempt… Second
Attempt…)
Omit obvious implementation details.
Focus instead on those that really
matter.
20. Report Structure
Title
Abstract
Introduction
Background
The details
Evaluation/Discussion
Conclusions and further work
Bibliography
(Appendices)
21. Evaluation
You must evaluate your successes and failures,
bearing in mind what others have already done
Make sure that each of your claimed
contributions in the introduction is backed up
with evidence
Quantitative evaluation is about performing
experiments and making sense of hard
numerical results
Qualitative evaluation is about making
judgements – it is more “slippery” and
subjective
22. Quantitative evaluation
Detail your experiment(s)
The reader must be able to reproduce your
results
Present results in tables and/or graphs
Discuss the results and provide insight
Explain interesting artefacts. Don’t leave it
to the reader to interpret your results (this
is the hard bit!)
23. Example: Detailing experiments
The experiments were conducted on a 2.8GHz Pentium IV with 1GB RAM, a 512K level-2 cache, a
16KB instruction cache and a 16KB, 4-way set-associative level-1 data cache with 32-byte lines.
The system ran Mandrake Linux 10.1 with a 2.6.15.1 kernel in single-user mode. All results reported
are averages taken over five runs and are expressed as overheads with respect to reference data.
Each application was run in two configurations: one with a heap large enough for the application to
complete without causing a garbage collection (typically around 110Mb), and one with a smaller
initial heap (256kB). It is worth noting that the execution time of the application must be controlled
such that the total allocation does not exceed the larger heap value (which would result in a garbage
collection). The value is constrained to just below the amount of memory in the benchmark machine.
With profiling turned off the difference between the two execution times tells us the total cost of
garbage collection. For the new scheme the application was first executed with profiling turned on in
order to determine the total number of mutator pauses and the heap allocation rate in bytes/s. The
timings taken from runs without profiling were used to determine the mean pause time.
Each application was run with four garbage collection algorithms: stop-and-copy (with all
optimisations turned on), a traditional implementation of Baker with a software read-barrier, and the
two versions of the new scheme, EA and EB. The Baker implementation was set to scavenge at
each block allocation, as per EB. In each case the mark-ratio was 2 for the runs of EA and 2048 for
the runs of EB.
The reader must be able
Make it absolutely clear what you to reproduce your results
did
24. Example: Explaining artefacts
Table 5.3 demonstrates some unusual behaviour. Averaged over all 36 benchmarks the incremental-
B schemes are actually running faster than stop-and-copy, which is unexpected. Similarly, some
applications (Bernoulli, Gamteb, Pic and Symalg) exhibit substantial speed-ups (35-79%) across all
variants of the incremental scheme. There are also significant outliers in the other direction (Paraffins
and scs, for example). Wave4main curiously exhibits extreme speed-ups across all -B variants and
extreme slow-downs across all -A variants.
Because the effects are seen across the board it suggests that it is an artefact of the incremental
nature of the garbage collector, rather than the idiosyncrasies of any particular application.
The extreme behaviours are attributable to two different aspects of incremental collector behaviour.
The extreme speed-ups are an artefact of long-lived data and the fact that stop-and-copy collection
happens instantaneously with respect to mutator execution...
The extreme slow-downs are an artefact of incremental collection cycles that are forced to
completion. The stop-and-copy collector sizes the nursery based on the amount of live data at the
end of the current cycle, while the incremental collector can only use that of the previous cycle...
The artefact
The insight
25. Qualitative evaluation
Try to relate experience, rather than rely on
speculation, e.g.
“Users were able to use the system by just being told
the rules of the game…”
“The use of shadows gave much better depth
perception and made it easier to…”
“The need to cast all numerical values to doubles was
a universally unpopular feature…”
Hint: be quantitative wherever possible
E.g. size of code, time to solve a given problem,
number of mouse clicks…
26. Report Structure
Title
Abstract
Introduction
Background
The details
Evaluation/Discussion
Conclusions and further work
Bibliography
(Appendices)
27. Conclusions and future work
The conclusion is not a summary of the
project, although you might provide a brief
reminder
Summarise those things that you have learnt
by the end of the project, that may not have
been obvious at the beginning
Use the further work to suggest fixes to your
approach and to highlight new ideas that
warrant a new investigation
28. Example conclusions
8. Conclusions and Future Work
A technique has been described for incorporating Baker’s incremental garbage collection into the GHC run-time. Each
closure creates its own ‘personal’ read-barrier by manipulating its entry code-pointer during execution and garbage
collection. The technique derives its efficiency from the twin facts that these manipulations are very cheap and that they
are performed on a per-closure basis, rather than a per-pointer-load basis, as in most implementations of the read-barrier.
Experiments suggest that the new scheme carries a very low overhead, averaging less than 4% in execution time for the
benchmarks tested, with average pause times being sub-millisecond in all cases. Etc.
We have also seen that the block-based memory allocator of the GHC run-time system provides an excellent level of
granularity at which to perform incremental scavenging. By scavenging on each block allocation we can keep the
scavenger going for longer, at the expense of an increase in pause time. This substantially reduces the overheads in the
proposed scheme and provides an additional mechanism (the block size) by which to control the behaviour of the
garbage collector. Etc.
8.1 Further Work
In principle, the scheme can be applied to any system that supports pure dynamic dispatch. The Java Virtual Machine
(JVM), for example, uses dynamic dispatch to implement virtual method invocations, although other method calls are
supported using more conventional function calling mechanisms. An interesting approach would be to modify the JVM so
that all method calls are implemented by dynamic dispatch so that the technique described here can be used to support
incremental collection with minimal additional overhead. There is then an interesting performance trade-off: is it cheaper
to build a permanent read barrier in the current JVM, or to pay the price of full virtualisation and then get the read barrier
(almost) for free? This is left as an open question that is worthy of future investigation. Etc.
29. Report Structure
Title
Abstract
Introduction
Background
The details
Evaluation/Discussion
Conclusions and further work
Bibliography
(Appendices)
30. Bibliography
Cite your sources properly, e.g.
[4] A. Printezis and A. Garthwaite, “Visualising the Train garbage
collector”, Proceedings of the Third International Symposium on
Memory Management (ISMM’02), ACM SIGPLAN Notices, pp.
100-105, Berlin, June 2002. ACM Press.
[5] The libunwind project,
http://www.hpl.hp.com/research/linux/libunwind/
The reader should be able to locate all
your sources of inspiration directly
32. Submission
Submit two hard copies by the deadline (NO
extensions)
More pages does not imply more marks –
popular binder sizes are 75-100 and 100-130
Print double sided, single-spaced
Do not narrow or expand the margins
or 16pt font
Do not use 6pt font
Where appropriate, use an appendix for: a user guide,
experimental data, traces, detailed proof, etc.
Do not include program listings in the report
Always use a spell checker
(Always use LaTeX!)
33. Visual structure
Give strong visual structure to your report
using
sections and sub-sections
bullets
italics
laid-out code
Find out how to draw beautiful pictures, and
use them
Always refer explicitly to figures in the text
35. Make it engaging
Use the active voice (subject-verb-object) and engage the
reader “We” = you
and the
NO YES reader
It can be seen that... We can see that...
The code must then be The programmer must then
analysed by the programmer analyse the code
It might be thought that this You might think this would be Active verb
would be a type error a type error
“You” = the
reader
36. Use simple, direct language
NO YES
This pattern is exemplified
This is shown in Figure 5.5
pictorially in Figure 5.5
The object under study was
The ball moved sideways
displaced horizontally
Endeavour to ascertain Find out
It could be considered that the
The garbage collector was really
speed of storage reclamation left
slow
something to be desired
37. Avoid “noise words”
NO YES
The code is then simply re-compiled The code is then re-compiled
The actual memory used was never The memory used was never more
more than 256KB than 256KB
In actual fact, and completely contrary
to what might have been expected, it
Surprisingly, the second approach was
turned out that the second approach
much more efficient
proved to be much the more efficient
of the two
However, of course, this would break However, this would break the type
the type system system
38. Some other sources
Research Skills (Simon Peyton Jones) (
http://research.microsoft.com/~simonpj/papers/giving-a-talk/giving-a-talk
) (Thanks to Simon for providing the template for this talk)
Scientific Communication
http://undergraduate.csse.uwa.edu.au/units/CITS7200/
Advice on Research and Writing
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/user/mleone/web/how-to.html
How to write plain English
http://www.plainenglish.co.uk/howto.pdf