Update manual, CHANGELOG, and TODO

2021-01-04 02:26:40 +05:30 · 2021-01-04 02:26:40 +05:30 · 5b55002d5c
parent 7204050f7f
commit 5b55002d5c
4 changed files with 36 additions and 91 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -15,7 +15,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 * Lisp objects being stored as un`read`able strings in `chronometrist-value-history`, resulting in value suggestions not matching user input.
 * `chronometrist-report` no longer calls `delete-other-windows`; use `chronometrist-report-mode-hook` if it is desired.
 * Fixed infinite loop in `chronometrist-report` triggered by non-English locales.
-* Optimization - refresh time after changing chronometrist-file is now near-instant for the two most common situations - an expression being added to the end of the file, or the last expression in the file being changed. This works for changes made by the user as well as changes made by Chronometrist (or other) commands.
+* Optimization - refresh time after changing `chronometrist-file` is now near-instant for the most common situations - an expression being added to the end of the file, and the last expression in the file being changed or removed. This works for changes made by the user as well as changes made by Chronometrist (or other) commands.

 ## [0.5.6] - 2020-12-22
 ### Fixed
--- a/TODO.org
+++ b/TODO.org
@ -106,7 +106,27 @@
 ** Maybe
 1. Add a new kind of plist - =(:name "NAME" :time "TIME" ...)=
   To record events for which the time interval is not relevant. These won't be shown in =chronometrist= - perhaps in a different buffer.
-* Optimization
+* Optimization [33%]
+Some options and ideas -
+1. [X] Defer (tag/key/value) history generation from file-change-time to prompt-time, and make it per-task instead of all tasks at once
+   * The biggest resource hog is splitting of midnight-spanning intervals, however.
+   * Reduce memory use by allowing user to restrict number of s-expressions read.
+   * Per-task history generation will create problems - e.g. values for a given key for one task won't be suggested for values for the same key in another 🤦
+     + Tags and keys are already task-sensitive; just don't make values task-sensitive.
+2. [X] Compare partial hashes of file to know what has changed - only update memory when necessary.
+3. [ ] In-memory cache - don't store entire file into memory; instead, split midnight-spanning events just for the requested data.
+   * Will increase load time for the first time =chronometrist=, =chronometrist-report=, or =chronometrist-statistics= are run (including forward/backward commands in the latter two)
+     + Can try pre-emptively loading data for the latter two
+   * Will reduce memory used by =chronometrist-events=.
+     + Further reductions can take place, if we automatically discard cache entries past a certain limit. (perhaps excluding data for the current day, week, or month)
+4. [ ] Mix of 2 and 3 - in-memory cache with partial updates
+5. [ ] Split and save midnight-spanning intervals to disk - remove the need for an in-memory version of data with split midnight-spanning intervals.
+   * Least memory use?
+   * Might make the file harder for a user to edit.
+6. [ ] Save timestamps as UNIX epoch time.
+   * Will (probably) greatly speed up time parsing and interval splitting.
+   * Will greatly impede human editing of the file, too. 🤔
+     + An editing UI could help - pretty sure every timestamp edit I've ever made has been for the last interval, or at most an interval in today's data.
 ** Cache
   + Lessons from the parsimonious-reading branch - iterating =read= over the whole file is fast; splitting the events is not.
   + Things we need to read the whole file for - task list, tag/key/value history.
@ -114,16 +134,13 @@
   + Anything requiring split events will first look in =chronometrist-events=, and if not found, will read from the file and update =chronometrist-events=.
   + When the file changes, use the file byte length and hash strategy described below to know whether to keep the cache.
   + Save cache to a file, so that event splitting is avoided by reading from that.
-*** Thoughts
-    + =chronometrist-key-value-cache= would basically be the entire file, if =chronometrist-history-suggestion-limit= is nil.
-    + history generation for tags/keys/values - which involve the most parsing - doesn't actually need the events to be split at midnights. Why not make that a keyword argument to =chronometrist-sexp-read=, so it's faster for that?
 ** Ideas to make -refresh-file faster
   1. Support multiple files, so we read and process lesser data when one of them changes.
   2. Make file writing async
   3. Don't refresh from file when clocking in.
   4. Only write to the file when Emacs is idle or being killed, and store data in memory (in the events hash table) in the meantime
   5. What if commands both write to the file /and/ add to the hash table, so we don't have to re-read the file and re-populate the table for commands? The expensive reading+parsing could be avoided for commands, and only take place for the user changing the file.
-      * [ ] jonasw - store length and hash of previous file, see if the new file has the same hash until old-length bytes.
+      * [X] jonasw - store length and hash of previous file, see if the new file has the same hash until old-length bytes.
        * Rather than storing and hashing the full length, we could do it until (before) the last s-expression (or last N s-expressions?). That way, we know if the last expression (or last N expressions) have changed.
          * Or even the first expression of the current date. That way, we just re-read the events for today. Because chronometrist-events uses dates as keys, it's easy to work on the basis of dates.
   6. [ ] Don't generate tag/keyword/value history from the entire log, just from the last N days (where N is user-customizable).
--- a/doc/manual.info
+++ b/doc/manual.info
@ -240,9 +240,11 @@ It was fixed in v0.2.2 by making the watch creation conditional, using @uref{../
@end itemize

@item
-Preserve state
+Preserve hash table state for some commands


+NOTE - this has been replaced with a more general optimization - see next section.
+
 The next one was released in v0.5. Till then, any time the @uref{../elisp/chronometrist-custom.el, @samp{chronometrist-file}} was modified, we'd clear the @uref{../elisp/chronometrist-events.el, @samp{chronometrist-events}} hash table and read data into it again. The reading itself is nearly-instant, even with ~2 years' worth of data @footnote{As indicated by exploratory work in the @samp{parsimonious-reading} branch, where I made a loop to only @samp{read} and collect s-expressions from the file. It was near-instant@dots{}until I added event splitting to it.} (it uses Emacs' @uref{(describe-function 'read), @samp{read}}, after all), but the splitting of @ref{Midnight-spanning events, , midnight-spanning events} is the real performance killer.

 After the optimization@dots{}
@ -266,69 +268,12 @@ Instead, the aforementioned backend functions modify the relevant variables - @s
 There are still some operations which @uref{../elisp/chronometrist.el, @samp{chronometrist-refresh-file}} runs unconditionally - which is to say there is scope for further optimization, if or when required.

@item
-Checking hashes
+Determine type of change made to file


-@samp{chronometrist-file-change-type}
-@samp{chronometrist-file-hash}
-@samp{chronometrist--file-state}
+Most changes, whether made through user-editing or by Chronometrist commands, happen at the end of the file. We try to detect the kind of change made - whether the last expression was modified, removed, or whether a new expression was added to the end - and make the corresponding change to @samp{chronometrist-events}, instead of doing a full parse again (@samp{chronometrist-events-populate}). The increase in responsiveness has been significant.

-@item
-Other strategies [25%]
-
-
-@enumerate
-@item
-Defer (tag/key/value) history generation from file-change-time to prompt-time, and make it per-task instead of all tasks at once
-@itemize
-@item
-The biggest resource hog is splitting of midnight-spanning intervals, however.
-@item
-Reduce memory use by allowing user to restrict number of s-expressions read.
-@item
-Per-task history generation will create problems - e.g. values for a given key for one task won't be suggested for values for the same key in another 🤦
-@itemize
-@item
-Tags and keys are already task-sensitive; just don't make values task-sensitive.
-@end itemize
-@end itemize
-@item
-Compare partial hashes of file to know what has changed - only update memory when necessary.
-@item
-In-memory cache - don't store entire file into memory; instead, split midnight-spanning events just for the requested data.
-@itemize
-@item
-Will increase load time for the first time @samp{chronometrist}, @samp{chronometrist-report}, or @samp{chronometrist-statistics} are run (including forward/backward commands in the latter two)
-@itemize
-@item
-Can try pre-emptively loading data for the latter two
-@end itemize
-@item
-Will reduce memory use.
-@end itemize
-@item
-Mix of 2 and 3 - in-memory cache with partial updates
-@item
-Split and save midnight-spanning intervals to disk - remove the need for an in-memory version of data with split midnight-spanning intervals.
-@itemize
-@item
-Least memory use?
-@item
-Might make the file harder for a user to edit.
-@end itemize
-@item
-Save timestamps as UNIX epoch time.
-@itemize
-@item
-Will (probably) greatly speed up time parsing and interval splitting.
-@item
-Will greatly impede human editing of the file, too. 🤔
-@itemize
-@item
-An editing UI could help - pretty sure every timestamp edit I've ever made has been for the last interval, or at most an interval in today's data.
-@end itemize
-@end itemize
-@end enumerate
+When @samp{chronometrist-refresh-file} is run by the file system watcher, it uses  @samp{chronometrist-file-hash} to assign indices and a hash to @samp{chronometrist--file-state}. The next time the file changes, @samp{chronometrist-file-change-type} compares this state to the current state of the file to determine the type of change made.
@end enumerate

@node Midnight-spanning events
--- a/doc/manual.org
+++ b/doc/manual.org
@ -81,7 +81,9 @@ Thus, I have considered various optimization strategies, and so far implemented
 One of the earliest 'optimizations' of great importance turned out to simply be a bug - turns out, if you run an identical call to [[elisp:(describe-function 'file-notify-add-watch)][=file-notify-add-watch=]] twice, you create /two/ file watchers and your callback will be called /twice./ We were creating a file watcher /each time the chronometrist command was run./ 🤦 This was causing humongous slowdowns each time the file changed. 😅
 + It was fixed in v0.2.2 by making the watch creation conditional, using [[file:../elisp/chronometrist-common.el::defvar chronometrist--fs-watch ][=chronometrist--fs-watch=]] to store the watch object.

-**** Preserve state
+**** Preserve hash table state for some commands
+NOTE - this has been replaced with a more general optimization - see next section.
+
 The next one was released in v0.5. Till then, any time the [[file:../elisp/chronometrist-custom.el::defcustom chronometrist-file (][=chronometrist-file=]] was modified, we'd clear the [[file:../elisp/chronometrist-events.el::defvar chronometrist-events (][=chronometrist-events=]] hash table and read data into it again. The reading itself is nearly-instant, even with ~2 years' worth of data [fn:1] (it uses Emacs' [[elisp:(describe-function 'read)][=read=]], after all), but the splitting of [[* Midnight-spanning events][midnight-spanning events]] is the real performance killer.

 After the optimization...
@ -96,30 +98,11 @@ There are still some operations which [[file:../elisp/chronometrist.el::defun ch

 [fn:1] As indicated by exploratory work in the =parsimonious-reading= branch, where I made a loop to only =read= and collect s-expressions from the file. It was near-instant...until I added event splitting to it.

-**** Checking hashes
-=chronometrist-file-change-type=
-=chronometrist-file-hash=
-=chronometrist--file-state=
+**** Determine type of change made to file
+Most changes, whether made through user-editing or by Chronometrist commands, happen at the end of the file. We try to detect the kind of change made - whether the last expression was modified, removed, or whether a new expression was added to the end - and make the corresponding change to =chronometrist-events=, instead of doing a full parse again (=chronometrist-events-populate=). The increase in responsiveness has been significant.
+
+When =chronometrist-refresh-file= is run by the file system watcher, it uses =chronometrist-file-hash= to assign indices and a hash to =chronometrist--file-state=. The next time the file changes, =chronometrist-file-change-type= compares this state to the current state of the file to determine the type of change made.

-**** Other strategies [25%]
-1. [X] Defer (tag/key/value) history generation from file-change-time to prompt-time, and make it per-task instead of all tasks at once
-   * The biggest resource hog is splitting of midnight-spanning intervals, however.
-   * Reduce memory use by allowing user to restrict number of s-expressions read.
-   * Per-task history generation will create problems - e.g. values for a given key for one task won't be suggested for values for the same key in another 🤦
-     + Tags and keys are already task-sensitive; just don't make values task-sensitive.
-2. [ ] Compare partial hashes of file to know what has changed - only update memory when necessary.
-3. [ ] In-memory cache - don't store entire file into memory; instead, split midnight-spanning events just for the requested data.
-   * Will increase load time for the first time =chronometrist=, =chronometrist-report=, or =chronometrist-statistics= are run (including forward/backward commands in the latter two)
-     + Can try pre-emptively loading data for the latter two
-   * Will reduce memory use.
-4. [ ] Mix of 2 and 3 - in-memory cache with partial updates
-5. [ ] Split and save midnight-spanning intervals to disk - remove the need for an in-memory version of data with split midnight-spanning intervals.
-   * Least memory use?
-   * Might make the file harder for a user to edit.
-6. [ ] Save timestamps as UNIX epoch time.
-   * Will (probably) greatly speed up time parsing and interval splitting.
-   * Will greatly impede human editing of the file, too. 🤔
-     + An editing UI could help - pretty sure every timestamp edit I've ever made has been for the last interval, or at most an interval in today's data.
 ** Midnight-spanning events
 :PROPERTIES:
 :DESCRIPTION: Events starting on one day and ending on another