Allow alternate /home paths #25
Labels
No Label
bug
compatibility
documentation
duplicate
enhancement
future release
help wanted
invalid
non-code
question
refactor
testing
this release
wontfix
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: cmccabe/linkulator2#25
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Not all systems keep user home directories in /home/username.
SDF/MA keeps them in /meta/[a-z]/username
SDF/Arpa keeps them in /sdf/arpa/[a-z][a-z]/[a-z]/username
We're already using glob() to aggregate all linkluator.data files, and we should be able to feed it a slightly more general pattern (configurarable by the user) to catch the files in places other than /home/username/.linkulator
In each of these cases, users on a server would not need to have more than one setting, correct?
For example, the two systems mentioned are both standalone systems. One server has this configured as
/meta/*/*/.linkulator/
and the other as/sdf/arpa/*/*/*/*/.linkulator/
and that is all that is required?If this is the case, this could be a variable configurable by whoever performs the installation. Documentation could support this action. It also means that there's no need for per-user configuration for this setting.
Is that all correct?
Yes, those are correct. I realized we have one more step to account for though, and that is extracting the username from the filepath. We do it currently in this line:
file_owner = filename.split("/")[2]
I think we can just change it to the following to accommodate any numbers of directory depth:
file_owner = filename.split("/")[-1]
Are there any edge cases where this would break down?
The following will get a list of users and their home directories by looking at the passwd database:
Need to do some investigation on portability, but works on my computer and rawtext club.
Edit: It is available on all Unix versions
I think it may be more specific to stick with the [-1] approach because, rather than looking for a list of all usernames on the system, we are trying to extract usernames of linkulator users from the file globbing function.
linkulator_files = glob.glob("/home/*/.linkulator/linkulator.data")
---this one has a full filepath of all linkulator.data files, including usernames.
Later, we do this to extract the usernames associate with each linkulator.data file:
for filename in linkulator_files:
file_owner = filename.split("/")[2]
But since username will only be element [2] in a home dir scheme like /home/username, we need to generalize it more. I think username will always be one field left of the right side of the split string, so [-1]. Or is it [-2] or even [-3] because of the linkulator directory name and the linkulator.data file name?
asdf, take a look at #27. I think I got it working there. It looks like [-3] was the right spacing from the right side of the split() string.
It is more specific, and probably faster, but requires the administrator make a configuration change to support differing home path conventions. My suggestion would (in theory) work without any special configuration, but iterating over each home directory to check if the data file exists might be slower. I get the impression glob is very efficient.
I'm happy to go with your approach though, and it looks fine as it is. My only suggestions are:
I'll do a PR with these proposals to explain it better.
Edit: See #28 for my proposal
Good points. I had not thought about the outcome of an admin needing to change the configuration. But ok, let's do as you suggested with points 1 and 2.
This should now be complete. Home directory path can be amended in config.py and the process is documented.
This might not be the normal way to handle customisation, but it should be usable for now.
Let me know if any other issues!
Re-opening this issue for one specific topic.
I looked at the home directory structure on grex.org and I'm wondering if even our generalize approach will work with it. Grex's home directories are in a format like this /[a-z]/[a-z]/username ...where the first letter is on the same level as the other root-level directories.
It looks like this: ls -l /
...so maybe our generalized approach will still work, using ///*/.linkulator/ as the glob path. But I'm wondering if this won't slow it down significantly because it adds a ton of potential directories to the search path.
I'm not actually sure there is a solution here. Maybe we should just test it out?
Ok, I tested on grex using //// as the home dir path, and it totally choked. The asterisks in the first two slots mean that it is searching through a ton of unnecessary and huge directories, so I also tried /[a-z]/[a-z]/ as the path, and it takes about a minute and a half to run:
It's not quite as bad on SDF, but still unacceptably slow (more than 20 seconds):
So unless there is a way to optimize linkulator's search of home directory paths on larger systems, we may just need to accept that it is designed for tiny systems.
But... this gives me an idea for a future enhancement. Coming soon as a new Issue. (See issue #45)
I tested on tilde.team which has 400+ users in the standard /home/username directory structure; on tilde.town which has over 2000 users; and on tilde.club which has about 1800 users. In each of these cases, traversing the /home dir tree was super fast. Of course, this does not mean it would be fast if each user had linkulator.data, but it's hard to test that.
If the glob pattern you've specified is slow, can you try to validate how a similar operation performs in the shell? For example:
Which version of Python is
python3
on each of these systems?Also, what is the actual operating system?
Good questions.
SDF is NetBSD 8.1 with Python 3.6.9
Grex is OpenBSD 6.3 with Python 3.6.4
tilde.team is Ubuntu 18.04 with Python 3.6.9
tilde.town is Ubuntu 19.04 with Python 3.7.3
tilde.club is Fedora 30 with Python 3.7.5
I'll time those operations as soon as I have more time.
The three tildes were pretty much the same, so I just included town (the biggest) here. The results are the same or slower than in Python.
OK, seems to me that the performance is directly related to globbing.
I wonder if there are other options, like parallel processing, or if we are just at a real system limit.
That would be an interesting challenge. There is a multiprocssing module for Python - https://docs.python.org/3.8/library/multiprocessing.html Maybe we should leave this for future consideration though since our initial usage target is just these small systems.
But also, although SDF has 8 CPUs, Grex has only 1. So it looks like we are at our true system limit on Grex.
So I'll close this one out for now, and we can open another one in the future if we want to tackle parallelization.
Not to beat a dead horse, but I also tested using the 'find' command:
So that IS faster than ls, but still not nearly fast enough.