Commit Graph

266 Commits

Author SHA1 Message Date
cheloha 19f38e0b6b msdosfs: don't pass NULL proc pointer to detrunc()
detrunc()'s proc pointer argument may be passed to vinvalbuf(9), which
under certain conditions will pass the given proc pointer to
VOP_FSYNC(9), which always asserts that the given proc pointer is
equal to curproc.

msdosfs_write(), msdosfs_inactive(), createde(), and deextend() all
pass NULL for detrunc()'s proc pointer argument.  I have no idea why.
If these detrunc() calls ever reach VOP_FSYNC(9) the kernel will
panic.

So, for example, any user with write access to an msdosfs partition
can panic the kernel by writing to the partition until they cause
ENOSPC.  That particular panic looks like this:

panic: kernel diagnostic assertion "p == curproc" failed: file "/usr/src/sys/kern/vfs_vops.c", line 305
Stopped at      db_enter+0xa:   popq    %rbp
    TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
*500294   8955      0    0x100003          0    1K ksh
db_enter() at db_enter+0xa
panic(ffffffff81f1b0cf) at panic+0xc4
__assert(ffffffff81fa361c,ffffffff81ee8329,131,ffffffff81f7229b) at assert+0x3b
VOP_FSYNC(fffffd8449a78b30,ffffffffffffffff,1,0) at VOP_FSYNC+Oxd6
vinvalbuf(fffffd8449a78b30,3,ffffffffffffffff,0,0,ffffffffffffffff) at vinvalbuf+0xd5
detrunc(ffff80000186f900,1fe,0,ffffffffffffffff,0) at detrunc+0x239
msdosfs_write(ffff800055774b98) at msdosf_write+0x4a4
VOP_WRITE(fffffd8449a78b30,ffff800055774d10,3,fffffd8370e8d5d0) at VOP_WRITE+0x59
vn_write(fffffd83c723b860,ffff800055774d10,0) at vn_write+0xc0
dofilewritev(ffff8000556ecfc0,1,ffff800055774d10,0.ffff800055774dc0) at dofilewritev+0x14d
sys_write(ffff8000556ecfc0,ffff800055774dd0,ffff800055774dc0) at sys_write+0x6a
syscall(ffff800055774e70) at syscall+0x39b
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7ffffd8bf0, count: 2

This patch tweaks all the detrunc() calls in the aforementioned
msdosfs functions to pass curproc instead of a NULL pointer to
detrunc().  We don't appear to have curproc stashed anywhere in
msdosfs_write() or deextend(), so for those calls we explicitly pass
curproc.

This might have unforseen consequences I can't anticipate.  However,
with this patch I can no longer panic the kernel by filling an msdosfs
partition, which seems like an improvement.

With advice from gnezdo@.

ok gnezdo@
2022-08-23 20:37:16 +00:00
jsg b1b3e58b99 remove msdosfs findwin95()
unused since msdosfs_vfsops.c 1.95
ok miod@ millert@
2022-08-15 01:47:09 +00:00
visa 4b515238de Put more struct vnode fields under splbio().
Buffer cache related struct vnode fields can be accessed in interrupt
context. Be more consistent with the use of splbio().

OK mpi@
2022-08-12 14:30:52 +00:00
visa 5c9fbd3ca3 Remove unused VOP_POLL().
OK mpi@
2022-06-26 05:20:42 +00:00
jsg 0d297f4756 spelling
ok jmc@
2022-01-11 03:13:58 +00:00
jsg a76718dddd make array bounds in unix2dosfn() prototype match function
missed when unix2dosfn() was changed with msdosfs_conv.c rev 1.15 in 2012
2021-12-23 02:12:52 +00:00
visa f1993be39f Add vnode parameter to VOP_STRATEGY()
Pass the device vnode as a parameter to VOP_STRATEGY() to allow calling
the correct vop_strategy callback. Now the vnode is also available
in the callback.

OK mpi@
2021-12-12 09:14:58 +00:00
visa 6ecc0d7f4d Clarify usage of __EV_POLL and __EV_SELECT
Make __EV_POLL specific to kqueue-based poll(2), to remove overlap
with __EV_SELECT that only select(2) uses.

OK millert@ mpi@
2021-12-11 09:28:26 +00:00
kn 5970a93587 Use long filenames by default on FAT filesystems
These days, 8.3 filenames are often a problem, filesystems containing
firmware with long names must not truncate them -- it's also a sane default
as portable file system between OSes, anyway.

Altough undocumented in mount_msdos(8), the default for FAT32 already is to
use long filenames:  ever since its import from NetBSD in 1998.

Previously, mount_msdos would ignore long filenames and default to short
filenames unless a flag was used or long ones were found on the filesystem
prior to mounting it.

Just always mount with support for long filenames (unless `-s' is used).


As various install media use FAT filesystems, adjust the remaining ones to
also pass explicit mount option reflecting the previous default.

OK deraadt
2021-11-13 18:18:59 +00:00
jsg 4c3a06e3a0 correct comment
from Jonathan Kollasch in NetBSD
2021-07-11 04:34:13 +00:00
jsg b66b9ef836 spelling 2021-03-11 13:31:35 +00:00
visa 9b0cf67b26 Refactor klist insertion and removal
Rename klist_{insert,remove}() to klist_{insert,remove}_locked().
These functions assume that the caller has locked the klist. The current
state of locking remains intact because the kernel lock is still used
with all klists.

Add new functions klist_insert() and klist_remove() that lock the klist
internally. This allows some code simplification.

OK mpi@
2020-12-25 12:59:51 +00:00
jsg c34741d6f3 consistently return EINVAL on invalid BPB
reverts changes from msdosfs_vfsops.c rev 1.7

Prompted by a patch from John Carmack to add an an error path when exFAT
is detected on mount to give a more helpful error message.
Returning EINVAL in the existing sanity checks will make mount_msdos(8)
print "not an MSDOS filesystem" when attempting to mount exFAT and
matches historic and documented behaviour.

ok kn@
2020-08-10 05:18:46 +00:00
mpi 1c57bd6bc2 Rename poll-compatibility flag to better reflect what it is.
While here prefix kernel-only EV flags with two underbars.

Suggested by kettenis@, ok visa@
2020-06-11 09:18:43 +00:00
mpi 6e29a94440 Use a new EV_OLDAPI flag to match the behavior of poll(2) and select(2).
Adapt FS kqfilters to always return true when the flag is set and bypass
the polling mechanism of the NFS thread.

While here implement a write filter for NFS.

ok visa@
2020-06-08 08:04:09 +00:00
visa 9c969c9ab4 Abstract the head of knote lists. This allows extending the lists,
for example, with locking assertions.

OK mpi@, anton@
2020-04-07 13:27:50 +00:00
krw c5450bbb77 Kill some dead code that tests bits immediately after setting them.
CID 1452873
2020-03-24 14:03:30 +00:00
mpi db7aa9821c Remove unused "struct proc *" argument from the following functions:
- ufs_chown() & ufs_chmod()
- ufs_reclaim()
- ext2fs_chown() & ext2fs_chmod()
- ntfs_ntget() & ntfs_ntput()
- ntfs_vgetex(), ntfs_ntlookup() & ntfs_ntlookupfile()

While here use `ap->a_p' directly when it is only required to re-enter
the VFS layer in order to help reducing the loop.

ok visa@
2020-02-27 09:10:31 +00:00
visa b821368957 Replace field f_isfd with field f_flags in struct filterops to allow
adding more filter properties without cluttering the struct.

OK mpi@, anton@
2020-02-20 16:56:51 +00:00
tedu 42f54f89da remove a notyet that remains more not than yet after 25 years. ok krw 2020-01-24 03:49:34 +00:00
claudio 2d6b9e38f3 struct vops is not modified during runtime so use const which moves each
into read-only data segment.
OK deraadt@ tedu@
2020-01-20 23:21:55 +00:00
visa 94321eb495 Use C99 designated initializers with struct filterops. In addition,
make the structs const so that the data are put in .rodata.

OK mpi@, deraadt@, anton@, bluhm@
2019-12-31 13:48:31 +00:00
bluhm 41f642fce2 Convert struct vfsops initializer to C99 style.
OK visa@
2019-12-26 13:28:49 +00:00
cheloha 612d413b9d msdosfs: remove timezone support
This support is undocumented, only works if you're using the kernel
timezone, and breaks during a DST shift.  It also preferences file systems
managed by a Windows installation: many implementations, like ours, use
UTC by default (think: phones, digital cameras).

No complaints on tech@.

"good riddance" tedu@, "Yep." deraadt@
2019-09-04 14:40:22 +00:00
anton 836f297b39 Allow concurrent reads of the f_offset field of struct file by
serializing both read/write operations using the existing file mutex.
The vnode lock still grants exclusive write access to the offset; the
mutex is only used to make the actual write atomic and prevent any
concurrent reader from observing intermediate values.

ok mpi@ visa@
2019-08-05 08:35:59 +00:00
cheloha a8d7c3beb6 vinvalbuf(9): tlseep -> tsleep_nsec(9); ok millert@ 2019-07-25 01:43:20 +00:00
cheloha 570df5c46e getblk(9): tsleep(9) -> tsleep_nsec(9); ok visa@ 2019-07-19 00:24:31 +00:00
solene 6784024ee2 Revert anton@ changes about read/write unlocking
https://marc.info/?l=openbsd-cvs&m=156277704122293&w=2

ok anton@
2019-07-12 13:56:27 +00:00
anton d038d3d544 Make read/write of the f_offset field belonging to struct file MP-safe;
as part of the effort to unlock the kernel. Instead of relying on the
vnode lock, introduce a dedicated lock per file. Exclusive write access
is granted using the new foffset_enter and foffset_leave API. A
convenience function foffset_get is also available for threads that only
need to read the current offset.

The lock acquisition order in vn_write has been changed to match the one
in vn_read in order to avoid a potential deadlock. This change also gets
rid of a documented race in vn_read().

Inspired by the FreeBSD implementation.

With help and ok mpi@ visa@
2019-07-10 16:43:19 +00:00
anton 6bd4f7ca0e Introduce a dedicated entry point data structure for file locks. This new data
structure allows for better tracking of pending lock operations which is
essential in order to prevent a use-after-free once the underlying vnode is
gone.

Inspired by the lockf implementation in FreeBSD.

ok visa@

Reported-by: syzbot+d5540a236382f50f1dac@syzkaller.appspotmail.com
2019-01-21 18:09:21 +00:00
visa 5ff674a53d Drop redundant "node == parent node" checks from VOP_RMDIR()
implementations. Rely on the VFS layer to do the checking.

OK mpi@, helg@
2018-06-21 14:17:23 +00:00
visa 4dd4d774e8 Make callers of VOP_CREATE(9) and VOP_MKNOD(9) responsible for
unlocking the directory vnode.

OK mpi@, helg@
2018-06-07 13:37:27 +00:00
visa 08107a0b7d Drop unnecessary `p' parameter from vget(9).
OK mpi@
2018-05-27 06:02:14 +00:00
mpi 90aa86e4e3 Implement VFS read clustering for MSDOSFS, take 3.
With sf@, inputs from krw@, tested by many, ok visa@
2018-05-07 14:43:01 +00:00
visa 6e88053469 Remove proc from the parameters of vn_lock(). The parameter is
unnecessary because curproc always does the locking.

OK mpi@
2018-05-02 02:24:55 +00:00
visa 36bb23f12a Clean up the parameters of VOP_LOCK() and VOP_UNLOCK(). It is always
curproc that does the locking or unlocking, so the proc parameter
is pointless and can be dropped.

OK mpi@, deraadt@
2018-04-28 03:13:04 +00:00
visa d78cb2ffda Use RWL_IS_VNODE with locks that are acquired through VOP_LOCK(),
to appease WITNESS. ext2fs and ffs already use the flag. The same
locking pattern appears with other file systems too, so this patch
addresses the remaining cases.

OK mpi@
2018-03-28 16:34:28 +00:00
deraadt 976e983900 Syncronize filesystems to disk when suspending. Each mountpoint's vnodes
are pushed to disk.  Dangling vnodes (unlinked files still in use) and
vnodes undergoing change by long-running syscalls are identified -- and
such filesystems are marked dirty on-disk while we are suspended (in case
power is lost, a fsck will be required).  Filesystems without dangling or
busy vnodes are marked clean, resulting in faster boots following
"battery died" circumstances.
Tested by numerous developers, thanks for the feedback.
2018-02-10 05:24:23 +00:00
guenther 4b1f64dcf5 Stop assuming <sys/file.h> will pull in fcntl.h when _KERNEL is defined.
ok millert@ sthen@
2018-01-02 06:38:45 +00:00
guenther c0cd348992 Don't pull in <sys/file.h> just to get fcntl.h
ok deraadt@ krw@
2017-12-30 23:08:29 +00:00
guenther 98edb555c6 Delete unnecessary <sys/file.h> includes
ok millert@ krw@
2017-12-30 20:46:59 +00:00
deraadt 7efda1a11d In uvm Chuck decided backing store would not be allocated proactively
for blocks re-fetchable from the filesystem.  However at reboot time,
filesystems are unmounted, and since processes lack backing store they
are killed. Since the scheduler is still running, in some cases init is
killed... which drops us to ddb [noted by bluhm].  Solution is to convert
filesystems to read-only [proposed by kettenis]. The tale follows:
sys_reboot() should pass proc * to MD boot() to vfs_shutdown() which
completes current IO with vfs_busy VB_WRITE|VB_WAIT, then calls VFS_MOUNT()
with MNT_UPDATE | MNT_RDONLY, soon teaching us that *fs_mount() calls a
copyin() late... so store the sizes in vfsconflist[] and move the copyin()
to sys_mount()... and notice nfs_mount copyin() is size-variant, so kill
legacy struct nfs_args3.  Next we learn ffs_mount()'s MNT_UPDATE code is
sharp and rusty especially wrt softdep, so fix some bugs adn add
~MNT_SOFTDEP to the downgrade.  Some vnodes need a little more help,
so tie them to &dead_vnops.

ffs_mount calling DIOCCACHESYNC is causing a bit of grief still but
this issue is seperate and will be dealt with in time.
couple hundred reboots by bluhm and myself, advice from guenther and
others at the hut
2017-12-11 05:27:40 +00:00
sf d7df32c5d9 msdofs: Add new CLUST_END constant
(forgot to commit fat.h)

Add new CLUST_END and use it as parameter to pcbmap() when searching
for end cluster, instead of explicitly passing 0xffff. This fixes potential
problem for FAT32, where cluster number may be legally bigger than 0xffff.

Also change clusteralloc() so that fillwith is not explicitly passed by caller
anymore (there is no need to use anything other than CLUST_EOFE).

From NetBSD commit by jdolecek@NetBSD.org

ok tb@ mpi@
2017-08-14 22:45:12 +00:00
sf d388903626 msdofs: Add new CLUST_END constant
Add new CLUST_END and use it as parameter to pcbmap() when searching
for end cluster, instead of explicitly passing 0xffff. This fixes potential
problem for FAT32, where cluster number may be legally bigger than 0xffff.

Also change clusteralloc() so that fillwith is not explicitly passed by caller
anymore (there is no need to use anything other than CLUST_EOFE).

From NetBSD commit by jdolecek@NetBSD.org

ok tb@ mpi@
2017-08-14 22:43:56 +00:00
sf cc40aa10d7 minor msdosfs tweaks
* add to comments for pcbmap()
* remove useless ";"

ok tb@
2017-08-13 23:36:27 +00:00
sf 659e968ee6 Revert 'Implement VFS read clustering for MSDOSFS' again
This has again caused regressions, this time when reading from msdosfs.

This reverts

        denode.h 1.31
        msdosfs_vnops.c 1.114

Requested by deraadt@
2017-06-13 18:13:18 +00:00
sf a34a8c9c6a msdosfs & ffs: flush cache if updating mount from r/w to r/o
ok deraadt@
2017-05-29 14:07:16 +00:00
sf f4aaab5fd4 Implement VFS read clustering for MSDOSFS
This is the reverted commit by mpi@ from msdosfs_vnops.c 1.105 plus some
additional tweaks to fix some cluster/block number confusion that lead
to regressions when seeking past the end of a file.

The original commit message was:

  The logic used in msdosfs_bmap() to loop calling pcbmap() comes from
  FreeBSD and is not really efficient but it is good enough since it is
  only called when generating I/O.

  With this diff I get a 100% improvement when reading big files from a
  crappy USB stick.

  With this and bread_cluster(9) modified to not re-fetch B_CACHED buffers,
  reading large contiguous files with chunk sizes of MAXPHYS is almost as
  fast as physio(9) on the same device.

  For a 'real world' example, when copying music files from a USB stick I
  see a speed jump from 15MB/s on -current to 24Mb/s with this diff.

  While here rename some 'lbn' variables into 'cn' to better reflect what
  we're dealing with.

  Tested by Mathieu, with support from deraadt@

ok mpi@
2017-05-29 13:48:12 +00:00
visa 6f84e71dcd Tweak lock inits to make the system runnable with witness(4)
on amd64 and i386.
2017-04-20 14:13:00 +00:00
bluhm ba4520cc08 Rename BIOS parameter block field from bsPBP to bsBPB. This typo
has been fixed in FreeBSD in 2002.  No binary change.
From Alexander von Gernler; OK krw@
2016-10-10 00:34:50 +00:00