fsnotify is a backend for filesystem notification. fsnotify does
not provide any userspace interface but does provide the basis
needed for other notification schemes such as dnotify. fsnotify
can be extended to be the backend for inotify or the upcoming
fanotify. fsnotify provides a mechanism for "groups" to register for
some set of filesystem events and to then deliver those events to
those groups for processing.
fsnotify has a number of benefits, the first being actually shrinking the size
of an inode. Before fsnotify to support both dnotify and inotify an inode had
unsigned long i_dnotify_mask; /* Directory notify events */
struct dnotify_struct *i_dnotify; /* for directory notifications */
struct list_head inotify_watches; /* watches on this inode */
struct mutex inotify_mutex; /* protects the watches list
But with fsnotify this same functionallity (and more) is done with just
__u32 i_fsnotify_mask; /* all events for this inode */
struct hlist_head i_fsnotify_mark_entries; /* marks on this inode */
That's right, inotify, dnotify, and fanotify all in 64 bits. We used that
much space just in inotify_watches alone, before this patch set.
fsnotify object lifetime and locking is MUCH better than what we have today.
inotify locking is incredibly complex. See 8f7b0ba1c8 as an example of
what's been busted since inception. inotify needs to know internal semantics
of superblock destruction and unmounting to function. The inode pinning and
vfs contortions are horrible.
no fsnotify implementers do allocation under locks. This means things like
f04b30de3 which (due to an overabundance of caution) changes GFP_KERNEL to
GFP_NOFS can be reverted. There are no longer any allocation rules when using
or implementing your own fsnotify listener.
fsnotify paves the way for fanotify. In brief fanotify is a notification
mechanism that delivers the lisener both an 'event' and an open file descriptor
to the object in question. This means that fanotify is pathname agnostic.
Some on lkml may not care for the original companies or users that pushed for
TALPA, but fanotify was designed with flexibility and input for other users in
mind. The readahead group expressed interest in fanotify as it could be used
to profile disk access on boot without breaking the audit system. The desktop
search groups have also expressed interest in fanotify as it solves a number
of the race conditions and problems present with managing inotify when more
than a limited number of specific files are of interest. fanotify can provide
for a userspace access control system which makes it a clean interface for AV
vendors to hook without trying to do binary patching on the syscall table,
LSM, and everywhere else they do their things today. With this patch series
fanotify can be implemented in less than 1200 lines of easy to review code.
Almost all of which is the socket based user interface.
This patch series builds fsnotify to the point that it can implement
dnotify and inotify_user. Patches exist and will be sent soon after
acceptance to finish the in kernel inotify conversion (audit) and implement
fanotify.
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
SH4's BUG() seems to confuse the compiler as it is considered to
return; thus, some functions would trigger usage of uninitialized
variables or non-void functions returning void.
Work around by initializing/returning.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
slow_work_thread() sleeps on slow_work_thread_wq without WQ_FLAG_EXCLUSIVE,
this means that slow_work_enqueue()->__wake_up(nr_exclusive => 1) wakes up all
kslowd threads. This is not what we want, so we change slow_work_thread() to
use prepare_to_wait_exclusive() instead.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
[libata] ata_piix: Enable parallel scan
sata_nv: use hardreset only for post-boot probing
[libata] ahci: Restore SB600 SATA controller 64 bit DMA
ata_piix: Remove stale comment
ata_piix: Turn on hotplugging support for older chips
ahci: misc cleanups for EM stuff
[libata] get rid of ATA_MAX_QUEUE loop in ata_qc_complete_multiple() v2
sata_sil: enable 32-bit PIO
sata_sx4: speed up ECC initialization
libata-sff: avoid byte swapping in ata_sff_data_xfer()
[libata] ahci: use less error-prone array initializers
The IBS implemention writes 64 bit register values to the cpu buffer
by writing two 32 values using oprofile_add_data(). This patch
introduces oprofile_add_data64() to write a single 64 bit value to the
buffer.
Signed-off-by: Robert Richter <robert.richter@amd.com>
The IBS code internally uses 32 bit values (a low and a high value) to
represent a 64 bit value. This patch changes this and now 64 bit
values are used instead. 64 bit MSR functions can be used now.
No functional changes.
Signed-off-by: Robert Richter <robert.richter@amd.com>
This patch removes struct op_saved_msr and replaces it by an u64
variable. This makes code easier and it is possible to use 64 bit MSR
functions.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Caused by an API update. The return value can be safely ignored, as
there is notthing we can do with it.
Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
* 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits)
block: add request clone interface (v2)
floppy: fix hibernation
ramdisk: remove long-deprecated "ramdisk=" boot-time parameter
fs/bio.c: add missing __user annotation
block: prevent possible io_context->refcount overflow
Add serial number support for virtio_blk, V4a
block: Add missing bounce_pfn stacking and fix comments
Revert "block: Fix bounce limit setting in DM"
cciss: decode unit attention in SCSI error handling code
cciss: Remove no longer needed sendcmd reject processing code
cciss: change SCSI error handling routines to work with interrupts enabled.
cciss: separate error processing and command retrying code in sendcmd_withirq_core()
cciss: factor out fix target status processing code from sendcmd functions
cciss: simplify interface of sendcmd() and sendcmd_withirq()
cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code
cciss: Use schedule_timeout_uninterruptible in SCSI error handling code
block: needs to set the residual length of a bidi request
Revert "block: implement blkdev_readpages"
block: Fix bounce limit setting in DM
Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt
...
Manually fix conflicts with tracing updates in:
block/blk-sysfs.c
drivers/ide/ide-atapi.c
drivers/ide/ide-cd.c
drivers/ide/ide-floppy.c
drivers/ide/ide-tape.c
include/trace/events/block.h
kernel/trace/blktrace.c
Ftrace on sh handles nop'ing out trace function calls differently than
other architectures. Instead of inserting NOP instructions in place of
the call to the function tracer we branch over the call instructions
and continue executing the main body of the function.
This patch fixes a bug in the implementation of ftrace_modify_code()
where we check that the old value of the code we're about to replace
is an expected one. In the ftrace_make_call() code path
ftrace_modify_code() was comparing the old instruction value with NOP
instructions. The compare was failing because we never actually insert
NOP instructions. It makes sense to just get rid of the NOP
instructions in ftrace_nop and compare the old code with the address
of the function body if we're expecting ftrace to have nop'd out the
function trace call.
Signed-off-by: Matt Fleming <matt@console-pimps.org>
The patch replaces all CTRL_SET_*ACTIVE macros. 64 bit MSR functions
and 64 bit counter values are used now. The code uses bit masks from
<asm/intel_arch_perfmon.h>.
Signed-off-by: Robert Richter <robert.richter@amd.com>
The patch replaces all CTR_OVERFLOWED macros. 64 bit MSR functions and
64 bit counter values are used now. Thus, it will be easier to later
extend the models to use more than 32 bit width counters.
Signed-off-by: Robert Richter <robert.richter@amd.com>
This patch introduces op_x86_get_ctrl() to calculate the value of the
performance control register. This is generic code usable for all
models. The event and reserved masks are model specific and stored in
struct op_x86_model_spec. 64 bit MSR functions are used now. The patch
removes many hard to read macros used for ctrl calculation.
The function op_x86_get_ctrl() is common code and the first step to
further merge performance counter implementations for x86 models.
Signed-off-by: Robert Richter <robert.richter@amd.com>
In follow-on patches the setup_ctrs() functions will need data that
describes the model. This patch extends the function argument list to
pass a pointer of the model to these function.
Signed-off-by: Robert Richter <robert.richter@amd.com>
The use of the macros has no effect. The oprofilefs has to be extended
first to support these features.
Signed-off-by: Robert Richter <robert.richter@amd.com>
This patch fixes missing braces around macro parameters. Macro
definitions from intel_arch_perfmon.h are used where possible.
Signed-off-by: Robert Richter <robert.richter@amd.com>
The macros CTRL_READ() and CTRL_WRITE() make the code hard to read and
maintain. This patch replaces them by rdmsr()/wrmsr() functions and
simplifies the code.
Signed-off-by: Robert Richter <robert.richter@amd.com>
The macros CTRL_READ() and CTRL_WRITE() make the code hard to read and
maintain. This patch replaces them by rdmsr()/wrmsr() functions and
simplifies the code.
Signed-off-by: Robert Richter <robert.richter@amd.com>
The macros CTRL_READ() and CTRL_WRITE() make the code hard to read and
maintain. This patch replaces them by rdmsr()/wrmsr() functions and
simplifies the code.
Signed-off-by: Robert Richter <robert.richter@amd.com>
There are duplicate macro implementations in model specific code. This
patch moves all common macros to op_x86_model.h.
Signed-off-by: Robert Richter <robert.richter@amd.com>
The end brace for a larger for statement was placed inside the #else
part of #ifdef TIMER_VECT1. However, for all current chips, the
define TIMER_VECT1 is always unset, and the error was never triggered.
Move the brace down below the #endif.
Fixes:
http://bugzilla.kernel.org/show_bug.cgi?id=13476
Reported-by: Martin Ettl <ettl.martin@gmx.de>
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Mikael Starvik <mikael.starvik@axis.com>
The code didn't test the pointer to the newly allocated
memory, but a parameter sent in as value.
Since the input parameter was most often set, the code
would have used a null pointer if the kmalloc failed.
If the input parameter was not set, the code would
leak the allocated buffer.
http://bugzilla.kernel.org/show_bug.cgi?id=11363
Reported-by: Daniel Marjamäki <danielm77@spray.se>
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
This fix triggering the WARN_ON_ONCE(in_irq() || irqs_disabled()); in
local_bh_enable().
Here is no need to grab this lock, this was wrong at all and may
cause a deadlock and access to freed memory, since on a TEI remove
the current listelement can be deleted under us. So this is clearly
a case for list_for_each_entry_safe.
Signed-off-by: Karsten Keil <keil@b1-systems.de>
If we get no interrupts for after 3 resets we need to unregister
the interrupt function, which is already done outside the loop.
Signed-off-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Karsten Keil <keil@b1-systems.de>
* 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (138 commits)
KVM: Prevent overflow in largepages calculation
KVM: Disable large pages on misaligned memory slots
KVM: Add VT-x machine check support
KVM: VMX: Rename rmode.active to rmode.vm86_active
KVM: Move "exit due to NMI" handling into vmx_complete_interrupts()
KVM: Disable CR8 intercept if tpr patching is active
KVM: Do not migrate pending software interrupts.
KVM: inject NMI after IRET from a previous NMI, not before.
KVM: Always request IRQ/NMI window if an interrupt is pending
KVM: Do not re-execute INTn instruction.
KVM: skip_emulated_instruction() decode instruction if size is not known
KVM: Remove irq_pending bitmap
KVM: Do not allow interrupt injection from userspace if there is a pending event.
KVM: Unprotect a page if #PF happens during NMI injection.
KVM: s390: Verify memory in kvm run
KVM: s390: Sanity check on validity intercept
KVM: s390: Unlink vcpu on destroy - v2
KVM: s390: optimize float int lock: spin_lock_bh --> spin_lock
KVM: s390: use hrtimer for clock wakeup from idle - v2
KVM: s390: Fix memory slot versus run - v3
...
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: remove never-used in6_addr option
cifs: add addr= mount option alias for ip=
[CIFS] Add mention of new mount parm (forceuid) to cifs readme
cifs: make overriding of ownership conditional on new mount options
cifs: fix IPv6 address length check
cifs: clean up set_cifs_acl interfaces
cifs: reorganize get_cifs_acl
[CIFS] Update readme to indicate change to default mount (serverino)
cifs: make serverino the default when mounting
cifs: rename cifs_iget to cifs_root_iget
cifs: make cnvrtDosUnixTm take a little-endian args and an offset
cifs: have cifs_NTtimeToUnix take a little-endian arg
cifs: tighten up default file_mode/dir_mode
cifs: fix artificial limit on reading symlinks
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (49 commits)
ext4: Avoid corrupting the uninitialized bit in the extent during truncate
ext4: Don't treat a truncation of a zero-length file as replace-via-truncate
ext4: fix dx_map_entry to support 256k directory blocks
ext4: truncate the file properly if we fail to copy data from userspace
ext4: Avoid leaking blocks after a block allocation failure
ext4: Change all super.c messages to print the device
ext4: Get rid of EXTEND_DISKSIZE flag of ext4_get_blocks_handle()
ext4: super.c whitespace cleanup
jbd2: Fix minor typos in comments in fs/jbd2/journal.c
ext4: Clean up calls to ext4_get_group_desc()
ext4: remove unused function __ext4_write_dirty_metadata
ext2: Fix memory leak in ext2_fill_super() in case of a failed mount
ext3: Fix memory leak in ext3_fill_super() in case of a failed mount
ext4: Fix memory leak in ext4_fill_super() in case of a failed mount
ext4: down i_data_sem only for read when walking tree for fiemap
ext4: Add a comprehensive block validity check to ext4_get_blocks()
ext4: Clean up ext4_get_blocks() so it does not depend on bh_result->b_state
ext4: Merge ext4_da_get_block_write() into mpage_da_map_blocks()
ext4: Add BUG_ON debugging checks to noalloc_get_block_write()
ext4: Add documentation to the ext4_*get_block* functions
...
* 'for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (28 commits)
ide-tape: fix debug call
alim15x3: Remove historical hacks, re-enable init_hwif for PowerPC
ide-dma: don't reset request fields on dma_timeout_retry()
ide: drop rq->data handling from ide_map_sg()
ide-atapi: kill unused fields and callbacks
ide-tape: simplify read/write functions
ide-tape: use byte size instead of sectors on rw issue functions
ide-tape: unify r/w init paths
ide-tape: kill idetape_bh
ide-tape: use standard data transfer mechanism
ide-tape: use single continuous buffer
ide-atapi,tape,floppy: allow ->pc_callback() to change rq->data_len
ide-tape,floppy: fix failed command completion after request sense
ide-pm: don't abuse rq->data
ide-cd,atapi: use bio for internal commands
ide-atapi: convert ide-{floppy,tape} to using preallocated sense buffer
ide-cd: convert to using generic sense request
ide: add helpers for preparing sense requests
ide-cd: don't abuse rq->buffer
ide-atapi: don't abuse rq->buffer
...
Slab is initialized before the console subsystem so use the slab allocator in
vgacon_scrollback_startup().
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Don't hardcode to node zero for early boot IRQ setup memory allocations.
[ penberg@cs.helsinki.fi: minor cleanups ]
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Now that kmem_cache_init() happens before console_init(), we should use
kzalloc() and not the bootmem allocator.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>