The KEBA CP500 system FPGA is a PCIe device, which consists of multiple
IP cores. Every IP core has its own auxiliary driver. The cp500 driver
registers an auxiliary device for each device and the corresponding
drivers are loaded by the Linux driver infrastructure.
Currently 3 variants of this device exists. Every variant has its own
PCI device ID, which is used to determine the list of available IP
cores. In this first version only the auxiliary device for the I2C
controller is registered.
Besides the auxiliary device registration some other basic functions of
the FPGA are implemented; e.g, FPGA version sysfs file, keep FPGA
configuration on reset sysfs file, error message for errors on the
internal AXI bus of the FPGA.
Signed-off-by: Gerhard Engleder <eg@keba.com>
Link: https://lore.kernel.org/r/20240630194740.7137-2-gerhard@engleder-embedded.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
DDR4 DIMMS implementing SPD Annex L, Revision 1 do not implement SPD byte
14 (Module Temperature Sensor); this byte was only added in revision 2 of
the standard. This only applies to DDR4, not DDR4E or LPDDR4, since those
DDR types were only introduced in revision 3 of the standard.
Use this information to instantiate the jc42 device if the module is a DDR4
following SPD revision 1.0 and a device is detected at the expected thermal
sensor address, even if the Module Temperature Sensor byte suggests that
the thermal sensor is not supported.
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20240629173716.20389-2-linux@roeck-us.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Instantiating a device by calling i2c_new_client_device() assumes that the
device is not already instantiated. If that is not the case, it will return
an error and generate a misleading kernel log message.
i2c i2c-0: Failed to register i2c client jc42 at 0x18 (-16)
This can be reproduced by unloading the ee1004 driver and loading it again.
Avoid this by calling i2c_new_scanned_device() instead, which returns
silently if a device is already instantiated or does not exist.
Fixes: 393bd1000f ("eeprom: ee1004: add support for temperature sensor")
Cc: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20240629173716.20389-1-linux@roeck-us.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently the driver does not register a nvmem provider, which means
that userspace programs cannot access the ee1004 EEPROM through the
standard nvmem sysfs API.
Fix this by replacing the custom sysfs attribute with a standard nvmem
interface, which also takes care of backwards compatibility.
Tested on a Dell Inspiron 3505.
Signed-off-by: Armin Wolf <W_Armin@gmx.de>
Link: https://lore.kernel.org/r/20240625063459.429953-2-W_Armin@gmx.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The ata_sas_port_alloc() wrapper mainly exists in order to export the
internal libata function which it wraps. The secondary reason is that
it initializes some ata_port struct members.
However, ata_sas_port_alloc() is only used in a single location,
sas_ata_init(), which already performs some ata_port struct member
initialization, so it does not make sense to spread this initialization
out over two separate locations.
Thus, remove the wrapper and instead export the libata function directly,
and move the libsas specific ata_port initialization to sas_ata_init(),
which already does some ata_port initialization.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240703184418.723066-19-cassel@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Currently, the ata_port print_ids are increased indefinitely, even when
there are lower ids available.
E.g. on first boot you will have ata1-ata6 assigned.
After a rmmod + modprobe, you will instead have ata7-ata12 assigned.
Move to use the ida_alloc() API, such that print_ids will get reused.
This means that even after a rmmod + modprobe, the ports will be assigned
print_ids ata1-ata6.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240703184418.723066-18-cassel@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
While the assignment of ap->print_id could have been moved to
ata_host_alloc(), let's simply move it to ata_port_alloc().
If you allocate a port, you want to give it a unique name that can be used
for printing.
By moving the ap->print_id assignment to ata_port_alloc(), means that we
can also remove the ap->print_id assignment from ata_sas_port_alloc().
This will allow a LLD to use the ata_port_*() print functions before
ata_host_register() has been called.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240703184418.723066-17-cassel@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Commit f31871951b ("libata: separate out ata_host_alloc() and
ata_host_register()") added ata_host_alloc(), where the API allowed
a LLD to overallocate the number of ports supplied to ata_host_alloc(),
as long as the LLD decreased host->n_ports before calling
ata_host_register().
However, this functionally has never ever been used by a single LLD.
Because of the current API design, the assignment of ap->print_id is
deferred until registration time, which is bad, because that means that
the ata_port_*() print functions cannot be used by a LLD until after
registration time, which means that a LLD is forced to use a print
function that is non-port specific, even for a port specific error.
Remove the support for decreasing the number of ports, such that it will
be possible to assign ap->print_id earlier.
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240703184418.723066-14-cassel@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
The pag in xfs_ag_resv_rmapbt_alloc() is already held when the struct
xfs_btree_cur is initialized in xfs_rmapbt_init_cursor(), so there is no
need to get pag again.
On the other hand, in xfs_rmapbt_free_block(), the similar function
xfs_ag_resv_rmapbt_free() was removed in commit 92a005448f ("xfs: get
rid of unnecessary xfs_perag_{get,put} pairs"), xfs_ag_resv_rmapbt_alloc()
was left because scrub used it, but now scrub has removed it. Therefore,
we could get rid of xfs_ag_resv_rmapbt_alloc() just like the rmap free
block, make the code cleaner.
Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Jonathan writes:
IIO: 2nd set of new device support, features and cleanup for 6.11
The big one here is we finally have Paul Cercueil's (and others)
DMA buffer support for IIO devices enabling high speed zero
copy transfer of data to and from sensors supported by IIO (and for
example USB). This should aid with upstream support of a range of
higher performance ADCs and DACs.
Two merges from other trees
- spi/spi_devm_optimize used for simplification in ad7944.
- dmaengine/topic_dma_vec to enable the DMABUF series.
One feature with impact outside IIO.
- Richer set of dev_err_probe() like helpers to cover ERR_PTR() cases.
New device support
==================
adi,ad7173
- Add support for AD4111, AD4112, AD4114, AD4115 and ADC4116 pseudo
differential ADCs. Major driver rework was needed to enabled these.
adi,ad7944
- Use devm_spi_optimize_message() to avoid a local devm cleanup
callback. This is the example case from the patch set, others will
follow.
mediatek,mt6359-auxadc
- New driver for this ADC IP found in MT6357, MT6358 and MT6359 PMICs.
st,accel
- Add support for the LIS2DS12 accelerometer
ti,ads1119
- New driver for this 16 bit 2-differential or 4-single ended channel
ADC.
Features
========
dt-bindings
- Introduce new common-mode-channel property to help handle pseudo
differential ADCs where we have something that looks like one side
of differential input, but which is only suited for use with a
slow moving reference.
adi,adf4350
- Support use as a clock provider.
iio-hmwon
- Support reading of labels from IIO devices by their consumers and
use this in the hwmon bridge.
Cleanup and minor fixes
=======================
Treewide
- Use regmap_clear_bits() / regmap_set_bits() to simplify open coded
equivalents.
- Use devm_regulator_get_enable_read_voltage() to replace equivalent
opencoded boilerplate. In some cases enabled complete conversion to
devm handling and removal of explicit remove() callbacks.
- Introduce dev_err_ptr_probe() and other variants and make use of
of them in a couple of examples driver cleanups. Will find use in
many more drivers soon.
adi,ad7192
- Introduce local struct device *dev and use dev_err_probe() to give
more readable code.
adi,adi-axi-adc/dac
- Improved consistency of messages using dev_err_probe()
adi,adis
- Split the trigger handling into cases that needed paging and those that
don't resulting in more readable code.
- Use cleanup.h to simplify error paths via scoped cleanup.
- Add adis specific lock helpers and make use of them in a number of drivers.
adi,ad7192
- Update maintainer (Alisa-Dariana Roman)
adi,ad7606
- dt-binding cleanup.
avago,apds9306
- Add a maintainer entry (Subhajit Ghosh)
linear,ltc2309
- Fix a wrong endian type.
st,stm32-dfsdm
- Fix a missing port property in the dt-binding.
st,sensors
- Relax whoami match failure to a warning print rather than probe failure.
This enables fallback compatibles to existing parts from those that don't
necessarily even exit yet.
* tag 'iio-for-6.11b' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (112 commits)
iio: adc: ad7173: Fix uninitialized symbol is_current_chan
iio: adc: Add support for MediaTek MT6357/8/9 Auxiliary ADC
math.h: Add unsigned 8 bits fractional numbers type
dt-bindings: iio: adc: Add MediaTek MT6359 PMIC AUXADC
iio: common: scmi_iio: convert to dev_err_probe()
iio: backend: make use of dev_err_cast_probe()
iio: temperature: ltc2983: convert to dev_err_probe()
dev_printk: add new dev_err_probe() helpers
iio: xilinx-ams: Add labels
iio: adc: ad7944: use devm_spi_optimize_message()
Documentation: iio: Document high-speed DMABUF based API
iio: buffer-dmaengine: Support new DMABUF based userspace API
iio: buffer-dma: Enable support for DMABUFs
iio: core: Add new DMABUF interface infrastructure
MAINTAINERS: Update AD7192 driver maintainer
iio: adc: ad7192: use devm_regulator_get_enable_read_voltage
iio: st_sensors: relax WhoAmI check in st_sensors_verify_id()
MAINTAINERS: Add AVAGO APDS9306
dt-bindings: iio: adc: adi,ad7606: comment and sort the compatible names
dt-bindings: iio: adc: adi,ad7606: add missing datasheet link
...
A couple copy/paste mistakes in the code that selects steering targets
for OADDRM and INSTANCE0 unintentionally clobbered the steering target
for DSS ranges in some cases.
The OADDRM/INSTANCE0 values were also not assigned as intended, although
that mistake wound up being harmless since the desired values for those
specific ranges were '0' which the kzalloc of the GT structure should
have already taken care of implicitly.
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240626210536.1620176-2-matthew.d.roper@intel.com
(cherry picked from commit 4f82ac6102)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
If a queue is already assigned to the hardware, then a newly submitted
job can start straight away without waiting for the tick. However in
this case the devfreq infrastructure isn't notified that the GPU is
busy. By the time the tick happens the job might well have finished and
no time will be accounted for the GPU being busy.
Fix this by recording the GPU as busy directly in queue_run_job() in the
case where there is a CSG assigned and therefore we just ring the
doorbell.
Fixes: de85488138 ("drm/panthor: Add the scheduler logical block")
Signed-off-by: Steven Price <steven.price@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240703155646.80928-1-steven.price@arm.com
Merge MD fixes from Song:
"This PR contains various small fixes by Yu Kuai,
Benjamin Marzinski, Christophe JAILLET, and Yang Li."
* tag 'md-6.11-20240704' of git://git.kernel.org/pub/scm/linux/kernel/git/song/md:
md/raid5: recheck if reshape has finished with device_lock held
md: Don't wait for MD_RECOVERY_NEEDED for HOT_REMOVE_DISK ioctl
md-cluster: Constify struct md_cluster_operations
md: Remove unneeded semicolon
md/raid5: fix spares errors about rcu usage
Unlike T2 Macs with Butterfly keyboard, who have their keyboard backlight
on the USB device the T2 Macs with Magic keyboard have their backlight on
the Touchbar backlight device (05ac:8102).
Support for Butterfly keyboards has already been added in
commit 9018eacbe6 ("HID: apple: Add support for keyboard backlight on
certain T2 Macs.") This patch adds support for the Magic keyboards.
Signed-off-by: Orlando Chamberlain <orlandoch.dev@gmail.com>
Co-developed-by: Aditya Garg <gargaditya08@live.com>
Signed-off-by: Aditya Garg <gargaditya08@live.com>
Link: https://patch.msgid.link/E1D444EA-7FD0-42DA-B198-50B0F03298FB@live.com
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
Currently users of the interrupt simulator don't have any way of being
notified about interrupts from the simulated domain being requested or
released. This causes a problem for one of the users - the GPIO
simulator - which is unable to lock the pins as interrupts.
Define a structure containing callbacks to be executed on various
irq_sim-related events (for now: irq request and release) and provide an
extended function for creating simulated interrupt domains that takes it
and a pointer to custom user data (to be passed to said callbacks) as
arguments.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20240624093934.17089-2-brgl@bgdev.pl
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
The AIL pushing code spends a huge amount of time skipping over
items that are already marked as flushing. It is not uncommon to
see hundreds of thousands of items skipped every second due to inode
clustering marking all the inodes in a cluster as flushing when the
first one is flushed.
However, to discover an item is already flushing and should be
skipped we have to call the iop_push() method for it to try to flush
the item. For inodes (where this matters most), we have to first
check that inode is flushable first.
We can optimise this overhead away by tracking whether the log item
is flushing internally. This allows xfsaild_push() to check the log
item directly for flushing state and immediately skip the log item.
Whilst this doesn't remove the CPU cache misses for loading the log
item, it does avoid the overhead of an indirect function call
and the cache misses involved in accessing inode and
backing cluster buffer structures to determine flushing state. When
trying to flush hundreds of thousands of inodes each second, this
CPU overhead saving adds up quickly.
It's so noticeable that the biggest issue with pushing on the AIL on
fast storage becomes the 10ms back-off wait when we hit enough
pinned buffers to break out of the push loop but not enough for the
AIL pushing to be considered stuck. This limits the xfsaild to about
70% total CPU usage, and on fast storage this isn't enough to keep
the storage 100% busy.
The xfsaild will block on IO submission on slow storage and so is
self throttling - it does not need a backoff in the case where we
are really just breaking out of the walk to submit the IO we have
gathered.
Further with no backoff we don't need to gather huge delwri lists to
mitigate the impact of backoffs, so we can submit IO more frequently
and reduce the time log items spend in flushing state by breaking
out of the item push loop once we've gathered enough IO to batch
submission effectively.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
The grant heads in the log track the space reserved in the log for
running transactions. They do this by tracking how far ahead of the
tail that the reservation has reached, and the units for doing this
are {cycle,bytes} for the reserve head rather than {cycle,blocks}
which are normal used by LSNs.
This is annoyingly complex because we have to split, crack and
combined these tuples for any calculation we do to determine log
space and targets. This is computationally expensive as well as
difficult to do atomically and locklessly, as well as limiting the
size of the log to 2^32 bytes.
Really, though, all the grant heads are tracking is how much space
is currently available for use in the log. We can track this as a
simply byte count - we just don't care what the actual physical
location in the log the head and tail are at, just how much space we
have remaining before the head and tail overlap.
So, convert the grant heads to track the byte reservations that are
active rather than the current (cycle, offset) tuples. This means an
empty log has zero bytes consumed, and a full log is when the
reservations reach the size of the log minus the space consumed by
the AIL.
This greatly simplifies the accounting and checks for whether there
is space available. We no longer need to crack or combine LSNs to
determine how much space the log has left, nor do we need to look at
the head or tail of the log to determine how close to full we are.
There is, however, a complexity that needs to be handled. We know
how much space is being tracked in the AIL now via log->l_tail_space
and the log tickets track active reservations and return the unused
portions to the grant heads when ungranted. Unfortunately, we don't
track the used portion of the grant, so when we transfer log items
from the CIL to the AIL, the space accounted to the grant heads is
transferred to the log tail space. Hence when we move the AIL head
forwards on item insert, we have to remove that space from the grant
heads.
We also remove the xlog_verify_grant_tail() debug function as it is
no longer useful. The check it performs has been racy since delayed
logging was introduced, but now it is clearly only detecting false
positives so remove it.
The result of this substantially simpler accounting algorithm is an
increase in sustained transaction rate from ~1.3 million
transactions/s to ~1.9 million transactions/s with no increase in
CPU usage. We also remove the 32 bit space limitation on the grant
heads, which will allow us to increase the journal size beyond 2GB
in future.
Note that this renames the sysfs files exposing the log grant space
now that the values are exported in bytes. This allows xfstests
to auto-detect the old or new ABI.
[hch: move xlog_grant_sub_space out of line,
update the xlog_grant_{add,sub}_space prototypes,
rename the sysfs files to allow auto-detection in xfstests]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Because we are going to need them soon. API change only, no logic
changes.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Currently we track space used in the log by grant heads.
These store the reserved space as a physical log location and
combine both space reserved for future use with space already used in
the log in a single variable. The amount of space consumed in the
log is then calculated as the distance between the log tail and
the grant head.
The problem with tracking the grant head as a physical location
comes from the fact that it tracks both log cycle count and offset
into the log in bytes in a single 64 bit variable. because the cycle
count on disk is a 32 bit number, this also limits the offset into
the log to 32 bits. ANd because that is in bytes, we are limited to
being able to track only 2GB of log space in the grant head.
Hence to support larger physical logs, we need to track used space
differently in the grant head. We no longer use the grant head for
guiding AIL pushing, so the only thing it is now used for is
determining if we've run out of reservation space via the
calculation in xlog_space_left().
What we really need to do is move the grant heads away from tracking
physical space in the log. The issue here is that space consumed in
the log is not directly tracked by the current mechanism - the
space consumed in the log by grant head reservations gets returned
to the free pool by the tail of the log moving forward. i.e. the
space isn't directly tracked or calculated, but the used grant space
gets "freed" as the physical limits of the log are updated without
actually needing to update the grant heads.
Hence to move away from implicit, zero-update log space tracking we
need to explicitly track the amount of physical space the log
actually consumes separately to the in-memory reservations for
operations that will be committed to the journal. Luckily, we
already track the information we need to calculate this in the AIL
itself.
That is, the space currently consumed by the journal is the maximum
LSN that the AIL has seen minus the current log tail. As we update
both of these items dynamically as the head and tail of the log
moves, we always know exactly how much space the journal consumes.
This means that we also know exactly how much space the currently
active reservations require, and exactly how much free space we have
remaining for new reservations to be made. Most importantly, we know
what these spaces are indepedently of the physical locations of
the head and tail of the log.
Hence by separating out the physical space consumed by the journal,
we can now track reservations in the grant heads purely as a byte
count, and the log can be considered full when the tail space +
reservation space exceeds the size of the log. This means we can use
the full 64 bits of grant head space for reservation space,
completely removing the 32 bit byte count limitation on log size
that they impose.
Hence the first step in this conversion is to track and update the
"log tail space" every time the AIL tail or maximum seen LSN
changes.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
The function is called from a single place, and it isn't just
setting the iclog state to XLOG_STATE_CALLBACK - it can mark iclogs
clean, which moves them to states after CALLBACK. Hence the function
is now badly named, and should just be folded into the caller where
the iclog completion logic makes a whole lot more sense.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
The current implementation of xlog_assign_tail_lsn() assumes that
when the AIL is empty, the log tail matches the LSN of the last
written commit record. This is recorded in xlog_state_set_callback()
as log->l_last_sync_lsn when the iclog state changes to
XLOG_STATE_CALLBACK. This change is then immediately followed by
running the callbacks on the iclog which then insert the log items
into the AIL at the "commit lsn" of that checkpoint.
The AIL tracks log items via the start record LSN of the checkpoint,
not the commit record LSN. This is because we can pipeline multiple
checkpoints, and so the start record of checkpoint N+1 can be
written before the commit record of checkpoint N. i.e:
start N commit N
+-------------+------------+----------------+
start N+1 commit N+1
The tail of the log cannot be moved to the LSN of commit N when all
the items of that checkpoint are written back, because then the
start record for N+1 is no longer in the active portion of the log
and recovery will fail/corrupt the filesystem.
Hence when all the log items in checkpoint N are written back, the
tail of the log most now only move as far forwards as the start LSN
of checkpoint N+1.
Hence we cannot use the maximum start record LSN the AIL sees as a
replacement the pointer to the current head of the on-disk log
records. However, we currently only use the l_last_sync_lsn when the
AIL is empty - when there is no start LSN remaining, the tail of the
log moves to the LSN of the last commit record as this is where
recovery needs to start searching for recoverable records. THe next
checkpoint will have a start record LSN that is higher than
l_last_sync_lsn, and so everything still works correctly when new
checkpoints are written to an otherwise empty log.
l_last_sync_lsn is an atomic variable because it is currently
updated when an iclog with callbacks attached moves to the CALLBACK
state. While we hold the icloglock at this point, we don't hold the
AIL lock. When we assign the log tail, we hold the AIL lock, not the
icloglock because we have to look up the AIL. Hence it is an atomic
variable so it's not bound to a specific lock context.
However, the iclog callbacks are only used for CIL checkpoints. We
don't use callbacks with unmount record writes, so the
l_last_sync_lsn variable only gets updated when we are processing
CIL checkpoint callbacks. And those callbacks run under AIL lock
contexts, not icloglock context. The CIL checkpoint already knows
what the LSN of the iclog the commit record was written to (obtained
when written into the iclog before submission) and so we can update
the l_last_sync_lsn under the AIL lock in this callback. No other
iclog callbacks will run until the currently executing one
completes, and hence we can update the l_last_sync_lsn under the AIL
lock safely.
This means l_last_sync_lsn can move to the AIL as the "ail_head_lsn"
and it can be used to replace the atomic l_last_sync_lsn in the
iclog code. This makes tracking the log tail belong entirely to the
AIL, rather than being smeared across log, iclog and AIL state and
locking.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
Whenever we write an iclog, we call xlog_assign_tail_lsn() to update
the current tail before we write it into the iclog header. This
means we have to take the AIL lock on every iclog write just to
check if the tail of the log has moved.
This doesn't avoid races with log tail updates - the log tail could
move immediately after we assign the tail to the iclog header and
hence by the time the iclog reaches stable storage the tail LSN has
moved forward in memory. Hence the log tail LSN in the iclog header
is really just a point in time snapshot of the current state of the
AIL.
With this in mind, if we simply update the in memory log->l_tail_lsn
every time it changes in the AIL, there is no need to update the in
memory value when we are writing it into an iclog - it will already
be up-to-date in memory and checking the AIL again will not change
this. Hence xlog_state_release_iclog() does not need to check the
AIL to update the tail lsn and can just sample it directly without
needing to take the AIL lock.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>