Commit Graph

117948 Commits

Author SHA1 Message Date
Cornelia Huck
d45387d8bc [S390] cio: Update cio_ignore documentation.
Add documentation for the new "purge" cio_ignore parameter.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-10-10 21:33:48 +02:00
Peter Oberparleiter
ecf5d9ef68 [S390] cio: introduce purge function for /proc/cio_ignore
Allow users to remove blacklisted ccw devices by using the
/proc/cio_ignore interface:

  echo purge > /proc/cio_ignore

will remove all devices which are offline and blacklisted.

Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-10-10 21:33:47 +02:00
Peter Oberparleiter
46fbe4e46d [S390] cio: move device unregistration to dedicated work queue
Use dedicated slow path work queue when unregistering a device due to
a user action. This ensures serialialization of other register/
unregister requests.

Signed-off-by: Peter Oberparleiter <peter.oberparleiter@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-10-10 21:33:47 +02:00
Ursula Braun
4bcb3a3718 [S390] qdio: speed up multicast traffic on full HiperSocket queue
If an asynchronous HiperSockets queue runs full, no further packet
can be sent. In this case the next initiative to give transmitted
skbs back to the stack is triggered only by a 10-seconds qdio timer.
This timer has been introduced for low multicast traffic scenarios
to guarantee freeing of skbs in a limited amount of time. For high
HiperSocket multicast traffic scenarios progress checking on the
outbound queue should be enforced by tasklet rescheduling.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-10-10 21:33:46 +02:00
Jan Beulich
fcea94ba07 ACPI: fix FADT parsing
The (1.0 inherited) separate length fields in the FADT are byte granular.
Further, PM1a/b may have distinct lengths (if using the v2 fields was
okay) and may live in distinct address spaces.  acpi_tb_convert_fadt()
should account for all of these conditions.

Apart from these changes I'm puzzled by the fact that, not just for
acpi_gbl_xpm1{a,b}_enable, acpi_hw_low_level_{read,write}() get an
explicit size passed rather than using the size found in the passed GAS.
What happens on a platform that defines PM1{a,b} wider than 16 bits?  Of
course, acpi_hw_low_level_{read,write}() at present are entirely
un-prepared to deal with sizes other than 8, 16, or 32, not to speak of a
non-zero bit_offset or access_width...

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2008-10-10 15:31:48 -04:00
Luis R. Rodriguez
d2a3b222cf ath9k: Fix return code when ath9k_hw_setpower() fails on reset
We were not reporting a status code back ath9k_hw_setpower() failed
during reset so lets correct this.

Reported-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-10 12:26:24 -07:00
Luis R. Rodriguez
1cf69cfbe1 ath9k: remove nasty FAIL macro from ath9k_hw_reset()
This is fucking horribe crap code so nuke it. There I cursed too in a commit log.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-10 12:25:45 -07:00
Tom Talpey
c055551e97 RPC/RDMA: ensure connection attempt is complete before signalling.
The RPC/RDMA connection logic could return early from reconnection
attempts, leading to additional spurious retries.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:15:06 -04:00
Tom Talpey
08ca0dce1e RPC/RDMA: correct the reconnect timer backoff
The RPC/RDMA code had a constant 5-second reconnect backoff, and
always performed it, even when re-establishing a connection to a
server after the RPC layer closed it due to being idle. Make it
an geometric backoff (up to 30 seconds), and don't delay idle
reconnect.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:15:02 -04:00
Tom Talpey
b3cd8d45a7 RPC/RDMA: optionally emit useful transport info upon connect/disconnect.
Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:13:59 -04:00
Tom Talpey
5f37d561e0 RPC/RDMA: reformat a debug printk to keep lines together.
The send marshaling code split a particular dprintk across two
lines, which makes it hard to extract from logfiles.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:13:42 -04:00
Tom Talpey
5675add36e RPC/RDMA: harden connection logic against missing/late rdma_cm upcalls.
Add defensive timeouts to wait_for_completion() calls in RDMA
address resolution, and make them interruptible. Fix the timeout
units to milliseconds (formerly jiffies) and move to private header.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:13:31 -04:00
Robert Reif
4dd95b63ae leo: disable cursor when leaving graphics mode
On sparc32 debian etch, the cursor is not disabled
when leaving X windows.  This patch disables the
cursor when leaving graphics mode just like the
ffb driver.

Signed-off-by: Robert Reif <reif@earthlink.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-10 12:13:22 -07:00
Tom Talpey
1a954051b0 RPC/RDMA: fix connect/reconnect resource leak.
The RPC/RDMA code can leak RDMA connection manager endpoints in
certain error cases on connect. Don't signal unwanted events,
and be certain to destroy any allocated qp.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:13:02 -04:00
Tom Talpey
926449ba66 RPC/RDMA: return a consistent error, when connect fails.
The xprt_connect call path does not expect such errors as ECONNREFUSED
to be returned from failed transport connection attempts, otherwise it
translates them to EIO and signals fatal errors. For example, mount.nfs
prints simply "internal error". Translate all such errors to ENOTCONN
from RPC/RDMA to match sockets behavior.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:12:44 -04:00
Robert Reif
291d5f3039 cg6: disable cursor when leaving graphics mode
On sparc32 debian etch, the cursor is not disabled
when leaving X windows.  This patch disables the
cursor when leaving graphics mode just like the
ffb driver.

Signed-off-by: Robert Reif <reif@earthlink.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-10 12:12:41 -07:00
Tom Talpey
9191ca3b38 RPC/RDMA: adhere to protocol for unpadded client trailing write chunks.
The RPC/RDMA protocol allows clients and servers to avoid RDMA
operations for data which is purely the result of XDR padding.
On the client, automatically insert the necessary padding for
such server replies, and optionally don't marshal such chunks.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:12:33 -04:00
Tom Talpey
fee08caf94 RPC/RDMA: avoid an oops due to disconnect racing with async upcalls.
RDMA disconnects yield an upcall from the RDMA connection manager,
which can race with rpc transport close, e.g. on ^C of a mount.
Ensure any rdma cm_id and qp are fully destroyed before continuing.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:11:40 -04:00
Patrick McHardy
4d74f8ba1f gre: minor cleanups in netlink interface
- use typeful helpers for IFLA_GRE_LOCAL/IFLA_GRE_REMOTE
- replace magic value by FIELD_SIZEOF
- use MODULE_ALIAS_RTNL_LINK macro

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-10 12:11:06 -07:00
Tom Talpey
ad0e9e01da RPC/RDMA: maintain the RPC task bytes-sent statistic.
Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:10:40 -04:00
Tom Talpey
575448bd36 RPC/RDMA: suppress retransmit on RPC/RDMA clients.
An RPC/RDMA client cannot retransmit on an unbroken connection,
doing so violates its flow control with the server.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:10:36 -04:00
Patrick McHardy
ba9e64b1c2 gre: fix copy and paste error
The flags are dumped twice, the keys not at all.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-10 12:10:30 -07:00
Tom Tucker
b334eaabf4 RPC/RDMA: fix connection IRD/ORD setting
This logic sets the connection parameter that configures the local device
and informs the remote peer how many concurrent incoming RDMA_READ
requests are supported. The original logic didn't really do what was
intended for two reasons:

- The max number supported by the device is typically smaller than
any one factor in the calculation used, and

- The field in the connection parameter structure where the value is
stored is a u8 and always overflows for the default settings.

So what really happens is the value requested for responder resources
is the left over 8 bits from the "desired value". If the desired value
happened to be a multiple of 256, the result was zero and it wouldn't
connect at all.

Given the above and the fact that max_requests is almost always larger
than the max responder resources supported by the adapter, this patch
simplifies this logic and simply requests the max supported by the device,
subject to a reasonable limit.

This bug was found by Jim Schutt at Sandia.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:09:56 -04:00
Tom Talpey
3197d309f5 RPC/RDMA: support FRMR client memory registration.
Configure, detect and use "fastreg" support from IB/iWARP verbs
layer to perform RPC/RDMA memory registration.

Make FRMR the default memreg mode (will fall back if not supported
by the selected RDMA adapter).

This allows full and optimal operation over the cxgb3 adapter, and others.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Acked-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:09:34 -04:00
Tom Talpey
bd7ed1d133 RPC/RDMA: check selected memory registration mode at runtime.
At transport creation, check for, and use, any local dma lkey.
Then, check that the selected memory registration mode is in fact
supported by the RDMA adapter selected for the mount. Fall back
to best alternative if not.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Acked-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:09:26 -04:00
Tom Talpey
fe9053b30b RPC/RDMA: add data types and new FRMR memory registration enum.
Internal RPC/RDMA structure updates in preparation for FRMR support.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Acked-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:09:24 -04:00
Tom Talpey
8d4ba0347c RPC/RDMA: refactor the inline memory registration code.
Refactor the memory registration and deregistration routines.
This saves stack space, makes the code more readable and prepares
to add the new FRMR registration methods.

Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 15:09:16 -04:00
Yevgeny Petrilin
a3cdcbfa8f mlx4_core: Add QP range reservation support
To allow allocating an aligned range of consecutive QP numbers, add an
interface to reserve an aligned range of QP numbers and have the QP
allocation function always take a QP number.

This will be used for RSS support in the mlx4_en Ethernet driver and
also potentially by IPoIB RSS support.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-10-10 12:01:37 -07:00
Julien Brunel
6aea938f54 RDMA/ucma: Test ucma_alloc_multicast() return against NULL, not with IS_ERR()
In case of error, the function ucma_alloc_multicast() returns a NULL
pointer, but never returns an ERR pointer.  So after a call to this
function, an IS_ERR test should be replaced by a NULL test.

The semantic match that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@match bad_is_err_test@
expression x, E;
@@

x = ucma_alloc_multicast(...)
... when != x = E
IS_ERR(x)
// </smpl>

Signed-off-by:  Julien Brunel <brunel@diku.dk>
Signed-off-by:  Julia Lawall <julia@diku.dk>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-10-10 12:00:19 -07:00
Chuck Lever
5e2e7721f0 NFS: fix nfs_parse_ip_address() corner case
Bruce observed that nfs_parse_ip_address() will successfully parse an
IPv6 address that looks like this:

  "::1%"

A scope delimiter is present, but there is no scope ID following it.
This is harmless, as it would simply set the scope ID to zero.  However,
in some cases we would like to flag this as an improperly formed
address.

We are now also careful to reject addresses where garbage follows the
address (up to the length of the string), instead of ignoring the
non-address characters; and where the scope ID is nonsense (not a valid
device name, but also not numeric).  Before, both of these cases would
result in a harmless zero scope ID.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 14:41:51 -04:00
J. Bruce Fields
456018d791 NFS: Cleanup nfs_set_port
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 14:41:50 -04:00
Linus Torvalds
f6bccf6954 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: skcipher - Use RNG interface instead of get_random_bytes
  crypto: rng - RNG interface and implementation
  crypto: api - Add fips_enable flag
  crypto: skcipher - Move IV generators into their own modules
  crypto: cryptomgr - Test ciphers using ECB
  crypto: api - Use test infrastructure
  crypto: cryptomgr - Add test infrastructure
  crypto: tcrypt - Add alg_test interface
  crypto: tcrypt - Abort and only log if there is an error
  crypto: crc32c - Use Intel CRC32 instruction
  crypto: tcrypt - Avoid using contiguous pages
  crypto: api - Display larval objects properly
  crypto: api - Export crypto_alg_lookup instead of __crypto_alg_lookup
  crypto: Kconfig - Replace leading spaces with tabs
2008-10-10 11:20:42 -07:00
Linus Torvalds
3af73d392c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (29 commits)
  RDMA/nes: Fix slab corruption
  IB/mlx4: Set RLKEY bit for kernel QPs
  RDMA/nes: Correct error_module bit mask
  RDMA/nes: Fix routed RDMA connections
  RDMA/nes: Enhanced PFT management scheme
  RDMA/nes: Handle AE bounds violation
  RDMA/nes: Limit critical error interrupts
  RDMA/nes: Stop spurious MAC interrupts
  RDMA/nes: Correct tso_wqe_length
  RDMA/nes: Fill in firmware version for ethtool
  RDMA/nes: Use ethtool timer value
  RDMA/nes: Correct MAX TSO frags value
  RDMA/nes: Enable MC/UC after changing MTU
  RDMA/nes: Free NIC TX buffers when destroying NIC QP
  RDMA/nes: Fix MDC setting
  RDMA/nes: Add wqm_quanta module option
  RDMA/nes: Module parameter permissions
  RDMA/cxgb3: Set active_mtu in ib_port_attr
  RDMA/nes: Add support for 4-port 1G HP blade card
  RDMA/nes: Make mini_cm_connect() static
  ...
2008-10-10 11:16:33 -07:00
Linus Torvalds
13dd7f876d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
  dlm: choose better identifiers
  dlm: remove bkl
  dlm: fix address compare
  dlm: fix locking of lockspace list in dlm_scand
  dlm: detect available userspace daemon
  dlm: allow multiple lockspace creates
2008-10-10 11:13:55 -07:00
Linus Torvalds
b0af205afb Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
  dm: detect lost queue
  dm: publish dm_vcalloc
  dm: publish dm_table_unplug_all
  dm: publish dm_get_mapinfo
  dm: export struct dm_dev
  dm crypt: avoid unnecessary wait when splitting bio
  dm crypt: tidy ctx pending
  dm crypt: fix async inc_pending
  dm crypt: move dec_pending on error into write_io_submit
  dm crypt: remove inc_pending from write_io_submit
  dm crypt: tidy write loop pending
  dm crypt: tidy crypt alloc
  dm crypt: tidy inc pending
  dm exception store: use chunk_t for_areas
  dm exception store: introduce area_location function
  dm raid1: kcopyd should stop on error if errors handled
  dm mpath: remove is_active from struct dm_path
  dm mpath: use more error codes

Fixed up trivial conflict in drivers/md/dm-mpath.c manually.
2008-10-10 11:11:47 -07:00
Christoph Hellwig
73f6aa4d44 Fix barrier fail detection in XFS
Currently we disable barriers as soon as we get a buffer in xlog_iodone
that has the XBF_ORDERED flag cleared.  But this can be the case not only
for buffers where the barrier failed, but also the first buffer of a
split log write in case of a log wraparound.  Due to the disabled
barriers we can easily get directory corruption on unclean shutdowns.
So instead of using this check add a new buffer flag for failed barrier
writes.

This is a regression vs 2.6.26 caused by patch to use the right macro
to check for the ORDERED flag, as we previously got true returned for
every buffer.

Thanks to Toei Rei for reporting the bug.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@sandeen.net>
Reviewed-by: David Chinner <david@fromorbit.com>
Signed-off-by: Tim Shimmin <tes@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-10 11:08:07 -07:00
Linus Torvalds
445e1ceda3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
  GFS2: Support for I/O barriers
  GFS2: Add UUID to GFS2 sb
  GFS2: high time to take some time over atime
  GFS2: The war on bloat
  GFS2: GFS2 will panic if you misspell any mount options
  GFS2: Direct IO write at end of file error
  GFS2: Use an IS_ERR test rather than a NULL test
  GFS2: Fix race relating to glock min-hold time
  GFS2: Fix & clean up GFS2 rename
  GFS2: rm on multiple nodes causes panic
  GFS2: Fix metafs mounts
  GFS2: Fix debugfs glock file iterator
2008-10-10 11:02:22 -07:00
Linus Torvalds
ef5bef357c Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (37 commits)
  [SCSI] zfcp: fix double dbf id usage
  [SCSI] zfcp: wait on SCSI work to be finished before proceeding with init dev
  [SCSI] zfcp: fix erp list usage without using locks
  [SCSI] zfcp: prevent fc_remote_port_delete calls for unregistered rport
  [SCSI] zfcp: fix deadlock caused by shared work queue tasks
  [SCSI] zfcp: put threshold data in hba trace
  [SCSI] zfcp: Simplify zfcp data structures
  [SCSI] zfcp: Simplify get_adapter_by_busid
  [SCSI] zfcp: remove all typedefs and replace them with standards
  [SCSI] zfcp: attach and release SAN nameserver port on demand
  [SCSI] zfcp: remove unused references, declarations and flags
  [SCSI] zfcp: Update message with input from review
  [SCSI] zfcp: add queue_full sysfs attribute
  [SCSI] scsi_dh: suppress comparison warning
  [SCSI] scsi_dh: add Dell product information into rdac device handler
  [SCSI] qla2xxx: remove the unused SCSI_QLOGIC_FC_FIRMWARE option
  [SCSI] qla2xxx: fix printk format warnings
  [SCSI] qla2xxx: Update version number to 8.02.01-k8.
  [SCSI] qla2xxx: Ignore payload reserved-bits during RSCN processing.
  [SCSI] qla2xxx: Additional residual-count corrections during UNDERRUN handling.
  ...
2008-10-10 10:53:26 -07:00
Linus Torvalds
e26feff647 Merge branch 'for-2.6.28' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.28' of git://git.kernel.dk/linux-2.6-block: (132 commits)
  doc/cdrom: Trvial documentation error, file not present
  block_dev: fix kernel-doc in new functions
  block: add some comments around the bio read-write flags
  block: mark bio_split_pool static
  block: Find bio sector offset given idx and offset
  block: gendisk integrity wrapper
  block: Switch blk_integrity_compare from bdev to gendisk
  block: Fix double put in blk_integrity_unregister
  block: Introduce integrity data ownership flag
  block: revert part of d7533ad0e132f92e75c1b2eb7c26387b25a583c1
  bio.h: Remove unused conditional code
  block: remove end_{queued|dequeued}_request()
  block: change elevator to use __blk_end_request()
  gdrom: change to use __blk_end_request()
  memstick: change to use __blk_end_request()
  virtio_blk: change to use __blk_end_request()
  blktrace: use BLKTRACE_BDEV_SIZE as the name size for setup structure
  block: add lld busy state exporting interface
  block: Fix blk_start_queueing() to not kick a stopped queue
  include blktrace_api.h in headers_install
  ...
2008-10-10 10:52:45 -07:00
Ingo Molnar
725c25819e Merge branches 'core/iommu', 'x86/amd-iommu' and 'x86/iommu' into x86-v28-for-linus-phase3-B
Conflicts:
	arch/x86/kernel/pci-gart_64.c
	include/asm-x86/dma-mapping.h
2008-10-10 19:47:12 +02:00
Ingo Molnar
3dd392a407 Merge branch 'linus' into x86/pat2
Conflicts:
	arch/x86/mm/init_64.c
2008-10-10 19:30:08 +02:00
Suresh Siddha
b27a43c1e9 x86, cpa: make the kernel physical mapping initialization a two pass sequence, fix
Jeremy Fitzhardinge wrote:

> I'd noticed that current tip/master hasn't been booting under Xen, and I
> just got around to bisecting it down to this change.
>
> commit 065ae73c5462d42e9761afb76f2b52965ff45bd6
> Author: Suresh Siddha <suresh.b.siddha@intel.com>
>
>    x86, cpa: make the kernel physical mapping initialization a two pass sequence
>
> This patch is causing Xen to fail various pagetable updates because it
> ends up remapping pagetables to RW, which Xen explicitly prohibits (as
> that would allow guests to make arbitrary changes to pagetables, rather
> than have them mediated by the hypervisor).

Instead of making init a two pass sequence, to satisfy the Intel's TLB
Application note (developer.intel.com/design/processor/applnots/317080.pdf
Section 6 page 26), we preserve the original page permissions
when fragmenting the large mappings and don't touch the existing memory
mapping (which satisfies Xen's requirements).

Only open issue is: on a native linux kernel, we will go back to mapping
the first 0-1GB kernel identity mapping as executable (because of the
static mapping setup in head_64.S). We can fix this in a different
patch if needed.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:21 +02:00
Ingo Molnar
ad2cde16a2 x86, pat: cleanups
clean up recently added code to be more consistent with other x86 code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:20 +02:00
Suresh Siddha
28dd033f43 x86: fix pagetable init 64-bit breakage
Fix _end alignment check - can trigger a crash if _end happens to be
on a page boundary.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:20 +02:00
Suresh Siddha
9542ada803 x86: track memtype for RAM in page struct
Track the memtype for RAM pages in page struct instead of using the
memtype list. This avoids the explosion in the number of entries in
memtype list (of the order of 20,000 with AGP) and makes the PAT
tracking simpler.

We are using PG_arch_1 bit in page->flags.

We still use the memtype list for non RAM pages.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:18 +02:00
Suresh Siddha
ad5ca55f6b x86, cpa: srlz cpa(), global flush tlb after splitting big page and before doing cpa
Do a global flush tlb after splitting the large page and before we do the
actual change page attribute in the PTE.

With out this, we violate the TLB application note, which says
    "The TLBs may contain both ordinary and large-page translations for
     a 4-KByte range of linear addresses. This may occur if software
     modifies the paging structures so that the page size used for the
     address range changes. If the two translations differ with respect
     to page frame or attributes (e.g., permissions), processor behavior
     is undefined and may be implementation-specific."

And also serialize cpa() (for !DEBUG_PAGEALLOC which uses large identity
mappings) using cpa_lock. So that we don't allow any other cpu, with stale
large tlb entries change the page attribute in parallel to some other cpu
splitting a large page entry along with changing the attribute.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:17 +02:00
Suresh Siddha
8311eb84bf x86, cpa: remove cpa pool code
Interrupt context no longer splits large page in cpa(). So we can do away
with cpa memory pool code.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:16 +02:00
Suresh Siddha
55121b4369 x86, cpa: no need to check alias for __set_pages_p/__set_pages_np
No alias checking needed for setting present/not-present mapping. Otherwise,
we may need to break large pages for 64-bit kernel text mappings (this adds to
complexity if we want to do this from atomic context especially, for ex:
with CONFIG_DEBUG_PAGEALLOC). Let's keep it simple!

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:15 +02:00
Suresh Siddha
0b8fdcbcd2 x86, cpa: dont use large pages for kernel identity mapping with DEBUG_PAGEALLOC
Don't use large pages for kernel identity mapping with DEBUG_PAGEALLOC.
This will remove the need to split the large page for the
allocated kernel page in the interrupt context.

This will simplify cpa code(as we don't do the split any more from the
interrupt context). cpa code simplication in the subsequent patches.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:14 +02:00
Suresh Siddha
a2699e477b x86, cpa: make the kernel physical mapping initialization a two pass sequence
In the first pass, kernel physical mapping will be setup using large or
small pages but uses the same PTE attributes as that of the early
PTE attributes setup by early boot code in head_[32|64].S

After flushing TLB's, we go through the second pass, which setups the
direct mapped PTE's with the appropriate attributes (like NX, GLOBAL etc)
which are runtime detectable.

This two pass mechanism conforms to the TLB app note which says:

"Software should not write to a paging-structure entry in a way that would
 change, for any linear address, both the page size and either the page frame
 or attributes."

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: arjan@linux.intel.com
Cc: venkatesh.pallipadi@intel.com
Cc: jeremy@goop.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-10 19:29:13 +02:00