linux

Author	SHA1	Message	Date
Bjorn Helgaas	a1802c4295	PNP: make resource option structures private to PNP subsystem Nothing outside the PNP subsystem should need access to a device's resource options, so this patch moves the option structure declarations to a private header file. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Rene Herman <rene.herman@gmail.com> Signed-off-by: Len Brown <len.brown@intel.com>	2008-07-16 23:27:06 +02:00
Bjorn Helgaas	08c9f262f2	PNP: define PNP-specific IORESOURCE_IO_* flags alongside IRQ, DMA, MEM PNP previously defined PNP_PORT_FLAG_16BITADDR and PNP_PORT_FLAG_FIXED in a private header file, but put those flags in struct resource.flags fields. Better to make them IORESOURCE_IO_* flags like the existing IRQ, DMA, and MEM flags. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Rene Herman <rene.herman@gmail.com> Signed-off-by: Len Brown <len.brown@intel.com>	2008-07-16 23:27:06 +02:00
Bjorn Helgaas	57fd51a8be	PNP: add pnp_possible_config() -- can a device could be configured this way? As part of a heuristic to identify modem devices, 8250_pnp.c checks to see whether a device can be configured at any of the legacy COM port addresses. This patch moves the code that traverses the PNP "possible resource options" from 8250_pnp.c to the PNP subsystem. This encapsulation is important because a future patch will change the implementation of those resource options. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Rene Herman <rene.herman@gmail.com> Signed-off-by: Len Brown <len.brown@intel.com>	2008-07-16 23:27:06 +02:00
Bjorn Helgaas	aee3ad815d	PNP: replace pnp_resource_table with dynamically allocated resources PNP used to have a fixed-size pnp_resource_table for tracking the resources used by a device. This table often overflowed, so we've had to increase the table size, which wastes memory because most devices have very few resources. This patch replaces the table with a linked list of resources where the entries are allocated on demand. This removes messages like these: pnpacpi: exceeded the max number of IO resources 00:01: too many I/O port resources References: http://bugzilla.kernel.org/show_bug.cgi?id=9535 http://bugzilla.kernel.org/show_bug.cgi?id=9740 http://lkml.org/lkml/2007/11/30/110 This patch also changes the way PNP uses the IORESOURCE_UNSET, IORESOURCE_AUTO, and IORESOURCE_DISABLED flags. Prior to this patch, the pnp_resource_table entries used the flags like this: IORESOURCE_UNSET This table entry is unused and available for use. When this flag is set, we shouldn't look at anything else in the resource structure. This flag is set when a resource table entry is initialized. IORESOURCE_AUTO This resource was assigned automatically by pnp_assign_{io,mem,etc}(). This flag is set when a resource table entry is initialized and cleared whenever we discover a resource setting by reading an ISAPNP config register, parsing a PNPBIOS resource data stream, parsing an ACPI _CRS list, or interpreting a sysfs "set" command. Resources marked IORESOURCE_AUTO are reinitialized and marked as IORESOURCE_UNSET by pnp_clean_resource_table() in these cases: - before we attempt to assign resources automatically, - if we fail to assign resources automatically, - after disabling a device IORESOURCE_DISABLED Set by pnp_assign_{io,mem,etc}() when automatic assignment fails. Also set by PNPBIOS and PNPACPI for: - invalid IRQs or GSI registration failures - invalid DMA channels - I/O ports above 0x10000 - mem ranges with negative length After this patch, there is no pnp_resource_table, and the resource list entries use the flags like this: IORESOURCE_UNSET This flag is no longer used in PNP. Instead of keeping IORESOURCE_UNSET entries in the resource list, we remove entries from the list and free them. IORESOURCE_AUTO No change in meaning: it still means the resource was assigned automatically by pnp_assign_{port,mem,etc}(), but these functions now set the bit explicitly. We still "clean" a device's resource list in the same places, but rather than reinitializing IORESOURCE_AUTO entries, we just remove them from the list. Note that IORESOURCE_AUTO entries are always at the end of the list, so removing them doesn't reorder other list entries. This is because non-IORESOURCE_AUTO entries are added by the ISAPNP, PNPBIOS, or PNPACPI "get resources" methods and by the sysfs "set" command. In each of these cases, we completely free the resource list first. IORESOURCE_DISABLED In addition to the cases where we used to set this flag, ISAPNP now adds an IORESOURCE_DISABLED resource when it reads a configuration register with a "disabled" value. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:05 +02:00
Bjorn Helgaas	20bfdbba72	PNP: make pnp_{port,mem,etc}_start(), et al work for invalid resources Some callers use pnp_port_start() and similar functions without making sure the resource is valid. This patch makes us fall back to returning the initial values if the resource is not valid or not even present. This mostly preserves the previous behavior, where we would just return the initial values set by pnp_init_resource_table(). The original 2.6.25 code didn't range-check the "bar", so it would return garbage if the bar exceeded the table size. This code returns sensible values instead. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:05 +02:00
Zhao Yakui	da5e09a1b3	ACPI : Create "idle=nomwait" bootparam "idle=nomwait" disables the use of the MWAIT instruction from both C1 (C1_FFH) and deeper (C2C3_FFH) C-states. When MWAIT is unavailable, the BIOS and OS generally negotiate to use the HALT instruction for C1, and use IO accesses for deeper C-states. This option is useful for power and performance comparisons, and also to work around BIOS bugs where broken MWAIT support is advertised. http://bugzilla.kernel.org/show_bug.cgi?id=10807 http://bugzilla.kernel.org/show_bug.cgi?id=10914 Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Li Shaohua <shaohua.li@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:05 +02:00
Zhao Yakui	c1e3b377ad	ACPI: Create "idle=halt" bootparam "idle=halt" limits the idle loop to using the halt instruction. No MWAIT, no IO accesses, no C-states deeper than C1. If something is broken in the idle code, "idle=halt" is a less severe workaround than "idle=poll" which disables all power savings. Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:05 +02:00
Zhang Rui	71b58cbb0c	ACPI: Enhance /sys/firmware/interrupts to allow enable/disable/clear from user-space Allow users to enable/disable/clear a specific & valid GPE/Fixed Event in user space. This is useful for debugging, especially for some interrupt storm issues. All wakeup GPEs are disabled and they can not be enabled at runtime, and we mark them as invalid. All GPEs that don't have a _Lxx/_Exx method are marked as invalid. All Fixed Events that don't have an event handler are marked as invalid and they can't be enabled until an event handler is registered. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Ling Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:04 +02:00
Bob Moore	9c9f6d052d	ACPICA: Update version to 20080609 Update version to 20080609. Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:04 +02:00
Bob Moore	71d993e115	ACPICA: Cleanup debug operand dump mechanism Eliminated unnecessary operands; eliminated use of negative index in loop. Operands now displayed in correct order, not backwards. Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:04 +02:00
Bob Moore	75e5b5fb77	ACPICA: Update disassembler for DMAR table changes Now supports the 2007 intel Virtualization Technology for Directed I/O specification. Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:04 +02:00
Bob Moore	19d0cfe9dd	ACPICA: Update DMAR and SRAT table definitions Synchronized tables with current specifications. Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:04 +02:00
Bob Moore	b25d2a470b	ACPICA: Update version to 20080514 Update version to 20080514 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:04 +02:00
Bob Moore	4b8ed63167	ACPICA: Add const qualifier for appropriate string constants Mostly MODULE_NAME and printf format strings. Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:04 +02:00
Bob Moore	67a119f990	ACPICA: Eliminate acpi_native_uint type v2 No longer needed; replaced mostly with u32, but also acpi_size where a type that changes 32/64 bit on 32/64-bit platforms is required. v2: Fix a cast of a 32-bit int to a pointer in ACPI to avoid a compiler warning. from David Howells Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:03 +02:00
Bob Moore	11f2a61ab4	ACPICA: Fix possible negative array index in acpi_ut_validate_exception Added NULL fields to the exception string arrays to eliminate the -1 subtraction on the SubStatus field. Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:03 +02:00
Jan Beulich	6719561f9b	ACPICA: Update tracking macros to reduce code/data size Changed ACPI_MODULE_NAME and ACPI_FUNCTION_NAME to use arrays of strings instead of pointers to static strings. Jan Beulich and Bob Moore. Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:03 +02:00
Bob Moore	c91d924e3a	ACPICA: Fix for hang on GPE method invocation Fixes problem where the new method argument count validation mechanism will enter an infinite loop when a GPE method is dispatched. Problem fixed be removing the obsolete code that passes GPE block information to the notify handler via the control method parameter pointer. Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:03 +02:00
Bob Moore	f3454ae810	ACPICA: Add argument count checking to control method invocation via acpi_evaluate_object Error if too few arguments, warning if too many. This applies only to external programmatic control method execution, not method-to-method calls within the AML. Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:03 +02:00
Rafael J. Wysocki	ebb12db51f	Freezer: Introduce PF_FREEZER_NOSIG The freezer currently attempts to distinguish kernel threads from user space tasks by checking if their mm pointer is unset and it does not send fake signals to kernel threads. However, there are kernel threads, mostly related to networking, that behave like user space tasks and may want to be sent a fake signal to be frozen. Introduce the new process flag PF_FREEZER_NOSIG that will be set by default for all kernel threads and make the freezer only send fake signals to the tasks having PF_FREEZER_NOSIG unset. Provide the set_freezable_with_signal() function to be called by the kernel threads that want to be sent a fake signal for freezing. This patch should not change the freezer's observable behavior. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Len Brown <len.brown@intel.com>	2008-07-16 23:27:03 +02:00
David Brownell	2fe2de5f6c	ACPI PM: acpi_pm_device_sleep_state() cleanup Get rid of a superfluous acpi_pm_device_sleep_state() parameter. The only legitimate value of that parameter must be derived from the first parameter, which is what all the callers already do. (However, this does not address the fact that ACPI still doesn't set up those flags.) Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Len Brown <len.brown@intel.com>	2008-07-16 23:27:02 +02:00
Vegard Nossum	47c00d2bc2	ACPICA: fix mutex names in debug code. Reorder the mutex names to match the preceding #defines Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:01 +02:00
Bob Moore	e38e8a0743	Make GPE disable more robust Implemented another change for the GPE disable. We now perform a read-change-write of the enable register instead of simply writing out the cached enable mask. This will prevent inadvertent enabling of GPEs if a rogue GPE is received during initialization (before GPE handlers are installed.) http://bugzilla.kernel.org/show_bug.cgi?id=6217 Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:01 +02:00
Mike Travis	706546d023	ACPI: change processors from array to per_cpu variable Change processors from an array sized by NR_CPUS to a per_cpu variable. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com>	2008-07-16 23:27:01 +02:00
Roland McGrath	380fdd7585	x86 ptrace: user-sets-TF nits This closes some arcane holes in single-step handling that can arise only when user programs set TF directly (via popf or sigreturn) and then use vDSO (syscall/sysenter) system call entry. In those entry paths, the clear_TF_reenable case hits and we must check TIF_SINGLESTEP to be sure our bookkeeping stays correct wrt the user's view of TF. Signed-off-by: Roland McGrath <roland@redhat.com>	2008-07-16 12:15:17 -07:00
Roland McGrath	d4d6715016	x86 ptrace: unify syscall tracing This unifies and cleans up the syscall tracing code on i386 and x86_64. Using a single function for entry and exit tracing on 32-bit made the do_syscall_trace() into some terrible spaghetti. The logic is clear and simple using separate syscall_trace_enter() and syscall_trace_leave() functions as on 64-bit. The unification adds PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP support on x86_64, for 32-bit ptrace() callers and for 64-bit ptrace() callers tracing either 32-bit or 64-bit tasks. It behaves just like 32-bit. Changing syscall_trace_enter() to return the syscall number shortens all the assembly paths, while adding the SYSEMU feature in a simple way. Signed-off-by: Roland McGrath <roland@redhat.com>	2008-07-16 12:15:17 -07:00
Roland McGrath	64f0973319	x86 ptrace: unify TIF_SINGLESTEP This unifies the treatment of TIF_SINGLESTEP on i386 and x86_64. The bit is now excluded from _TIF_WORK_MASK on i386 as it has been on x86_64. This means the do_notify_resume() path using it is never used, so TIF_SINGLESTEP is not cleared on returning to user mode. Both now leave TIF_SINGLESTEP set when returning to user, so that it's already set on an int $0x80 system call entry. This removes the need for testing TF on the system_call path. Doing it this way fixes the regression for PTRACE_SINGLESTEP into a sigreturn syscall, introduced by commit `1e2e99f0e4`. The clear_TF_reenable case that sets TIF_SINGLESTEP can only happen on a non-exception kernel entry, i.e. sysenter/syscall instruction. That will always get to the syscall exit tracing path. Signed-off-by: Roland McGrath <roland@redhat.com>	2008-07-16 12:15:16 -07:00
Elias Oltmanns	3ef5eb424e	IDE: Remove unused code Remove some code which has been made obsolete and hasn't worked properly before anyway. Part of the infrastructure may be reintroduced in a follow up patch to implement a working command aborting facility. Signed-off-by: Elias Oltmanns <eo@nebensachen.de> Cc: "Alan Cox" <alan@lxorguk.ukuu.org.uk> Cc: "Randy Dunlap" <randy.dunlap@oracle.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-16 20:33:48 +02:00
Elias Oltmanns	79e36a9f54	IDE: Fix HDIO_DRIVE_RESET handling Currently, the code path executing an HDIO_DRIVE_RESET ioctl is broken in various ways. Most importantly, it is treated as an out of band request in an illegal way which may very likely lead to system lock ups. Use the drive's request queue to avoid this problem (and fix a locking issue for free along the way). Signed-off-by: Elias Oltmanns <eo@nebensachen.de> Cc: "Alan Cox" <alan@lxorguk.ukuu.org.uk> Cc: "Randy Dunlap" <randy.dunlap@oracle.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-16 20:33:48 +02:00
Bartlomiej Zolnierkiewicz	e6d95bd149	ide: ->port_init_devs -> ->init_dev Change ->port_init_devs method to take 'ide_drive_t ' as an argument instead of 'ide_hwif_t ' and rename it to ->init_dev. There should be no functional changes caused by this patch. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-16 20:33:42 +02:00
Bartlomiej Zolnierkiewicz	c56c5648a3	ide: set hwif->dev in ide_init_port_hw() (take 2) * Add 'parent' field to hw_regs_t for optional parent device pointer (needed by macio PMAC IDE controllers) and set hwif->dev in ide_init_port_hw(). * Update au1xxx-ide.c, sgiioc4.c, pmac.c and setup-pci.c accordingly. v2: * Update scc_pata.c. There should be no functional changes caused by this patch. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-16 20:33:40 +02:00
Bartlomiej Zolnierkiewicz	63b51c6d1d	ide: make ide_hwifs[] static Move ide_hwifs[] from ide.c to ide-probe.c and make it static. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-16 20:33:40 +02:00
Bartlomiej Zolnierkiewicz	9ad5409375	ide: move PIO blacklist to ide-pio-blacklist.c Move PIO blacklist to ide-pio-blacklist.c. While at it: - fix comment - fix whitespace damage There should be no functional changes caused by this patch. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-16 20:33:39 +02:00
Bartlomiej Zolnierkiewicz	3e153cfb5e	ide: remove no longer used ide_pio_timings[] Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-16 20:33:39 +02:00
Bartlomiej Zolnierkiewicz	c9d6c1a237	ide: move ide_pio_cycle_time() to ide-timings.c All ide_pio_cycle_time() users already select CONFIG_IDE_TIMINGS so move the function from ide-lib.c to ide-timings.c. While at it: - convert ide_pio_cycle_time() to use ide_timing_find_mode() - cleanup ide_pio_cycle_time() a bit There should be no functional changes caused by this patch. Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-16 20:33:39 +02:00
Bartlomiej Zolnierkiewicz	f06ab3402a	ide: convert ide-timing.h to ide-timings.c library (take 2) * Don't include ide-timing.h in cs5535 and sis5513 host drivers (they don't need it currently). * Convert ide-timing.h to ide-timings.c library and add CONFIG_IDE_TIMINGS config option to be selected by host drivers using the library. While at it: - fix ide_timing_find_mode() placement v2: * Add missing EXPORT_SYMBOLs. (Stephen Rothwell <sfr@canb.auug.org.au>) There should be no functional changes caused by this patch. Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-16 20:33:37 +02:00
Bartlomiej Zolnierkiewicz	3be53f3f21	ide: move some bits from ide-timing.h to <linux/ide.h> Move struct ide_timing and IDE_TIMING_* defines to <linux/ide.h> from drivers/ide/ide-timing.h. While at it: - use u8/u16 instead of short for struct ide_timing fields - use enum for IDE_TIMING_* There should be no functional changes caused by this patch. Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2008-07-16 20:33:36 +02:00
Ingo Molnar	4bb689eee1	x86: paravirt spinlocks, !CONFIG_SMP build fixes Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:15:53 +02:00
Jeremy Fitzhardinge	2d9e1e2f58	xen: implement Xen-specific spinlocks The standard ticket spinlocks are very expensive in a virtual environment, because their performance depends on Xen's scheduler giving vcpus time in the order that they're supposed to take the spinlock. This implements a Xen-specific spinlock, which should be much more efficient. The fast-path is essentially the old Linux-x86 locks, using a single lock byte. The locker decrements the byte; if the result is 0, then they have the lock. If the lock is negative, then locker must spin until the lock is positive again. When there's contention, the locker spin for 2^16[] iterations waiting to get the lock. If it fails to get the lock in that time, it adds itself to the contention count in the lock and blocks on a per-cpu event channel. When unlocking the spinlock, the locker looks to see if there's anyone blocked waiting for the lock by checking for a non-zero waiter count. If there's a waiter, it traverses the per-cpu "lock_spinners" variable, which contains which lock each CPU is waiting on. It picks one CPU waiting on the lock and sends it an event to wake it up. This allows efficient fast-path spinlock operation, while allowing spinning vcpus to give up their processor time while waiting for a contended lock. [] 2^16 iterations is threshold at which 98% locks have been taken according to Thomas Friebel's Xen Summit talk "Preventing Guests from Spinning Around". Therefore, we'd expect the lock and unlock slow paths will only be entered 2% of the time. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <clameter@linux-foundation.org> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: Virtualization <virtualization@lists.linux-foundation.org> Cc: Xen devel <xen-devel@lists.xensource.com> Cc: Thomas Friebel <thomas.friebel@amd.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:15:53 +02:00
Jeremy Fitzhardinge	8efcbab674	paravirt: introduce a "lock-byte" spinlock implementation Implement a version of the old spinlock algorithm, in which everyone spins waiting for a lock byte. In order to be compatible with the ticket-lock's use of a zero initializer, this uses the convention of '0' for unlocked and '1' for locked. This algorithm is much better than ticket locks in a virtual envionment, because it doesn't interact badly with the vcpu scheduler. If there are multiple vcpus spinning on a lock and the lock is released, the next vcpu to be scheduled will take the lock, rather than cycling around until the next ticketed vcpu gets it. To use this, you must call paravirt_use_bytelocks() very early, before any spinlocks have been taken. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <clameter@linux-foundation.org> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: Virtualization <virtualization@lists.linux-foundation.org> Cc: Xen devel <xen-devel@lists.xensource.com> Cc: Thomas Friebel <thomas.friebel@amd.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:15:53 +02:00
Jeremy Fitzhardinge	74d4affde8	x86/paravirt: add hooks for spinlock operations Ticket spinlocks have absolutely ghastly worst-case performance characteristics in a virtual environment. If there is any contention for physical CPUs (ie, there are more runnable vcpus than cpus), then ticket locks can cause the system to end up spending 90+% of its time spinning. The problem is that (v)cpus waiting on a ticket spinlock will be granted access to the lock in strict order they got their tickets. If the hypervisor scheduler doesn't give the vcpus time in that order, they will burn timeslices waiting for the scheduler to give the right vcpu some time. In the worst case it could take O(n^2) vcpu scheduler timeslices for everyone waiting on the lock to get it, not counting new cpus trying to take the lock while the log-jam is sorted out. These hooks allow a paravirt backend to replace the spinlock implementation. At the very least, this could revert the implementation back to the old lock algorithm, which allows the next scheduled vcpu to take the lock, and has basically fairly good performance. It also allows the spinlocks to take advantages of the hypervisor features to make locks more efficient (spin and block, for example). The cost to native execution is an extra direct call when using a spinlock function. There's no overhead if CONFIG_PARAVIRT is turned off. The lock structure is fixed at a single "unsigned int", initialized to zero, but the spinlock implementation can use it as it wishes. Thanks to Thomas Friebel's Xen Summit talk "Preventing Guests from Spinning Around" for pointing out this problem. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Lameter <clameter@linux-foundation.org> Cc: Petr Tesarik <ptesarik@suse.cz> Cc: Virtualization <virtualization@lists.linux-foundation.org> Cc: Xen devel <xen-devel@lists.xensource.com> Cc: Thomas Friebel <thomas.friebel@amd.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:15:52 +02:00
Jeremy Fitzhardinge	6a52e4b1cd	x86_64: further cleanup of 32-bit compat syscall mechanisms AMD only supports "syscall" from 32-bit compat usermode. Intel and Centaur(?) only support "sysenter" from 32-bit compat usermode. Set the X86 feature bits accordingly, and set up the vdso in accordance with those bits. On the offchance we run on in a 64-bit environment which supports neither syscall nor sysenter from 32-bit mode, then fall back to the int $0x80 vdso. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2008-07-16 11:08:27 +02:00
Ingo Molnar	9c8a442044	xen64: fix !HVC_XEN build dependency fix: arch/x86/xen/built-in.o: In function `set_page_prot': enlighten.c:(.text+0x111d): undefined reference to `xen_raw_printk' arch/x86/xen/built-in.o: In function `xen_start_kernel': : undefined reference to `xen_raw_console_write' arch/x86/xen/built-in.o: In function `xen_start_kernel': : undefined reference to `xen_raw_console_write' Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:06:48 +02:00
Jeremy Fitzhardinge	c24481e9da	xen64: save lots of registers The Xen hypercall interface is allowed to trash any or all of the argument registers, so we need to be careful that the kernel state isn't damaged. On 32-bit kernels, the hypercall parameter registers same as a regparm function call, so we've got away without explicit clobbering so far. The 64-bit ABI defines lots of caller-save registers, so save them all for safety. We can trim this set later by re-distributing the responsibility for saving all these registers. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:05:23 +02:00
Jeremy Fitzhardinge	c05f1cfaba	xen64: implement 64-bit update_descriptor 64-bit hypercall interface can pass a maddr in one argument rather than splitting it. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:05:09 +02:00
Eduardo Habkost	45eb0d8898	Xen64: HYPERVISOR_set_segment_base() implementation Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:03:31 +02:00
Jeremy Fitzhardinge	88459d4c7e	xen64: register callbacks in arch-independent way Use callback_op hypercall to register callbacks in a 32/64-bit independent way (64-bit doesn't need a code segment, but that detail is hidden in XEN_CALLBACK). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:03:01 +02:00
Jeremy Fitzhardinge	ce803e705f	xen64: use arbitrary_virt_to_machine for xen_set_pmd When building initial pagetables in 64-bit kernel the pud/pmd pointer may be in ioremap/fixmap space, so we need to walk the pagetable to look up the physical address. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:01:17 +02:00
Jeremy Fitzhardinge	084a2a4e76	xen64: early mapping setup Set up the initial pagetables to map the kernel mapping into the physical mapping space. This makes __va() usable, since it requires physical mappings. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 11:00:07 +02:00
Jeremy Fitzhardinge	5b09b2876e	x86_64: add workaround for no %gs-based percpu As a stopgap until Mike Travis's x86-64 gs-based percpu patches are ready, provide workaround functions for x86_read/write_percpu for Xen's use. Specifically, this means that we can't really make use of vcpu placement, because we can't use a single gs-based memory access to get to vcpu fields. So disable all that for now. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2008-07-16 10:58:13 +02:00

... 23 24 25 26 27 ...

25711 Commits