Commit Graph

210656 Commits

Author SHA1 Message Date
Joel Becker
5453258d53 ocfs2: Silence gcc warning in ocfs2_write_zero_page().
ocfs2_write_zero_page() has a loop that won't ever be skipped, but gcc
doesn't know that.  Set ret=0 just to make gcc happy.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-07-16 13:33:39 -07:00
Robert Jennings
ee2e6114de ibmveth: lost IRQ while closing/opening device leads to service loss
The order of freeing the IRQ and freeing the device in firmware
in ibmveth_close can cause the adapter to become unusable after a
subsequent ibmveth_open.  Only a reboot of the OS will make the
network device usable again. This is seen when cycling the adapter
up and down while there is network activity.

There is a window where an IRQ will be left unserviced (H_EOI will not
be called).  The solution is to make a VIO_IRQ_DISABLE h_call, free the
device with firmware, and then call free_irq.

Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-16 13:03:23 -07:00
David S. Miller
0f6142fa96 Merge branch 'vhost-net' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost 2010-07-16 12:41:44 -07:00
Bjorn Helgaas
58c84eda07 PCI: fall back to original BIOS BAR addresses
If we fail to assign resources to a PCI BAR, this patch makes us try the
original address from BIOS rather than leaving it disabled.

Linux tries to make sure all PCI device BARs are inside the upstream
PCI host bridge or P2P bridge apertures, reassigning BARs if necessary.
Windows does similar reassignment.

Before this patch, if we could not move a BAR into an aperture, we left
the resource unassigned, i.e., at address zero.  Windows leaves such BARs
at the original BIOS addresses, and this patch makes Linux do the same.

This is a bit ugly because we disable the resource long before we try to
reassign it, so we have to keep track of the BIOS BAR address somewhere.
For lack of a better place, I put it in the struct pci_dev.

I think it would be cleaner to attempt the assignment immediately when the
claim fails, so we could easily remember the original address.  But we
currently claim motherboard resources in the middle, after attempting to
claim PCI resources and before assigning new PCI resources, and changing
that is a fairly big job.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16263

Reported-by: Andrew <nitr0@seti.kr.ua>
Tested-by: Andrew <nitr0@seti.kr.ua>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-07-16 11:39:48 -07:00
Linus Torvalds
f469461df6 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  tracing: Add alignment to syscall metadata declarations
  perf: Sync callchains with period based hits
  perf: Resurrect flat callchains
  perf: Version String fix, for fallback if not from git
  perf: Version String fix, using kernel version
2010-07-16 11:26:33 -07:00
John W. Linville
088c87262b mac80211: improve error checking if WEP fails to init
Do this by poisoning the values of wep_tx_tfm and wep_rx_tfm if either
crypto allocation fails.

Reported-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-07-16 14:03:42 -04:00
Dan Carpenter
48d5548fc5 orinoco_usb: potential null dereference
Smatch complains that "upriv->read_urb" gets dereferenced before
checking for NULL.  It turns out that it's possible for
"upriv->read_urb" to be NULL so I added checks around the dereferences.

Also I remove an "if (upriv->bap_buf != NULL)" check because
"kfree(NULL) is OK.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-07-16 14:03:42 -04:00
Luis R. Rodriguez
9171acc7e0 ath9k_hw: Fix AR9003 MPDU delimeter CRC check for middle subframes
An A-MPDU may contain several subframes each containing its own
CRC for the data. Each subframe also has a respective CRC for the
MPDU length and 4 reserved bits (aka delimeter CRC). AR9003 will
ACK frames that have a valid data CRC but have failed to pass the
CRC for the MPDU length, if and only if the subframe is not the
last subframe in an A-MPDU and if an OFDM phy OFDM reset error has
been caught. Discarding those subframes results in packet loss under
heavy stress conditions, an example being UDP video. Since the
frames are ACK'd by hardware we need to let these frames through
and process them as valid frames.

Cc: Tushit Jain <tushit.jain@atheros.com>
Cc: Kyungwan Nam <kyungwan.nam@atheros.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-07-16 14:03:42 -04:00
Christian Lamparter
15804e3e9d mac80211: skip HT parsing if HW does not support HT
This patch will also fix the odd freeze which occurred
when minstrel_ht connects to an 802.11n network with
legacy hardware.

Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-07-16 14:03:42 -04:00
Luis R. Rodriguez
9edd9520a2 ath9k_htc: make ath9k_htc_tx_aggr_oper() static
This fixes this sparse complaint:

  CHECK   drivers/net/wireless/ath/ath9k/htc_drv_main.c
drivers/net/wireless/ath/ath9k/htc_drv_main.c:441:5:
	warning: symbol 'ath9k_htc_tx_aggr_oper'
		 was not declared. Should it be static?

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-07-16 14:03:41 -04:00
Stephen Boyd
9acd56d3f2 rt2x00: Fix lockdep warning in rt2x00lib_probe_dev()
The rt2x00dev->intf_work workqueue is never initialized when a driver is
probed for a non-existent device (in this case rt2500usb). On such a
path we call rt2x00lib_remove_dev() to free any resources initialized
during the probe before we use INIT_WORK to initialize the workqueue.
This causes lockdep to get confused since the lock used in the workqueue
hasn't been initialized yet but is now being acquired during
cancel_work_sync() called by rt2x00lib_remove_dev().

Fix this by initializing the workqueue first before we attempt to probe
the device. This should make lockdep happy and avoid breaking any
assumptions about how the library cleans up after a probe fails.

phy0 -> rt2x00lib_probe_dev: Error - Failed to allocate device.
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
Pid: 2027, comm: modprobe Not tainted 2.6.35-rc5+ #60
Call Trace:
 [<ffffffff8105fe59>] register_lock_class+0x152/0x31f
 [<ffffffff81344a00>] ? usb_control_msg+0xd5/0x111
 [<ffffffff81061bde>] __lock_acquire+0xce/0xcf4
 [<ffffffff8105f6fd>] ? trace_hardirqs_off+0xd/0xf
 [<ffffffff81492aef>] ?  _raw_spin_unlock_irqrestore+0x33/0x41
 [<ffffffff810628d5>] lock_acquire+0xd1/0xf7
 [<ffffffff8104f037>] ? __cancel_work_timer+0x99/0x17e
 [<ffffffff8104f06e>] __cancel_work_timer+0xd0/0x17e
 [<ffffffff8104f037>] ? __cancel_work_timer+0x99/0x17e
 [<ffffffff8104f136>] cancel_work_sync+0xb/0xd
 [<ffffffffa0096675>] rt2x00lib_remove_dev+0x25/0xb0 [rt2x00lib]
 [<ffffffffa0096bf7>] rt2x00lib_probe_dev+0x380/0x3ed [rt2x00lib]
 [<ffffffff811d78a7>] ? __raw_spin_lock_init+0x31/0x52
 [<ffffffffa00bbd2c>] ? T.676+0xe/0x10 [rt2x00usb]
 [<ffffffffa00bbe4f>] rt2x00usb_probe+0x121/0x15e [rt2x00usb]
 [<ffffffff813468bd>] usb_probe_interface+0x151/0x19e
 [<ffffffff812ea08e>] driver_probe_device+0xa7/0x136
 [<ffffffff812ea167>] __driver_attach+0x4a/0x66
 [<ffffffff812ea11d>] ? __driver_attach+0x0/0x66
 [<ffffffff812e96ca>] bus_for_each_dev+0x54/0x89
 [<ffffffff812e9efd>] driver_attach+0x19/0x1b
 [<ffffffff812e9b64>] bus_add_driver+0xb4/0x204
 [<ffffffff812ea41b>] driver_register+0x98/0x109
 [<ffffffff813465dd>] usb_register_driver+0xb2/0x173
 [<ffffffffa00ca000>] ? rt2500usb_init+0x0/0x20 [rt2500usb]
 [<ffffffffa00ca01e>] rt2500usb_init+0x1e/0x20 [rt2500usb]
 [<ffffffff81000203>] do_one_initcall+0x6d/0x17a
 [<ffffffff8106cae8>] sys_init_module+0x9c/0x1e0
 [<ffffffff8100296b>] system_call_fastpath+0x16/0x1b

Signed-off-by: Stephen Boyd <bebarino@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-07-16 13:57:59 -04:00
Kulikov Vasiliy
1c689cbcf2 arch/tile: check kmalloc() result
If kmalloc() fails exit with -ENOMEM.

Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
2010-07-16 13:37:14 -04:00
Sage Weil
e979cf5039 ceph: do not include cap/dentry releases in replayed messages
Strip the cap and dentry releases from replayed messages.  They can
cause the shared state to get out of sync because they were generated
(with the request message) earlier, and no longer reflect the current
client state.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-07-16 10:30:18 -07:00
Sage Weil
01a92f174f ceph: reuse request message when replaying against recovering mds
Replayed rename operations (after an mds failure/recovery) were broken
because the request paths were regenerated from the dentry names, which
get mangled when d_move() is called.

Instead, resend the previous request message when replaying completed
operations.  Just make sure the REPLAY flag is set and the target ino is
filled in.

This fixes problems with workloads doing renames when the MDS restarts,
where the rename operation appears to succeed, but on mds restart then
fails (leading to client confusion, app breakage, etc.).

Signed-off-by: Sage Weil <sage@newdream.net>
2010-07-16 10:30:17 -07:00
Gui Jianfeng
74534341c1 perf symbols: Fix directory descriptor leaking
When I ran "perf kvm ... top", I encountered the following error output.

  Error: perfcounter syscall returned with -1 (Too many open files)

  Fatal: No CONFIG_PERF_EVENTS=y kernel support configured?

Looking into perf, I found perf opens too many directories at
initialization time, but forgets to close them. Here is the fix.

LKML-Reference: <4C230362.5080704@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-07-16 14:16:47 -03:00
Linus Torvalds
79140bc486 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
  GFS2: rename causes kernel Oops
  GFS2: BUG in gfs2_adjust_quota
  GFS2: Fix kernel NULL pointer dereference by dlm_astd
  GFS2: recovery stuck on transaction lock
  GFS2: O_TRUNC not working on stuffed files across cluster
2010-07-16 08:23:10 -07:00
Linus Torvalds
cc10b6ffd3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: w90p910_ts - fix call to setup_timer()
  Input: synaptics - fix wrong dimensions check
  Input: i8042 - mark stubs in i8042.h "static inline"
2010-07-16 08:22:40 -07:00
Masami Hiramatsu
8217563359 perf probe: Fix the logic of die_compare_name
Invert the return value of die_compare_name(), because it returns a 'bool'
result which should be expeced true if the die's name is same as compared
string.

LKML-Reference: <4C36EBED.1000006@hitachi.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-07-16 11:48:34 -03:00
Masami Hiramatsu
6a330a3c8a perf probe: Support comp_dir to find an absolute source path
Gcc generates DW_AT_comp_dir and stores relative source path if building kernel
without O= option. In that case, perf probe --line sometimes doesn't work
without --source option, because it tries to access relative source path.

This adds DW_AT_comp_dir support to perf probe for finding an absolute source
path when no --source option.

LKML-Reference: <4C36EBE7.3060802@hitachi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-07-16 11:48:09 -03:00
Masami Hiramatsu
7cf0b79e6f perf probe: Fix error message if get_real_path() failed
Perf probe -L shows incorrect error message (Dwarf error) if it fails to find
source file. This can confuse users.

# ./perf probe -s /nowhere -L vfs_read
Debuginfo analysis failed. (-2)
  Error: Failed to show lines. (-2)

With this patch, it shows correct message.

# ./perf probe -s /nowhere -L vfs_read
Failed to find source file. (-2)
  Error: Failed to show lines. (-2)

LKML-Reference: <4C36EBDB.4020308@hitachi.com>
Cc: Chase Douglas <chase.douglas@canonical.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Chase Douglas <chase.douglas@canonical.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-07-16 11:46:34 -03:00
Michael S. Tsirkin
22cb516696 netfilter: correct CHECKSUM header and export it
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-07-16 14:08:20 +02:00
Russell King
e910b63d00 Merge branch 'l7200' into devel
Conflicts:
	arch/arm/configs/lusl7200_defconfig
2010-07-16 11:08:33 +01:00
Russell King
3abe9d33b3 ARM: early_alloc()
Add a common early allocator function, in preparation for switching
over to LMB.  When we do, this function will need to do a little more
than just allocating memory; we need it zero initialized too.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-07-16 11:06:42 +01:00
Russell King
71ee7dad9b ARM: OMAP: Convert to use ->reserve method to reserve boot time memory
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-07-16 11:06:41 +01:00
Russell King
a1af0fbbba ARM: OMAP: Cleanup OMAP FB SDRAM reservation
The logic in this file is rather convoluted, but essentially:

1. region type 0 is SDRAM
2. referring to the code fragment
                if (set_fbmem_region_type(&rg, OMAPFB_MEMTYPE_SDRAM,
                                          sdram_start, sdram_size) < 0 ||
                    (rg.type != OMAPFB_MEMTYPE_SDRAM))
                        continue;
   - if rg.type is not OMAPFB_MEMTYPE_SDRAM, set_fbmem_region_type()
     returns zero immediately (since rg.type is non-zero), and so we
     'continue'.
   - if rg.type is OMAPFB_MEMTYPE_SDRAM, and rg.paddr is zero,
     we fall through.
   - if rg.type is OMAPFB_MEMTYPE_SDRAM, and the region lies within
     SDRAM, we fall through.
   - if rg.type is OMAPFB_MEMTYPE_SDRAM, and the region is not within
     SDRAM, we 'continue'.
3. check_fbmem_region seems unnecessary.
   - we know rg.type is OMAPFB_MEMTYPE_SDRAM
   - we can check rg.size independently
   - bootmem_reserve() can check for overlapping reservations itself
   - we've already validated that the requested region lies within SDRAM.
4. avoid BUG()ing if the region entry is already set; print an error,
   and mark the configuration invalid - at least we'll continue booting
   so the error message has a chance of being logged/visible via serial
   console.

With these changes in place, it makes the code much easier to understand
and hence easier to convert to LMB.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-07-16 11:06:41 +01:00
Russell King
98c672cf1f ARM: Move platform memory reservations out of generic code
Move the platform specific bootmem memory reservations out of
arch/arm/mm/mmu.c into their respective platform files.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-07-16 11:06:40 +01:00
Russell King
b65b4781fb ARM: Remove 'node' argument form arch_adjust_zones()
Since we no longer support discontigmem, node is always zero, so
remove this argument.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-07-16 10:57:36 +01:00
Russell King
be37030274 ARM: Remove DISCONTIGMEM support
Everything should now be using sparsemem rather than discontigmem, so
remove the code supporting discontigmem from ARM.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-07-16 10:57:35 +01:00
Russell King
7961239599 ARM: Precalculate vmalloc_min
Rather than storing the minimum size of the vmalloc area, store the
maximum permitted address of the vmalloc area instead.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-07-16 10:57:35 +01:00
Eliot Blennerhassett
e2768c0c22 ALSA: asihpi - Avoid useless assignment of returned index values.
Signed-off-by: Eliot Blennerhassett <eblennerhassett@audioscience.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2010-07-16 11:34:23 +02:00
Eliot Blennerhassett
604a440a9d ALSA: asihpi - Avoid using c99 uintX types.
Signed-off-by: Eliot Blennerhassett <eblennerhassett@audioscience.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2010-07-16 11:33:47 +02:00
Eliot Blennerhassett
8d4bbee77e ALSA: asihpi - HPI version 4.04.01
Signed-off-by: Eliot Blennerhassett <eblennerhassett@audioscience.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2010-07-16 11:31:37 +02:00
Michael S. Tsirkin
95c0ec6a97 vhost: avoid pr_err on condition guest can trigger
Guest can trigger packet truncation by posting
a very short buffer and disabling buffer merging.
Convert pr_err to pr_debug to avoid log from filling
up when this happens.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2010-07-16 12:18:17 +03:00
Christoph Lameter
af537b0a6c slub: Use kmem_cache flags to detect if slab is in debugging mode.
The cacheline with the flags is reachable from the hot paths after the
percpu allocator changes went in. So there is no need anymore to put a
flag into each slab page. Get rid of the SlubDebug flag and use
the flags in kmem_cache instead.

Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-07-16 11:13:08 +03:00
Christoph Lameter
f5b801ac38 slub: Allow removal of slab caches during boot
If a slab cache is removed before we have setup sysfs then simply skip over
the sysfs handling.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Roland Dreier <rdreier@cisco.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-07-16 11:13:07 +03:00
Christoph Lameter
d7278bd7d1 slub: Check kasprintf results in kmem_cache_init()
Small allocations may fail during slab bringup which is fatal. Add a BUG_ON()
so that we fail immediately rather than failing later during sysfs
processing.

Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-07-16 11:13:07 +03:00
Christoph Lameter
f90ec39014 SLUB: Constants need UL
UL suffix is missing in some constants. Conform to how slab.h uses constants.

Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-07-16 11:13:07 +03:00
Christoph Lameter
2154a33638 slub: Use a constant for a unspecified node.
kmalloc_node() and friends can be passed a constant -1 to indicate
that no choice was made for the node from which the object needs to
come.

Use NUMA_NO_NODE instead of -1.

CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-07-16 11:13:06 +03:00
Bob Liu
d602dabaeb SLOB: Free objects to their own list
SLOB has alloced smaller objects from their own list in reduce overall external
fragmentation and increase repeatability, free to their own list also.

This is /proc/meminfo result in my test machine:

  without this patch:
  ===
  MemTotal:        1030720 kB
  MemFree:          750012 kB
  Buffers:           15496 kB
  Cached:           160396 kB
  SwapCached:            0 kB
  Active:           105024 kB
  Inactive:         145604 kB
  Active(anon):      74816 kB
  Inactive(anon):     2180 kB
  Active(file):      30208 kB
  Inactive(file):   143424 kB
  Unevictable:          16 kB
  ....

  with this patch:
  ===
  MemTotal:        1030720 kB
  MemFree:          751908 kB
  Buffers:           15492 kB
  Cached:           160280 kB
  SwapCached:            0 kB
  Active:           102720 kB
  Inactive:         146140 kB
  Active(anon):      73168 kB
  Inactive(anon):     2180 kB
  Active(file):      29552 kB
  Inactive(file):   143960 kB
  Unevictable:          16 kB
  ...

The result shows an improvement of 1 MB!

And when I tested it on a embeded system with 64 MB, I found this path is never
called during kernel bootup.

Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Bob Liu <lliubbo@gmail.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
2010-07-16 11:03:20 +03:00
Jiri Slaby
f33ebbe9da unistd: add __NR_prlimit64 syscall numbers
Add __NR_prlimit64 syscall numbers to asm-generic. Add them also to
asm-x86, both 32 and 64-bit.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2010-07-16 09:52:32 +02:00
Jiri Slaby
c022a0acad rlimits: implement prlimit64 syscall
This patch adds the code to support the sys_prlimit64 syscall which
modifies-and-returns the rlim values of a selected process atomically.
The first parameter, pid, being 0 means current process.

Unlike the current implementation, it is a generic interface,
architecture indepentent so that we needn't handle compat stuff
anymore. In the future, after glibc start to use this we can deprecate
sys_setrlimit and sys_getrlimit in favor to clean up the code finally.

It also adds a possibility of changing limits of other processes. We
check the user's permissions to do that and if it succeeds, the new
limits are propagated online. This is good for large scale
applications such as SAP or databases where administrators need to
change limits time by time (e.g. on crashes increase core size). And
it is unacceptable to restart the service.

For safety, all rlim users now either use accessors or doesn't need
them due to
- locking
- the fact a process was just forked and nobody else knows about it
  yet (and nobody can't thus read/write limits)
hence it is safe to modify limits now.

The limitation is that we currently stay at ulong internal
representation. So the rlim64_is_infinity check is used where value is
compared against ULONG_MAX on 32-bit which is the maximum value there.

And since internally the limits are held in struct rlimit, converters
which are used before and after do_prlimit call in sys_prlimit64 are
introduced.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2010-07-16 09:48:48 +02:00
Jiri Slaby
b95183453a rlimits: switch more rlimit syscalls to do_prlimit
After we added more generic do_prlimit, switch sys_getrlimit to that.
Also switch compat handling, so we can get rid of ugly __user casts
and avoid setting process' address limit to kernel data and back.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2010-07-16 09:48:48 +02:00
Jiri Slaby
5b41535aac rlimits: redo do_setrlimit to more generic do_prlimit
It now allows also reading of limits. I.e. all read and writes will
later use this function.

It takes two parameters, new and old limits which can be both NULL.
If new is non-NULL, the value in it is set to rlimits.
If old is non-NULL, current rlimits are stored there.
If both are non-NULL, old are stored prior to setting the new ones,
atomically.
(Similar to sigaction.)

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2010-07-16 09:48:48 +02:00
Jiri Slaby
6a1d5e2c85 rlimits: add rlimit64 structure
Add a platform independent structure for resource limits to use with
a new prlimit64 syscall. This structure is the same which uses glibc
for 64-bit limits.

Also add corresponding infinity which is a 64-bit full of bit-ones.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2010-07-16 09:48:47 +02:00
Jiri Slaby
86f162f4c7 rlimits: do security check under task_lock
Do security_task_setrlimit under task_lock. Other tasks may change
limits under our hands while we are checking limits inside the
function. From now on, they can't.

Note that all the security work is done under a spinlock here now.
Security hooks count with that, they are called from interrupt context
(like security_task_kill) and with spinlocks already held (e.g.
capable->security_capable).

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Acked-by: James Morris <jmorris@namei.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
2010-07-16 09:48:47 +02:00
Jiri Slaby
1c1e618ddd rlimits: allow setrlimit to non-current tasks
Add locking to allow setrlimit accept task parameter other than
current.

Namely, lock tasklist_lock for read and check whether the task
structure has sighand non-null. Do all the signal processing under
that lock still held.

There are some points:
1) security_task_setrlimit is now called with that lock held. This is
   not new, many security_* functions are called with this lock held
   already so it doesn't harm (all this security_* stuff does almost
   the same).
2) task->sighand->siglock (in update_rlimit_cpu) is nested in
   tasklist_lock. This dependence is already existing.
3) tsk->alloc_lock is nested in tasklist_lock. This is OK too, already
   existing dependence.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
2010-07-16 09:48:47 +02:00
Jiri Slaby
7855c35da7 rlimits: split sys_setrlimit
Create do_setrlimit from sys_setrlimit and declare do_setrlimit
in the resource header. This is the first phase to have generic
do_prlimit which allows to be called from read, write and compat
rlimits code.

The new do_setrlimit also accepts a task pointer to change the limits
of. Currently, it cannot be other than current, but this will change
with locking later.

Also pass tsk->group_leader to security_task_setrlimit to check
whether current is allowed to change rlimits of the process and not
its arbitrary thread because it makes more sense given that rlimit are
per process and not per-thread.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2010-07-16 09:48:46 +02:00
Oleg Nesterov
eb2d55a32b rlimits: selinux, do rlimits changes under task_lock
When doing an exec, selinux updates rlimits in its code of current
process depending on current max. Make sure max or cur doesn't change
in the meantime by grabbing task_lock which do_prlimit needs for
changing limits too.

While at it, use rlimit helper for accessing CPU rlimit a line below.
To have a volatile access too.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Oleg Nesterov <oleg@redhat.com>
2010-07-16 09:48:46 +02:00
Oleg Nesterov
2fb9d2689a rlimits: make sure ->rlim_max never grows in sys_setrlimit
Mostly preparation for Jiri's changes, but probably makes sense anyway.

sys_setrlimit() checks new_rlim.rlim_max <= old_rlim->rlim_max, but when
it takes task_lock() old_rlim->rlim_max can be already lowered. Move this
check under task_lock().

Currently this is not important, we can only race with our sub-thread,
this means the application is stupid. But when we change the code to allow
the update of !current task's limits, it becomes important to make sure
->rlim_max can be lowered "reliably" even if we race with the application
doing sys_setrlimit().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
2010-07-16 09:48:46 +02:00
Jiri Slaby
5ab46b345e rlimits: add task_struct to update_rlimit_cpu
Add task_struct as a parameter to update_rlimit_cpu to be able to set
rlimit_cpu of different task than current.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
2010-07-16 09:48:45 +02:00