Dave Kleikamp [Thu, 9 Oct 2008 18:21:30 +0000 (13:21 -0500)]
sched_clock: prevent scd->clock from moving backwards
When sched_clock_cpu() couples the clocks between two cpus, it may
increment scd->clock beyond the GTOD tick window that __update_sched_clock()
uses to clamp the clock. A later call to __update_sched_clock() may move
the clock back to scd->tick_gtod + TICK_NSEC, violating the clock's
monotonic property.
This patch ensures that scd->clock will not be set backward.
Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
David John [Fri, 10 Oct 2008 06:12:44 +0000 (11:42 +0530)]
HPET: Remove spurious HPET busy warning message.
On x86 systems with CONFIG_HPET_TIMER enabled, when
the HPET driver (drivers/char/hpet.c) is loaded,
an incorrect busy message is printed when the driver
initializes since the HPET has already been allocated
by the core timer code. Remove the warning message.
Signed-off-by: David John <davidjon@xenontk.org> Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Kumar Gala [Fri, 10 Oct 2008 03:50:06 +0000 (22:50 -0500)]
[MTD] [NAND] remove dead Kconfig associated with !CONFIG_PPC_MERGE
Removed the Kconfig associated with 'NDFC NanD Flash Controller'.
We can't enable !CONFIG_PPC_MERGE so there is no way to enable
this. Additionally the code needs to get updated for arch/powerpc.
For the time being lets just remove the Kconfig option so we can
actually remove CONFIG_PPC_MERGE.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
[MTD] [NAND] driver extension to support NAND on TQM85xx modules
This patch extends the FSL UPM NAND driver from Anton Vorontsov to
support hardware which does not have the R/B pin of the NAND chip
connected, like the TQM8548 module:
- The OF_GPIO dependency has been removed from the Kconfig option
because GPIO is not needed. The relevant gpio_* function are then
stubbed out in <linux/gpio.h>.
- It re-introduces the chip-delay property to define an appropriate
maximum delay time (tR) required for read operations. The binding
will be documented in a separate patch.
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Anton Vorontsov [Thu, 18 Sep 2008 16:50:26 +0000 (20:50 +0400)]
[MTD] [NAND] fsl_upm: update driver for the new OF bindings
- Get rid of fsl,wait-pattern and fsl,wait-write. I think this isn't
chip-specific, and we should always do waits. I saw one board that
didn't need fsl,wait-pattern, but I assume this was the exception
that proves the rule;
- Get rid of chip-delay. Today there are no users for this, and if
anyone really need this they should push the OF bindings beforehand;
- Now flash chips should be child nodes of the FSL UPM NAND controller;
- Implement OF partition parsing.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Jon Tollefson [Thu, 9 Oct 2008 10:18:40 +0000 (10:18 +0000)]
powerpc: Reserve in bootmem lmb reserved regions that cross NUMA nodes
If there are multiple reserved memory blocks via lmb_reserve() that are
contiguous addresses and on different NUMA nodes we are losing track of which
address ranges to reserve in bootmem on which node. I discovered this
when I recently got to try 16GB huge pages on a system with more then 2 nodes.
When scanning the device tree in early boot we call lmb_reserve() with
the addresses of the 16G pages that we find so that the memory doesn't
get used for something else. For example the addresses for the pages
could be 4000000000, 4400000000, 4800000000, 4C00000000, etc - 8 pages,
one on each of eight nodes. In the lmb after all the pages have been
reserved it will look something like the following:
The reserved.region[0x4] contains the 16G pages. In
arch/powerpc/mm/num.c: do_init_bootmem() we loop through each of the
node numbers looking for the reserved regions that belong to the
particular node. It is not able to identify region 0x4 as being a part
of each of the 8 nodes. It is assuming that a reserved region is only
on a single node.
This patch takes out the reserved region loop from inside
the loop that goes over each node. It looks up the active region containing
the start of the reserved region. If it extends past that active region then
it adjusts the size and gets the next active region containing it.
Signed-off-by: Jon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Paul Mackerras [Wed, 8 Oct 2008 14:03:29 +0000 (14:03 +0000)]
powerpc: Sync RPA note in zImage with kernel's RPA note
Commit 9b09c6d909dfd8de96b99b9b9c808b94b0a71614 ("powerpc: Change the
default link address for pSeries zImage kernels") changed the
real-base value in the CHRP note added by the addnote program from
12MB to 32MB to give more space for Open Firmware to load the zImage.
(The real-base value says where we want OF to position itself in
memory.) However, this change was ineffective on most pSeries
machines, because the RPA note added by addnote has the "ignore me"
flag set to 1. This was intended to tell OF to ignore just the RPA
note, but has the side effect of also making OF ignore the CHRP note
(at least on most pSeries machines).
To solve this we have to set the "ignore me" flag to 0 in the RPA
note. (We can't just omit the RPA note because that is equivalent to
having an RPA note with default values, and the default values are not
what we want.) However, then we have to make sure the values in the
zImage's RPA note match up with the values that the kernel supplies
later in prom_init.c with either the ibm,client-architecture-support
call or the process-elf-header call in prom_send_capabilities().
So this sets the "ignore me" flag in the RPA note in addnote to 0, and
adjusts the RPA note values in addnote.c and in prom_init.c to be
consistent with each other and with the values in ibm_architecture_vec.
However, since the wrapper is independent of the kernel, this doesn't
ensure that the notes will stay consistent. To ensure that, this adds
code to addnote.c so that it can extract the kernel's RPA note from
the kernel binary and put that in the zImage. To that end, we put the
kernel's fake ELF header (which contains the kernel's RPA note) into
its own section, and arrange for wrapper to pull out that section with
objcopy and pass it to addnote, which then extracts the RPA note from
it and transfers it to the zImage.
Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Grant Likely [Wed, 8 Oct 2008 05:05:29 +0000 (05:05 +0000)]
powerpc/of-bindings: Don't support linux,<modalias> "compatible" values
Compatible property values in the form linux,<modalias> is not documented
anywhere and using it leaks Linux implementation details into the device
tree data (which is bad). Remove support for compatible values of this
form.
If any platforms exist which depended on this code (and I don't know of
any), then they can be fixed up by adding legacy translations to the
lookup table in this file.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Josh Poimboeuf [Tue, 7 Oct 2008 06:10:03 +0000 (06:10 +0000)]
powerpc: Fix error path in kernel_thread function
The powerpc 32-bit and 64-bit kernel_thread functions don't properly
propagate errors being returned by the clone syscall. (In the case of
error, the syscall exit code returns a positive errno in r3 and sets
the CR0[SO] bit.)
This patch fixes that by negating r3 if CR0[SO] is set after the syscall.
Signed-off-by: Josh Poimboeuf <jpoimboe@us.ibm.com> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Roel Kluin [Mon, 6 Oct 2008 22:38:33 +0000 (22:38 +0000)]
powerpc/cell/oprofile: Fix test on overlay_tbl_offset in vma_map
Offset is unsigned and when an address isn't found in the vma map
vma_map_lookup() returns the vma physical address + 0x10000000.
vma_map_lookup used to return 0xffffffff on a failed lookup, but
a change was made to return the vma physical address + 0x10000000
There are two callers of vam_map_lookup: one of them correctly
deals with this new return value, but the other (below) did not.
Signed-off-by: Roel Kluin <12o3l@tiscali.nl> Acked-by: Maynard Johnson <maynardj@us.ibm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Vorontsov [Mon, 6 Oct 2008 07:26:54 +0000 (07:26 +0000)]
powerpc: Fix no interrupt handling in pata_of_platform
When no interrupt is specified the pata_of_platform fills the irq_res
resource with -1, which is wrong to do for two reasons:
1. By definition, 'no irq' should be IRQ 0, not some negative integer;
2. pata_platform checks for irq_res.start > 0, but since irq_res.start
is unsigned type, the check will be true for `-1'.
Reported-by: Steven A. Falco <sfalco@harris.com> Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Nathan Fontenot [Wed, 1 Oct 2008 09:44:02 +0000 (09:44 +0000)]
powerpc: Oops in pseries_lmb_remove()
Testing hotplug memory remove has revealed that we can oops in
pseries_lmb_remove(). The incorrect shift causes a NULL pointer
dereference in the page_zone() inline routine.
I have only been able to reproduce the oops on kernels with large pages
enabled.
Tested on Power5 and Power6 with and without large pages enabled.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Chien Tung [Fri, 10 Oct 2008 00:41:05 +0000 (17:41 -0700)]
RDMA/nes: Fix slab corruption
Referencing cm_node after it is freed via rem_ref_cm_node() causes a
slab corruption. There is no need to set cm_node->cm_id to NULL in
mini_cm_close().
Signed-off-by: Chien Tung <ctung@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
Alexey Dobriyan [Thu, 9 Oct 2008 23:27:16 +0000 (03:27 +0400)]
proc: remove kernel.maps_protect
After commit 831830b5a2b5d413407adf380ef62fe17d6fcbf2 aka
"restrict reading from /proc/<pid>/maps to those who share ->mm or can ptrace"
sysctl stopped being relevant because commit moved security checks from ->show
time to ->start time (mm_for_maps()).
Kees Cook [Sun, 5 Oct 2008 23:11:58 +0000 (03:11 +0400)]
[PATCH] proc: show personality via /proc/pid/personality
Make process personality flags visible in /proc. Since a process's
personality is potentially sensitive (e.g. READ_IMPLIES_EXEC), make this
file only readable by the process owner.
Lai Jiangshan [Sat, 4 Oct 2008 20:51:15 +0000 (00:51 +0400)]
[PATCH] signal, procfs: some lock_task_sighand() users do not need rcu_read_lock()
lock_task_sighand() make sure task->sighand is being protected,
so we do not need rcu_read_lock().
[ exec() will get task->sighand->siglock before change task->sighand! ]
But code using rcu_read_lock() _just_ to protect lock_task_sighand()
only appear in procfs. (and some code in procfs use lock_task_sighand()
without such redundant protection.)
Other subsystem may put lock_task_sighand() into rcu_read_lock()
critical region, but these rcu_read_lock() are used for protecting
"for_each_process()", "find_task_by_vpid()" etc. , not for protecting
lock_task_sighand().
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
[ok from Oleg] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Alexey Dobriyan [Thu, 2 Oct 2008 20:18:52 +0000 (00:18 +0400)]
proc: fix return value of proc_reg_open() in "too late" case
If ->open() wasn't called, returning 0 is misleading and, theoretically,
oopsable:
1) remove_proc_entry clears ->proc_fops, drops lock,
2) ->open "succeeds",
3) ->release oopses, because it assumes ->open was called (single_release()).
Kou Ishizaki [Wed, 8 Oct 2008 23:45:49 +0000 (10:45 +1100)]
powerpc/spufs: add a missing mutex_unlock
A mutex_unlock(&gang->aff_mutex) in spufs_create_context() is missing
in case spufs_context_open() fails. As a result, spu_create syscall
and spu_get_idle() may block.
This patch adds the mutex_unlock.
Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Acked-by: Andre Detsch <adetsch@br.ibm.com>
Eric Dumazet [Thu, 9 Oct 2008 21:51:27 +0000 (14:51 -0700)]
udp: complete port availability checking
While looking at UDP port randomization, I noticed it
was litle bit pessimistic, not looking at type of sockets
(IPV6/IPV4) and not looking at bound addresses if any.
We should perform same tests than when binding to a
specific port.
This permits a cleanup of udp_lib_get_port()
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
I choose to leave the reset related netns comment in place (not
the one that is killed) as I cannot understand its English so
it's a bit hard for me to evaluate its usefulness :-).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
- /* Pass a socket to ip6_dst_lookup either it is for RST
- * Underlying function will use this to retrieve the network
- * namespace
- */
if (!ip6_dst_lookup(ctl_sk, &buff->dst, &fl)) {
if (xfrm_lookup(&buff->dst, &fl, NULL, 0) >= 0) {
ip6_xmit(ctl_sk, buff, &fl, NULL, 0);
TCP_INC_STATS_BH(net, TCP_MIB_OUTSEGS);
- TCP_INC_STATS_BH(net, TCP_MIB_OUTRSTS);
return;
}
}
...which starts to be trivial to combine.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Thu, 9 Oct 2008 21:37:47 +0000 (14:37 -0700)]
tcpv[46]: fix md5 pseudoheader address field ordering
Maybe it's just me but I guess those md5 people made a mess
out of it by having *_md5_hash_* to use daddr, saddr order
instead of the one that is natural (and equal to what csum
functions use). For the segment were sending, the original
addresses are reversed so buff's saddr == skb's daddr and
vice-versa.
Maybe I can finally proceed with unification of some code
after fixing it first... :-)
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Thu, 9 Oct 2008 21:33:26 +0000 (14:33 -0700)]
sctp: update SNMP statiscts when T5 timer expired.
The T5 timer is the timer for the over-all shutdown procedure. If
this timer expires, then shutdown procedure has not completed and we
ABORT the association. We should update SCTP_MIB_ABORTED and
SCTP_MIB_CURRESTAB when aborting.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Thu, 9 Oct 2008 21:33:01 +0000 (14:33 -0700)]
sctp: Fix SNMP number of SCTP_MIB_ABORTED during violation handling.
If ABORT chunks require authentication and a protocol violation
is triggered, we do not tear down the association. Subsequently,
we should not increment SCTP_MIB_ABORTED.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Thu, 9 Oct 2008 21:32:24 +0000 (14:32 -0700)]
sctp: Fix the SNMP number of SCTP_MIB_CURRESTAB
RFC3873 defined SCTP_MIB_CURRESTAB:
sctpCurrEstab OBJECT-TYPE
SYNTAX Gauge32
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"The number of associations for which the current state is
either ESTABLISHED, SHUTDOWN-RECEIVED or SHUTDOWN-PENDING."
REFERENCE
"Section 4 in RFC2960 covers the SCTP Association state
diagram."
If the T4 RTO timer expires many times(timeout), the association will enter
CLOSED state, so we should dec the number of SCTP_MIB_CURRESTAB, not inc the
number of SCTP_MIB_CURRESTAB.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Graham [Thu, 9 Oct 2008 21:29:26 +0000 (14:29 -0700)]
e1000: don't generate bad checksums for tcp packets with 0 csum
When offloading transmit checksums only, the driver was not
correctly configuring the hardware to handle the case of a zero
checksum. For UDP the correct behavior is to leave it alone, but
for tcp the checksum must be changed from 0x0000 to 0xFFFF. The
hardware takes care of this case but only if it is told the
packet is tcp.
same patch as e1000e
Signed-off-by: Dave Graham <david.graham@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Graham [Thu, 9 Oct 2008 21:28:58 +0000 (14:28 -0700)]
e1000e: don't generate bad checksums for tcp packets with 0 csum
When offloading transmit checksums only, the driver was not
correctly configuring the hardware to handle the case of a zero
checksum. For UDP the correct behavior is to leave it alone, but
for tcp the checksum must be changed from 0x0000 to 0xFFFF. The
hardware takes care of this case but only if it is told the
packet is tcp.
Signed-off-by: Dave Graham <david.graham@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 9 Oct 2008 21:04:54 +0000 (14:04 -0700)]
Don't allow splice() to files opened with O_APPEND
This is debatable, but while we're debating it, let's disallow the
combination of splice and an O_APPEND destination.
It's not entirely clear what the semantics of O_APPEND should be, and
POSIX apparently expects pwrite() to ignore O_APPEND, for example. So
we could make up any semantics we want, including the old ones.
But Miklos convinced me that we should at least give it some thought,
and that accepting writes at arbitrary offsets is wrong at least for
IS_APPEND() files (which always have O_APPEND set, even if the reverse
isn't true: you can obviously have O_APPEND set on a regular file).
So disallow O_APPEND entirely for now. I doubt anybody cares, and this
way we have one less gray area to worry about.
Benjamin Li [Thu, 9 Oct 2008 19:26:41 +0000 (12:26 -0700)]
bnx2: Handle DMA mapping errors.
Before, the driver would not care about the return codes from pci_map_*
functions. This could be potentially dangerous if a mapping failed.
Now, we will check all pci_map_* calls. On the transmit side, we switch
to use the new function skb_dma_map(). On the receive side, we add
pci_dma_mapping_error().
Signed-off-by: Benjamin Li <benli@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 9 Oct 2008 19:21:46 +0000 (12:21 -0700)]
bnx2: Check netif_running() in all ethtool operations.
We need to check netif_running() state in most ethtool operations
and properly handle the !netif_running() state where the chip is
in an uninitailzed state or low power state that may not accept
any MMIO.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Benjamin Li <benli@broadcom.com> Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 9 Oct 2008 19:21:08 +0000 (12:21 -0700)]
bnx2: Add bnx2_shutdown_chip().
This logic is used in bnx2_close() and bnx2_suspend() and
so should be separated out into a separate function.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Benjamin Li <benli@broadcom.com> Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Mackall [Wed, 8 Oct 2008 19:51:57 +0000 (14:51 -0500)]
SLOB: fix bogus ksize calculation fix
This fixes the previous fix, which was completely wrong on closer
inspection. This version has been manually tested with a user-space
test harness and generates sane values. A nearly identical patch has
been boot-tested.
The problem arose from changing how kmalloc/kfree handled alignment
padding without updating ksize to match. This brings it in sync.
Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Herbert Xu [Thu, 9 Oct 2008 19:03:17 +0000 (12:03 -0700)]
inet: Make tunnel RX/TX byte counters more consistent
This patch makes the RX/TX byte counters for IPIP, GRE and SIT more
consistent. Previously we included the external IP headers on the
way out but not when the packet is inbound.
The new scheme is to count payload only in both directions. For
IPIP and SIT this simply means the exclusion of the external IP
header. For GRE this means that we exclude the GRE header as
well.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 9 Oct 2008 19:00:17 +0000 (12:00 -0700)]
gre: Add Transparent Ethernet Bridging
This patch adds support for Ethernet over GRE encapsulation.
This is exposed to user-space with a new link type of "gretap"
instead of "gre". It will create an ARPHRD_ETHER device in
lieu of the usual ARPHRD_IPGRE.
Note that to preserver backwards compatibility all Transparent
Ethernet Bridging packets are passed to an ARPHRD_IPGRE tunnel
if its key matches and there is no ARPHRD_ETHER device whose
key matches more closely.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 9 Oct 2008 18:59:55 +0000 (11:59 -0700)]
gre: Add netlink interface
This patch adds a netlink interface that will eventually displace
the existing ioctl interface. It utilises the elegant rtnl_link_ops
mechanism.
This also means that user-space no longer needs to rely on the
tunnel interface being of type GRE to identify GRE tunnels. The
identification can now occur using rtnl_link_ops.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 9 Oct 2008 18:59:32 +0000 (11:59 -0700)]
gre: Move MTU setting out of ipgre_tunnel_bind_dev
This patch moves the dev->mtu setting out of ipgre_tunnel_bind_dev.
This is in prepartion of using rtnl_link where we'll need to make
the MTU setting conditional on whether the user has supplied an
MTU. This also requires the move of the ipgre_tunnel_bind_dev
call out of the dev->init function so that we can access the user
parameters later.
This patch also adds a check to prevent setting the MTU below
the minimum of 68.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Thu, 9 Oct 2008 18:58:54 +0000 (11:58 -0700)]
gre: Use needed_headroom
Now that we have dev->needed_headroom, we can use it instead of
having a bogus dev->hard_header_len. This also allows us to
include dev->hard_header_len in the MTU computation so that when
we do have a meaningful hard_harder_len in future it is included
automatically in figuring out the MTU.
Incidentally, this fixes a bug where we ignored the needed_headroom
field of the underlying device in calculating our own hard_header_len.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Sven Wegener [Sat, 20 Sep 2008 14:50:08 +0000 (16:50 +0200)]
[CPUFREQ] Don't export governors for default governor
We don't need to export the governors for use as the default governor,
because the default governor will be built-in anyway and we can access
the symbol directly.
This also fixes the following sparse warnings:
drivers/cpufreq/cpufreq_conservative.c:578:25: warning: symbol 'cpufreq_gov_conservative' was not declared. Should it be static?
drivers/cpufreq/cpufreq_ondemand.c:582:25: warning: symbol 'cpufreq_gov_ondemand' was not declared. Should it be static?
drivers/cpufreq/cpufreq_performance.c:39:25: warning: symbol 'cpufreq_gov_performance' was not declared. Should it be static?
drivers/cpufreq/cpufreq_powersave.c:38:25: warning: symbol 'cpufreq_gov_powersave' was not declared. Should it be static?
drivers/cpufreq/cpufreq_userspace.c:190:25: warning: symbol 'cpufreq_gov_userspace' was not declared. Should it be static?
Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Signed-off-by: Dave Jones <davej@redhat.com>
[CPUFREQ][6/6] cpufreq: Add idle microaccounting in ondemand governor
Use get_cpu_idle_time_us() to get micro-accounted idle information.
This enables ondemand to get more accurate idle and busy timings
than the jiffy based calculation. As a result, we can decrease
the ondemand safety gaurd band from 80-10 to 95-3.
Results in more aggressive power savings.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Dave Jones <davej@redhat.com>
[CPUFREQ][5/6] cpufreq: Changes to get_cpu_idle_time_us(), used by ondemand governor
export get_cpu_idle_time_us() for it to be used in ondemand governor.
Last update time can be current time when the CPU is currently non-idle,
accounting for the busy time since last idle.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Dave Jones <davej@redhat.com>
[CPUFREQ][4/6] cpufreq_ondemand: Parameterize down differential
Use a parameter for down differential, instead of hardcoded 10%. Follow-on
patch changes the down-differential dynamically, based on whether
we are using idle micro-accounting or not.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Dave Jones <davej@redhat.com>
[CPUFREQ][3/6] cpufreq: get_cpu_idle_time() changes in ondemand for idle-microaccounting
Preparatory changes for doing idle micro-accounting in ondemand governor.
get_cpu_idle_time() gets extra parameter and returns idle time and also the
wall time that corresponds to the idle time measurement.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Dave Jones <davej@redhat.com>
[CPUFREQ][2/6] cpufreq: Change load calculation in ondemand for software coordination
Change the load calculation algorithm in ondemand to work well with software
coordination of frequency across the dependent cpus.
Multiply individual CPU utilization with the average freq of that logical CPU
during the measurement interval (using getavg call). And find the max CPU
utilization number in terms of CPU freq. That number is then used to
get to the target freq for next sampling interval.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Dave Jones <davej@redhat.com>
[CPUFREQ][1/6] cpufreq: Add cpu number parameter to __cpufreq_driver_getavg()
Add a cpu parameter to __cpufreq_driver_getavg(). This is needed for software
cpufreq coordination where policy->cpu may not be same as the CPU on which we
want to getavg frequency.
A follow-on patch will use this parameter to getavg freq from all cpus
in policy->cpus.
Change since last patch. Fix the offline/online and suspend/resume
oops reported by Youquan Song <youquan.song@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Dave Jones <davej@redhat.com>
Ben Slusky [Mon, 7 Jul 2008 17:16:20 +0000 (13:16 -0400)]
[CPUFREQ] use deferrable delayed work init in conservative governor
Venki Pallipadi made a similar change to the ondemand governor a while
back (in commit 28287033e12463c8ff89f1ea8038783d0360391c). It seems to
work just as well in the conservative governor, leading to fewer wakeups
as reported by powertop.
Signed-off-by: Ben Slusky <sluskyb@paranoiacs.org> Signed-off-by: Dave Jones <davej@redhat.com>
After calling cpufreq_cpu_get, error handling code should call
cpufreq_cpu_put.
The semantic match that finds this problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@r@
expression x,E;
statement S;
position p1,p2,p3;
@@
(
if ((x = cpufreq_cpu_get@p1(...)) == NULL || ...) S
|
x = cpufreq_cpu_get@p1(...)
... when != x
if (x == NULL || ...) S
)
<...
if@p3 (...) { ... when != cpufreq_cpu_put(x)
when != if (x) { ... cpufreq_cpu_put(x); ...}
return@p2 ...;
}
...>
(
return x;
|
return 0;
|
x = E
|
E = x
|
cpufreq_cpu_put(x)
)
@exists@
position r.p1,r.p2,r.p3;
expression x;
int ret != 0;
statement S;
@@
* x = cpufreq_cpu_get@p1(...)
<...
* if@p3 (...)
S
...>
* return@p2 \(NULL\|ret\);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Dave Jones <davej@redhat.com>
Creating a subvolume is in many ways like a normal VFS ->mkdir, and we
really need to play with the VFS topology locking rules. So instead of
just creating the snapshot on disk and then later getting rid of
confliting aliases do it correctly from the start. This will become
especially important once we allow for subvolumes anywhere in the tree,
and not just below a hidden root.
Note that snapshots will need the same treatment, but do to the delay
in creating them we can't do it currently. Chris promised to fix that
issue, so I'll wait on that.
Trond Myklebust [Thu, 9 Oct 2008 17:27:55 +0000 (13:27 -0400)]
NFS: Fix attribute updates
This fixes a regression seen when running the Connectathon testsuite
against an ext3 filesystem. The reason was that the inode was constantly
being marked as 'just updated' by the jiffy wraparound test.
This again meant that newer GETATTR calls were failing to pass the
nfs_inode_attrs_need_update() test unless the changes caused a ctime update
on the server, since they were perceived as having been started before the
latest inode update.
Given that nfs_inode_attrs_need_update() already checks for wraparound
of nfsi->last_updated, we can drop the buggy "protection" in
nfs_update_inode().
Also make a slight micro-optimisation of nfs_inode_attrs_need_update(): we
are more often going to see time_after(fattr->time_start, nfsi->last_updated)
be true, rather than seeing an update of ctime/size, so put that test
first to ensure that we optimise away the ctime/size tests.
phylib: two dynamic mii_bus allocation fallout fixes
1. arch/powerpc/platforms/pasemi/gpio_mdio.c also needs to be
converted over to mdiobus_{alloc,free}().
2. drivers/net/phy/fixed.c used to embed a struct mii_bus into its
struct fixed_mdio_bus and then use container_of() to go from the
former to the latter. Since mii bus structures are no longer
embedded, we need to do something like use the mii bus private
pointer to go from mii_bus to fixed_mdio_bus instead.
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>