Steven Rostedt [Mon, 12 May 2008 19:20:46 +0000 (21:20 +0200)]
ftrace: user run time file reading
This patch creates a file called trace_pipe in the tracing
debug directory. This file is a consumer of the trace buffers.
This means that reads of this file consumes the entries from
the trace buffers so that they will not be read a second time,
as contrast to the static buffers latency_trace and trace.
Reading from the trace_pipe will remove the entries from trace
and latency_trace too.
The advantage that trace_pipe has is that it can record live
traces. It will block when there is nothing in the buffer,
and read the entries as they are entered. An EOF happens when
tracing is disabled (tracing_enabled = 0).
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:46 +0000 (21:20 +0200)]
ftrace: add a buffer for output
Later patches will need to print the same things as the seq output
does. But those outputs will not use the seq utility. This patch
adds a buffer to the iterator, that can be used by either the
seq utility or other output.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:45 +0000 (21:20 +0200)]
ftrace: change buffers to producer consumer
This patch changes the way the CPU trace buffers are handled.
Instead of always starting from the trace page head, the logic
is changed to a producer consumer logic. This allows for the
buffers to be drained while they are alive.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:45 +0000 (21:20 +0200)]
ftrace: reset selftests
The tests may leave stuff in the buffers. This resets the buffers
after each test is run. If a test fails, it does not reset the
buffer to avoid touching a buffer that is corrupted.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:45 +0000 (21:20 +0200)]
ftrace: startup tester on dynamic tracing.
This patch adds a startup self test on dynamic code modification
and filters. The test filters on a specific function, makes sure that
no other function is traced, exectutes the function, then makes sure that
the function is traced.
This patch also fixes a slight bug with the ftrace selftest, where
tracer_enabled was not being set.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:44 +0000 (21:20 +0200)]
ftrace: irqs off smp_processor_id() fix
The irqsoff function tracer did a __get_cpu_var to determine
if it should trace the function or not. The problem is that
__get_cpu_var can preempt between getting the CPU and reading
the cpu variable. This means that the cpu variable that is
being read is not from the cpu being run on.
At worst, this can give a false positive, where we trace the
function when we should not. It will never give a false negative
since we only want to trace when interrupts are disabled
and we never preempt when they are.
This fix adds a check after reading the irq flags to only
trace if the interrupts are actually disabled. It also changes
the reading of the cpu variable to use a raw_smp_processor_id
since we now don't care if we preempt. We still catch that fact.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:44 +0000 (21:20 +0200)]
ftrace: debug smp_processor_id, use notrace preempt disable
The debug smp_processor_id caused a recursive fault in debugging
the irqsoff tracer. The tracer used a smp_processor_id in the
ftrace callback, and this function called preempt_disable which
also is traced. This caused a recursive fault (stack overload).
Since using smp_processor_id without debugging on does not cause
faults with the tracer (even when the tracer is wrong), the
debug version should not cause a system reboot.
This changes the debug_smp_processor_id to use the notrace versions
of preempt_disable and enable.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:43 +0000 (21:20 +0200)]
ftrace: convert single large buffer into single pages.
Allocating large buffers for the tracer may fail easily.
This patch converts the buffer from a large ordered allocation
to single pages. It uses the struct page LRU field to link the
pages together.
Later patches may also implement dynamic increasing and decreasing
of the trace buffers.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:43 +0000 (21:20 +0200)]
ftrace: add filter select functions to trace
This patch adds two files to the debugfs system:
/debugfs/tracing/available_filter_functions
and
/debugfs/tracing/set_ftrace_filter
The available_filter_functions lists all functions that has been
recorded by the ftraced that has called the ftrace_record_ip function.
This is to allow users to see what functions have been converted
to nops and can be enabled for tracing.
To enable functions, simply echo the names (whitespace delimited)
into set_ftrace_filter. Simple wildcards are also allowed.
Steven Rostedt [Mon, 12 May 2008 19:20:43 +0000 (21:20 +0200)]
ftrace: use dynamic patching for updating mcount calls
This patch replaces the indirect call to the mcount function
pointer with a direct call that will be patched by the
dynamic ftrace routines.
On boot up, the mcount function calls the ftace_stub function.
When the dynamic ftrace code is initialized, the ftrace_stub
is replaced with a call to the ftrace_record_ip, which records
the instruction pointers of the locations that call it.
Later, the ftraced daemon will call kstop_machine and patch all
the locations to nops.
When a ftrace is enabled, the original calls to mcount will now
be set top call ftrace_caller, which will do a direct call
to the registered ftrace function. This direct call is also patched
when the function that should be called is updated.
All patching is performed by a kstop_machine routine to prevent any
type of race conditions that is associated with modifying code
on the fly.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:43 +0000 (21:20 +0200)]
ftrace: add ftrace_enabled sysctl to disable mcount function
This patch adds back the sysctl ftrace_enabled. This time it is
defaulted to on, if DYNAMIC_FTRACE is configured. When ftrace_enabled
is disabled, the ftrace function is set to the stub return.
If DYNAMIC_FTRACE is also configured, on ftrace_enabled = 0,
the registered ftrace functions will all be set to jmps, but no more
new calls to ftrace recording (used to find the ftrace calling sites)
will be called.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:42 +0000 (21:20 +0200)]
ftrace: dynamic enabling/disabling of function calls
This patch adds a feature to dynamically replace the ftrace code
with the jmps to allow a kernel with ftrace configured to run
as fast as it can without it configured.
The way this works, is on bootup (if ftrace is enabled), a ftrace
function is registered to record the instruction pointer of all
places that call the function.
Later, if there's still any code to patch, a kthread is awoken
(rate limited to at most once a second) that performs a stop_machine,
and replaces all the code that was called with a jmp over the call
to ftrace. It only replaces what was found the previous time. Typically
the system reaches equilibrium quickly after bootup and there's no code
patching needed at all.
e.g.
call ftrace /* 5 bytes */
is replaced with
jmp 3f /* jmp is 2 bytes and we jump 3 forward */
3:
When we want to enable ftrace for function tracing, the IP recording
is removed, and stop_machine is called again to replace all the locations
of that were recorded back to the call of ftrace. When it is disabled,
we replace the code back to the jmp.
Allocation is done by the kthread. If the ftrace recording function is
called, and we don't have any record slots available, then we simply
skip that call. Once a second a new page (if needed) is allocated for
recording new ftrace function calls. A large batch is allocated at
boot up to get most of the calls there.
Because we do this via stop_machine, we don't have to worry about another
CPU executing a ftrace call as we modify it. But we do need to worry
about NMI's so all functions that might be called via nmi must be
annotated with notrace_nmi. When this code is configured in, the NMI code
will not call notrace.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:42 +0000 (21:20 +0200)]
ftrace: trace preempt off critical timings
Add preempt off timings. A lot of kernel core code is taken from the RT patch
latency trace that was written by Ingo Molnar.
This adds "preemptoff" and "preemptirqsoff" to /debugfs/tracing/available_tracers
Now instead of just tracing irqs off, preemption off can be selected
to be recorded.
When this is selected, it shares the same files as irqs off timings.
One can either trace preemption off, irqs off, or one or the other off.
By echoing "preemptoff" into /debugfs/tracing/current_tracer, recording
of preempt off only is performed. "irqsoff" will only record the time
irqs are disabled, but "preemptirqsoff" will take the total time irqs
or preemption are disabled. Runtime switching of these options is now
supported by simpling echoing in the appropriate trace name into
/debugfs/tracing/current_tracer.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:42 +0000 (21:20 +0200)]
ftrace: trace irq disabled critical timings
This patch adds latency tracing for critical timings
(how long interrupts are disabled for).
"irqsoff" is added to /debugfs/tracing/available_tracers
Note:
tracing_max_latency
also holds the max latency for irqsoff (in usecs).
(default to large number so one must start latency tracing)
tracing_thresh
threshold (in usecs) to always print out if irqs off
is detected to be longer than stated here.
If irq_thresh is non-zero, then max_irq_latency
is ignored.
Here's an example of a trace with ftrace_enabled = 0
Steven Rostedt [Mon, 12 May 2008 19:20:42 +0000 (21:20 +0200)]
ftrace: tracer for scheduler wakeup latency
This patch adds the tracer that tracks the wakeup latency of the
highest priority waking task.
"wakeup" is added to /debugfs/tracing/available_tracers
Also added to /debugfs/tracing
tracing_max_latency
holds the current max latency for the wakeup
wakeup_thresh
if set to other than zero, a log will be recorded
for every wakeup that takes longer than the number
entered in here (usecs for all counters)
(deletes previous trace)
Steven Rostedt [Mon, 12 May 2008 19:20:42 +0000 (21:20 +0200)]
ftrace: function tracer
This is a simple trace that uses the ftrace infrastructure. It is
designed to be fast and small, and easy to use. It is useful to
record things that happen over a very short period of time, and
not to analyze the system in general.
Updates:
available_tracers
"function" is added to this file.
current_tracer
To enable the function tracer:
echo function > /debugfs/tracing/current_tracer
To disable the tracer:
echo disable > /debugfs/tracing/current_tracer
The output of the function_trace file is as follows
Steven Rostedt [Mon, 12 May 2008 19:20:42 +0000 (21:20 +0200)]
ftrace: latency tracer infrastructure
This patch adds the latency tracer infrastructure. This patch
does not add anything that will select and turn it on, but will
be used by later patches.
If it were to be compiled, it would add the following files
to the debugfs:
The root tracing directory:
/debugfs/tracing/
This patch also adds the following files:
available_tracers
list of available tracers. Currently no tracers are
available. Looking into this file only shows
"none" which is used to unregister all tracers.
current_tracer
The trace that is currently active. Empty on start up.
To switch to a tracer simply echo one of the tracers that
are listed in available_tracers:
example: (used with later patches)
echo function > /debugfs/tracing/current_tracer
To disable the tracer:
echo disable > /debugfs/tracing/current_tracer
tracing_enabled
echoing "1" into this file starts the ftrace function tracing
(if sysctl kernel.ftrace_enabled=1)
echoing "0" turns it off.
latency_trace
This file is readonly and holds the result of the trace.
trace
This file outputs a easier to read version of the trace.
iter_ctrl
Controls the way the output of traces look.
So far there's two controls:
echoing in "symonly" will only show the kallsyms variables
without the addresses (if kallsyms was configured)
echoing in "verbose" will change the output to show
a lot more data, but not very easy to understand by
humans.
echoing in "nosymonly" turns off symonly.
echoing in "noverbose" turns off verbose.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
ftrace: add basic support for gcc profiler instrumentation
If CONFIG_FTRACE is selected and /proc/sys/kernel/ftrace_enabled is
set to a non-zero value the ftrace routine will be called everytime
we enter a kernel function that is not marked with the "notrace"
attribute.
The ftrace routine will then call a registered function if a function
happens to be registered.
[ This code has been highly hacked by Steven Rostedt and Ingo Molnar,
so don't blame Arnaldo for all of this ;-) ]
Update:
It is now possible to register more than one ftrace function.
If only one ftrace function is registered, that will be the
function that ftrace calls directly. If more than one function
is registered, then ftrace will call a function that will loop
through the functions to call.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
ftrace: annotate core code that should not be traced
Mark with "notrace" functions in core code that should not be
traced. The "notrace" attribute will prevent gcc from adding
a call to ftrace on the annotated funtions.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:41 +0000 (21:20 +0200)]
x86: add notrace annotations to vsyscall.
Add the notrace annotations to the vsyscall functions - there we are
not in kernel context yet, so the tracer function cannot (and must not)
be called.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Ingo Molnar [Mon, 12 May 2008 19:20:41 +0000 (21:20 +0200)]
tracing: add notrace to linkage.h
notrace signals that a function should not be traced. Most of the
time this is used by tracers to annotate code that cannot be
traced - it's in a volatile state (such as in user vdso context
or NMI context) or it's in the tracer internals.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:41 +0000 (21:20 +0200)]
ftrace: add preempt_enable/disable notrace macros
The tracer may need to call preempt_enable and disable functions
for time keeping and such. The trace gets ugly when we see these
functions show up for all traces. To make the output cleaner
this patch adds preempt_enable_notrace and preempt_disable_notrace
to be used by tracer (and debugging) functions.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Steven Rostedt [Mon, 12 May 2008 19:20:41 +0000 (21:20 +0200)]
ftrace: make the task state char-string visible to all
The tracer wants to be able to convert the state number
into a user visible character. This patch pulls that conversion
string out the scheduler into the header. This way if it were to
ever change, other parts of the kernel will know.
Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Jeff Layton [Thu, 22 May 2008 13:33:34 +0000 (09:33 -0400)]
silently ignore ownership changes unless unix extensions are enabled or we're faking uid changes
CIFS currently allows you to change the ownership of a file, but unless
unix extensions are enabled this change is not passed off to the server.
Have CIFS silently ignore ownership changes that can't be persistently
stored on the server unless the "setuids" option is explicitly
specified.
We could return an error here (-EOPNOTSUPP or something), but this is
how most disk-based windows filesystems on behave on Linux (e.g. VFAT,
NTFS, etc). With cifsacl support and proper Windows to Unix idmapping
support, we may be able to do this more properly in the future.
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
Jeff Layton [Thu, 22 May 2008 13:33:34 +0000 (09:33 -0400)]
when creating new inodes, use file_mode/dir_mode exclusively on mount without unix extensions
When CIFS creates a new inode on a mount without unix extensions, it
temporarily assigns the mode that was passed to it in the create/mkdir
call. Eventually, when the inode is revalidated, it changes to have the
file_mode or dir_mode for the mount. This is confusing to users who
expect that the mode shouldn't change this way. It's also problematic
since only the mode is treated this way, not the uid or gid. Suppose you
have a CIFS mount that's mounted with:
uid=0,gid=0,file_mode=0666,dir_mode=0777
...if an unprivileged user comes along and does this on the mount:
mkdir -m 0700 foo
touch foo/bar
...there is a period of time where the touch will fail, since the dir
will initially be owned by root and have mode 0700. If the user waits
long enough, then "foo" will be revalidated and will get the correct
dir_mode permissions.
This patch changes cifs_mkdir and cifs_create to not overwrite the
mode found by the initial cifs_get_inode_info call after the inode is
created on the server. Legacy behavior can be reenabled with the
new "dynperm" mount option.
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
Jeff Layton [Thu, 22 May 2008 13:31:40 +0000 (09:31 -0400)]
on non-posix shares, clear write bits in mode when ATTR_READONLY is set
When mounting a share with posix extensions disabled,
cifs_get_inode_info turns off all the write bits in the mode for regular
files if ATTR_READONLY is set. Directories and other inode types,
however, can also have ATTR_READONLY set, but the mode gives no
indication of this.
This patch makes this apply to other inode types besides regular files.
It also cleans up how modes are set in cifs_get_inode_info for both the
"normal" and "dynperm" cases.
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
Linus Torvalds [Fri, 23 May 2008 18:11:44 +0000 (11:11 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB/mad: Fix kernel crash when .process_mad() returns SUCCESS|CONSUMED
IPoIB: Test for NULL broadcast object in ipiob_mcast_join_finish()
MAINTAINERS: Add cxgb3 and iw_cxgb3 NIC and iWARP driver entries
IB/mlx4: Fix creation of kernel QP with max number of send s/g entries
IB/mthca: Fix max_sge value returned by query_device
RDMA/cxgb3: Fix uninitialized variable warning in iwch_post_send()
IB/mlx4: Fix uninitialized-var warning in mlx4_ib_post_send()
IB/ipath: Fix UC receive completion opcode for RDMA WRITE with immediate
IB/ipath: Fix printk format for ipath_sdma_status
Dave Olson [Fri, 23 May 2008 17:52:59 +0000 (10:52 -0700)]
IB/mad: Fix kernel crash when .process_mad() returns SUCCESS|CONSUMED
If a low-level driver returns IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_CONSUMED,
handle_outgoing_dr_smp() doesn't clean up properly. The fix is to
kfree the local data and break, rather than falling through. This was
observed with the ipath driver, but could happen with any driver.
This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1027>.
Signed-off-by: Dave Olson <dave.olson@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
The for_each_cpu_mask loop is used quite often in the kernel. It
makes use of two functions: first_cpu and next_cpu. This patch
changes for_each_cpu_mask to use only the latter. Because next_cpu
finds the next eligible cpu _after_ the given one, the iteration
variable has to be initialized to -1 and next_cpu has to be
called with this value before the first iteration. An x86_64
defconfig kernel (from sched/latest) is about 2500 bytes smaller
with this patch applied:
text data bss dec hex filename 6222517 917952 749932 7890401 7865e1 vmlinux.orig 6219922 917952 749932 7887806 785bbe vmlinux
The same size reduction is seen for defconfig+MAXSMP
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Mike Travis [Mon, 12 May 2008 19:21:13 +0000 (21:21 +0200)]
clocksource/events: use performance variant for_each_cpu_mask_nr
Change references from for_each_cpu_mask to for_each_cpu_mask_nr
where appropriate
Reviewed-by: Paul Jackson <pj@sgi.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Mike Travis [Mon, 12 May 2008 19:21:13 +0000 (21:21 +0200)]
infiniband: use performance variant for_each_cpu_mask_nr
Change references from for_each_cpu_mask to for_each_cpu_mask_nr
where appropriate
Reviewed-by: Paul Jackson <pj@sgi.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Mike Travis [Mon, 12 May 2008 19:21:13 +0000 (21:21 +0200)]
x86: use performance variant for_each_cpu_mask_nr
Change references from for_each_cpu_mask to for_each_cpu_mask_nr
where appropriate
Reviewed-by: Paul Jackson <pj@sgi.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
commit 2d474871e2fb092eb46a0930aba5442e10eb96cc
Author: Mike Travis <travis@sgi.com>
Date: Mon May 12 21:21:13 2008 +0200
Mike Travis [Mon, 12 May 2008 19:21:13 +0000 (21:21 +0200)]
net: use performance variant for_each_cpu_mask_nr
Change references from for_each_cpu_mask to for_each_cpu_mask_nr
where appropriate
Reviewed-by: Paul Jackson <pj@sgi.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Mike Travis [Mon, 12 May 2008 19:21:13 +0000 (21:21 +0200)]
mm: use performance variant for_each_cpu_mask_nr
Change references from for_each_cpu_mask to for_each_cpu_mask_nr
where appropriate
Reviewed-by: Paul Jackson <pj@sgi.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Mike Travis [Mon, 12 May 2008 19:21:13 +0000 (21:21 +0200)]
core: use performance variant for_each_cpu_mask_nr
Change references from for_each_cpu_mask to for_each_cpu_mask_nr
where appropriate
Reviewed-by: Paul Jackson <pj@sgi.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Mike Travis [Mon, 12 May 2008 19:21:13 +0000 (21:21 +0200)]
cpufreq: use performance variant for_each_cpu_mask_nr
Change references from for_each_cpu_mask to for_each_cpu_mask_nr
where appropriate
Reviewed-by: Paul Jackson <pj@sgi.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Mike Travis [Mon, 12 May 2008 19:21:13 +0000 (21:21 +0200)]
acpi: use performance variant for_each_cpu_mask_nr
Change references from for_each_cpu_mask to for_each_cpu_mask_nr
where appropriate
Reviewed-by: Paul Jackson <pj@sgi.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Mike Travis [Mon, 12 May 2008 19:21:13 +0000 (21:21 +0200)]
x86: Add performance variants of cpumask operators
* Increase performance for systems with large count NR_CPUS by limiting
the range of the cpumask operators that loop over the bits in a cpumask_t
variable. This removes a large amount of wasted cpu cycles.
* Add performance variants of the cpumask operators:
int cpus_weight_nr(mask) Same using nr_cpu_ids instead of NR_CPUS
int first_cpu_nr(mask) Number lowest set bit, or nr_cpu_ids
int next_cpu_nr(cpu, mask) Next cpu past 'cpu', or nr_cpu_ids
for_each_cpu_mask_nr(cpu, mask) for-loop cpu over mask using nr_cpu_ids
Note: The alternate operations with the suffix "_nr" are used
to limit the range of the loop to nr_cpu_ids instead of
NR_CPUS when NR_CPUS > 64 for performance reasons.
If NR_CPUS is <= 64 then most assembler bitmask
operators execute faster with a constant range, so
the operator will continue to use NR_CPUS.
Another consideration is that nr_cpu_ids is initialized
to NR_CPUS and isn't lowered until the possible cpus are
discovered (including any disabled cpus). So early uses
will span the entire range of NR_CPUS.
(The net effect is that for systems with 64 or less CPU's there are no
functional changes.)
For inclusion into sched-devel/latest tree.
Based on:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
+ sched-devel/latest .../mingo/linux-2.6-sched-devel.git
Cc: Paul Jackson <pj@sgi.com> Cc: Christoph Lameter <clameter@sgi.com> Reviewed-by: Paul Jackson <pj@sgi.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mike Travis [Mon, 12 May 2008 19:21:12 +0000 (21:21 +0200)]
sched: replace MAX_NUMNODES with nr_node_ids in kernel/sched.c
* Replace usages of MAX_NUMNODES with nr_node_ids in kernel/sched.c,
where appropriate. This saves some allocated space as well as many
wasted cycles going through node entries that are non-existent.
For inclusion into sched-devel/latest tree.
Based on:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
+ sched-devel/latest .../mingo/linux-2.6-sched-devel.git
Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Mon, 12 May 2008 19:21:15 +0000 (21:21 +0200)]
x86: prevent PGE flush from interruption/preemption
CR4 manipulation is not protected against interrupts and preemption,
but KVM uses smp_function_call to manipulate the X86_CR4_VMXE bit
either from the CPU hotplug code or from the kvm_init call.
We need to protect the CR4 manipulation from both interrupts and
preemption.
Original bug report: http://lkml.org/lkml/2008/5/7/48
Bugzilla entry: http://bugzilla.kernel.org/show_bug.cgi?id=10642
This is not a regression from 2.6.25, it's a long standing and hard to
trigger bug.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Jesse Barnes [Fri, 23 May 2008 15:40:45 +0000 (08:40 -0700)]
remove debug printk from DRM suspend path
Not sure how this snuck upstream, but it really doesn't belong there. We
don't need a KERN_ERR printk in the suspend path to know what's going on (at
least not anymore).
Linus Torvalds [Fri, 23 May 2008 15:15:12 +0000 (08:15 -0700)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] iSeries: Remove unused mail address
[POWERPC] mpic: Fix use of uninitialized variable
[POWERPC] Add kernstart_addr to list of allowed symbols in prom_init
[POWERPC] Fix __set_fixmap() for STRICT_MM_TYPECHECKS
[POWERPC] PS3: Fix memory hotplug
Linus Torvalds [Fri, 23 May 2008 15:13:39 +0000 (08:13 -0700)]
Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6
* 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6:
[XFS] Fix memory corruption with small buffer reads
[XFS] Fix inode list allocation size in writeback.
[XFS] Don't allow memory reclaim to wait on the filesystem in inode
[XFS] Fix fsync() b0rkage.
[XFS] Include linux/random.h in all builds, not just debug builds.
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
stop_machine: make stop_machine_run more virtualization friendly
doc: add a chapter about trylock functions [Bug 9011]
modules: proper cleanup of kobject without CONFIG_SYSFS
module loading ELF handling: use SELFMAG instead of numeric constant
Harvey Harrison [Thu, 22 May 2008 22:45:08 +0000 (15:45 -0700)]
fbdev: fix integer as NULL pointer warning
drivers/video/aty/atyfb_base.c:3359:26: warning: Using plain integer as NULL pointer
drivers/video/aty/radeon_base.c:2280:32: warning: Using plain integer as NULL pointer
drivers/video/matrox/matroxfb_base.h:203:25: warning: Using plain integer as NULL pointer
drivers/video/matrox/matroxfb_base.h:203:25: warning: Using plain integer as NULL pointer
drivers/video/sis/sis_main.c:5790:44: warning: Using plain integer as NULL pointer
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Harvey Harrison [Thu, 22 May 2008 22:45:07 +0000 (15:45 -0700)]
scsi: fix integer as NULL pointer warning
drivers/scsi/aha152x.c:3585:60: warning: Using plain integer as NULL pointer
drivers/scsi/aha152x.c:3845:56: warning: Using plain integer as NULL pointer
drivers/scsi/qla1280.c:2814:37: warning: Using plain integer as NULL pointer
drivers/scsi/atp870u.c:750:47: warning: Using plain integer as NULL pointer
drivers/scsi/3w-9xxx.c:1281:36: warning: Using plain integer as NULL pointer
drivers/scsi/3w-9xxx.c:1293:36: warning: Using plain integer as NULL pointer
drivers/scsi/3w-9xxx.c:1301:35: warning: Using plain integer as NULL pointer
drivers/scsi/hptiop.c:447:10: warning: Using plain integer as NULL pointer
drivers/scsi/hptiop.c:457:10: warning: Using plain integer as NULL pointer
drivers/scsi/hptiop.c:479:24: warning: Using plain integer as NULL pointer
drivers/scsi/hptiop.c:483:22: warning: Using plain integer as NULL pointer
drivers/scsi/hptiop.c:1213:23: warning: Using plain integer as NULL pointer
drivers/scsi/hptiop.c:1214:23: warning: Using plain integer as NULL pointer
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Harvey Harrison [Thu, 22 May 2008 22:45:07 +0000 (15:45 -0700)]
isdn: fix integer as NULL pointer warning
drivers/isdn/hysdn/hycapi.c:465:42: warning: Using plain integer as NULL pointer
drivers/isdn/hysdn/hycapi.c:467:44: warning: Using plain integer as NULL pointer
drivers/isdn/hysdn/hycapi.c:469:42: warning: Using plain integer as NULL pointer
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Harvey Harrison [Thu, 22 May 2008 22:45:06 +0000 (15:45 -0700)]
acpi: fix integer as NULL pointer warning
drivers/acpi/dispatcher/dsmethod.c:568:50: warning: Using plain integer as NULL pointer
drivers/acpi/executer/exmutex.c:329:30: warning: Using plain integer as NULL pointer
drivers/acpi/executer/exmutex.c:466:31: warning: Using plain integer as NULL pointer
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Circular include dependencies are dangerous since they can result in
inconsistent definitions being provided to other code, especially if
'#ifndef' constructs are used.
Solve these by removing the offending includes, and add additional
includes where necessary.
Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Greg Ungerer [Fri, 23 May 2008 07:31:39 +0000 (08:31 +0100)]
[ARM] 5053/1: define before use of processor_id
For the simple read_cpuid() macro case the variable processor_id has
no definition on use of the macro. Add an extern for it. Move all the
processor ID macros into the #ifndef __ASSEMBLEY__ block.
Signed-off-by: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Andi Kleen [Wed, 14 May 2008 23:10:42 +0000 (16:10 -0700)]
x86: use explicit copy in vdso_gettimeofday()
Jeremy's gcc 3.4 seems to be unable to inline a 8 byte memcpy. But the
vdso doesn't support external references. Copy the structure members
of struct timezone explicitely instead.
Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Tue, 13 May 2008 10:31:00 +0000 (12:31 +0200)]
x86: distangle user disabled TSC from unstable
tsc_enabled is set to 0 from the command line switch "notsc" and from
the mark_tsc_unstable code. Seperate those functionalities and replace
tsc_enable with tsc_disable. This makes also the native_sched_clock()
decision when to use TSC understandable.
Preparatory patch to solve the sched_clock() issue on 32 bit.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Sun, 18 May 2008 17:27:48 +0000 (19:27 +0200)]
x86: fix setup of cyc2ns in tsc_64.c
When the TSC is calibrated against the PIT due to the nonavailability
of PMTIMER/HPET or due to SMI interference then the setup of the per
CPU cyc2ns variables is skipped. This is unlikely to happen but it
would definitely render sched_clock() unusable.
[XFS] Fix memory corruption with small buffer reads
When we have multiple buffers in a single page for a blocksize == pagesize
filesystem we might overwrite the page contents if two callers hit it
shortly after each other. To prevent that we need to keep the page locked
until I/O is completed and the page marked uptodate.
Thanks to Eric Sandeen for triaging this bug and finding a reproducible
testcase and Dave Chinner for additional advice.
This should fix kernel.org bz #10421.
Tested-by: Eric Sandeen <sandeen@sandeen.net>
SGI-PV: 981813
SGI-Modid: xfs-linux-melb:xfs-kern:31173a
Signed-off-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Lachlan McIlroy <lachlan@sgi.com>
Patrick McHardy [Fri, 23 May 2008 07:22:04 +0000 (00:22 -0700)]
vlan: Use bitmask of feature flags instead of seperate feature bits
Herbert Xu points out that the use of seperate feature bits for features
to be propagated to VLAN devices is going to get messy real soon.
Replace the VLAN feature bits by a bitmask of feature flags to be
propagated and restore the old GSO_SHIFT/MASK values.
Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Rothwell [Wed, 21 May 2008 06:24:31 +0000 (16:24 +1000)]
[POWERPC] mpic: Fix use of uninitialized variable
Compiling ppc64_defconfig with gcc 4.3 gives thes warnings:
arch/powerpc/sysdev/mpic.c: In function 'mpic_irq_get_priority':
arch/powerpc/sysdev/mpic.c:1351: warning: 'is_ipi' may be used uninitialized in this function
arch/powerpc/sysdev/mpic.c: In function 'mpic_irq_set_priority':
arch/powerpc/sysdev/mpic.c:1328: warning: 'is_ipi' may be used uninitialized in this function
It turns out that in the cases where is_ipi is uninitialized, another
variable (mpic) will be NULL and it is dereferenced. Protect against
this by returning if mpic is NULL in mpic_irq_set_priority, and removing
mpic_irq_get_priority completely as it has no in tree callers.
This has the nice side effect of making the warning go away.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Michael Ellerman [Tue, 20 May 2008 12:48:03 +0000 (22:48 +1000)]
[POWERPC] Add kernstart_addr to list of allowed symbols in prom_init
Since commit "85xx: Add support for relocatable kernel (and
booting at non-zero)" (37dd2badcfcec35f5e21a0926968d77a404f03c3),
PHYSICAL_START is #defined as kernstart_addr if RELOCATABLE
and FLATMEM is enabled.
PHYSICAL_START is used in prom_init.c and so kernstart_addr
needs to be added to the list of allowed symbols that
prom_init.c can access.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
Geoff Levand [Thu, 15 May 2008 20:09:59 +0000 (06:09 +1000)]
[POWERPC] PS3: Fix memory hotplug
A change was made to walk_memory_resource() in commit 4b119e21d0c66c22e8ca03df05d9de623d0eb50f that added a
check of find_lmb(). Add the coresponding lmb_add()
call to ps3_mm_add_memory() so that that check will
succeed.
This fixes the condition where the PS3 boots up with
only the 128 MiB of boot memory, and doesn't see the
other 128MiB that is available.
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
Michael Ellerman [Fri, 23 May 2008 04:21:30 +0000 (14:21 +1000)]
[POWERPC] Add debugging trigger to Axon MSI code
This adds some debugging code to the Axon MSI driver. It creates a
file per MSIC in /sys/kernel/debug/powerpc, which allows the user to
trigger a fake MSI interrupt by writing to the file.
This can be used to test some of the MSI generation path. In
particular, that the MSIC recognises a write to the MSI address,
generates an interrupt and writes the MSI packet into the ring buffer.
All the code is inside #ifdef DEBUG so it causes no harm unless it's
enabled.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
Roland McGrath [Sun, 11 May 2008 00:40:47 +0000 (10:40 +1000)]
[POWERPC] Tweak VDSO linker script to avoid upsetting old binutils
This works around bugs in older binutils' objcopy.
The placement of these sections does not really matter,
but it confused the buggy old BFD libraries.
Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
Anton Vorontsov [Tue, 29 Apr 2008 14:05:24 +0000 (00:05 +1000)]
[POWERPC] of/gpio: Use dynamic base allocation
Since the "gpiolib: dynamic gpio number allocation" patch was recently
merged into Linus' tree (commit 8d0aab2f16c4fa170f32e7a74a52cd0122bbafef),
we can use dynamic GPIO base allocation now.
This, in turn, removes number of gpios per chip constraint.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
David Chinner [Tue, 20 May 2008 01:30:15 +0000 (11:30 +1000)]
[XFS] Fix inode list allocation size in writeback.
We only need to allocate space for the number of inodes in the cluster
when writing back inodes, not every byte in the inode cluster. This
reduces the amount of memory needing to be allocated to 256 bytes instead
of 64k.
David Chinner [Mon, 19 May 2008 06:29:34 +0000 (16:29 +1000)]
[XFS] Don't allow memory reclaim to wait on the filesystem in inode
writeback
If we allow memory reclaim to wait on the pages under writeback in inode
cluster writeback we could deadlock because we are currently holding the
ILOCK on the initial writeback inode which is needed in data I/O
completion to change the file size or do unwritten extent conversion
before the pages are taken out of writeback state.