pilppa.org Git - linux-2.6-omap-h63xx.git/log

]> pilppa.org Git - linux-2.6-omap-h63xx.git/log

projects / linux-2.6-omap-h63xx.git / log

Pekka Paalanen [Tue, 16 Sep 2008 19:06:42 +0000 (22:06 +0300)]

ftrace: inject markers via trace_marker file

Allow a user to inject a marker (TRACE_PRINT entry) into the trace ring
buffer. The related file operations are derived from code by Frédéric
Weisbecker <fweisbec@gmail.com>.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Pekka Paalanen [Tue, 16 Sep 2008 19:03:56 +0000 (22:03 +0300)]

mmiotrace: remove left-over marker cruft

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Pekka Paalanen [Tue, 16 Sep 2008 19:02:27 +0000 (22:02 +0300)]

mmiotrace: handle TRACE_PRINT entries

Also make trace_seq_print_cont() non-static, and add a newline if the
seq buffer can't hold all data.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Pekka Paalanen [Tue, 16 Sep 2008 19:00:34 +0000 (22:00 +0300)]

x86 mmiotrace: implement mmiotrace_printk()

Offer mmiotrace users a function to inject markers from inside the kernel.
This depends on the trace_vprintk() patch.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Pekka Paalanen [Tue, 16 Sep 2008 18:58:24 +0000 (21:58 +0300)]

ftrace: add trace_vprintk()

trace_vprintk() for easier implementation of tracer specific *_printk
functions. Add check check for no_tracer, and implement
__ftrace_printk() as a wrapper.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Pekka Paalanen [Tue, 16 Sep 2008 18:56:41 +0000 (21:56 +0300)]

ftrace: move mmiotrace functions out of trace.c

Moves the mmiotrace specific functions from trace.c to
trace_mmiotrace.c. Functions trace_wake_up(), tracing_get_trace_entry(),
and tracing_generic_entry_update() are therefore made available outside
trace.c.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Pekka Paalanen [Tue, 16 Sep 2008 18:54:16 +0000 (21:54 +0300)]

x86 mmiotrace: fix a rare memory leak

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Sat, 6 Sep 2008 05:06:04 +0000 (01:06 -0400)]

ftrace: fix unlocking of hash

This must be brown paper bag week for Steven Rostedt!

While working on ftrace for PPC, I discovered that the hash locking done
when CONFIG_FTRACE_MCOUNT_RECORD is not set, is totally incorrect.

With a cut and paste error, I had the hash lock macro to lock for both
hash_lock _and_ hash_unlock!

This bug did not affect x86 since this bug was introduced when
CONFIG_FTRACE_MCOUNT_RECORD was added to x86.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Sat, 6 Sep 2008 05:06:03 +0000 (01:06 -0400)]

ftrace: use ftrace_release for all dynamic ftrace functions

ftrace_release is necessary for all uses of dynamic ftrace and not just
the archs that have CONFIG_FTRACE_MCOUNT_RECORD defined.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Ingo Molnar [Thu, 4 Sep 2008 12:04:51 +0000 (14:04 +0200)]

ftrace: make it depend on DEBUG_KERNEL

make most of the tracers depend on DEBUG_KERNEL - that's their intended
purpose. (most distributions have DEBUG_KERNEL enabled anyway so this is
not a practical limitation - but it simplifies the tracing menu in the
normal case)

Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Peter Zijlstra [Thu, 4 Sep 2008 08:24:16 +0000 (10:24 +0200)]

ftrace: sched_switch: show the wakee's cpu

While profiling the smp behaviour of the scheduler it was needed to know to
which cpu a task got woken.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Peter Zijlstra [Thu, 4 Sep 2008 08:24:14 +0000 (10:24 +0200)]

ftrace: make ftrace_printk usable with the other tracers

Currently ftrace_printk only works with the ftrace tracer, switch it to an
iter_ctrl setting so we can make us of them with other tracers too.

[rostedt@redhat.com: tweak to the disable condition]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Wed, 3 Sep 2008 21:42:51 +0000 (17:42 -0400)]

ftrace: print continue index fix

An item in the trace buffer that is bigger than one entry may be split
up using the TRACE_CONT entry. This makes it a virtual single entry.
The current code increments the iterator index even while traversing
TRACE_CONT entries, making it look like the iterator is further than
it actually is.

This patch adds code to not increment the iterator index while skipping
over TRACE_CONT entries.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Wed, 3 Sep 2008 21:42:50 +0000 (17:42 -0400)]

ftrace: binary and not logical for continue test

Peter Zijlstra provided me with a nice brown paper bag while letting me know
that I was doing a logical AND and not a binary one, making a condition
true more often than it should be.

Luckily, a false true is handled by the calling function and no harm is
done. But this needs to be fixed regardless.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Michael Ellerman [Wed, 20 Aug 2008 23:36:11 +0000 (16:36 -0700)]

ftrace: make output nicely spaced for up to 999 cpus

Currently some of the ftrace output goes skewiff if you have more
than 9 cpus, and some if you have more than 99.

Twiddle with the headers and format strings to make up to 999 cpus
display without causing spacing problems.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Ingo Molnar [Thu, 4 Sep 2008 13:04:37 +0000 (15:04 +0200)]

stack tracer: depends on DEBUG_KERNEL

Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Fri, 29 Aug 2008 20:51:43 +0000 (16:51 -0400)]

ftrace: stack trace add indexes

This patch adds indexes into the stack that the functions in the
stack dump were found at. As an added bonus, I also added a diff
to show which function is the most notorious consumer of the stack.

The output now looks like this:

# cat /debug/tracing/stack_trace
        Depth   Size      Location    (48 entries)
        -----   ----      --------
  0)     2476     212   blk_recount_segments+0x39/0x59
  1)     2264      12   bio_phys_segments+0x16/0x1d
  2)     2252      20   blk_rq_bio_prep+0x23/0xaf
  3)     2232      12   init_request_from_bio+0x74/0x77
  4)     2220      56   __make_request+0x294/0x331
  5)     2164     136   generic_make_request+0x34f/0x37d
  6)     2028      56   submit_bio+0xe7/0xef
  7)     1972      28   submit_bh+0xd1/0xf0
  8)     1944     112   block_read_full_page+0x299/0x2a9
  9)     1832       8   blkdev_readpage+0x14/0x16
10)     1824      28   read_cache_page_async+0x7e/0x109
11)     1796      16   read_cache_page+0x11/0x49
12)     1780      32   read_dev_sector+0x3c/0x72
13)     1748      48   read_lba+0x4d/0xaa
14)     1700     168   efi_partition+0x85/0x61b
15)     1532      72   rescan_partitions+0x10e/0x266
16)     1460      40   do_open+0x1c7/0x24e
17)     1420     292   __blkdev_get+0x79/0x84
18)     1128      12   blkdev_get+0x12/0x14
19)     1116      20   register_disk+0xd1/0x11e
20)     1096      28   add_disk+0x34/0x90
21)     1068      52   sd_probe+0x2b1/0x366
22)     1016      20   driver_probe_device+0xa5/0x120
23)      996       8   __device_attach+0xd/0xf
24)      988      32   bus_for_each_drv+0x3e/0x68
25)      956      24   device_attach+0x56/0x6c
26)      932      16   bus_attach_device+0x26/0x4d
27)      916      64   device_add+0x380/0x4b4
28)      852      28   scsi_sysfs_add_sdev+0xa1/0x1c9
29)      824     160   scsi_probe_and_add_lun+0x919/0xa2a
30)      664      36   __scsi_add_device+0x88/0xae
31)      628      44   ata_scsi_scan_host+0x9e/0x21c
32)      584      28   ata_host_register+0x1cb/0x1db
33)      556      24   ata_host_activate+0x98/0xb5
34)      532     192   ahci_init_one+0x9bd/0x9e9
35)      340      20   pci_device_probe+0x3e/0x5e
36)      320      20   driver_probe_device+0xa5/0x120
37)      300      20   __driver_attach+0x3f/0x5e
38)      280      36   bus_for_each_dev+0x40/0x62
39)      244      12   driver_attach+0x19/0x1b
40)      232      28   bus_add_driver+0x9c/0x1af
41)      204      28   driver_register+0x76/0xd2
42)      176      20   __pci_register_driver+0x44/0x71
43)      156       8   ahci_init+0x14/0x16
44)      148     100   _stext+0x42/0x122
45)       48      20   kernel_init+0x175/0x1dc
46)       28      28   kernel_thread_helper+0x7/0x10

The first column is simply an index starting from the inner most function
and counting down to the outer most.

The next column is the location that the function was found on the stack.

The next column is the size of the stack for that function.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Wed, 27 Aug 2008 17:02:01 +0000 (13:02 -0400)]

ftrace: remove warning of old objcopy and local functions

The warning messages about old objcopy and local functions spam the
user quite drastically. Remove the warning until we can find a nicer
way of tell the user to upgrade their objcopy.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Thu, 28 Aug 2008 03:24:15 +0000 (23:24 -0400)]

ftrace: remove direct reference to mcount in trace code

The mcount record method of ftrace scans objdump for references to mcount.
Using mcount as the reference to test if the calls to mcount being replaced
are indeed calls to mcount, this use of mcount was also caught as a
location to change. Using a variable that points to the mcount address
moves this reference into the data section that is not scanned, and
we do not use a false location to try and modify.

The warn on code was what was used to detect this bug.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Thu, 28 Aug 2008 03:31:01 +0000 (23:31 -0400)]

ftrace: add stack tracer

This is another tracer using the ftrace infrastructure, that examines
at each function call the size of the stack. If the stack use is greater
than the previous max it is recorded.

You can always see (and set) the max stack size seen. By setting it
to zero will start the recording again. The backtrace is also available.

For example:

# cat /debug/tracing/stack_max_size
1856

# cat /debug/tracing/stack_trace
[<c027764d>] stack_trace_call+0x8f/0x101
[<c021b966>] ftrace_call+0x5/0x8
[<c02553cc>] clocksource_get_next+0x12/0x48
[<c02542a5>] update_wall_time+0x538/0x6d1
[<c0245913>] do_timer+0x23/0xb0
[<c0257657>] tick_do_update_jiffies64+0xd9/0xf1
[<c02576b9>] tick_sched_timer+0x4a/0xad
[<c0250fe6>] __run_hrtimer+0x3e/0x75
[<c02518ed>] hrtimer_interrupt+0xf1/0x154
[<c022c870>] smp_apic_timer_interrupt+0x71/0x84
[<c021b7e9>] apic_timer_interrupt+0x2d/0x34
[<c0238597>] finish_task_switch+0x29/0xa0
[<c05abd13>] schedule+0x765/0x7be
[<c05abfca>] schedule_timeout+0x1b/0x90
[<c05ab4d4>] wait_for_common+0xab/0x101
[<c05ab5ac>] wait_for_completion+0x12/0x14
[<c033cfc3>] blk_execute_rq+0x84/0x99
[<c0402470>] scsi_execute+0xc2/0x105
[<c040250a>] scsi_execute_req+0x57/0x7f
[<c043afe0>] sr_test_unit_ready+0x3e/0x97
[<c043bbd6>] sr_media_change+0x43/0x205
[<c046b59f>] media_changed+0x48/0x77
[<c046b5ff>] cdrom_media_changed+0x31/0x37
[<c043b091>] sr_block_media_changed+0x16/0x18
[<c02b9e69>] check_disk_change+0x1b/0x63
[<c046f4c3>] cdrom_open+0x7a1/0x806
[<c043b148>] sr_block_open+0x78/0x8d
[<c02ba4c0>] do_open+0x90/0x257
[<c02ba869>] blkdev_open+0x2d/0x56
[<c0296a1f>] __dentry_open+0x14d/0x23c
[<c0296b32>] nameidata_to_filp+0x24/0x38
[<c02a1c68>] do_filp_open+0x347/0x626
[<c02967ef>] do_sys_open+0x47/0xbc
[<c02968b0>] sys_open+0x23/0x2b
[<c021aadd>] sysenter_do_call+0x12/0x26

I've tested this on both x86_64 and i386.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Andrew Morton [Wed, 27 Aug 2008 07:08:30 +0000 (09:08 +0200)]

kbuild: ftrace: don't assume that scripts/recordmcount.pl is executable

CHK     include/linux/version.h
  CHK     include/linux/utsrelease.h
  CC      scripts/mod/empty.o
/bin/sh: /usr/src/25/scripts/recordmcount.pl: Permission denied

We shouldn't assume that files have their `x' bits set.  There are various
ways in which file permissions get lost, including use of patch(1).

It might not be correct to assume that perl lives in $PATH?

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Mon, 25 Aug 2008 18:52:11 +0000 (14:52 -0400)]

ftrace: objcopy version test for local symbols

The --globalize-symbols option came out in objcopy version 2.17.
If the kernel is being compiled on a system with a lower version of
objcopy, then we can not use the globalize / localize trick to
link to symbols pointing to local functions.

This patch tests the version of objcopy and will only use the trick
if the version is greater than or equal to 2.17. Otherwise, if an
object has only local functions within a section, it will give a
nice warning and recommend the user to upgrade their objcopy.

Leaving the symbols unrecorded is not that big of a deal, since the
mcount record method changes the actual mcount code to be a simple
"ret" without recording registers or anything.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Ingo Molnar [Mon, 25 Aug 2008 06:12:04 +0000 (08:12 +0200)]

ftrace: clean up macro usage

enclose the argument in parenthesis. (especially since we cast it,
which is a high prio operation)

Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Stephen Rothwell [Mon, 25 Aug 2008 03:08:44 +0000 (13:08 +1000)]

ftrace: fix build failure

After disabling FTRACE_MCOUNT_RECORD via a patch, a dormant build
failure surfaced:

kernel/trace/ftrace.c: In function 'ftrace_record_ip':
kernel/trace/ftrace.c:416: error: incompatible type for argument 1 of '_spin_lock_irqsave'
kernel/trace/ftrace.c:433: error: incompatible type for argument 1 of '_spin_lock_irqsave'

Introduced by commit 6dad8e07f4c10b17b038e84d29f3ca41c2e55cd0 ("ftrace:
add necessary locking for ftrace records").

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Wed, 20 Aug 2008 16:55:07 +0000 (12:55 -0400)]

ftrace: x86 use copy to and from user functions

The modification of code is performed either by kstop_machine, before
SMP starts, or on module code before the module is executed. There is
no reason to do the modifications from assembly. The copy to and from
user functions are sufficient and produces cleaner and easier to read
code.

Thanks to Benjamin Herrenschmidt for suggesting the idea.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Wed, 20 Aug 2008 14:07:35 +0000 (10:07 -0400)]

ftrace: handle weak symbol functions

During tests and checks, I've discovered that there were failures to
convert mcount callers into nops. Looking deeper into these failures,
code that was attempted to be changed was not an mcount caller.
The current code only updates if the code being changed is what it expects,
but I still investigate any time there is a failure.

What was happening is that a weak symbol was being used as a reference
for other mcount callers. That weak symbol was also referenced elsewhere
so the offsets were using the strong symbol and not the function symbol
that it was referenced from.

This patch changes the setting up of the mcount_loc section to search
for a global function that is not weak. It will pick a local over a weak
but if only a weak is found in a section, a warning is printed and the
mcount location is not recorded (just to be safe).

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Fri, 15 Aug 2008 15:40:24 +0000 (11:40 -0400)]

ftrace: update recordmount.pl arch changes

I'm trying to keep all the arch changes in recordmcount.pl in one place.
I moved your code into that area, by adding the flags to the commands
that were passed in.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Jeremy Fitzhardinge [Mon, 18 Aug 2008 22:58:12 +0000 (15:58 -0700)]

ftrace: fix build problem with CONFIG_FTRACE

I'm seeing when I use separate src/build dirs:

make[3]: *** [arch/x86/kernel/time_32.o] Error 1
/bin/sh: scripts/recordmcount.pl: No such file or directory
make[3]: *** [arch/x86/kernel/irq_32.o] Error 1
/bin/sh: scripts/recordmcount.pl: No such file or directory
make[3]: *** [arch/x86/kernel/ldt.o] Error 1
/bin/sh: scripts/recordmcount.pl: No such file or directory
make[3]: *** [arch/x86/kernel/i8259.o] Error 1
/bin/sh: scripts/recordmcount.pl: No such file or directory

This fixes it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Huang Ying [Mon, 18 Aug 2008 08:24:56 +0000 (16:24 +0800)]

ftrace: fix incorrect comment style of __ftrace_enabled_save()

This patch fixes incorrect comment style of __ftrace_enabled_save().

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Sat, 16 Aug 2008 01:40:05 +0000 (21:40 -0400)]

ftrace: add necessary locking for ftrace records

The new design of pre-recorded mcounts and updating the code outside of
kstop_machine has changed the way the records themselves are protected.

This patch uses the ftrace_lock to protect the records. Note, the lock
still does not need to be taken within calls that are only called via
kstop_machine, since the that code can not run while the spin lock is held.

Also removed the hash_lock needed for the daemon when MCOUNT_RECORD is
configured. Also did a slight cleanup of an unused variable.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Sat, 16 Aug 2008 01:40:04 +0000 (21:40 -0400)]

ftrace: do not init module on ftrace disabled

If one of the self tests of ftrace has disabled the function tracer,
do not run the code to convert the mcount calls in modules.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Frédéric Weisbecker [Fri, 15 Aug 2008 19:08:22 +0000 (21:08 +0200)]

ftrace: fix some mistakes in error messages

This patch fixes some mistakes on the tracer in warning messages when
debugfs fails to create tracing files.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: srostedt@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Ingo Molnar [Mon, 9 Jun 2008 18:54:22 +0000 (20:54 +0200)]

ftrace: scripts/recordmcount.pl cross-build hack

hack around:

ld: Relocatable linking with relocations from format elf32-i386 (init/.tmp_gl_calibrate.o) to format elf64-x86-64 (init/.tmp_mx_calibrate.o) i CC arch/x86/mm/extable.o
objcopy: 'init/.tmp_mx_calibrate.o': No such file
rm: cannot remove `init/.tmp_mx_calibrate.o': No such file or directory
ld: Relocatable linking with relocations from format elf32-i386 (arch/x86/mm/extable.o) to format elf64-x86-64 (arch/x86/mm/.tmp_mx_extable.o) is not supported
mv: cannot stat `arch/x86/mm/.tmp_mx_extable.o': No such file or directory
ld: Relocatable linking with relocations from format elf32-i386 (arch/x86/mm/fault.o) to format elf64-x86-64 (arch/x86/mm/.tmp_mx_fault.o) is not supported

Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Ingo Molnar [Fri, 15 Aug 2008 16:22:09 +0000 (18:22 +0200)]

ftrace: ftrace_kill_atomic() build fix

fix:

kernel/built-in.o: In function `ftrace_dump':
(.text+0x2e2ea): undefined reference to `ftrace_kill_atomic'

Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Ingo Molnar [Fri, 15 Aug 2008 15:48:02 +0000 (17:48 +0200)]

ftrace: build fix

fix:

In file included from init/main.c:65:
include/linux/ftrace.h:166: error: expected ‘,' or ‘;' before ‘{' token
make[1]: *** [init/main.o] Error 1
make: *** [init/main.o] Error 2

Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Thu, 31 Jul 2008 02:36:46 +0000 (22:36 -0400)]

ftrace: dump out ftrace buffers to console on panic

At OLS I had a lot of interest to be able to have the ftrace buffers
dumped on panic.  Usually one would expect to uses kexec and examine
the buffers after a new kernel is loaded. But sometimes the resources
do not permit kdump and kexec, so having an option to still see the
sequence of events up to the crash is very advantageous.

This patch adds the option to have the ftrace buffers dumped to the
console in the latency_trace format on a panic. When the option is set,
the default entries per CPU buffer are lowered to 16384, since the writing
to the serial (if that is the console) may take an awful long time
otherwise.

[
Changes since -v1:
  Got alpine to send correctly (as well as spell check working).
  Removed config option.
  Moved the static variables into ftrace_dump itself.
  Gave printk a log level.
]

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Fri, 1 Aug 2008 20:45:49 +0000 (16:45 -0400)]

ftrace: ftrace_printk doc moved

Based on Randy Dunlap's suggestion, the ftrace_printk kernel-doc belongs
with the ftrace_printk macro that should be used. Not with the
__ftrace_printk internal function.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Fri, 1 Aug 2008 16:26:41 +0000 (12:26 -0400)]

ftrace: printk formatting infrastructure

This patch adds a feature that can help kernel developers debug their
code using ftrace.

  int ftrace_printk(const char *fmt, ...);

This records into the ftrace buffer using printf formatting. The entry
size in the buffers are still a fixed length. A new type has been added
that allows for more entries to be used for a single recording.

The start of the print is still the same as the other entries.

It returns the number of characters written to the ftrace buffer.

For example:

Having a module with the following code:

static int __init ftrace_print_test(void)
{
        ftrace_printk("jiffies are %ld\n", jiffies);
        return 0;
}

Gives me:

  insmod-5441  3...1 7569us : ftrace_print_test: jiffies are 4296626666

for the latency_trace file and:

          insmod-5441  [03]  1959.370498: ftrace_print_test jiffies are 4296626666

for the trace file.

Note: Only the infrastructure should go into the kernel. It is to help
facilitate debugging for other kernel developers. Calls to ftrace_printk
is not intended to be left in the kernel, and should be frowned upon just
like scattering printks around in the code.

But having this easily at your fingertips helps the debugging go faster
and bugs be solved quicker.

Maybe later on, we can hook this with markers and have their printf format
be sucked into ftrace output.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Fri, 1 Aug 2008 16:26:40 +0000 (12:26 -0400)]

ftrace: new continue entry - separate out from trace_entry

Some tracers will need to work with more than one entry. In order to do this
the trace_entry structure was split into two fields. One for the start of
all entries, and one to continue an existing entry.

The trace_entry structure now has a "field" entry that consists of the previous
content of the trace_entry, and a "cont" entry that is just a string buffer
the size of the "field" entry.

Thanks to Andrew Morton for suggesting this idea.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Fri, 15 Aug 2008 02:47:19 +0000 (22:47 -0400)]

ftrace: remove old pointers to mcount

When a mcount pointer is recorded into a table, it is used to add or
remove calls to mcount (replacing them with nops). If the code is removed
via removing a module, the pointers still exist. At modifying the code
a check is always made to make sure the code being replaced is the code
expected. In-other-words, the code being replaced is compared to what
it is expected to be before being replaced.

There is a very small chance that the code being replaced just happens
to look like code that calls mcount (very small since the call to mcount
is relative). To remove this chance, this patch adds ftrace_release to
allow module unloading to remove the pointers to mcount within the module.

Another change for init calls is made to not trace calls marked with
__init. The tracing can not be started until after init is done anyway.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Fri, 15 Aug 2008 02:47:18 +0000 (22:47 -0400)]

ftrace: move notrace to compiler.h

The notrace define belongs in compiler.h so that it can be used in
init.h

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Fri, 15 Aug 2008 02:47:17 +0000 (22:47 -0400)]

ftrace: do not show freed records in available_filter_functions

Seems that freed records can appear in the available_filter_functions list.
This patch fixes that.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Thu, 14 Aug 2008 22:05:05 +0000 (18:05 -0400)]

ftrace: use only 5 byte nops for x86

Mathieu Desnoyers revealed a bug in the original code. The nop that is
used to relpace the mcount caller can be a two part nop. This runs the
risk where a process can be preempted after executing the first nop, but
before the second part of the nop.

The ftrace code calls kstop_machine to keep multiple CPUs from executing
code that is being modified, but it does not protect against a task preempting
in the middle of a two part nop.

If the above preemption happens and the tracer is enabled, after the
kstop_machine runs, all those nops will be calls to the trace function.
If the preempted process that was preempted between the two nops is executed
again, it will execute half of the call to the trace function, and this
might crash the system.

This patch instead uses what both the latest Intel and AMD spec suggests.
That is the P6_NOP5 sequence of "0x0f 0x1f 0x44 0x00 0x00".

Note, some older CPUs and QEMU might fault on this nop, so this nop
is executed with fault handling first. If it detects a fault, it will then
use the code "0x66 0x66 0x66 0x66 0x90". If that faults, it will then
default to a simple "jmp 1f; .byte 0x00 0x00 0x00; 1:". The jmp is
not optimal but will do if the first two can not be executed.

TODO: Examine the cpuid to determine the nop to use.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Thu, 14 Aug 2008 19:45:12 +0000 (15:45 -0400)]

ftrace: x86 mcount stub

x86 now sets up the mcount locations through the build and no longer
needs to record the ip when the function is executed. This patch changes
the initial mcount to simply return. There's no need to do any other work.
If the ftrace start up test fails, the original mcount will be what everything
will use, so having this as fast as possible is a good thing.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Thu, 14 Aug 2008 19:45:11 +0000 (15:45 -0400)]

ftrace: enable using mcount recording on x86

Enable the use of the __mcount_loc infrastructure on x86_64 and i386.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Thu, 14 Aug 2008 19:45:10 +0000 (15:45 -0400)]

ftrace: rebuild everything on change to FTRACE_MCOUNT_RECORD

When enabling or disabling CONFIG_FTRACE_MCOUNT_RECORD, we want a full
kernel compile to handle the adding of the __mcount_loc sections.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Thu, 14 Aug 2008 19:45:09 +0000 (15:45 -0400)]

ftrace: enable mcount recording for modules

This patch enables the loading of the __mcount_section of modules and
changing all the callers of mcount into nops.

The modification is done before the init_module function is called, so
again, we do not need to use kstop_machine to make these changes.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Thu, 14 Aug 2008 19:45:08 +0000 (15:45 -0400)]

ftrace: mcount call site on boot nops core

This is the infrastructure to the converting the mcount call sites
recorded by the __mcount_loc section into nops on boot. It also allows
for using these sites to enable tracing as normal. When the __mcount_loc
section is used, the "ftraced" kernel thread is disabled.

This uses the current infrastructure to record the mcount call sites
as well as convert them to nops. The mcount function is kept as a stub
on boot up and not converted to the ftrace_record_ip function. We use the
ftrace_record_ip to only record from the table.

This patch does not handle modules. That comes with a later patch.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Steven Rostedt [Thu, 14 Aug 2008 19:45:07 +0000 (15:45 -0400)]

ftrace: create __mcount_loc section

This patch creates a section in the kernel called "__mcount_loc".
This will hold a list of pointers to the mcount relocation for
each call site of mcount.

For example:

objdump -dr init/main.o
[...]
Disassembly of section .text:

0000000000000000 <do_one_initcall>:
   0:   55                      push   %rbp
[...]
000000000000017b <init_post>:
17b:   55                      push   %rbp
17c:   48 89 e5                mov    %rsp,%rbp
17f:   53                      push   %rbx
180:   48 83 ec 08             sub    $0x8,%rsp
184:   e8 00 00 00 00          callq  189 <init_post+0xe>
                        185: R_X86_64_PC32      mcount+0xfffffffffffffffc
[...]

We will add a section to point to each function call.

   .section __mcount_loc,"a",@progbits
[...]
   .quad .text + 0x185
[...]

The offset to of the mcount call site in init_post is an offset from
the start of the section, and not the start of the function init_post.
The mcount relocation is at the call site 0x185 from the start of the
.text section.

  .text + 0x185  == init_post + 0xa

We need a way to add this __mcount_loc section in a way that we do not
lose the relocations after final link.  The .text section here will
be attached to all other .text sections after final link and the
offsets will be meaningless.  We need to keep track of where these
.text sections are.

To do this, we use the start of the first function in the section.
do_one_initcall.  We can make a tmp.s file with this function as a reference
to the start of the .text section.

   .section __mcount_loc,"a",@progbits
[...]
   .quad do_one_initcall + 0x185
[...]

Then we can compile the tmp.s into a tmp.o

  gcc -c tmp.s -o tmp.o

And link it into back into main.o.

  ld -r main.o tmp.o -o tmp_main.o
  mv tmp_main.o main.o

But we have a problem.  What happens if the first function in a section
is not exported, and is a static function. The linker will not let
the tmp.o use it.  This case exists in main.o as well.

Disassembly of section .init.text:

0000000000000000 <set_reset_devices>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   e8 00 00 00 00          callq  9 <set_reset_devices+0x9>
                        5: R_X86_64_PC32        mcount+0xfffffffffffffffc

The first function in .init.text is a static function.

00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices

The lowercase 't' means that set_reset_devices is local and is not exported.
If we simply try to link the tmp.o with the set_reset_devices we end
up with two symbols: one local and one global.

.section __mcount_loc,"a",@progbits
.quad set_reset_devices + 0x10

00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices
                 U set_reset_devices

We still have an undefined reference to set_reset_devices, and if we try
to compile the kernel, we will end up with an undefined reference to
set_reset_devices, or even worst, it could be exported someplace else,
and then we will have a reference to the wrong location.

To handle this case, we make an intermediate step using objcopy.
We convert set_reset_devices into a global exported symbol before linking
it with tmp.o and set it back afterwards.

00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 T set_reset_devices

00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 T set_reset_devices

00000000000000a8 t __setup_set_reset_devices
000000000000105f t __setup_str_set_reset_devices
0000000000000000 t set_reset_devices

Now we have a section in main.o called __mcount_loc that we can place
somewhere in the kernel using vmlinux.ld.S and access it to convert
all these locations that call mcount into nops before starting SMP
and thus, eliminating the need to do this with kstop_machine.

Note, A well documented perl script (scripts/recordmcount.pl) is used
to do all this in one location.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Ingo Molnar [Tue, 29 Jul 2008 10:36:02 +0000 (12:36 +0200)]

ftrace: mark lapic_wd_event() notrace

it can be called in the NMI path:

[    0.645999] calling  ftrace_dynamic_init+0x0/0xd6
[    0.647521] ------------[ cut here ]------------
[    0.647521] WARNING: at kernel/trace/ftrace.c:348 ftrace_record_ip+0x4e/0x252()
[    0.647521] Modules linked in:
[    0.647521] Pid: 15, comm: kstop1 Not tainted 2.6.27-rc1-tip #22686
[    0.647521]
[    0.647521] Call Trace:
[    0.647521]  <NMI>  [<ffffffff8024593f>] warn_on_slowpath+0x5d/0x84
[    0.647521]  [<ffffffff80220b99>] ? lapic_wd_event+0xb/0x5c
[    0.647521]  [<ffffffff80287b3b>] ftrace_record_ip+0x4e/0x252
[    0.647521]  [<ffffffff80211274>] mcount_call+0x5/0x31
[    0.647521]  [<ffffffff80220b9e>] ? lapic_wd_event+0x10/0x5c
[    0.647521]  [<ffffffff8083f3ec>] nmi_watchdog_tick+0x19d/0x1ad
[    0.647521]  [<ffffffff8083e875>] default_do_nmi+0x75/0x1e3
[    0.647521]  [<ffffffff8083f0b3>] do_nmi+0x5d/0x94
[    0.647521]  [<ffffffff8083e2d2>] nmi+0xa2/0xc2
[    0.647521]  [<ffffffff802b48c3>] ? check_bytes_and_report+0x11/0xcc
[    0.647521]  <<EOE>>  [<ffffffff80211274>] ? mcount_call+0x5/0x31
[    0.647521]  [<ffffffff802b49df>] check_object+0x61/0x1b0
[    0.647521]  [<ffffffff802b502a>] __slab_free+0x169/0x2ae
[    0.647521]  [<ffffffff80242dbf>] ? __cleanup_sighand+0x25/0x27
[    0.647521]  [<ffffffff80242dbf>] ? __cleanup_sighand+0x25/0x27
[    0.647521]  [<ffffffff802b60cd>] kmem_cache_free+0x85/0xb9
[    0.647521]  [<ffffffff80242dbf>] __cleanup_sighand+0x25/0x27
[    0.647521]  [<ffffffff80247b3d>] release_task+0x256/0x339
[    0.647521]  [<ffffffff802490b4>] do_exit+0x764/0x7ef
[    0.647521]  [<ffffffff8027624c>] __xchg+0x0/0x38
[    0.647521]  [<ffffffff8027619a>] ? stop_cpu+0x0/0xb2
[    0.647521]  [<ffffffff8027619a>] ? stop_cpu+0x0/0xb2
[    0.647521]  [<ffffffff8025922f>] kthread+0x4e/0x7b
[    0.647521]  [<ffffffff80212979>] child_rip+0xa/0x11
[    0.647521]  [<ffffffff80211c17>] ? restore_args+0x0/0x30
[    0.647521]  [<ffffffff802283a5>] ? native_load_tls+0x14/0x2e
[    0.647521]  [<ffffffff802591e1>] ? kthread+0x0/0x7b
[    0.647521]  [<ffffffff8021296f>] ? child_rip+0x0/0x11
[    0.647521]
[    0.647521] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.672032] initcall ftrace_dynamic_init+0x0/0xd6 returned 0 after 19 msecs

also mark it no-kprobes while at it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Ingo Molnar [Tue, 29 Jul 2008 10:00:59 +0000 (12:00 +0200)]

ftrace: ignore functions that cannot be kprobe-ed

kprobes already has an extensive list of annotations for functions
that should not be instrumented. Add notrace annotations to these
functions as well.

This is particularly useful for functions called by the NMI path.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Mathieu Desnoyers [Thu, 24 Jul 2008 20:37:23 +0000 (16:37 -0400)]

tracepoints: use TABLE_SIZE macro

Steven Rostedt suggested:

| Wouldn't it look nicer to have: (TRACEPOINT_TABLE_SIZE - 1) ?

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Pekka Paalanen [Mon, 21 Jul 2008 15:49:56 +0000 (18:49 +0300)]

x86: fix mmiotrace 8-bit register decoding

When SIL, DIL, BPL or SPL registers were used in MMIO, the datum
was extracted from AH, BH, CH, or DH, which are incorrect.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: "Vegard Nossum" <vegard.nossum@gmail.com>
Cc: "Steven Rostedt" <srostedt@redhat.com>
Cc: proski@gnu.org
Cc: "Pekka Enberg"
<penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Ingo Molnar [Wed, 23 Jul 2008 12:15:22 +0000 (14:15 +0200)]

tracing: clean up tracepoints kconfig structure

do not expose users to CONFIG_TRACEPOINTS - tracers can select it
just fine.

update ftrace to select CONFIG_TRACEPOINTS.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Ingo Molnar [Wed, 23 Jul 2008 11:48:22 +0000 (13:48 +0200)]

sched: clean up tracepoints

make it a bit more structured hence more readable.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Ingo Molnar [Wed, 23 Jul 2008 11:38:00 +0000 (13:38 +0200)]

tracing: disable tracepoints by default

while it's arguably low overhead, we dont enable new features by default.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Mathieu Desnoyers [Fri, 18 Jul 2008 16:16:17 +0000 (12:16 -0400)]

ftrace: port to tracepoints

Porting the trace_mark() used by ftrace to tracepoints. (cleanup)

Changelog :
- Change error messages : marker -> tracepoint

[ mingo@elte.hu: conflict resolutions ]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: 'Peter Zijlstra' <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Mathieu Desnoyers [Fri, 18 Jul 2008 16:16:17 +0000 (12:16 -0400)]

tracing, sched: LTTng instrumentation - scheduler

Instrument the scheduler activity (sched_switch, migration, wakeups,
wait for a task, signal delivery) and process/thread
creation/destruction (fork, exit, kthread stop). Actually, kthread
creation is not instrumented in this patch because it is architecture
dependent. It allows to connect tracers such as ftrace which detects
scheduling latencies, good/bad scheduler decisions. Tools like LTTng can
export this scheduler information along with instrumentation of the rest
of the kernel activity to perform post-mortem analysis on the scheduler
activity.

About the performance impact of tracepoints (which is comparable to
markers), even without immediate values optimizations, tests done by
Hideo Aoki on ia64 show no regression. His test case was using hackbench
on a kernel where scheduler instrumentation (about 5 events in code
scheduler code) was added. See the "Tracepoints" patch header for
performance result detail.

Changelog :

- Change instrumentation location and parameter to match ftrace
instrumentation, previously done with kernel markers.

[ mingo@elte.hu: conflict resolutions ]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: 'Peter Zijlstra' <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Mathieu Desnoyers [Fri, 18 Jul 2008 16:16:16 +0000 (12:16 -0400)]

tracing: tracepoints, samples

Tracepoint example code under samples/.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: 'Peter Zijlstra' <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Mathieu Desnoyers [Fri, 18 Jul 2008 16:16:16 +0000 (12:16 -0400)]

tracing: tracepoints, documentation

Documentation of tracepoint usage.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: 'Peter Zijlstra' <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Mathieu Desnoyers [Fri, 18 Jul 2008 16:16:16 +0000 (12:16 -0400)]

tracing: Kernel Tracepoints

Implementation of kernel tracepoints. Inspired from the Linux Kernel
Markers. Allows complete typing verification by declaring both tracing
statement inline functions and probe registration/unregistration static
inline functions within the same macro "DEFINE_TRACE". No format string
is required. See the tracepoint Documentation and Samples patches for
usage examples.

Taken from the documentation patch :

"A tracepoint placed in code provides a hook to call a function (probe)
that you can provide at runtime. A tracepoint can be "on" (a probe is
connected to it) or "off" (no probe is attached). When a tracepoint is
"off" it has no effect, except for adding a tiny time penalty (checking
a condition for a branch) and space penalty (adding a few bytes for the
function call at the end of the instrumented function and adds a data
structure in a separate section).  When a tracepoint is "on", the
function you provide is called each time the tracepoint is executed, in
the execution context of the caller. When the function provided ends its
execution, it returns to the caller (continuing from the tracepoint
site).

You can put tracepoints at important locations in the code. They are
lightweight hooks that can pass an arbitrary number of parameters, which
prototypes are described in a tracepoint declaration placed in a header
file."

Addition and removal of tracepoints is synchronized by RCU using the
scheduler (and preempt_disable) as guarantees to find a quiescent state
(this is really RCU "classic"). The update side uses rcu_barrier_sched()
with call_rcu_sched() and the read/execute side uses
"preempt_disable()/preempt_enable()".

We make sure the previous array containing probes, which has been
scheduled for deletion by the rcu callback, is indeed freed before we
proceed to the next update. It therefore limits the rate of modification
of a single tracepoint to one update per RCU period. The objective here
is to permit fast batch add/removal of probes on _different_
tracepoints.

Changelog :
- Use #name ":" #proto as string to identify the tracepoint in the
  tracepoint table. This will make sure not type mismatch happens due to
  connexion of a probe with the wrong type to a tracepoint declared with
  the same name in a different header.
- Add tracepoint_entry_free_old.
- Change __TO_TRACE to get rid of the 'i' iterator.

Masami Hiramatsu <mhiramat@redhat.com> :
Tested on x86-64.

Performance impact of a tracepoint : same as markers, except that it
adds about 70 bytes of instructions in an unlikely branch of each
instrumented function (the for loop, the stack setup and the function
call). It currently adds a memory read, a test and a conditional branch
at the instrumentation site (in the hot path). Immediate values will
eventually change this into a load immediate, test and branch, which
removes the memory read which will make the i-cache impact smaller
(changing the memory read for a load immediate removes 3-4 bytes per
site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it
also saves the d-cache hit).

About the performance impact of tracepoints (which is comparable to
markers), even without immediate values optimizations, tests done by
Hideo Aoki on ia64 show no regression. His test case was using hackbench
on a kernel where scheduler instrumentation (about 5 events in code
scheduler code) was added.

Quoting Hideo Aoki about Markers :

I evaluated overhead of kernel marker using linux-2.6-sched-fixes git
tree, which includes several markers for LTTng, using an ia64 server.

While the immediate trace mark feature isn't implemented on ia64, there
is no major performance regression. So, I think that we don't have any
issues to propose merging marker point patches into Linus's tree from
the viewpoint of performance impact.

I prepared two kernels to evaluate. The first one was compiled without
CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.

I downloaded the original hackbench from the following URL:
http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c

I ran hackbench 5 times in each condition and calculated the average and
difference between the kernels.

    The parameter of hackbench: every 50 from 50 to 800
    The number of CPUs of the server: 2, 4, and 8

Below is the results. As you can see, major performance regression
wasn't found in any case. Even if number of processes increases,
differences between marker-enabled kernel and marker- disabled kernel
doesn't increase. Moreover, if number of CPUs increases, the differences
doesn't increase either.

Curiously, marker-enabled kernel is better than marker-disabled kernel
in more than half cases, although I guess it comes from the difference
of memory access pattern.

* 2 CPUs

Number of | without      | with         | diff     | diff    |
processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
--------------------------------------------------------------
       50 |      4.811   |       4.872  |  +0.061  |  +1.27  |
      100 |      9.854   |      10.309  |  +0.454  |  +4.61  |
      150 |     15.602   |      15.040  |  -0.562  |  -3.6   |
      200 |     20.489   |      20.380  |  -0.109  |  -0.53  |
      250 |     25.798   |      25.652  |  -0.146  |  -0.56  |
      300 |     31.260   |      30.797  |  -0.463  |  -1.48  |
      350 |     36.121   |      35.770  |  -0.351  |  -0.97  |
      400 |     42.288   |      42.102  |  -0.186  |  -0.44  |
      450 |     47.778   |      47.253  |  -0.526  |  -1.1   |
      500 |     51.953   |      52.278  |  +0.325  |  +0.63  |
      550 |     58.401   |      57.700  |  -0.701  |  -1.2   |
      600 |     63.334   |      63.222  |  -0.112  |  -0.18  |
      650 |     68.816   |      68.511  |  -0.306  |  -0.44  |
      700 |     74.667   |      74.088  |  -0.579  |  -0.78  |
      750 |     78.612   |      79.582  |  +0.970  |  +1.23  |
      800 |     85.431   |      85.263  |  -0.168  |  -0.2   |
--------------------------------------------------------------

* 4 CPUs

Number of | without      | with         | diff     | diff    |
processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
--------------------------------------------------------------
       50 |      2.586   |       2.584  |  -0.003  |  -0.1   |
      100 |      5.254   |       5.283  |  +0.030  |  +0.56  |
      150 |      8.012   |       8.074  |  +0.061  |  +0.76  |
      200 |     11.172   |      11.000  |  -0.172  |  -1.54  |
      250 |     13.917   |      14.036  |  +0.119  |  +0.86  |
      300 |     16.905   |      16.543  |  -0.362  |  -2.14  |
      350 |     19.901   |      20.036  |  +0.135  |  +0.68  |
      400 |     22.908   |      23.094  |  +0.186  |  +0.81  |
      450 |     26.273   |      26.101  |  -0.172  |  -0.66  |
      500 |     29.554   |      29.092  |  -0.461  |  -1.56  |
      550 |     32.377   |      32.274  |  -0.103  |  -0.32  |
      600 |     35.855   |      35.322  |  -0.533  |  -1.49  |
      650 |     39.192   |      38.388  |  -0.804  |  -2.05  |
      700 |     41.744   |      41.719  |  -0.025  |  -0.06  |
      750 |     45.016   |      44.496  |  -0.520  |  -1.16  |
      800 |     48.212   |      47.603  |  -0.609  |  -1.26  |
--------------------------------------------------------------

* 8 CPUs

Number of | without      | with         | diff     | diff    |
processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
--------------------------------------------------------------
       50 |      2.094   |       2.072  |  -0.022  |  -1.07  |
      100 |      4.162   |       4.273  |  +0.111  |  +2.66  |
      150 |      6.485   |       6.540  |  +0.055  |  +0.84  |
      200 |      8.556   |       8.478  |  -0.078  |  -0.91  |
      250 |     10.458   |      10.258  |  -0.200  |  -1.91  |
      300 |     12.425   |      12.750  |  +0.325  |  +2.62  |
      350 |     14.807   |      14.839  |  +0.032  |  +0.22  |
      400 |     16.801   |      16.959  |  +0.158  |  +0.94  |
      450 |     19.478   |      19.009  |  -0.470  |  -2.41  |
      500 |     21.296   |      21.504  |  +0.208  |  +0.98  |
      550 |     23.842   |      23.979  |  +0.137  |  +0.57  |
      600 |     26.309   |      26.111  |  -0.198  |  -0.75  |
      650 |     28.705   |      28.446  |  -0.259  |  -0.9   |
      700 |     31.233   |      31.394  |  +0.161  |  +0.52  |
      750 |     34.064   |      33.720  |  -0.344  |  -1.01  |
      800 |     36.320   |      36.114  |  -0.206  |  -0.57  |
--------------------------------------------------------------

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: 'Peter Zijlstra' <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

commit | commitdiff | tree

Ron Mercer [Tue, 14 Oct 2008 05:55:59 +0000 (22:55 -0700)]

qlge: Fix page size ifdef test.

This ASIC does support all page sizes. For 4k and 8k page size the TX
control block needs an external scatter gather list. For page sizes
larger than 8k the max frags is satisfied by the original TX control
block.

Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Mark Brown [Mon, 13 Oct 2008 18:16:14 +0000 (19:16 +0100)]

ALSA: ASoC: Hide TLV320AIC26 configuration option for non-OpenFirwmare users

Make the visibility of the tristate conditional on having the OpenFirmware
helper code enabed so that users who can't use it don't see the visible
option. Kconfig ignores dependencies for select so other users are
unaffected.

Thanks to Takashi for the suggestion.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

commit | commitdiff | tree

Matthew Ranostay [Mon, 13 Oct 2008 17:22:45 +0000 (13:22 -0400)]

ALSA: hda: fix nid variable warning

Fixed compiler warning with possible uninitialized variable 'nid'.

CC [M] /home/mranostay/git/alsa-driver/pci/hda/patch_sigmatel.o
/home/mranostay/git/alsa-driver/pci/hda/../../alsa-kernel/pci/hda/patch_sigmatel.c: In function
‘stac92xx_parse_auto_config’:
/home/mranostay/git/alsa-driver/pci/hda/../../alsa-kernel/pci/hda/patch_sigmatel.c:2815: warning: ‘nid’ may be used
uninitialized in this function

Signed-off-by: Matthew Ranostay <mranostay@embeddedalley.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

commit | commitdiff | tree

Liam Girdwood [Tue, 14 Oct 2008 03:00:15 +0000 (23:00 -0400)]

Input: wm97xx - update email address for Liam Girdwood

This updates the email address for Liam Girdwood as my old address is no
longer valid.

Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>

commit | commitdiff | tree

Colin B Macdonald [Sat, 11 Oct 2008 22:16:38 +0000 (18:16 -0400)]

Input: i8042 - add Thinkpad R31 to nomux list

Thinkpad R31 needs i8042 nomux quirk. Stops jittery jumping mouse
and random keyboard input. Fixes kernel bug #11723. Cherry picked
from Ubuntu who have sometimes (on-again-off-again) had a fix in
their patched kernels.

Signed-off-by: Colin B Macdonald <cbm@m.fsf.org>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>

commit | commitdiff | tree

Alan Cox [Tue, 14 Oct 2008 02:01:08 +0000 (19:01 -0700)]

net: Rationalise email address: Network Specific Parts

Clean up the various different email addresses of mine listed in the code
to a single current and valid address. As Dave says his network merges
for 2.6.28 are now done this seems a good point to send them in where
they won't risk disrupting real changes.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Heiko Carstens [Tue, 14 Oct 2008 01:58:48 +0000 (18:58 -0700)]

dsa: fix compile bug on s390

git commit 45cec1bac0719c904bb5f4405c2937f7e715888c
"dsa: Need to select PHYLIB." causes this build bug on s390:

drivers/built-in.o: In function `phy_stop_interrupts':
/home/heicarst/linux-2.6/drivers/net/phy/phy.c:631: undefined reference to `free_irq'
/home/heicarst/linux-2.6/drivers/net/phy/phy.c:646: undefined reference to `enable_irq'
drivers/built-in.o: In function `phy_start_interrupts':
/home/heicarst/linux-2.6/drivers/net/phy/phy.c:601: undefined reference to `request_irq'
drivers/built-in.o: In function `phy_interrupt':
/home/heicarst/linux-2.6/drivers/net/phy/phy.c:528: undefined reference to `disable_irq_nosync'
drivers/built-in.o: In function `phy_change':
/home/heicarst/linux-2.6/drivers/net/phy/phy.c:674: undefined reference to `enable_irq'
/home/heicarst/linux-2.6/drivers/net/phy/phy.c:692: undefined reference to `disable_irq'

PHYLIB has alread a depend on !S390, however select PHYLIB at DSA overrides
that unfortunately. So add a depend on !S390 to DSA as well.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Alexey Dobriyan [Tue, 14 Oct 2008 01:54:07 +0000 (18:54 -0700)]

netns: mib6 section fixlet

LD net/ipv6/ipv6.o
WARNING: net/ipv6/ipv6.o(.text+0xd8): Section mismatch in reference from the function inet6_net_init() to the function .init.text:ipv6_init_mibs()

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Roland Dreier [Tue, 14 Oct 2008 01:53:05 +0000 (18:53 -0700)]

enic: Fix Kconfig headline description

I don't think the enic driver has anything to do with Mark Everett
(http://en.wikipedia.org/wiki/A_Man_Called_E). Fix the Kconfig
description.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Martin Langer [Tue, 14 Oct 2008 01:49:38 +0000 (18:49 -0700)]

de2104x: wrong MAC address fix

The de2104x returns sometimes a wrong MAC address. The wrong one is
like the original one, but it comes with an one byte shift. I found
this bug on an older alpha ev5 cpu. More details are available in Gentoo
bugreport #240718.

It seems the hardware is sometimes a little bit too slow for an
immediate access. This patch solves the problem by introducing a small
udelay.

Signed-off-by: Martin Langer <martin-langer@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Alexey Dobriyan [Tue, 14 Oct 2008 01:48:43 +0000 (18:48 -0700)]

s390: claw compile fixlet

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Adrian Bunk [Tue, 14 Oct 2008 01:48:09 +0000 (18:48 -0700)]

net: export genphy_restart_aneg

This patch fixes the following build error caused by
commit ed94493fb38a665cebcf750dfabe8a6dd13e136f
(mv643xx_eth: convert to phylib):

<--  snip  -->

...
  Building modules, stage 2.
  MODPOST 1280 modules
ERROR: "genphy_restart_aneg" [drivers/net/mv643xx_eth.ko] undefined!
...
make[2]: *** [__modpost] Error 1

<--  snip  -->

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Divy Le Ray [Tue, 14 Oct 2008 01:47:30 +0000 (18:47 -0700)]

cxgb3: extend copyrights to 2008

Update copyright banner to 2008.

Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Divy Le Ray [Tue, 14 Oct 2008 01:47:02 +0000 (18:47 -0700)]

cxgb3: update driver version

Add a field to the driver versioning info.
Update version to 1.1.0.

Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Randy Dunlap [Tue, 14 Oct 2008 01:46:22 +0000 (18:46 -0700)]

net/phy: add missing kernel-doc

Fix kernel-doc warning, missing description:

Warning(lin2627-g3-kdocfixes//drivers/net/phy/mdio_bus.c:63): No description found for parameter 'd'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ilpo Järvinen [Tue, 14 Oct 2008 01:43:59 +0000 (18:43 -0700)]

pktgen: fix skb leak in case of failure

Seems that skb goes into void unless something magic happened
in pskb_expand_head in case of failure.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Adrian Bunk [Tue, 14 Oct 2008 01:42:55 +0000 (18:42 -0700)]

mISDN/dsp_cmx.c: fix size checks

The checks for ensuring that the array indices are inside the range
were flipped.

Reported-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Andrew Morton [Tue, 14 Oct 2008 01:42:07 +0000 (18:42 -0700)]

misdn: use nonseekable_open()

The driver just sets ->llseek to NULL. It should also clear FMODE_LSEEK to
tell the VFS that seeks are not supported.

Pointed out by Christoph Hellwig.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Kamalesh Babulal [Tue, 14 Oct 2008 01:41:01 +0000 (18:41 -0700)]

net: fix driver build errors due to missing net/ip6_checksum.h include

2.6.27-git2 kernel build fails with allyesconfig on powerpc with
build error

<introduced by commit 01f2e4ead2c51226ed1283ef6a8388ca6f4cff8f>

CC drivers/net/enic/enic_main.o
drivers/net/enic/enic_main.c: In function â\80\98enic_queue_wq_skb_tsoâ\80\99:
drivers/net/enic/enic_main.c:576: error: implicit declaration of function â\80\98csum_ipv6_magicâ\80\99
make[3]: *** [drivers/net/enic/enic_main.o] Error 1

<introduced by commit c4e84bde1d595d857d3c74b49b9c45cc770df792>

drivers/net/qlge/qlge_main.c: In function â\80\98ql_tsoâ\80\99:
drivers/net/qlge/qlge_main.c:1862: error: implicit declaration of function â\80\98csum_ipv6_magicâ\80\99
make[3]: *** [drivers/net/qlge/qlge_main.o] Error 1

<introduced by commit 95252236e73e789dd186ce796a2abc60b3a61ebe>

drivers/net/jme.c: In function â\80\98jme_tx_tsoâ\80\99:
drivers/net/jme.c:1784: error: implicit declaration of function â\80\98csum_ipv6_magicâ\80\99
make[2]: *** [drivers/net/jme.o] Error 1

Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Ron Mercer <ron.mercer@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Tao Ma [Thu, 9 Oct 2008 15:06:14 +0000 (23:06 +0800)]

ocfs2: Refactor xattr list and remove ocfs2_xattr_handler().

According to Christoph Hellwig's advice, we really don't need
a ->list to handle one xattr's list. Just a map from index to
xattr prefix is enough. And I also refactor the old list method
with the reference from fs/xfs/linux-2.6/xfs_xattr.c and the
xattr list method in btrfs.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>

commit | commitdiff | tree

Tao Ma [Thu, 9 Oct 2008 15:06:13 +0000 (23:06 +0800)]

ocfs2: Calculate EA hash only by its suffix.

According to Christoph Hellwig's advice, the hash value of EA
is only calculated by its suffix.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>

commit | commitdiff | tree

Mark Fasheh [Tue, 7 Oct 2008 21:52:59 +0000 (14:52 -0700)]

ocfs2: Move trusted and user attribute support into xattr.c

Per Christoph Hellwig's suggestion - don't split these up. It's not like we
gained much by having the two tiny files around.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>

commit | commitdiff | tree

Mark Fasheh [Tue, 7 Oct 2008 21:31:42 +0000 (14:31 -0700)]

ocfs2: Uninline ocfs2_xattr_name_hash()

This is too big to be inlined.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>

commit | commitdiff | tree

Mark Fasheh [Tue, 7 Oct 2008 21:25:16 +0000 (14:25 -0700)]

ocfs2: Don't check for NULL before brelse()

This is pointless as brelse() already does the check.

Signed-off-by: Mark Fasheh

commit | commitdiff | tree

Mark Fasheh [Tue, 7 Oct 2008 19:50:46 +0000 (12:50 -0700)]

ocfs2: use smaller counters in ocfs2_remove_xattr_clusters_from_cache

i and b_len don't really need to be u64's. Xattr extent lengths should be
limited by the VFS, and then the size of our on-disk length field.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>

commit | commitdiff | tree

Mark Fasheh [Tue, 7 Oct 2008 18:09:24 +0000 (11:09 -0700)]

ocfs2: Documentation update for user_xattr / nouser_xattr mount options

Signed-off-by: Mark Fasheh <mfasheh@suse.com>

commit | commitdiff | tree

Mark Fasheh [Tue, 7 Oct 2008 18:02:04 +0000 (11:02 -0700)]

ocfs2: make la_debug_mutex static

It can also be moved into ocfs2_la_debug_read().

Signed-off-by: Mark Fasheh <mfasheh@suse.com>

commit | commitdiff | tree

Mark Fasheh [Mon, 6 Oct 2008 23:16:08 +0000 (16:16 -0700)]

ocfs2: Remove pointless !!

ocfs2_stack_supports_plocks() doesn't need this to properly return a zero or
one value.

Signed-off-by: Mark Fasheh <mfasheh@suse.com>

commit | commitdiff | tree

Tao Ma [Fri, 19 Sep 2008 14:17:41 +0000 (22:17 +0800)]

ocfs2: Add empty bucket support in xattr.

As Mark mentioned, it may be time-consuming when we remove the
empty xattr bucket, so this patch try to let empty bucket exist
in xattr operation. The modification includes:
1. Remove the functin of bucket and extent record deletion during
   xattr delete.
2. In xattr set:
   1) Don't clean the last entry so that if the bucket is empty,
      the hash value of the bucket is the hash value of the entry
      which is deleted last.
   2) During insert, if we meet with an empty bucket, just use the
      1st entry.
3. In binary search of xattr bucket, use the bucket hash value(which
   stored in the 1st xattr entry) to find the right place.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>

commit | commitdiff | tree

Tao Ma [Fri, 19 Sep 2008 14:16:34 +0000 (22:16 +0800)]

ocfs2/xattr.c: Fix a bug when inserting xattr.

During the process of xatt insertion, we use binary search
to find the right place and "low" is set to it. But when
there is one xattr which has the same name hash as the inserted
one, low is the wrong value. So set it to the right position.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>

commit | commitdiff | tree

Sunil Mushran [Fri, 5 Sep 2008 18:29:14 +0000 (11:29 -0700)]

ocfs2: Add xattr mount option in ocfs2_show_options()

Patch adds check for [no]user_xattr in ocfs2_show_options() that completes
the list of all mount options.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>

commit | commitdiff | tree

Joel Becker [Thu, 4 Sep 2008 03:03:41 +0000 (20:03 -0700)]

ocfs2: Switch over to JBD2.

ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is
limiting our maximum filesystem size.

It's a pretty trivial change.  Most functions are just renamed.  The
only functional change is moving to Jan's inode-based ordered data mode.
It's better, too.

Because JBD2 reads and writes JBD journals, this is compatible with any
existing filesystem.  It can even interact with JBD-based ocfs2 as long
as the journal is formated for JBD.

We provide a compatibility option so that paranoid people can still use
JBD for the time being.  This will go away shortly.

[ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to
  ocfs2_truncate_for_delete(). --Mark ]

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>

commit | commitdiff | tree

Joel Becker [Thu, 4 Sep 2008 03:03:40 +0000 (20:03 -0700)]

ocfs2: Add the 'inode64' mount option.

Now that ocfs2 limits inode numbers to 32bits, add a mount option to
disable the limit. This parallels XFS. 64bit systems can handle the
larger inode numbers.

[ Added description of inode64 mount option in ocfs2.txt. --Mark ]

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>

commit | commitdiff | tree

Joel Becker [Thu, 4 Sep 2008 03:03:39 +0000 (20:03 -0700)]

ocfs2: Limit inode allocation to 32bits.

ocfs2 inode numbers are block numbers.  For any filesystem with less
than 2^32 blocks, this is not a problem.  However, when ocfs2 starts
using JDB2, it will be able to support filesystems with more than 2^32
blocks.  This would result in inode numbers higher than 2^32.

The problem is that stat(2) can't handle those numbers on 32bit
machines.  The simple solution is to have ocfs2 allocate all inodes
below that boundary.

The suballoc code is changed to honor an optional block limit.  Only the
inode suballocator sets that limit - all other allocations stay unlimited.

The biggest trick is to grow the inode suballocator beneath that limit.
There's no point in allocating block groups that are above the limit,
then rejecting their elements later on.  We want to prevent the inode
allocator from ever having block groups above the limit.  This involves
a little gyration with the local alloc code.  If the local alloc window
is above the limit, it signals the caller to try the global bitmap but
does not disable the local alloc file (which can be used for other
allocations).

[ Minor cleanup - removed an ML_NOTICE comment. --Mark ]

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>

commit | commitdiff | tree

Tao Ma [Fri, 29 Aug 2008 01:00:19 +0000 (09:00 +0800)]

ocfs2: Resolve deadlock in ocfs2_xattr_free_block.

In ocfs2_xattr_free_block, we take a cluster lock on xb_alloc_inode while we
have a transaction open. This will deadlock the downconvert thread, so fix
it.

We can clean up how xattr blocks are removed while here - this patch also
moves the mechanism of releasing xattr block (including both value, xattr
tree and xattr block) into this function.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>

commit | commitdiff | tree

Tao Ma [Mon, 1 Sep 2008 00:45:18 +0000 (08:45 +0800)]

ocfs2: bug-fix for journal extend in xattr.

In ocfs2_extend_trans, when we can't extend the current
transaction, it will commit current transaction and restart
a new one. So if the previous credits we have allocated aren't
used(the block isn't dirtied before our extend), we will not
have enough credits for any future operation(it will cause jbd
complain and bug out). So check this and re-extend it.

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>

commit | commitdiff | tree

Joel Becker [Fri, 22 Aug 2008 19:46:09 +0000 (12:46 -0700)]

ocfs2: Change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree()

The original get/put_extent_tree() functions held a reference on
et_root_bh.  However, every single caller already has a safe reference,
making the get/put cycle irrelevant.

We change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree().  It
no longer gets a reference on et_root_bh.  ocfs2_put_extent_tree() is
removed.  Callers now have a simpler init+use pattern.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>

commit | commitdiff | tree

Joel Becker [Fri, 22 Aug 2008 00:11:10 +0000 (17:11 -0700)]

ocfs2: Comment struct ocfs2_extent_tree_operations.

struct ocfs2_extent_tree_operations provides methods for the different
on-disk btrees in ocfs2. Describing what those methods do is probably a
good idea.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>

commit | commitdiff | tree

Joel Becker [Thu, 21 Aug 2008 02:36:33 +0000 (19:36 -0700)]

ocfs2: Make ocfs2_extent_tree the first-class representation of a tree.

We now have three different kinds of extent trees in ocfs2: inode data
(dinode), extended attributes (xattr_tree), and extended attribute
values (xattr_value).  There is a nice abstraction for them,
ocfs2_extent_tree, but it is hidden in alloc.c.  All the calling
functions have to pick amongst a varied API and pass in type bits and
often extraneous pointers.

A better way is to make ocfs2_extent_tree a first-class object.
Everyone converts their object to an ocfs2_extent_tree() via the
ocfs2_get_*_extent_tree() calls, then uses the ocfs2_extent_tree for all
tree calls to alloc.c.

This simplifies a lot of callers, making for readability.  It also
provides an easy way to add additional extent tree types, as they only
need to be defined in alloc.c with a ocfs2_get_<new>_extent_tree()
function.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>

Unnamed repository; edit this file 'description' to name the repository.

RSS Atom