Overview of Document:
=====================
-This document is intended to give an good overview of how to debug
-Linux for s/390 & z/Architecture it isn't intended as a complete reference & not a
-tutorial on the fundamentals of C & assembly, it dosen't go into
+This document is intended to give a good overview of how to debug
+Linux for s/390 & z/Architecture. It isn't intended as a complete reference & not a
+tutorial on the fundamentals of C & assembly. It doesn't go into
390 IO in any detail. It is intended to complement the documents in the
reference section below & any other worthwhile references you get.
0 0 Reserved ( must be 0 ) otherwise specification exception occurs.
1 1 Program Event Recording 1 PER enabled,
- PER is used to facilititate debugging e.g. single stepping.
+ PER is used to facilitate debugging e.g. single stepping.
2-4 2-4 Reserved ( must be 0 ).
1 1 64 bit
32 1=31 bit addressing mode 0=24 bit addressing mode (for backward
- compatibility ), linux always runs with this bit set to 1
+ compatibility), linux always runs with this bit set to 1
33-64 Instruction address.
33-63 Reserved must be 0
are used by the processor itself for holding such information as exception indications &
entry points for exceptions.
Bytes after 0xc00 hex are used by linux for per processor globals on s/390 & z/Architecture
-( there is a gap on z/Architecure too currently between 0xc00 & 1000 which linux uses ).
+( there is a gap on z/Architecture too currently between 0xc00 & 1000 which linux uses ).
The closest thing to this on traditional architectures is the interrupt
vector table. This is a good thing & does simplify some of the kernel coding
however it means that we now cannot catch stray NULL pointers in the
On 390 our limitations & strengths make us slightly different.
For backward compatibility we are only allowed use 31 bits (2GB)
-of our 32 bit addresses,however, we use entirely separate address
+of our 32 bit addresses, however, we use entirely separate address
spaces for the user & kernel.
This means we can support 2GB of non Extended RAM on s/390, & more
but only mess with 2 segment indices each time we mess with
a PMD.
-3) As z/Architecture supports upto a massive 5-level page table lookup we
+3) As z/Architecture supports up to a massive 5-level page table lookup we
can only use 3 currently on Linux ( as this is all the generic kernel
currently supports ) however this may change in future
this allows us to access ( according to my sums )
defined in linux/include/linux/sched.h
The S390 on initialisation & resuming of a process on a cpu sets
the __LC_KERNEL_STACK variable in the spare prefix area for this cpu
-( which we use for per processor globals).
+(which we use for per-processor globals).
-The kernel stack pointer is intimately tied with the task stucture for
+The kernel stack pointer is intimately tied with the task structure for
each processor as follows.
s/390
}
i.e. just anding the current kernel stack pointer with the mask -8192.
-Thankfully because Linux dosen't have support for nested IO interrupts
+Thankfully because Linux doesn't have support for nested IO interrupts
& our devices have large buffers can survive interrupts being shut for
short amounts of time we don't need a separate stack for interrupts.
Overview:
---------
This is the code that gcc produces at the top & the bottom of
-each function, it usually is fairly consistent & similar from
-function to function & if you know its layout you can probalby
+each function. It usually is fairly consistent & similar from
+function to function & if you know its layout you can probably
make some headway in finding the ultimate cause of a problem
after a crash without a source level debugger.
back-chain:
This is a pointer to the stack pointer before entering a
framed functions ( see frameless function ) prologue got by
-deferencing the address of the current stack pointer,
+dereferencing the address of the current stack pointer,
i.e. got by accessing the 32 bit value at the stack pointers
current location.
------
1) The only requirement is that registers which are used
by the callee are saved, e.g. the compiler is perfectly
-capible of using r11 for purposes other than a frame a
+capable of using r11 for purposes other than a frame a
frame pointer if a frame pointer is not needed.
2) In functions with variable arguments e.g. printf the calling procedure
is identical to one without variable arguments & the same number of
1) You can double check whether the files you expect to be included are the ones
that are being included ( e.g. double check that you aren't going to the i386 asm directory ).
2) Check that macro definitions aren't clashing with typedefs,
-3) Check that definitons aren't being used before they are being included.
+3) Check that definitions aren't being used before they are being included.
4) Helps put the line emitting the error under the microscope if it contains macros.
For convenience the Linux kernel's makefile will do preprocessing automatically for you
A source/assembly mixed dump of the kernel can be done with the line
objdump --source vmlinux > vmlinux.lst
-Also if the file isn't compiled -g this will output as much debugging information
-as it can ( e.g. function names ), however, this is very slow as it spends lots
-of time searching for debugging info, the following self explanitory line should be used
-instead if the code isn't compiled -g.
+Also, if the file isn't compiled -g, this will output as much debugging information
+as it can (e.g. function names). This is very slow as it spends lots
+of time searching for debugging info. The following self explanatory line should be used
+instead if the code isn't compiled -g, as it is much faster:
objdump --disassemble-all --syms vmlinux > vmlinux.lst
-as it is much faster
-As hard drive space is valuble most of us use the following approach.
+As hard drive space is valuable most of us use the following approach.
1) Look at the emitted psw on the console to find the crash address in the kernel.
2) Look at the file System.map ( in the linux directory ) produced when building
the kernel to find the closest address less than the current PSW to find the
6) rm /arch/s390/kernel/signal.o
7) make /arch/s390/kernel/signal.o
8) watch the gcc command line emitted
-9) type it in again or alernatively cut & paste it on the console adding the -g option.
+9) type it in again or alternatively cut & paste it on the console adding the -g option.
10) objdump --source arch/s390/kernel/signal.o > signal.lst
This will output the source & the assembly intermixed, as the snippet below shows
This will unfortunately output addresses which aren't the same
to a file & on the screen.
Q. What use is it ?
-A. You can used it to find out what files a particular program opens.
+A. You can use it to find out what files a particular program opens.
If you wanted to know does ping work but didn't have the source
strace ping -c 1 127.0.0.1
& then look at the man pages for each of the syscalls below,
-( In fact this is sometimes easier than looking at some spagetti
-source which conditionally compiles for several architectures )
-Not everything that it throws out needs to make sense immeadiately
+( In fact this is sometimes easier than looking at some spaghetti
+source which conditionally compiles for several architectures ).
+Not everything that it throws out needs to make sense immediately.
Just looking quickly you can see that it is making up a RAW socket
for the ICMP protocol.
Example 3
---------
-Getting sophistocated
-telnetd crashes on & I don't know why
+Getting sophisticated
+telnetd crashes & I don't know why
+
Steps
-----
1) Replace the following line in /etc/inetd.conf
Performance Debugging
=====================
-gcc is capible of compiling in profiling code just add the -p option
+gcc is capable of compiling in profiling code just add the -p option
to the CFLAGS, this obviously affects program size & performance.
This can be used by the gprof gnu profiling tool or the
gcov the gnu code coverage tool ( code coverage is a means of testing
-----
Addresses & values in the VM debugger are always hex never decimal
Address ranges are of the format <HexValue1>-<HexValue2> or <HexValue1>.<HexValue2>
-e.g. The address range 0x2000 to 0x3000 can be described described as
-2000-3000 or 2000.1000
+e.g. The address range 0x2000 to 0x3000 can be described as 2000-3000 or 2000.1000
The VM Debugger is case insensitive.
An alternative way of finding the STD of a currently running process
is to do the following, ( this method is more complex but
-could be quite convient if you aren't updating the kernel much &
+could be quite convenient if you aren't updating the kernel much &
so your kernel structures will stay constant for a reasonable period of
time ).
To find out how many cpus you have
Q CPUS displays all the CPU's available to your virtual machine
To find the cpu that the current cpu VM debugger commands are being directed at do
-Q CPU to change the current cpu cpu VM debugger commands are being directed at do
+Q CPU to change the current cpu VM debugger commands are being directed at do
CPU <desired cpu no>
On a SMP guest issue a command to all CPUs try prefixing the command with cpu all.
To issue a command to a particular cpu try cpu <cpu number> e.g.
CPU 01 TR I R 2000.3000
If you are running on a guest with several cpus & you have a IO related problem
-& cannot follow the flow of code but you know it isnt smp related.
+& cannot follow the flow of code but you know it isn't smp related.
from the bash prompt issue
shutdown -h now or halt.
do a Q CPUS to find out how many cpus you have
our 3rd return address is 8001085A
as the 04B52002 looks suspiciously like rubbish it is fair to assume that the kernel entry routines
-for the sake of optimisation dont set up a backchain.
+for the sake of optimisation don't set up a backchain.
now look at System.map to see if the addresses make any sense.
Unlike other bus architectures modern 390 systems do their IO using mostly
fibre optics & devices such as tapes & disks can be shared between several mainframes,
-also S390 can support upto 65536 devices while a high end PC based system might be choking
+also S390 can support up to 65536 devices while a high end PC based system might be choking
with around 64. Here is some of the common IO terminology
Subchannel:
-This is the logical number most IO commands use to talk to an IO device there can be upto
+This is the logical number most IO commands use to talk to an IO device there can be up to
0x10000 (65536) of these in a configuration typically there is a few hundred. Under VM
for simplicity they are allocated contiguously, however on the native hardware they are not
they typically stay consistent between boots provided no new hardware is inserted or removed.
TEST SUBCHANNEL ) we use this as the ID of the device we wish to talk to, the most
important of these instructions are START SUBCHANNEL ( to start IO ), TEST SUBCHANNEL ( to check
whether the IO completed successfully ), & HALT SUBCHANNEL ( to kill IO ), a subchannel
-can have up to 8 channel paths to a device this offers redunancy if one is not available.
+can have up to 8 channel paths to a device this offers redundancy if one is not available.
Device Number:
also they are made up of a CHPID ( Channel Path ID, the most significant 8 bits )
& another lsb 8 bits. These remain static even if more devices are inserted or removed
from the hardware, there is a 1 to 1 mapping between Subchannels & Device Numbers provided
-devices arent inserted or removed.
+devices aren't inserted or removed.
Channel Control Words:
CCWS are linked lists of instructions initially pointed to by an operation request block (ORB),
concurrently, you check how the IO went on by issuing a TEST SUBCHANNEL at each interrupt,
from which you receive an Interruption response block (IRB). If you get channel & device end
status in the IRB without channel checks etc. your IO probably went okay. If you didn't you
-probably need a doctorto examine the IRB & extended status word etc.
-If an error occurs more sophistocated control units have a facitity known as
+probably need a doctor to examine the IRB & extended status word etc.
+If an error occurs, more sophisticated control units have a facility known as
concurrent sense this means that if an error occurs Extended sense information will
be presented in the Extended status word in the IRB if not you have to issue a
subsequent SENSE CCW command after the test subchannel.
IOP's can use one or more links ( known as channel paths ) to talk to each
IO device. It first checks for path availability & chooses an available one,
then starts ( & sometimes terminates IO ).
-There are two types of channel path ESCON & the Paralell IO interface.
+There are two types of channel path: ESCON & the Parallel IO interface.
IO devices are attached to control units, control units provide the
logic to interface the channel paths & channel path IO protocols to
The 390 IO systems come in 2 flavours the current 390 machines support both
-The Older 360 & 370 Interface,sometimes called the paralell I/O interface,
+The Older 360 & 370 Interface,sometimes called the Parallel I/O interface,
sometimes called Bus-and Tag & sometimes Original Equipment Manufacturers
Interface (OEMI).
-This byte wide paralell channel path/bus has parity & data on the "Bus" cable
+This byte wide Parallel channel path/bus has parity & data on the "Bus" cable
& control lines on the "Tag" cable. These can operate in byte multiplex mode for
sharing between several slow devices or burst mode & monopolize the channel for the
-whole burst. Upto 256 devices can be addressed on one of these cables. These cables are
+whole burst. Up to 256 devices can be addressed on one of these cables. These cables are
about one inch in diameter. The maximum unextended length supported by these cables is
125 Meters but this can be extended up to 2km with a fibre optic channel extended
such as a 3044. The maximum burst speed supported is 4.5 megabytes per second however
ESCON if fibre optic it is also called FICON
Was introduced by IBM in 1990. Has 2 fibre optic cables & uses either leds or lasers
-for communication at a signaling rate of upto 200 megabits/sec. As 10bits are transferred
+for communication at a signaling rate of up to 200 megabits/sec. As 10bits are transferred
for every 8 bits info this drops to 160 megabits/sec & to 18.6 Megabytes/sec once
control info & CRC are added. ESCON only operates in burst mode.
known as XDF ( extended distance facility ). This can be further extended by using an
ESCON director which triples the above mentioned ranges. Unlike Bus & Tag as ESCON is
serial it uses a packet switching architecture the standard Bus & Tag control protocol
-is however present within the packets. Upto 256 devices can be attached to each control
+is however present within the packets. Up to 256 devices can be attached to each control
unit that uses one of these interfaces.
Common 390 Devices include:
DASD's direct access storage devices ( otherwise known as hard disks ).
Tape Drives.
CTC ( Channel to Channel Adapters ),
-ESCON or Paralell Cables used as a very high speed serial link
+ESCON or Parallel Cables used as a very high speed serial link
between 2 machines. We use 2 cables under linux to do a bi-directional serial link.
OSA 7C14 ON OSA 7C14 SUBCHANNEL = 0002
OSA 7C15 ON OSA 7C15 SUBCHANNEL = 0003
-If you have a guest with certain priviliges you may be able to see devices
-which don't belong to you to avoid this do add the option V.
+If you have a guest with certain privileges you may be able to see devices
+which don't belong to you. To avoid this, add the option V.
e.g.
Q V OSA
RECEIVE / LOG TXT A1 ( replace
8)
filel & press F11 to look at it
-You should see someting like.
+You should see something like:
00020942' SSCH B2334000 0048813C CC 0 SCH 0000 DEV 7C08
CPA 000FFDF0 PARM 00E2C9C4 KEY 0 FPI C0 LPM 80
--------
info registers: displays registers other than floating point.
info all-registers: displays floating points as well.
-disassemble: dissassembles
+disassemble: disassembles
e.g.
disassemble without parameters will disassemble the current function
disassemble $pc $pc+10
info breakpoints: shows all current breakpoints
-info stack: shows stack back trace ( if this dosent work too well, I'll show you the
+info stack: shows stack back trace ( if this doesn't work too well, I'll show you the
stacktrace by hand below ).
info locals: displays local variables.
list:
e.g.
list lists current function source
-list 1,10 list first 10 lines of curret file.
+list 1,10 list first 10 lines of current file.
list test.c:1,10
directory:
Adds directories to be searched for source if gdb cannot find the source.
-(note it is a bit sensititive about slashes )
+(note it is a bit sensitive about slashes)
e.g. To add the root of the filesystem to the searchpath do
directory //
Disassembling instructions without debug info
---------------------------------------------
-gdb typically compains if there is a lack of debugging
-symbols in the disassemble command with
-"No function contains specified address." to get around
+gdb typically complains if there is a lack of debugging
+symbols in the disassemble command with
+"No function contains specified address." To get around
this do
x/<number lines to disassemble>xi <address>
e.g.
current working directory.
This is very useful in that a customer can mail a core dump to a technical support department
& the technical support department can reconstruct what happened.
-Provided the have an identical copy of this program with debugging symbols compiled in &
+Provided they have an identical copy of this program with debugging symbols compiled in &
the source base of this build is available.
In short it is far more useful than something like a crash log could ever hope to be.
kill -SIGSEGV <gdb's pid>
or alternatively use killall -SIGSEGV gdb if you have the killall command.
Now look at the core dump.
-./gdb ./gdb core
+./gdb core
Displays the following
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
/proc/1/mem is the current running processes memory which you
can read & write to like a file.
strace uses this sometimes as it is a bit faster than the
-rather inefficent ptrace interface for peeking at DATA.
+rather inefficient ptrace interface for peeking at DATA.
cat status
+ RELSTATUS=release
+ MACHTYPE=i586-pc-linux-gnu
-perl -d <scriptname> runs the perlscript in a fully intercative debugger
+perl -d <scriptname> runs the perlscript in a fully interactive debugger
<like gdb>.
Type 'h' in the debugger for help.
additional files, Kerntypes which is built using a patch to the
linux kernel sources in the linux root directory & the System.map.
-Kerntypes is an an objectfile whose sole purpose in life
+Kerntypes is an objectfile whose sole purpose in life
is to provide stabs debug info to lcrash, to do this
Kerntypes is built from kerntypes.c which just includes the most commonly
referenced header files used when debugging, lcrash can then read the