Last time we talked about how to get a kernel debug session between a natively running debugger machine (a.k.a. Host), and a OSX VM running under VMware Fusion. It was fun, right?
In fact, the fun part(s) is(are) yet to come!. First, my immidiate task is to get a XNU kernel built in a VM, and replace the off the shelf kernel ( named mach_kernel or kernel or kernel.debug ...). Then get a kernel debugging going between that VM and the Host. Building the kernel is fine. I can built the XNU package for 10.10 and 10.9 but running with or without kernel debug configuration is where I'm really challenged. With the same steps, as discussed in part I, I landed on Kernel waiting for Debugger connection, but the ip address as well as the mac address are all 0s with respective format. Clearly there is some problem!
On the otherhand, if I try to run the newly built kernel without debugger configuration, it hangs. And it does not matter if it is RELEASE or DEBUG built. Here comes the KDP, kernel debugging protocol. FreeBSD has basically the same idea. This protocol is based on TTCP over UDP. As far as I know this is not self contained when it comes to configuration side of the interfaces. This is Bridged interface of VMware virtualization feature. This I will tackle later. But what is really the problem ? AFAIK, KDP part initialized few ip related stuff to 0s, and there are outer layers that does not come with XNU source ( but some are in opensource.apple.com ) that keeps configurations in tact when it comes to full OSX buit.
One thing I did not mention in part I is that we need to map the syms and src when we need to debug kext. This is like external driver programming. For that, first thing to find out is what version of OSX you are running. It can be found from Apple icon. Also from tty, uname -a will give you bit more information. Once you know what version you are running, go to Apple site, and download respective "Kernel Debugging Kit" onto your Host. In most versions of Kernel Debugging Kits, installing the dmg file is nothing but mounting the package. Once mounted, you will see Readme xml file. That has information about how to invoke LLDB, and thats it. You have symbolic debugging. Just start the VM with kernel debugging configured as explained in part I, and invoke LLDB, and play with the instructions given in the Readme xml file.
Note that using Kernel Debugging Kit implies, you are still using off the shelf XNU kernel that comes with the OS image. This is a complete build of Mac OSX,not just the XNU kernel code.
Next, we will get to bit more deeper side of Kernel hacking. For now Happy hacking !
Debugging not exactly an interesting thing for most programmer! And this is one of the reason for not many people are ready to debug kernel code written by others. But it is an excellent trade to have!
There a few different ways one can debug kernel code -
-- Passive debugging using message printing.
-- Active debugging under a debugger.
In this part, we will talk about active debugging under kernel debugger. For quite sometime, GDB used to be the debugger of choice with some modifications. GDB by design is not a kernel debugger in true sense. So there has to be some kind of patching that takes GDB to become a kernel debugger. In Linux and FreeBsd world, this is often called kgdb. In OSX, it is just GDB. Recent moves by Apple computer made GDB being not the default debugger. Unless you get a back level source and compile it under OSX, with perhaps some tweeks, it is difficult to get a recent GDB to work as a kernel debugger under current ( i.e 2014 - 2015 ) versions of OSX.
So here comes the LLDB. It is under the LLVM umbrella. It can be used for live kernel debugging, as well as two machine kernel debugging. Live debugging means single machine kernel debugging in this context. And two machines debugging is true sense kernel debugging. Though these machines could be VMs !
Here the setup for kernel debugging is - A laptop or Mini OSX running natively on Intel hardware, and a VM running on Fusion Virtual machine workstation infrastructure from VMware. In a two machine setup, the debugger machine is called the Host or Debugger, and the machine being debugged is the Target. Here the native OSX is the Host, and VM OSX is the Target.
Configuration for VM based debugging, the current discussion here, is different from two machine debugging with both machines running OSX natively. You can always look at Apple developers' site as well as looking at the ReadMe file of respective Kernel Debug Kit for the OSX version you are running. For setting up a VM under Fusion, please follow the instructions from Fusion package, as well as online blogs.
Assuming you have a VM running, one of the thing you need to do is to set the Networking as Bridged Network from one of the menu of Fusion. This will make the setting up and communication quite easy. Once it is set, please take a look at the ifconfig to make sure, the interface that has similar ip is indeed "en0", otherwise you need add another parameter. The detail you can find from any of the ReadMe file of Kernel Debug kits from Apple. Usually it is always "en0" unless you have multiple interfaces.
Now ping the VM from the host to make sure that they can talk over the Bridged network.
Finally before you go and change anything further, make sure you take a snapshot of the VM, so that you can always come back to this point of configuration :-)
On the host, you will need to make a static ip entry using the following format ( with respective ip and mac address from your interface )
sudo arp -s 10.2.4.20 00:0c:09:f3:f9:e2
There is a configuration file - /Library/Preferences/SystemConfiguration/com.apple.Boot.plist. Take a look at that, and now change that file to the following, under sudo mode -
Configuration is done! Reboot the VM, it will wait for debugger connection.
On the Host machine, have terminal ( i.e tty ) command line ready, execute lldb without any arguments.
On command prompt of lldb, just issue the following command -
lldb will break into the VMs kernel, now continue on lldb to let the VMs boot. Rest of it is to get to know how to use LLDB, and it is for another topic !
Examples of extended inline assembly -
void mycfunc ( )
int a =10, b = 0;
asm ("movl %1, %%eax; \
movl %%eax, %0;"
:"=r" (b) /*output constraint, first mention of a variable b so b is %0, = sign mean target var */
: "r" (a) /*Input constraint, a var is now %1, no = sign means it is not output or target */
: "%eax" /* we clobbering eax register, so gcc will preserve eax's previous content */
-- Watch that we need to refer to registers using %%, instead of %.
-- If we use registers, we should make GCC aware about them, so it preserves the values before using them.
Compiler toolchains are smart these days, so depending on the analysis, GCC could delete the whole code section or a part or it can move code around for optimization. There may be situation where we want the compiler to stop doing those optimization, and place the code as is ( i.e. in place ). For that in systems code we almost always uses __volatile__ keyword, like the following example.
From <asm-i386/atomic.h> of Linux source -
statoc __inline__ void atomic_inc(atomic_t *v)
__asm__ __volatile__ (
LOCK "incl %0"
: "m" (v->counter)
For the detail syntax, and semantics, please use the GNU compiler connection manual. It is available on line.
Remember the dialog of this excellent and funny movie !
When working with proprietory code, some time it is absolutely necessary to disassemble pieces of code to understand some funcky stuff. On example is working with Windows internals. One can't get far without disassembling some parts of kernel implementations. Windows debugger is good at that, and the micorsoft team are not afraid to encourage us to do that, well of course if we know what we are doing!
So for some one not too familiar about x86 family of code, it is a good starting point, just in case you might need it sometime. I heard way too many times that - "Hey we are engineers, we make things !". But hardly every you will hear that - "Hey we break things too !" Sometime breaking little stuff gives you the confidence, along with frustration and craving for wanting to know more.
Before we go ahead, it is good to mention that Windows also support other architectures. And in case you are completely new at delving deep into assembly language while you are quite familiar with higher level language like C/C++ then first thing to follow is to read - "Just enough assembly assembly language to get by ( http://www.microsoft.com/msj/0298/hood0298.aspx )". It is two part article, and enough to get one become curious. It did to me, though I was not new to assembly language.
On the other hand, you might learn directly using assembly language to build little applications. So my suggestion is to use both methods.
Here our discussion is bit different. It is about using gnu assembler (as), and it is bit more arcane than what normal assembly language programming. It is about embedded ( inline ) assembly feature of gnu assembler, named GAS.
First compatibility wise, some of the inline assembly does not quite work from one platform to another. Particularly be watchful, if you happen to test some of them in freebsd 7.2 or earlier. Remember that we are discussing inline assembly in C file. GCC, the complier does not have any notion of assemby syntax, so it does not try to parse anything inside such instruction, it passes it to assembler who knows what to do with them.
Next, we might ask why do we need it. There are times when you need to access the register transfer level instructions for various reasons like: Kernel programming; Interrupt handling; fast performance; etc.
is the general structure of a simple inline assembly statement.
asm("nop") ; asm("cli"); asm("sti"); are some of the simple statements.
In case you have any named variable asm in global scope, you may want to use __asm__ instead to avoid name conflict.
Extended inline assembly is where this feature really shine !. If you happen to have a look at Linux or XNU kernel, you will find quite a bit of example use and their power.
Form of an extended inline assembly is -
asm ("assembly statement" : "output constraint(s)" : "input constraint(s)" : "clobbered constraint(s)" );
In a simple inline assembly, none of the constraint(s) are needed. That tells that they are optional. So any of the above constraints could be absent in an extended inline assembly instruction. Also within any constraint clause we can have more than one constraints.
Most important constraint is the clobbered one. GCC does not want to know or care about what registers are being used as target of operations in your statement, so if we want to perserve consistent machine state we need to tell the compiler what register sets are being clobberred so gcc can produce code that would save the registers before being used in the inline statment.