Quantcast
Channel: Mike Stall's .NET Debugging Blog
Viewing all 35 articles
Browse latest View live

Jan is on MSDN TV talking about MDbg

$
0
0

Jan Stranik is on MSDN TV talking about MDbg, the managed-debugging sample written in C#.  See the video here.  Jan wrote most of MDbg, and handle a lot of hard problems about getting a working debugger in managed code.

He gives a brief overview of managed-debugging and of Mdbg's architecture, including how Mdbg layers things on ICorDebug.
He also has the following demos:
1. creating an MDbg extension.  (I blogged about that here)
2. using the mdbg wrappers in your own tools. (Kind of like a combination of enumerating all processes, attaching to them, and listing their modules).

Check it out!


Doing Detach with ICorDebug

$
0
0

Detaching a managed-debugger is somewhat complicated at the ICorDebug API level. In a perfectly-friendly API, you could just call "ICorDebugProcess::Detach" and be done with it. With managed-debugging, there are two main constraints and the hresults (in parenthesis) that you'll get for violating them:

  1. You can't be doing interop-debugging (CORDBG_E_INTEROP_NOT_SUPPORTED) or Edit-and-Continue (CORDBG_E_DETACH_FAILED_ON_ENC).
  2. The debuggee must be synchronized (CORDBG_E_PROCESS_NOT_SYNCHRONIZED). This means that you must be stopped at a managed callback, or have called ICorDebugProcess::Stop().
  3. The debugger has to "undo" any stuff it did, such as:
    • adding func-evals (CORDBG_E_DETACH_FAILED_OUTSTANDING_EVALS )
    • cancel outstanding  steppers (CORDBG_E_DETACH_FAILED_OUTSTANDING_STEPPERS).
    • remove outstanding breakpoints (CORDBG_E_DETACH_FAILED_OUTSTANDING_BREAKPOINTS).
  4. The event queue should be drained.


Why?
A lot of this complication is a hold-over from our "ICorDebug is a rocket science API" days. In Whidbey (V2), we tried to make the API friendlier. We actually added almost all of the hresults above for Whidbey.

Some of these restrictions are overly conservative. For example, ICorDebug could remove all steppers and breakpoints itself. Or it could even keep them in and just ignore them if no debugger was attached.
However, some have decent reasons. Dealing with outstanding func-evals is more complex. There's not a clear correct behavior if you try to detach while in a nested-break state. Also, we wanted the debugger still attached to be available to restore the thread at the func-eval completion. Detaching during EnC is also problematic because the debugger (not ICorDebug) has a lot of responsibility for keeping an EnCed program running (eg, by handling remaps). 

What if you rudely kill the debugger?
By default, if you rudely kill the debugger, all it's debuggees get killed too. We call the scenario of killing off the debugger in an uncooperative fashion "Rude-Detach". It was definitely not supported in V1.1. For V2, a CLR Host can respond to Rude-detach.

What to expect when you Attach, Async-Break

$
0
0

Don't assume that if you have a thread doing a spin-wait, that you can attach / asynchronously-break ("async-break") and your debugger will immediately stop at the spin-wait.

When you attach to a debuggee or async-break while debugging, the debuggee's threads could be anywhere. Even if you attach in response to a debuggee request (such as an unhandled exception or Debugger.Launch), other threads may be anywhere.

Furthermore, there is no notion of "active thread" in the underlying debug APIs (both ICorDebug and windows). It's purely a construct in a debugger UI to make it easier for end-users. Debugger's generally track an active thread that defaults to the last thread sending a debug-event, and the user can also explicitly set it, perhaps via a Threads-list window. Now the MDbg APIs do have a notion of active-thread. It's simply extra state in the MDbg wrappers to let extensions get at that piece of the UI.

Attach vs. Async-Break: In terms of stop-states, Stopping on Attach is very similar to Async-Break in that they both stop the process asynchronously using similar mechanisms. From this perspective, you can view Attach as establishing the debugger connection and then doing an Async-break. So I'll talk about them both together here. The following table looks at both attach and async-break across both managed and native debugging.

 

 NativeManaged
Attach

 

For Native-debugging, there is a "loader breakpoint", which is a break point event that comes after the load events and serves as an 'attach complete' event for native. This event comes in the normal launch sequence after the load events, and since attach fakes up the same events as launch, it makes sense to come in the attach case too.Managed-debugging has no attach-complete event (MDbg fakes one up to be nice), and so it doesn't have a clear thread to pick.  I would consider this a minor design flaw with managed debugging because you have no way of reliably knowing when the fake events are done and the real events start. (Mdbg uses a heuristic). At the same time, the native loader-breakpoint isn't super-reliable either, but it's better than nothing.
Async-breakAn Async Break in native-debugging is done by injecting a tiny thread into the debuggee that just generates an break point event (int3 on x86).  (See kernel32!DebugBreakProcess). This is why the "active thread' in most native debuggers would be this random injected thread. VS tries to be nice and hide this thread. Windbg doesn't have any such sugar and will show you this "async break" thread.For manager-debugging, Async-break is done cooperatively by the CLR using similar logic that the GC uses for suspension. For a GC, the thread requesting the GC (usually the thread allocating) pumps the suspension. In V2, our helper-thread pumps the suspension for the debugger.
So when the debugger suspension is complete for async-break, there is no managed thread directly responsible (see ICDThread trivia and so no good candidate for active thread in the managed debugger.

Attaching to a known state:
We have a lot of debugger tests where we want to attach to a known state with a particular "active thread" in the debugger. The way we do this in our tests is to have the debuggee do something like:

    while(!Debugger.IsAttached) { ; };  // loops until a debugger is attached
    Debugger.Break(); // sets the active thread.

And then the debugger can Attach and let the debuggee run to the Debugger.Break(). Then we get a debug event from a known thread and that becomes the active thread.

Trivia about Set-next-Statement (SetIp)

$
0
0

The poor-man's version of Set-Next-Statement (aka, SetIp) is to just forcibly set the instruction pointer register (eip on x86) to the instruction you want to execute next. However, this naive approach has several problems that ICorDebug's SetIp solves: 
- the naive approach doesn't update variable homes. For example, suppose local variable 'x' starts off in ecx and then gets spilled to the stack. If you just shift the eip, you won't update the local variables to be in their new homes. ICorDebug bends over backwards to remap locals from the old variable homes to the new ones when you set-next-statement.
- SetIp must play well with other things in the function, particularly any sort of stack operations like localloc or exception handling. 

Failure cases:
In ICorDebug, if SetIp can't properly handle these things, it fails the operation. This often requires additional information not tracked in optimized code, which is why managed setip often fails in optimized code. ICorDebug exposes many SetIp failure codes (all defined in corerror.h):

CORDBG_S_BAD_START_SEQUENCE_POINT
CORDBG_S_BAD_END_SEQUENCE_POINT
CORDBG_S_INSUFFICIENT_INFO_FOR_SET_IP
CORDBG_E_CANT_SET_IP_INTO_FINALLY
CORDBG_E_CANT_SET_IP_OUT_OF_FINALLY
CORDBG_E_CANT_SET_IP_INTO_CATCH
CORDBG_E_SET_IP_NOT_ALLOWED_ON_NONLEAF_FRAME
CORDBG_E_SET_IP_IMPOSSIBLE
CORDBG_E_CANT_SETIP_INTO_OR_OUT_OF_FILTER
CORDBG_E_CANT_SET_IP_OUT_OF_FINALLY_ON_WIN64
CORDBG_E_CANT_SET_IP_OUT_OF_CATCH_ON_WIN64
CORDBG_E_SET_IP_NOT_ALLOWED_ON_EXCEPTION
 

It's another design issue whether we should have exposed so many different error codes. This is probably more detail than the user needs; and some of these are pretty implementation specific. The bright side is that this enables a debugger to provide a detailed error message about why SetIp may not be allowed. We have a similar issue with func-eval error codes.

Doing SetIp at the ICorDebug level:
In ICorDebug, SetIp is exposed off the ICorDebug*Frame interfaces. You can set IP based off either IL or native offsets. ICD exposes two methods: SetIp and CanSetIp. CanSetIp lets a debugger query if SetIp would succeed without actually doing the SetIp. This can be used in UI for the end-user. For example, Visual Studio lets you SetIp by dragging the current line arrow (the yellow thing). It can then change the cursor icon if the target is an invalid SetIp destination.It can also use CanSetIp to determine whether it should include a "Set Next Statement" option in a context menu. 

 

The stop count and trivia

$
0
0

ICorDebug maintains a stop-count, and so if you call ICorDebugProcess::Stop() twice in a row, the 1st stop does the real asynchronous-break, and the 2nd stop is basically a nop that just increments a counter. You'll then need to call ICorDebugProcess::Continue() twice. The first Continue() call just decrements the stop-counter; and then the 2nd call does the real continue.

More trivia:
1) A debugger implements Async-break by calling ICDProcess::Stop().  That API is synchronous and will block until the debuggee stops. In fact, that is one of the few synchronous invasive methods in all of ICorDebug. Once it successfully returns, the debuggee is synchronized and you need to call an extra Continue() to resume it. 
2) Dispatching a callback bumps up the stop-counter, and so it takes 1 call to Continue() to resume from a managed event callback. So if you call stop inside a callback, then you'll need to call an extra Continue.
3) You can also call both Stop + Continue on any threads.

Why have a stop-count?
You could call Stop() on thread 1 which could race with ICD's thread dispatching a managed debug event. This scenario is a motivation for the stop-count. If we didn't have the stop-count, then Continuing the Stop() on thread 1 may accidentally continue the debug event too.

Stop the debuggee to poke at it

$
0
0

In ICorDebug, most operations are only available when the debuggee is stopped. (This was asked here). Many things will fail with CORDBG_E_PROCESS_NOT_SYNCHRONIZEDif you call them when the process is running. The motivation is:
1) Correctness: trying to query a running debuggee is not safe and may produce very unintuitive results.
2) Simplicity: querying a running debuggee introduces race issues that complicate the implementation.

Example: taking a callstack while running
For example, it's not clear what the correct behavior is when taking a callstack of a moving thread, particularly because ICD provides such intimate inpsection. Even if we suspend the thread, take the callstack, and then resume, there could still be problems. Imagine if the CLR had code-pitching (which was originally planned) and we pitched a method on the stack in the middle of the debugger taking the stack trace. So even if we let ICD clients take callstacks while running, there's a decent chance that the clients would not handle all the innate corner cases and misuse the information. And we judged there's a higher chance that if the client is indeed querying while the debuggee is running, it's actually by accident (perhaps there's a race in their code). So we made the judgment to fail the query operations. This is an example of rejecting behavior that's only correct 90% of the time.

Backwards compat + the Stop Count:
Unfortunately, we didn't start really enforcing this until V2; and so there were cases where this enforcement was an unacceptable breaking change.  Breakpoints are one example. Visual Studio lets you add breakpoints even while the process is running. This exerted a pressure on ICorDebug to let you add breakpoints (call ICorDebugBreakpoint::Activate) while the process is running. Our approach here is to Stop things under the covers, do the operation, and then resume things. This is feasible because of the stop-count. If the debuggee is already stopped, then the extra stop/go is just increment/decrement a counter. If the debuggee is running, then at least the operation now has clear semantics (it's a snapshot).

Categories:
Thus the ICD APIs fall into the following categories:
1) Must-be-synchronized: The debuggee must be synchronized in order to inspect. This is our preferred API type and is true for most inspection APIs.
2) Should-be-synchronized:  This will do a Stop/Go around the API so that it degenerates into the 'Must-be-synchronized' case. This is mainly for backwards compat.
3) Can-be-live: This is for a small set of APIs that don't matter if they're synchronized. For example, read-only APIs like getting the process PID.

If you're looking at the rotor sources, these correspond to the ATT_* macros you see defined in rspriv.h and present at the top of most ICD public functions.

What about MDbg?
Mdbg only lets you do commands while the debuggee is stopped, and so that mostly avoids the issue. This fits naturally with a command-line debugger's UI, and is also consistent with Windbg's UI.

Debugger.Log vs. OutputDebugString

$
0
0

Both Debugger.Log and OutputDebugString have some key similarities:

  1. The both log string for debugging purposes
  2. Both have thread-affinity. (The debugger can find out which thread logged)
  3. Data is piped through with no additional semantics in the debugger pipeline.

But they have some key differences.

 mscorlib!System.Diagnostics.Debugger.Logkernel32!OutputDebugString
Managed?ManagedNative
Debug events:Managed debug event, see here for how debugger retreives it.   Native debug event. Retrieved via a OUTPUT_DEBUG_STRING_INFO event
Information that's logged (level, category string, message string) (message string)
ActivationCan be enabled / disabled (enabled from debugger via ICorDebugProcess::EnableLogMessages, and can be checked via  Debugger.IsLogging). Debugger must enable to get the messages.Always enabled
miscCalls OutputDebugString with category and message strings. 
Sniffing outside a debuggerMust be attached as a managed debugger to get the managed log events. (DbMon can still sniff the OutputDebugStrings)Events can be sniffed from DbMon even when a debugger is not attached.

If you're writing managed code, you should probably call Debugger.Log() instead of pinvoke out to OutputDebugString.

Managed vs. Native debugging APIs

$
0
0

FxCop has a great rule (UseManagedEquivalentsOfWin32Api) to tell you about managed APIs that exist instead of trying to pinvoke out. 

I'm writing a native debugger in managed code (more on this later), and FxCop was telling me to use the managed debugging APIs instead of pinvoke out to the native ones. This came up in the fxcop forums. This raises an interesting point about the difference between some of the managed vs. native debugging APIs. Managed and Native debugging are (as of .NET 2.0) different services and so have different APIs with subtle distinctions.

Now, odds are that if you're running managed code (which you must be to do a pinvoke in the first place), then you care about managed-debugging and actually want to be calling the managed versions. In that case, Debugger.IsAttached(), Debugger.Break(), Debugger.Log() are really the one you care about. However, these APIs are still significantly different from their native counterparts. So if you are writing a native-debugger in managed code and really intend to pinvoke out to the native debugging APIs, you can't just replace them with the managed counterparts.
 

DebugActiveProcess != System.Diagnostics.Debugger.Launch
These ones are totally different.
System.Diagnostics.Debugger.Launch will (i) launch a (ii) managed debugger (using registry settings to find it), with the intent of attaching it to the (iii) current process.
kernel32!DebugActiveProcess will tell the current process to start (i) attach as a (ii) native debugger to the (iii) specified process (not the current process).

DebugBreak != System.Diagnostics.Debugger.Break
DebugBreak injects a (i) native breakpoint exception (eg, int3 on x86). If no native debugger is attached, it will use (ii) normal SEH processing, which may trigger an unhandled native exception and native jit-attach.
Debugger.Break injects a (ii) managed stop (UserBreak) and if no managed debugger is attached, it will (ii) trigger a jit-attach for a managed debugger.


IsDebuggerPresent != System.Diagnostics.Debugger.IsAttached
See here for more details. In short, kernel32!IsDebuggerPresent tells you if a (i) native debugger is attached (which includes interop-debugging), whereas Debugger.IsAttached tells you if a (i) managed debugger is attached. As of .Net 2.0, this is a significant distinction. This may change in the future (eg, if we built managed debugging on top of native debugging).

So for your standard C# debugging experience, IsDebuggerPresent() will return false while Debugger.IsAttached() will return true.

However, both of these APIs are evil because debuggers aren't supposed to change behavior, so please don't call them unless you really know what you're doing.

OutputDebugString =? System.Diagnostics.Debugger.Log
These are the closest in relationship. See here for more details. If you're writing managed code, you probably want to call Debugger.Log() instead of pinvoke out to OutputDebugString.
 

 

Summary:
1. The manage and native debugging APIs are functionally very different.
2. However, if you're writing in managed code, you probably intend to use the managed methods on Debugger (especially Debugger.Log and Debugger.Break)
3. If you are indeed writing a native debugger in managed code, then the differences between the method sets are important.

 


You can't cast from a pointer to a managed object in the debugger

$
0
0

During native debugging, it's common to cast a raw pointer value to a given type. Eg, in the expression window, do:
    (Foo*) 0x12345;

You may have noticed that you can't do this in managed code. This is a restriction at the ICorDebug (CLR) level. Some of the motivations for this are:

  1. Appdomains: You need to specify which AppDomain "Foo" is in (and that's hard for reasons mentioned here).  
  2. Func-eval: It's dangerous with func-eval. What if you cast garbage to a Foo* and then start func-evalling on it? Managed-code scenarios have a ton more-funceval than native, which aggravates this issue.
  3. Overall Safety: The debugging service's inprocess helper-thread has to execute inspection requests in the debuggee process space, so dangerous inspection requests have the potential to corrupt the debuggee.  We wanted to keep inspection "safe". We have some checking (obviously, you could get a corrupted reference with unsafe code), but casting a bad pointer, getting the right appdomain, doing func-evals on it, etc, would really tempt fate and could cause serious corruption in the debuggee. 
  4. Priority:  Although this is useful, we didn't see this as a top feature need for the mainline C#/VB scenarios. For example, this doesn't really come up in source-level non-optimized debugging scenarios. We focused on other features instead (like these).

With the current in-process architecture, we could put enough (potentially expensive) validation in there to make it safe in some cases. For example, doing a full GC walk could validate it's a valid reference object pointer; but value-type pointers would be much harder to validate. And depending on the CLR implementation, it may not even be possible to determine appdomain.

The signature for such a feature would look something like:
   HRESULT ICorDebugProcess3::GetValueFromAddress(ICorDebugType * pType, CORDB_ADDRESS address, ICorDebugValue ** ppOutValue);

We have a few features that are close, but nothing quite like this.
Not having this is definitely not ideal, and I hope we fix it at a future point.

Empty implementation of ICorDebugManagedCallback

$
0
0

I have to implement the ICorDebugManagedCallback interfaces. I wrote up a stub implementation (that just E_NOTIMPLs all the methods) and am posting it here for reference. It's pretty tedious, so I'll post it here and then never have to write that again.

As a language design point, it's definitely annoying using Interfaces for callback here because you have to write a handler for every possible method. It makes the advantage of C#'s Events much more obvious - you can just subscribe to the ones you want. That's a great quality for events that are ignorable (the framework can provide an intelligent default implementation and so your app doesn't need to handle them if it doesn't care).

Mdbg (our managed wrapprs for ICorDebug) converts from interface-based event dispatch to Event subscription dispatch.

When the callback is dispatched, the debuggee is stopped, and you need to continue it. In some cases, if the callback returns E_NOTIMPL, ICorDebug will automatically continue for you; but I don't know how consistently it does that.
So in general, the "empty" implementation of an ICorDebug callbacks has to at least call ICorDebugAppDomain::Continue() and return S_OK.

 

Feel free to copy + paste this at will if you happen to find it useful.


#define COM_METHOD  HRESULT STDMETHODCALLTYPE


class DefaultCallback :
   
public ICorDebugManagedCallback,
   
public ICorDebugManagedCallback2
{
   // Include your favorite implementation of IUnknown here

   
//
    // Implementation of ICorDebugManagedCallback
    //

    COM_METHOD Breakpoint( ICorDebugAppDomain *pAppDomain,
        ICorDebugThread *pThread,
        ICorDebugBreakpoint *pBreakpoint)
    {
       
return E_NOTIMPL;
    }

    COM_METHOD StepComplete( ICorDebugAppDomain *pAppDomain,
        ICorDebugThread *pThread,
        ICorDebugStepper *pStepper,
        CorDebugStepReason reason)
    {
       
return E_NOTIMPL;
    }
    COM_METHOD Break( ICorDebugAppDomain *pAppDomain,
        ICorDebugThread *thread)
    {
       
return E_NOTIMPL;
    }

    COM_METHOD Exception( ICorDebugAppDomain *pAppDomain,
        ICorDebugThread *pThread,
        BOOL unhandled)
    {
       
return E_NOTIMPL;
    }

    COM_METHOD EvalComplete( ICorDebugAppDomain *pAppDomain,
        ICorDebugThread *pThread,
        ICorDebugEval *pEval)
    {
       
return E_NOTIMPL;
    }

    COM_METHOD EvalException( ICorDebugAppDomain *pAppDomain,
        ICorDebugThread *pThread,
        ICorDebugEval *pEval)
    {
       
return E_NOTIMPL;
    }

    COM_METHOD CreateProcess( ICorDebugProcess *pProcess)
    {
       
return E_NOTIMPL;
    }

    COM_METHOD ExitProcess( ICorDebugProcess *pProcess)
    {
       
return E_NOTIMPL;
    }

    COM_METHOD CreateThread( ICorDebugAppDomain *pAppDomain,
        ICorDebugThread *thread)
    {
       
return E_NOTIMPL;
    }


    COM_METHOD ExitThread( ICorDebugAppDomain *pAppDomain,
        ICorDebugThread *thread)
    {
       
return E_NOTIMPL;
    }

    COM_METHOD LoadModule( ICorDebugAppDomain *pAppDomain,
        ICorDebugModule *pModule)
    {
       
return E_NOTIMPL;
    }

    COM_METHOD UnloadModule( ICorDebugAppDomain *pAppDomain,
        ICorDebugModule *pModule)
    {
       
return E_NOTIMPL;
    }

    COM_METHOD LoadClass( ICorDebugAppDomain *pAppDomain,
        ICorDebugClass *c) 
    {
       
return E_NOTIMPL;
    }

    COM_METHOD UnloadClass( ICorDebugAppDomain *pAppDomain,
        ICorDebugClass *c)
    {
       
return E_NOTIMPL;
    }

    COM_METHOD DebuggerError( ICorDebugProcess *pProcess,
        HRESULT errorHR,
        DWORD errorCode)
    {
       
return E_NOTIMPL;
    }

    COM_METHOD LogMessage( ICorDebugAppDomain *pAppDomain,
        ICorDebugThread *pThread,
        LONG lLevel,
        WCHAR *pLogSwitchName,
        WCHAR *pMessage)
    {
       
return E_NOTIMPL;
    }

    COM_METHOD LogSwitch( ICorDebugAppDomain *pAppDomain,
        ICorDebugThread *pThread,
        LONG lLevel,
        ULONG ulReason,
        WCHAR *pLogSwitchName,
        WCHAR *pParentName)
    {
       
return E_NOTIMPL;
    }

    COM_METHOD CreateAppDomain( ICorDebugProcess *pProcess,
        ICorDebugAppDomain *pAppDomain)
    {
       
return E_NOTIMPL;
    }

    COM_METHOD ExitAppDomain( ICorDebugProcess *pProcess,
        ICorDebugAppDomain *pAppDomain)
    {
       
return E_NOTIMPL;
    }

    COM_METHOD LoadAssembly( ICorDebugAppDomain *pAppDomain,
        ICorDebugAssembly *pAssembly)
    {
       
return E_NOTIMPL;
    }

    COM_METHOD UnloadAssembly( ICorDebugAppDomain *pAppDomain,
        ICorDebugAssembly *pAssembly)
    {
       
return E_NOTIMPL;
    }

    COM_METHOD ControlCTrap( ICorDebugProcess *pProcess)
    {
       
return E_NOTIMPL;
    }


    COM_METHOD NameChange( ICorDebugAppDomain *pAppDomain,
        ICorDebugThread *pThread)
    {
       
return E_NOTIMPL;
    }


    COM_METHOD UpdateModuleSymbols( ICorDebugAppDomain *pAppDomain,
        ICorDebugModule *pModule,
        IStream *pSymbolStream)
    {
       
return E_NOTIMPL;
    }

    COM_METHOD EditAndContinueRemap( ICorDebugAppDomain *pAppDomain,
        ICorDebugThread *pThread,
        ICorDebugFunction *pFunction,
        BOOL fAccurate)
    {
       
return E_NOTIMPL;
    }

    COM_METHOD BreakpointSetError( ICorDebugAppDomain *pAppDomain,
        ICorDebugThread *pThread,
        ICorDebugBreakpoint *pBreakpoint,
        DWORD dwError)
    {
       
return E_NOTIMPL;
    }

   
///
    /// Implementation of ICorDebugManagedCallback2
    ///

    COM_METHOD FunctionRemapOpportunity( ICorDebugAppDomain *pAppDomain,
        ICorDebugThread *pThread,
        ICorDebugFunction *pOldFunction,
        ICorDebugFunction *pNewFunction,
        ULONG32 oldILOffset)
    {
       
return E_NOTIMPL;
    }

    COM_METHOD CreateConnection( ICorDebugProcess *pProcess,
        CONNID dwConnectionId,
        WCHAR *pConnName)
    {
       
return E_NOTIMPL;
    }

    COM_METHOD ChangeConnection( ICorDebugProcess *pProcess,
        CONNID dwConnectionId )
    {
       
return E_NOTIMPL;
    }


    COM_METHOD DestroyConnection( ICorDebugProcess *pProcess,
        CONNID dwConnectionId )
    {
       
return E_NOTIMPL;
    }

    COM_METHOD Exception(  ICorDebugAppDomain *pAppDomain,
        ICorDebugThread *pThread,
        ICorDebugFrame *pFrame,
        ULONG32 nOffset,
        CorDebugExceptionCallbackType dwEventType,
        DWORD dwFlags )
    {
       
return E_NOTIMPL;
    }


    COM_METHOD ExceptionUnwind(  ICorDebugAppDomain *pAppDomain,
        ICorDebugThread *pThread,
        CorDebugExceptionUnwindCallbackType dwEventType,
        DWORD dwFlags )
    {
       
return E_NOTIMPL;
    }

    COM_METHOD FunctionRemapComplete( ICorDebugAppDomain *pAppDomain,
        ICorDebugThread *pThread,
        ICorDebugFunction *pFunction)
    {
       
return E_NOTIMPL;
    }


    COM_METHOD MDANotification(
        ICorDebugController * pController,
        ICorDebugThread *pThread,
        ICorDebugMDA * pMDA
        )
    {
       
return E_NOTIMPL;
    }
};

 

An example of an API versioning problem.

$
0
0

Here's an example of an API versioning problem.

In general:
Anytime you take two separate concepts and tie them together based off some current implementation assumption, you're going to get trouble when that assumption is broken.

The specific example:
You currently (as of .NET 2.0) can't unload the CLR once it's loaded.

The managed exit process debug event (ICorDebugManagedCallback::ExitProcess) is overloaded to mean two things:
a) when the CLR is unloaded.  This is important because it lets you know when can get rid of your ICorDebug instance. It also matches the semantics of the CreateProcess event.
b) AND when the actual process exits.  This is important because once the process exits, things happen like the process handle is signaled, file locks are released (so you could recompiled the debuggee), etc.

Because you can't unload the CLR, these are always the same.  But let's say we add the ability to unload the CLR. Now how should the managed ExitProcess event behave?
1. We could stick with A and fire the event when the CLR is unloaded. This is probably the "purest" solution, but it would break debuggers depending on B.
2. We could keep ExitProcess with B the same and add a new event like "ClrUnloaded" that fires at A.  This is technically correct.  However, it may require keeping ICorDebug loaded in the debugger once the CLR is no longer in the debuggee. This could become very cumbersome if a process loaded and unloaded runtimes in a loop (the debugger would get cluttered with a bunch of ICorDebugProcess instances waiting for the actual debuggee exit). We'd also need to provide an intelligent path for debugger authors to unload ICorDebug at the ClrUnloaded event.

 

LoadClass events are usually meaningless

$
0
0

ICorDebug notifies a debugger when a managed class is loaded via LoadClass debug event. For dynamic-modules, this is useful because it tells you that you just baked a new type and so may have new code to bind breakpoints in (see here for debugging ref-emit).

But for everything else, this is useless. LoadClass doesn't really provide much value beyond LoadModule. For example:

  1. LoadClass does not mean that any of the methods in the class are jitted.
  2. LoadClass does not mean that the cctor is run.  (cctors are run lazily)
  3. LoadClass does not mean that the class statics are initialized (those are also initialized lazily)
  4. LoadClass events usually don't even come. A debugger has to explicitly subscribe to them (EnableClassLoadCallbacks) for non-dynamic modules. And they don't even fire for ngen modules. So clearly they can't be doing anything too essential.
  5. LoadClass just gives you an ICorDebugClass back, but you could just as easily get that calling ICorDebugModule::GetClassFromToken at the LoadModule callback.

Basically, LoadClass just means that the CLR "touched" the class. ICorDebugClass is a pretty anemic interface that doesn't let you inspect very much. It's possible if there were more powerful inspection methods on ICDClass that they may lend meaning to LoadClass. For example, maybe when the CLR fires a LoadClass event, internally it did class layout and could now tell you what the offsets of each field are. But since ICorDebug doesn't let you directly get field offsets, that's a mute point.

Furthermore, since LoadClass didn't have any clear semantics to preserve, there was not a clear path for how it should be extended to handle generics in V2.

The moral of the story: Don't have APIs that don't actually mean anything.

ICorDebugValue vs. System.Object

$
0
0

System.Object represents a managed object within a process. ICorDebugValue is the debugger's representation of a System.Object within the debuggee process.

The key here is that the debugger and debuggee processes are ideally fully isolated from each other. For example

  1. They could both be running different versions of the runtime.
  2. The debugger should not be "tainted" by the debuggee's runtime.
  3. The debugger should never need to load debuggee code into the debugger process. In fact, it may not even be able to. (Eg, imagine if the debugger + debuggee on different machines or CPUs. Now I know ICD doesn't currently, allow this, but we still need to future-proof)
  4. They could be running in different security contexts (debugger could be admin, debugging a low-priv debuggee)

Some common questions:

  1. Can I get a System.Object from an ICDValue? The debuggee and debugger are 2 separate processes. Getting a system.object from 1 process into another process is done through serialization. So this is really just a serialization problem, with all of the challenges and evils of serialization.  Just because one process happens to be debugging the other doesn't suddenly make serialization easy.
     ICorDebug doesn't provide a built-in way to do this. There kind of was something in V1 (ICorDebugValue::GetManagedCopy), but it didn't work right and we deprecated it in V2.
  2. Why is ICDValue so hard to use? Because with ICDValue, all object-inspection operations are exposed through function calls. Whereas with System.Object, you have much more natural language features. For example, a field lookup with System.Object is like "p.field".  Psuedo code for a field lookup with ICDValue would be like:
        ICDValue _p = ... // get value for 'p'
        ICDValue _field = _p.GetFieldValue("field");
    In theory, you could build a wrapper around ICDValue that does things like overload operators to hide the ICDValue function calls behind a more natural syntax. However, that's very language dependent (not all languages support overloading field accessors), and ICDValue is a COM-classic interface that must expose everything through explicit interfaces. 
    The fact that ICDValue is indeed so hard to use motivates folks to want to be able to get an System.Object (which is easy to inspect) from an ICDValue, and then run complex inspection operations on the System.Object. The MDbg (managed) wrappers for ICDValue ought to do a better job here.
  3. Where does function evaluation fit into thisFunc-eval is the ability for the debugger to have the debuggee run code. The key is that the debuggee code is still being run in the debuggee process, and so the isolation is preserved.
  4. If this is so hard, then how does VS's immediate window lets me do easy inspection of objects in the debuggee? Because the VS's Expression Evaluator team is awesome and has done an amazing job of building on top of ICDValue.


 

LCG + Debuggability, and your feedback

$
0
0

I mentioned earlier that you can debug Reflection.Emit code. Unfortunately, Ref.Emit code can't be unloaded unless you unload the entire appdomain. I wanted to lay out the current landscape, and then get feedback about possible solutions.

In Whidbey, we added Light Weight CodeGen (LCG), which are dynamically generated methods that can be garbage collected. This fixes the unloading problem with Ref.Emit code. Unfortunately, LCG methods aren't really debuggable. You can't currently associate them with source, and they're basically hidden when debugging. In Whidbey, they show up as an “InternalFrame” marker in the callstack, which has no information other than the annotation that there is some dynamic code on the stack. If a debug event fires (such as an exception / Debug.break statement) is in dynamic method, we will stop, and then it’s nice to have that internal Frame on the callstack. But nothing is really debuggable about it in Whidbey.

In other words, with ICorDebug (and thus debuggers like VS that use that):
- You can’t set breakpoints in dynamic methods (LCG)
- You can’t see the IL of dynamic methods
- You can’t see the locals / arguments of dynamic methods
- You can’t map dynamic methods back to source code
- You can’t step into or out of dynamic methods (it’s like native code when managed-only debugging)

Workarounds?
The current workarounds are:
1) Use windbg and SOS. See demo here
2) Use Ref.Emit for code that you want to be debuggable.
3) Use other tricks, like Haibo's IL visualizer.

But ultimately in Whidbey (.NET 2.0), you're forced to choose between debuggability (use Ref.Emit or normal code)  vs. fine-grained unloadability beyond just appdomain unloading (use LCG).

Your feedback?
There's a few ways the CLR can solve this, including (but not limited to):
1. Make LCG debuggable, particularly adding the ability to associate source information with LCG methods, like you do with normal Ref.Emit when you call MarkSequencePoint.
2. Make Ref.Emit code collectable, ala LCG. 

Do folks out there have any preference in solution?

I realize there's also a large dial about what is considered debuggable. Eg, source-level debuggability, vs. assembly level vs. IL level. (I could do a whole other blog entry about that). Any comments on where the dial should be set are welcome too.

Fake attach event ordering

$
0
0

When you attach to a managed debuggee (via ICorDebug::DebugActiveProcess), ICorDebug generates a set of fake events designed to bring the debugger up to the current state. The motivation is that it pumps the debugger just as if the debugger was always attached. Native debugging does the same thing.

The list isn't very well documented, and I've had enough requests to clarify it. This is what you would observe in .NET 1.0, 1.1 and 2.0. I've constructed this mostly from memory (with some double-checking); and the order may change in the future.

Between each event, you must call Continue just as with real live events. The fake events are also split across several callbacks queues, which are noted via <end callback queue> items in the list. The exact partitioning of callback queues is something that could easily change, so robust debuggers shouldn't rely on them.

Here's the list of fake attach events: (update:1/17/07, forgot UpdateModuleSyms)

  1. a CreateProcess event. This doesn't come until the process has actually loaded managed code.
  2. a CreateAppDomain event for each AppDomain.  Debuggers should call ICorDebugAppDomain::Attach for each appdomain, especially when working with .NET 1.X.
  3. <end callback queue>
  4. a LoadAssembly event for each Assembly in each appdomain. These are generally not very interesting.  These come in the order the assemblies are loaded.
  5. a LoadModule for each module in each assembly.
    - This is an opportunity to request class load events. I recommend not requesting the events (which is the default).
    - These come in the order the modules are created. Visual Studio exposes this ordering via one of the columns in the modules window.
    - The order here is also important because of metadata dependencies. For example, in a multi-module assembly, the manifest module should come first (this is especially important in reflection.emit scenarios).
  6. a UpdateModuleSymbols eventif the runtime has symbols for the module. This is most common for a module generated with Ref.emit with symbols. It could also include an in-memory module where the host has the symbols. This delivers an IStream containing the symbols that then be used with the symbol reader.
  7. <end callback queue>
  8. a LoadClass for each class in each module. These are not particularly useful because they don't really mean anything.
  9. <end callback queue>
  10. CreateThread for each managed thread.
    - These come in the order that the threads were created. This means that the first CreateThread event is for the main thread of the app.
    - Depending on whether the finalizer has executed managed code yet, the total number of events here may appear random. 
  11. A CreateConnection and ChangeConnection event for each connection.
    - Connections are new in .NET 2 and only occur in certain Hosted scenarios.
    - Your debugger will very likely never see them; but I want to mention them for completeness sake.
  12. In case of a managed just-in-time (JIT) attach (where the debuggee initiates the attach), the event that triggered that attach is dispatched.
    - This is most likely an Unhandled Exception, User Break, or MDA event. 
    - These events are generally shell stopping events. This is why a debugger would naturally stop at a jit-attach; but keep running at a normal attach.
  13. <end callback queue>

Attach is now complete.  
 

Here are some more caveats:

  1. Note that the ordering is breadth first, not depth first. For example, you get all the appdomain events before you get any assembly events. Originally, the design goal was for partial-process debugging (which the CLR doesn't support) to allow a debugger to only subscribe to debugging subsets of the debuggee.
  2. The attach is not considered complete until all the fake events are drained.  Specifically, the debugger does not have the full state of the debuggee until after attach is finished. Other operations may not work well before attach is complete.
  3. Unfortunately, there's no managed "Attach Complete" event to tell you when that point is. This makes it very difficult to attach, take a snapshot, and then detach, because you don't know when the attach is complete. MDbg does has some built-in heuristics to infer attach complete and generates a fake attach complete event. (As an aside: this is one of the things I like about MDbg. It lets us smooth over some warts in ICorDebug.) This is kind of what the DrainAttach() function does in this callstack snapshot tool. Visual Studio does not stop the IDE after attach, but it prevents you from doing an aysnc-break before the attach is complete.
  4. Native debugging is a little better because that has the "loader breakpoint", which is that first breakpoint (int 3 on x86) that occurs on startup. However, since int3s can occur before the loader breakpoint, there's no simple way to be 100% correct about that either.
  5. There is no good way to distinguish between a fake attach event and a real attach event. For example, if you attach to a pure native app and then it loads the runtime, you'll get real CreateXYZ events. If you attach to an already managed app, you'll get the faked up CreateXYZ events above. You may get the same events in both cases, but in one case the events are real and in the other they are faked. The nice part about this is that the same debug engine can then handle both scenarios. The bad part about this is that it is lying.

 


You don't want to write an interop debugger.

$
0
0

I've had a growing number of people inquire about how to write an interop-debugger with ICorDebug.  My goal here is to discourage you from doing that. (This reminds me of one of my college classes. On day one, the acting-Prof launched into a great sermon "Why you should drop this class now". It turned out to be a great class).

Here are some reasons that you should not try to write an interop debugger:

  1. It's a tough challenge, and Visual Studio already does it for you.
  2. You can actually get a lot of mileage out of managed only debugging. For example, you can use non-invasive native debugging APIs even when you're managed-only debugging. So you could use CreateToolHelp32Snapshot (or the managed wrappers in System.Diagnostics) to view native threads and modules. You can even load symbols for native modules and take mixed mode callstacks in a lot of scenarios.
  3. Some things are disabled when Interop-debugging, such as Edit-And-Continue. Interop-debugging is also not supported on Win9x, but more importantly it's not supported on 64-bit OSes, including Amd64 (although it will run in x64 in the Wow)
  4. Interop-debugging is very complicated, and consuming ICorDebug's interop-debugging interfaces are very difficult. Writing for Interop-debugging is basically the worst of all worlds. For example, you need to worry about Out-Of-Band events. Also, many operations that appear to be simple (such as step from managed code into native) are actually very complicated to implement and require the debugger to maintain separate managed and native debugging operations and then stitch them together to give the end-user an illusion of a unified debugging experience. The next few points are special case of this.
  5. There are a lot of threads.  You have multiple callback threads in the debuggee, non-callback threads (such as a UI thread) in the debuggee, and multiple threads in the debugger generating debug events.
  6. It causes ICorDebug to be reentrant, which results in some evil causality chains. For example, an innocent call to ICorDebugThread::EnumerateChains may block because the helper-thread hit an out-of-band event that needs to be continued by your ICorDebugUnmanagedCallback handler.
  7. You have more states to worry about. Managed-only debugging just has 2 states: Stopped (called Synchronized) + Running. Interop-debugging has more.

(Ok, so 4-7 are all under the same umbrella. That's because I really want to emphasize that it's complicated)

 

Tips for writing an Interop Debugger

$
0
0

I've had a growing number of people inquire about how to write an interop-debugger with ICorDebug. I strongly advise anybody considering writing an interop-debugger to reconsider for reasons listed here. However, for those who can not be dissuaded, and promise to do something really really cool, here are some key tips:

  1. Interop-debugging means both the managed + native debugging engines coexist, and your debugger stitches them together to try to make it look like a unified debugging operation.
  2. Supply an ICorDebugUnmanagedCallback to ICorDebug::SetUnmanagedHandler. This callback is then invoked to dispatch DEBUG_EVENT structures (which are normally retrieved by WaitForDebugEvent in native debugging).
  3. You're extremely restricted for what you can do during this callback. Specifically, most of the ICorDebug API can't be called. In .NET 2.0, we started enforcing this better by checking and returning CORDBG_E_CANT_CALL_ON_THIS_THREAD.  Never block in this thread. Just update your state and get out.
  4. Most importantly, you can't do an in-band continue on the thread dispatching the ICorDebugUnmanagedCallback. This means the unmanaegd callback thread has to get another thread (perhaps the thread that continues for managed events) to make the in-band continue. There's actually a good technical reason for this; although it's not intuitive (witness this forum post)
  5. If the callback is for an Out-Of-Band event, just update internal state and immediately continue it.
  6. For Whidbey, use ICorDebugProcess2::SetUnmanagedBreakpoint and ClearUnmanagedBreakpoint instead of setting the raw int3s yourself.
  7. Don't call Native debug APIs directly. Instead, call the corresponding API on ICorDebug.
     
    Native Debugging API from kernel32.dllWhat to use on ICorDebug
    CreateProcess ICorDebug::CreateProcess with DEBUG_ONLY_THIS_PROCESS flag set.
    Note that child-process debugging (DEBUG_PROCESS && !DEBUG_ONLY_THIS_PROCESS) is not supported.
    DebugActiveProcessICorDebug::DebugActiveProcess, with win32Attach = true.
    WaitForDebugEventDispatched via the ICorDebugUnmanagedCallback callback, which was set via ICorDebug::SetUnmanagedHandler. Pay close attention to in-band vs. out-of-band events.
    ContinueDebugEventICorDebugProcess::Continue.
    Call ICorDebugProcess::ClearCurrentException  to get DBG_CONTINUE. Else this acts as DBG_EXCEPTION_NOT_HANDLED
    Set/GetThreadContextICorDebugProcess::Set/GetThreadContext. Only use when the thread is in native code and stopped at an inband event.
    Write/ReadProcessMemoryICorDebugProcess::Write/ReadMemory.
    Use ICorDebugProcess2::SetUnmanagedBreakpoint and ClearUnmanagedBreakpoint instead of writing int3 for native breakpoints.
    DebugActiveProcessStopNot supported.
    Suspend/ResumeThreadUse directly, but only when the debuggee is stopped and the thread is in native code.
    DebugBreakProcessICorDebugProcess::Stop.
    DebugSetProcessKillOnExitNot supported.

    This is because in order to make interop-debugging work, ICorDebug needs to intercept the native debugging APIs. Note that the native debugging APIs should only be used in native code. Use managed debugging APIs for operations in managed code. Since threads can cross between managed and native, you may need to figure out where the thread is (via a callstack) to determine which operation to use.

  8. Remember that ICorDebug is for debugging managed code, not native. So the ICorDebug* object hierarchy represents managed constructs, not native ones. For example, native threads don't get an ICorDebugThread object, and native modules don't get an ICorDebugModule object. This is why you'll notice above that all the native-API equivalents are on ICorDebug or ICorDebugProcess.  In contrast, since managed constructs are usually built on native ones, you may see native debug objects for managed elements. For example, a managed thread is built on a native thread, so you'll see a native thread entry for managed threads.

And don't forget there's the "Building Development Tools for .Net" forum for further questions.

UpdateModuleSysmbols comes on attach

$
0
0

I forgot about UpdateModuleSysmbols when I described the fake debug events sent on attach.  This event is sent after LoadModule and is what delivers the symbols for dynamically generated code. This lets you debug code generated with ref.emit when you attach to a process.  Note that if the debuggee only emits symbols when a debugger is attached, then this will appear not to work. (Another reason why the presence of a debugger shouldn't change behavior). I updated the original post.

This is an example of how simple things become so complicated. At first it sounds simple to say "Attaching will deliver faked events to bring the debugger up to the current state of the debuggee". But when you start drilling in, there's so many corner cases and caveats. I wish simple things could stay simple.

ICorPublish does not cross the 32/64 bit boundary

$
0
0

I mentioned earlier that ICorDebug does not cross the 32/64 boundary. If you want to debug a 32-bit managed app, you need to use a 32-bit version of the ICorDebug interfaces (or Mdbg). If you want to debug a 64-bit managed app, you need a 64-bit savy debugger. When you debug a 64-bit managed app in Visual Studio, VS will actually spin up the a 64-bit component to do the underlying debugging, and that remotes back to the 32-bit UI. 

Anyways, ICorPublish (and the CorPublish wrappers on top of them) have the same restriction. Although the underlying windows process enumeration APIs can view both 32 and 64-bit processes, the 64-bit version of ICorPublish only sees 64-bit managed processes.  A 64-bit ICorPublish will simply ignore 32-bit managed processes, and vice versa for 32-bit ICorPublish.

 

Disclaimer: I'm not saying this is a good design (although it does have some merits). However, a user ran into this and it caused some grief, and so I wanted to issue a public service announcement about it.

 

Brief Demo:

Here's a simple example using Mdbg. MDbg's processes enumeration ("pro") command is just a trivial wrapper on ICorPublish.

 

Let's say you have on the machine:
-  a 64-bit process, "la.vshost.exe", with pid=4028, AND
-  a 32-bit process, "excel.exe",  with pid=4720.

In 64-bit  Mdbg, you see:

mdbg> pro
Active processes on current machine:
(PID: 4028) C:\Temp\ListAppDomains\bin\Debug\la.vshost.exe
        (ID: 1) la.vshost.exe
mdbg> at 4720
Error: The operation failed because debuggee and debugger are on incompatible platforms. (Exception from HRESULT: 0x80131C30)

You can see that the 32-bit process (pid 4720) doesn't show in Publish/Mdbg, and if you try to be sneaky and attach to 4720 anyways, ICorDebug gives you a CORDBG_E_UNCOMPATIBLE_PLATFORMS (0x80131c30) error .

 

If you run it in 32-bit Mdbg, you no longer see the 64-bit process. But you do see the 32-bit process, and are able to attach to it.

mdbg> pro
Active processes on current machine:
(PID: 4720) C:\Program Files (x86)\Microsoft Office\Office12\excel.exe
        (ID: 2) TestAddIn.vsto|vstolocal
        (ID: 1) DefaultDomain

mdbg> at 4720
[p#:0, t#:0] mdbg> list mod
Loaded Modules:
:0      mscorlib.dll#0  (no symbols loaded)
:1      Microsoft.VisualStudio.Tools.Office.Runtime.v9.0.dll#0  (no symbols loaded)
:2      System.dll#0  (no symbols loaded)
:3      System.Core.dll#0  (no symbols loaded)
:4      System.AddIn.dll#0  (no symbols loaded)
:5      Microsoft.VisualStudio.Tools.Applications.Hosting.v9.0.dll#0  (no symbols loaded)
...

ICorDebugFunction is 1:1 with the IL

$
0
0

In CLR 1.0, there was a simple invariant between IL code blob, and native code blob.  It was either 1:0 if the code wasn't jitted, or 1:1 if it was. Method tokens (module scope, mdMethodDef) were also 1:1 with the IL blobs. 1:1 is a nice relationship. Each side can just point to the other without needing to enumerate through a collection.  In this case, you could do a clean mapping from (module, mdMethodDef) <--> IL code blob <--> native code blob.

In CLR 2.0, things got more complicated:

  1. Generics allow 1 IL function to be instantiated and jitted arbitrary number of times to native code. For example, void Foo<T>(T t) may be the IL function, and it may be instantiated as Foo<int>, Foo<string>, Foo<bar>, etc.   So that makes the IL:Native potentially 1:n, n>= 0.
  2. Edit-and-continue (EnC) allows providing new IL method bodies for an existing mdMethodDef. (It's important to use the existing methodDef instead of creating a new one so that other references to the method pick up the new body). Thus you can start out with Foo() version 1, edit and, and then get Foo() version 2. Each version of Foo() has its own IL blob, but the same mdMethodDef.

Generics + EnC are wonderful features, but they break 1:1 relationships.  We wanted to keep the invariant that the tokens (mdMethodDef) stayed 1:1 with the IL, and ICorDebugFunction was 1:1 with the IL.  


Coping with the breaks
:

Since EnC generated new IL blobs, that meant it would generate new ICorDebugFunctions, which would be provided to the debugger via new debug events (ICorDebugManagedCallback2::FunctionRemapOpportunity and FunctionRemapComplete).  Hence we added ICorDebugFunction2::GetVersionNumber() so that the debugger could tell which EnC version each function was for.

ICorDebugCode (for native code) stayed 1:1 with the actual native code blobs. Thus the 1:1 relationship between ICorDebugCode (native) and ICorDebugFunction got broken in CLR 2.0. A single ICDFunction could now map to multiple ICDCodes (1:n). It turns out for source-level debugging, the design patterns rarely needed to map from ICorDebugCode back to ICorDebugFunction, so that was a pretty reasonable place to put the break.

Viewing all 35 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>