Quantcast
Channel: Mike Stall's .NET Debugging Blog
Viewing all 35 articles
Browse latest View live

Debugger.Break()

$
0
0

System.Diagnostics.Debugger.Break() is a BCL method that causes a program to issue a User Breakpoint when run under the debugger.

This translates to a Break() debug event on ICorDebugManagedCallback. (Not to be confused with Breakpoint(), which corresponds to actual breakpoints. Yeah, we could have given them better names...)

The debugger will break at the callsite of the Break(), and not actually inside the call. This is convenient because it means that the caller is on top of the stack, which is what you'd intuitively expect. It also means that Debugger.Break() behaves almost identical to if you had manually set a breakpoint at the location and hit it.

There's also an IL 'break' opcode that does the same thing as Debugger.Break(). You can view Debugger.Break() as functional access to the break opcode. In both cases, it ends up as a call into the CLR engine which then does a few things:
- if no debugger is attached, it may cause a managed JIT-attach (similar to Debugger.Launch).
- if a debugger is attached, it will essentially do a step-out of the CLR engine's frames to get back to the call-site. This essentially gets the mscorwks frames off the stack.
- once that step out completes, it sends the user breakpoint event to the debugger.

 

Usages:

This can be useful in a variety of ways.

  1. This is the managed counterpart to kernel32!DebugBreak().  (See more comparison of managed vs. native debug methods)
  2. VB's 'stop' statement is implemented via Debugger.Break().  (JScript's 'debugger' statement could be implemented on this too).
  3. It's useful for assert dialogs or error checks.
  4. Strongly preferred way for instant break-into-the-debugger over exceptions.  Exceptions may modify your program's actual control flow. Debugger.Break() won't. It just notifies the debugger but has no other side-effects.

 

Quick demo:

Run this VB code:

Module Module1Sub Main()

        Console.WriteLine("Hi!")Stop        Console.WriteLine("done")EndSub

EndModule

And you stop in the debugger at the 'stop' statement. In the callstack you see:

>    ConsoleApplication2.exe!ConsoleApplication2.Module1.Main() Line 6 + 0x5 bytes    Basic
     [External Code]   

If you view the code in Reflector as C#, you see the Stop statement is Debugger.Break():
[STAThread]publicstatic void Main()
{
    Console.WriteLine("Hi!");
    Debugger.Break();
    Console.WriteLine("done");
}

Things that what work in Native-debugging that don't work in Interop-debugging.

$
0
0

Interop-debugging (mixed-mode) is managed + native debugging combined.

Well, sort of.

Native and managed debugging have very different paradigms. Native debugging tends to own the whole process, while managed debugging tends to require control of the whole process while only exposing a managed  view to the user. So some functionality restrictions were needed to get the two to coexist peacefully.

Most of these are either: a) consequences of the architecture or b) missing features. They're not little things that could be fixed in a service pack.  I'll ignore issues like bugs, stability, performance impact, etc here.

Here are some features that work in Native debugging that aren't available in Interop-debugging:

(This is not guaranteed to be a complete list.)

  1. Native data-breakpoints. While managed-debugging doesn't support data-breakpoints, native-debugging does. However, interop-debugging does things that interfere with how native-debugging data-breakpoints work.
  2. You can't debug the native portion of the runtime (mscorwks.dll). (See here). This includes native code in some profiler callbacks, as well as assembly-stepping through certain transition stubs.
  3. You can't detach.
  4. Interop-debugging is x86-only. We don't yet have interop-debugging support for all platforms that the CLR runs on (ia64, amd64). Other CLR implementations (Compact FX, Spot) don't have interop-debugging either.
  5. Conservatively, you can't use interop-debugging with AppVerifier. AV tends to transforms that are fine for a native-only debugger but too aggressive for interop-debugging.
  6. Debugging code in certain "privileged" places such as the Vectored Exception Handler or under certain locks like the Loader-lock or the process heap lock may hang. This is just very risky and deadlock prone because the helper thread may take these locks.
  7. The debugger is restricted to ICorDebug::CreateProcess  and so it can't use any fancy CreateProcess* functions in kernel32 like CreateProcessAsUser. 
  8. Interop-debugging can't auto-attach to child debuggee processes. (see here)
  9. No dump-debugging. Managed-dump debugging isn't supported. Interop-debugging for dumps without managed-dump support would be little (any?) value add beyond native dump support.

 

Restrictions on managed-debugging:

  1. Can't do managed Edit-and-Continue (EnC). (see here)

 

The bottom line is that interop-debugging is not just a superset of managed or native-debugging.

Breaking changes in ICorDebug from 1.1 to 2.0.

$
0
0

Here are some random notes about specific ICorDebug breaking changes between .NET v1.1 (Everett) and .NET 2.0 (Whidbey). (I came across these as I was cleaning out old documents in preparation for my upcoming move). This would have been more timely 2 years ago, but better late than never.

This can be viewed as the checklist for migrating a managed debugger from v1.1 to v2.0.  This is pretty detailed. But I'm not 100% sure if this is quite authoritative enough to put my "This should be in MSDN" tag on, it's not necessarily complete, and it's based off notes I had lying around that are several years old. Disclaimers aside:

  1. ICorDebugThread::GetCurrentException return code changed from E_FAIL to S_FALSE when the current thread has no exceptions.
  2. ICorDebugEnum return values changed to be COM compliant.
  3. More aggressive enforcement of object lifetimes. This means calls that used to succeed (though possibly provide wrong information) will now fail with CORDBG_E_OBJECT_NEUTERED.
  4. Client Debuggers must implement ICorDebugManagedCallback2 to debug v2.0 apps.
  5. The timeout to ICorDebugController::Stop is now ignored.
  6. Callstacks may now include ICorDebugInternalFrame instances. So the assumption that any frame could be successfully QIed for either ICorDebugILFrame or ICorDebugNativeFrame is now false.
  7. You may no longer call EnableJITDebugging on Modules whenever you please or even on attach.  The only time this is valid is during the real ModuleLoad callback.  Other calls will return various error codes explaining what you did wrong.  We recommend that people begin using ICDModule2::SetJITCompilerFlags instead.
  8. We now track JIT info all the time, which is not really a “breaking change” but still something that may confuse people if they try to turn it off and it remains on. One place this may break is if you try to use the presence of the "Track Jit Info" flag to tell if code is debuggable or not.
  9. CorDebug is no longer a coclass. It is now instantiated by mscoree!CreateDebuggingInterfaceFromVersion.
  10. Order of LoadClass and UpdateModuleSyms callbacks changed.
  11. Interop: Native Thread-exit debug event is now Out-Of-Band.
  12. Interop: OOB-breakpoints / single-step events are now automatically cleared, and not rewound. Debuggers should use ICorDebugProcess2::SetUnmanagedBreakpoint instead of calling WriteProcessMemory("int 3") themselves.

 

Some areas of intensive bug fixing that could be perceived as behavior changing:

  1. Fixes to handling debugging dynamic modules.
  2. Fixes to add-ref release bugs which were heavily flushed out by Mdbg. (These bugs were part of the reason that MDbg can't run on 1.1)

 

See also "What's new in .Net 2.0." for cool new things in 2.0.

Managed Dump debugging support for Visual Studio and ICorDebug

$
0
0

This is the longest I've gone without blogging, but our PDC announcements have stuff way too cool to stay quiet about.

If you saw PDC, you've head that the CLR Debugging API, ICorDebug, will support dump-debugging. This enables any ICorDebug-based debugger (including Visual Studio and MDbg) to debug dump-files of .NET applications. The coolness goes well beyond that, but dump-debugging is just the easiest feature to describe.

This was not an overnight feature, and required some major architectural changes to be plumbed through the entire system.  Specifically, when dump-debugging, there's no 'live' debuggee, so you can't rely on a helper-thread running in the debuggee process to service debugging requests anymore, so you need a completely different model.

Rick Byers has an excellent description of the ICorDebug re-architecture in CLR 4.0.  He also describes some of the other advancements in the CLR Tools API space. Go read them.

Virtual code execution via IL interpretation

$
0
0

As Soma announced, we just shipped VS2010 Beta1. This includes dump debugging support for managed code and a very cool bonus feature tucked in there that I’ll blog about today.

Dump-debugging (aka post-mortem debugging) is very useful and a long-requested feature for managed code.  The downside is that with a dump-file, you don’t have a live process anymore, and so property-evaluation won’t work. That’s because property evaluation is implemented by hijacking a thread in the debuggee to run the function of interest, commonly a ToString() or property-getter. There’s no live thread to hijack in post-mortem debugging.

We have a mitigation for that in VS2010. In addition to loading the dump file, we can also interpret the IL opcodes of the function and simulate execution to show the results in the debugger.

 

Here, I’ll just blog about the end-user experience and some top-level points. I’ll save the technical drill down for future blogs.

Consider the following sample:

using System;using System.Collections.Generic;using System.Linq;using System.Text;using System.Reflection;public class Point {int m_x;int m_y;public Point(int x, int y)
    {
        m_x = x;
        m_y = y;
    }public override string ToString()
    {return String.Format("({0},{1})", this.X, this.Y);
    }public int X
    {get{return m_x;
        }
    }public int Y
    {get{return m_y;
        }
    }
}public class Program{static void Main(string[] args)
    {Dictionary<int, string> dict = new Dictionary<int, string>();
        dict[5] = "Five";
        dict[3] = "three";Point p = new Point(3, 4);
    }public static int  Dot(Point p1, Point p2)
    {int r2 = p1.X * p2.X + p1.Y * p2.Y;return r2;
    }

}

 

Suppose you have a dump-file from a thread stopped at the end of Main() (See newly added menu item “Debug | Save Dump As …”; load dump-file via “File | Open | File …”).

Normally, you could see the locals (dict, p) and their raw fields, but you wouldn’t be able to see the properties or ToString() values. So it would look something like this:

image

But with the interpreter, you can actually simulate execution. With the IL interpreter, here’s what it looks like in the watch window:

image

Which is exactly what you’d expect with live-debugging.  (In one sense, “everything still works like it worked before” is not a gratifying demo…)

The ‘*’ after the values are indications that they came from the interpreter.  Note you still need to ensure that property-evaluation is enabled in “Tools | options | Debugging”:

image

 

 

 

 

How does it work?
The Interpreter gets the raw IL opcodes via ICorDebug and then simulates execution of those opcodes. For example, when you inspect “p.X” in the watch window, the debugger can get the raw IL opcodes:

.method public hidebysig specialname instance int32
        get_X() cil managed
{
  // Code size       12 (0xc)
  .maxstack  1
  .locals init ([0] int32 CS$1$0000)
  IL_0000:  nop
  IL_0001:  ldarg.0
  IL_0002:  ldfld      int32 Point::m_x
  IL_0007:  stloc.0
  IL_0008:  br.s       IL_000a
  IL_000a:  ldloc.0
  IL_000b:  ret
} // end of method Point::get_X

And then translate that ldfld opcode into a ICorDebug field fetch the same way it would fetch “p.m_x”. The problem gets a lot harder then that (eg, how does it interpret a newobj instruction?) but that’s the basic idea.

 

Other things it can interpret:

The immediate window is also wired up to use the interpreter when dump-debugging. Here are some sample things that work. Again, note the ‘*’ means the results are in the interpreter and the debuggee is not modified.

Simulating new objects:
? new Point(10,12).ToString()
"(10,12)"*

Basic reflection:
? typeof(Point).FullName
"Point"*

Dynamic method invocation:
? typeof(Point).GetMethod("get_X").Invoke(new Point(6,7), null)
0x00000006*

Calling functions, and even mixing debuggee data (the local variable ‘p’) with interpreter generated data (via the ‘new’ expression):
? Dot(p,new Point(10,20))
110*

 

It even works for Visualizers

Notice that it can even load the Visualizer for the Dictionary (dict) and show you the contents as a pretty array view rather than just the raw view of buckets. Visualizers are their own dll,  and we can verify that the dll is not actually loaded into the debuggee. For example, the Dictionary visualizer dll is  Microsoft.VisualStudio.DebuggerVisualizers.dll, but that’s not in the module list:

image

 

That’s because the interpreter has virtualized loading the visualizer dll into its own “virtual interpreter” space and not the actual debuggee process space. That’s important because in a dump file, you can’t load a visualizer dll post-mortem.

 

 

Other issues:

There are lots of other details here that I’m skipping over, like:

  1. The interpreter is definitely not bullet proof. If it sees something it can’t interpreter (like a pinvoke or dangerous code), then it simply aborts the interpretation attempt.
  2. The intepreter is recursive, so it can handle functions that call other functions. (Notice that ToString call get_X.)
  3. How does it deal with side-effecting operations?
  4. How does it handle virtual dispatch call opcodes?
  5. How does it handle ecalls?
  6. How does it handle reflection

 

Other advantages?

There are other advantages of IL interpretation for function evaluation, mainly that it addresses the ”func-eval is evil” problems by essentially degenerating dangerous func-evals to safe field accesses.

  1. It is provably safe because it errs on the side of safety. The interpreter is completely non-invasive (it operates on a dump-file!).
  2. No accidentally executing dangerous code.
  3. Side-effect free func-evals. This is a natural consequence of it being non-invasive.
  4. Bullet proof func-eval abort.
  5. Bullet proof protection against recursive properties that stack-overflow.
  6. It allows func-eval to occur at places previously impossible, such as in dump-files, when the thread is in native code, retail code, or places where there is no thread to hijack.

Closing thoughts

We realize that the interpreter is definitely not perfect. That’s part of why we choose to have it active in dump-files but not replace func-eval in regular execution. For dump-file scenarios, it took something that would have been completely broken and made many things work. 

Trigger, Bindings, and Route parameters in AzureJobs

$
0
0

We recently an alpha for WebJobs SDK (aka AzureJobs , and internally codenamed “SimpleBatch”). In this blog entry, I wanted to explain how Triggers, Bindings, and Route Parameters worked in AzureJobs.

A function can be “triggered” by some event such as a new blob, new queue message, or explicit invocation. JobHost (in the Microsoft.WindowsAzure.Jobs.Host nuget package) listens for the triggers and invokes the functions.

The trigger source also provides the “route parameters”, which is an extra name-value pair dictionary that helps with binding. This is very similar to WebAPI / MVC. The trigger event provides the route parameters, and then the parameter can be consumed in other bindings:

1. Via a [BlobOutput] parameter 
2. Via an explicit parameter capture.

 

Trigger on new blobs

Example usage:

This happens when the first attribute is [BlobInput] and the function gets triggered when a new blob is detected that matches the pattern.  

        public static void ApplyWaterMark(
            [BlobInput(@"images-output/{name}")] Stream inputStream,
            string name,
            [BlobOutput(@"images-watermarks/{name}")] Stream outputStream)
        {
            WebImage image = new WebImage(inputStream);
            image.AddTextWatermark(name, fontSize: 20, fontColor: "red");
            image.Save(outputStream);
        }

When does it execute?

The triggering system will compare timestamps for the input blobs to timestamps from the output blobs and only invoke the function if the inputs are newer than the outputs. This simple rules makes the system very easy to explain and prevents the system from endlessly re-executing the same function.

Route parameters:

In this case, the route parameter is {name} from BlobInput, and it flows to the BlobOutput binding. This pattern lends itself nicely for doing blob-2-blob transforms.

The route parameter is also captured via the “name” parameter. If the parameter type is not string, the binder will try to convert via invoking the TryParse method on the parameter type. This provides nice serialization semantics for simple types like int, guid, etc. The binder also looks for a TypeConverter, so you can extend binding to your own custom types.

In WebAPI, route parameters are provided by pattern matching against a URL. In AzureJobs, they’re provided by pattern matching against a BlobInput path (which is morally similar to a URL). This case was implemented first and so the name kind of stuck.

 

Trigger on new queue message

Example usage:

This happens when the first attribute is [QueueInput].

        public static void HandleQueue(
            [QueueInput(queueName : "myqueue")] CustomObject obj,
            [BlobInput("input/{Text}.txt")] Stream input,
            int Number,
            [BlobOutput("results/{Text}.txt")] Stream output)
        {
        }
 
    public class CustomObject
    {
        public string Text { get; set; }

        public int Number { get; set; }
    }
 

The function has both a QueueInput and BlobInput, but it triggers when the Queue message is detected and just uses the Blob input as a resource while executing.

When does it execute?

This function executes when a new queue message is found on the specified queue. The JobHost will keep the message invisible until the function returns (which is very handy for long running functions) and it will DeleteMessage for you when the function is done.

Route parameters:

In this case, the route parameters are the simple properties on the Queue parameter type  (so Text and Number). Note that since the queue parameter type (CustomObject) is a complex object, it will get deserialized using JSON.net.  Queue parameter types could also be string or byte[] (in which case they bind to the CloudQueueMessage.AsString and AsBytes).

The usage of parameter binding here may mean your function body doesn’t even need to look at the contents of the queue message.

 

Trigger when explicitly called via JobHost.Call().

Example usage:

You can explicitly invoke a method via JobHost.Call().

            JobHost h = new JobHost();
            var method = typeof(ImageFuncs).GetMethod("ApplyWaterMark");
            h.Call(method, new { name = "fruit.jpg" });

When does it execute?

I expect the list of possible triggers to grow over time, although JobHost.Call() does provide tremendous flexibility since you can have your own external listening mechanism that invokes the functions yourself. EG, you could simulate a Timer trigger by having your own timer callback which does JobHost.Call().

You can use JobHost.Call() to invoke a method that would normally be triggered by BlobInput.

You can also suppress automatic triggering via a “[NoAutomaticTrigger]” attribute on the method. In that case, the function can only be invoked via JobHost.Call().

Route Parameters:

The Call() method takes an anonymous object that provides the route parameters. In this case, it assigned “name”  as “fruit,jpg”. The single route parameter will allow 3 normal parameters of ApplyWaterMark to get bound.

 

Disclaimers

AzureJobs is still in alpha. So some of the rules may get tweaked to improve the experience.

Azure Storage Bindings Part 1 – Blobs

$
0
0

The Azure WebJobs SDK provides model binding between C# BCL types and Azure storage like Blobs, Tables, and Queues.

The SDK has a JobHost object which reflects over the functions in your assembly.  So your main looks like this:

        static void Main()
        {
            string acs = "DefaultEndpointsProtocol=https;AccountName=???;AccountKey=???";
            JobHost host = new JobHost(acs); // From nuget: Microsoft.WindowsAzure.Jobs.Host
            host.RunAndBlock();
        }

The JobHost will reflect over your methods looking for attributes in the Microsoft.WindowsAzure.Jobs namespace and use those attributes to setup some triggers (BlobInput, QueueInput) and do bindings. RunAndBlock() will scan for various triggers and then invoke your function when a trigger is fired. Model Binding refers to how the JobHost binds your functions parameters. (it’s very similar to MVC/WebAPI).

The benefits of model binding:

  1. Convenience. You can pick the type that’s most useful for you to consume and the WebJobs SDK will take care of the glue code. If you’re doing string operations on a blob, you can bind directly to TextReader/TextWriter, rather than 10 lines of ceremony to convert a TextWriter from a CloudBlob.
  2. Flushing and Closing: The WebJobs SDK will automatically flush and close outstanding outputs.
  3. Unit testability. It’s far easier to unit test and mock BCL types like TextWriter than ICloudBlob.
  4. Diagnostics.  model binding cooperates with the dashboard to give you real time diagnostics on your parameter usage. See screenshots below.

And if model binding is not sufficient for you, you can always bind to the Storage SDK types directly.

That said, here are the bindings that are currently supported in the Alpha release. 

Binding to BCL types: Stream, TextReader/Writer, String

You can use [BlobInput] and [BlobOutput] attributes to bind blobs to the BCL types Stream, TextReader and String.

See triggering rules for more details, but basically a function runs when a blob matching [BlobInput] is found that is newer than the blobs specified by [BlobOutput]. This means that it’s important for a [BlobInput] function to write some output (even if it’s just a dummy file) so that the JobHost knows that it’s run and doesn’t keep re-triggering it.

Here’s an example of a blob copy function using each of those types:

        public static void CopyWithStream(
            [BlobInput("container/in/{name}")] Stream input,
            [BlobOutput("container/out1/{name}")] Stream output
            )
        {
            Debug.Assert(input.CanRead && !input.CanWrite);
            Debug.Assert(!output.CanRead && output.CanWrite);

            input.CopyTo(output);
        }

        public static void CopyWithText(
            [BlobInput("container/in/{name}")] TextReader input,
            [BlobOutput("container/out2/{name}")] TextWriter output
            )
        {
            string content = input.ReadToEnd();
            output.Write(content);
        }

        public static void CopyWithString(
            [BlobInput("container/in/{name}")] string input,
            [BlobOutput("container/out3/{name}")] out string output
            )
        {
            output = input;
        }

Some notes:

  1. It’s fine to have multiple functions read from the same input blob. In this case, all functions are reading from any blob that matches “in/{name}” in the container named “container”.
  2. The Streams / TextWriters are automatically flushed when the function returns. 

You can see some more examples for blob usage on the sample site.

Diagnostics!

When you look at the function invocation in the dashboard, you can see usage stats for each parameter. In this case, we see that CopyWithStream() was invoked on blob “in/a.txt”, and read 4 bytes from it, spending 0.076 seconds on IO, and wrote out 4 bytes to blob “out1/a.txt”.

image

Again, the monitoring here “just works” when using the SDK, you don’t need to include any extra logging packages or do any extra configuration work to enable it.

Binding to Blob Storage SDK types

You can also bind directly to CloudBlob (v1 storage sdk) or ICloudBlob, CloudPageBlob, CloudBlobBlob (v2+ storage sdk). These options are good when you need to blob properties not exposed as a stream (such as etags, metadata, etc).

        public static void UseStorageSdk(
           [BlobInput("container/in/{name}")] CloudBlob input,
           [BlobOutput("container/out4/{name}")] CloudBlob output
           )
        {
            // get non-stream properties
            input.FetchAttributes();
            var keys = input.Metadata.AllKeys;

            // do stuff...
        }

The storage types obviously give you full control, but they don’t cooperate with the dashboard and so won’t give you the same monitoring experience as using the BCL types.

You can also bind a parameter to both CloudStorageAccount 1.7x and 2.x+ Storage SDK types.

        public static void Func(CloudStorageAccount account, ...)
        {
            // Now use the azure SDK directly
        }

The SDK is reflecting over the method parameters, so it knows if ‘account’ is 1.7x (Microsoft.WindowsAzure.CloudStorageAccount) or 2x+ (Microsoft.WindowsAzure.Storage.CloudStorageAccount) and can bind to either one. This means that existing applications using 1.7 can start to incorporate the WebJobs SDK.

Azure Storage Bindings Part 2 – Queues

$
0
0

 

I previously described how the Azure Webjobs SDK can bind to Blobs. This entry describes binding to Azure Queues.   (Binding to Service Bus Queues is not yet implemented)

You can see some more examples for queue usage on the sample site. Here are some supported queue bindings in the Alpha:

Queue Input for BCL types (String, Object, Byte[])

A function can have a [QueueInput]  attribute, which means that the function will get invoked when a queue message is available.

The queue name is either specified via the QueueInput constructor parameter or by the name of the local variable (similar to MVC).  So the following are equivalent:

        public static void Queue1([QueueInput] string queuename)
        {
        }

        public static void Queue2([QueueInput("queuename")] string x)
        {
        }

In this case, the functions are triggered when a new message shows up in “queuename”, and the parameter is bound to the message contents (CloudQueueMessage.AsString).

The parameter type can be string or object (both which bind as CloudQueueMessage.AsString) or byte[] (which binds as cloudQueueMessage.AsBytes)

As an added benefit, the SDK will ensure the queue input message is held until the function returns . It does this by calling UpdateMessage on a background thread while the function is still executing and calls DeleteMessage when the function returns.

Queue Input for user types

The parameter type can also be a your user defined poco type, in which case it is deserialized with JSON.Net.  For example:

    public class Payload
    {
        public int Value { get; set; }
        public string Output { get; set; }
    }
    public static void Queue2([QueueInput("queuename")] Payload x)
    {
    }

That has the same execution semantics as this:

    public static void Queue2([QueueInput("queuename")] string json)
    {
        Payload x = JsonConvert.DeserializeObject<Payload>(json);
        // …
    }

Serialization errors are treated as runtime exceptions which show up in the dashboard.

Of course, a big difference is that using poco type means the QueueInput provides route parameters, which means you can directly use the queue message properties in other parameter bindigs, like so:

        public static void Queue2(
            [QueueInput("queuename")] Payload x,
            int Value,
            [BlobOutput("container/{Output}.txt")] Stream output)
        {
        }

With bindings like these, it’s possible you don’t even need to use x in the function body.

Queue Output

A function can enqueue messages via [QueueOutput]. It can enqueue a single message via an out parameter. If the value is not null on return, the message is serialized and queued, similar to the rules used for [QueueInput].

Here’s an example of queuing a mesasage saying “payload” to a queue named testqueue.

        public static void OutputQueue([QueueOutput] out string testqueue)
        {
            testqueue = "payload";
        }

In this case, the function doesn’t have any triggers (no [QueueInput] or [BlobInput]), but could still be invoked directly from the host:

        host.Call(typeof(Program).GetMethod("OutputQueue"));

It can enqueue multiple messages via an IEnumerable<T>.  This function queues 3 messages:

        public static void OutputMultiple([QueueOutput] out IEnumerable<string> testqueue)
        {
            testqueue = new string[] {
                "one",
                "two",
                "three"
            };
        }  

You could of course bind to multiple output queues. For example, this takes an OrderRequest object as input, logs a history to the “history” queue, and may turn around and enqueue suborders for this order.

        public static void ProcessOrders(
            [QueueInput("orders")] OrderRequest input,
            [QueueOutput("history")] out WorkItem output, // log a history
            [QueueOutput("orders")] out IEnumerable<OrderRequest> children
            )
        {

        }

Diagnostics: Navigating between Producers and Consumers

The Azure WebJobs SDK will track who queued a message so you can use the dashboard to see relationships between queue producers and consumers.

For example, consider these functions:

        public static void Producer(
            [QueueOutput("testqueue")] out Payload payload,
            string output,
            int value
            )
        {
            payload = new Payload { Output = output, Value = value };
        }

        public static void Consumer([QueueInput("testqueue")] Payload payload)
        {
        }

And say we invoked Producer explicitly via:

        host.Call(typeof(Program).GetMethod("Producer"), new { output = "Hello", value = 15 });

So the Producer() queues a message which is then consumed by Consumer().

We can see Consumer’s execution in the dashboard:

image

And we can see that it was executed because “New queue input message on ‘testqueue’ from Producer()”. So we can click on the link to producer to jump to that function instance:

image

And here, we can see that it was executed because of our explicit call to JobHost.Call. And that a child function of Producer() is Consumer(), so you can navigating both directions.

Note this only works when using non-BCL types. You’ll notice the queue payload has an extra “$AzureJobsParentId” field, which we can easily add in JSON objects.

Binding to CloudMessage SDK Types

You can also bind to the CloudQueue Storage SDK type directly. This provides a backdoor if you need full access to queue APIs that don’t naturally map to model binding.

        [NoAutomaticTrigger]
        public static void Queue1(CloudQueue testqueue)
        {
            var ts = TimeSpan.FromSeconds(400);
            testqueue.AddMessage(new CloudQueueMessage("test"), ts);
        }

It’s still in Alpha, so there are some known issues we’re working through, such as:

  • Alpha 1 does not support binding to CloudQueueMessage directly.
  • better bindings for outputs
  • more customization round serialization for non-BCL types.
  • Better user control over the QueueInput listening

Who wrote that blob?

$
0
0

One of my favorite features of the Azure WebJobs SDK is the “Who wrote this blob?” feature. This is a common debugging scenario: you see your output is incorrect (in this case, a blob) and you’re trying to find the culprit that wrote the bad output.

On the main dashboard, there’s a “Search Blobs” button, which lets you enter the blob name.

image

Hit “Search” and then it takes you to a permalink for the function instance page that wrote that blob.

image

Of course, once you’re at the function instance page, you can then see things like that function’s input parameters, console output, and trigger reason for why that function was executed. In this case, the input blob was “in/a.txt”, and you can hit the “lookup” button on that to see who wrote the input. So you’re effectively tracing back up the chain.

And once you find the culprit, you can re-publish the function (if it was a coding error) or re-upload the blob (if it was a bad initial input) and re-execute it just the affected blobs.

It’s basically omniscient debugging for Azure with edit-and-continue.

How does it work?

When the WebJobs SDK invokes a function instance, it gives it a unique guid. That guid is used in the permalink to the function instance page. When a blob is written via  [BlobOutput], the blob’s metadata is stamped with that guid.  So the lookup only works for functions written by the WebJobs SDK.

What about queues?

The WebJobs SDK has a similar lookup queues. Queue messages written by [QueueOutput] also get stamped with the function instance guid and so you can lookup which function instance queued a message.

Azure Storage Bindings Part 3 – Tables

$
0
0

This blog post was made during the early previews of the SDK. Some of the features/ APIs have changed. For the
latest documentation on WebJobs SDK, please see this link http://azure.microsoft.com/en-us/documentation/articles/websites-webjobs-resources

The dictionary bindings were removed from the core product and are being moved into an extension.

====

I previously described how the Azure WebJobs SDK can bind to Blobs and Queues.  This entry describes binding to Tables.

You can use a [Table] attribute from the Microsoft.WindowsAzure.Jobs namespace in the Jobs.Host nuget package.   Functions are not triggered on table changes. However, once a function is called for some other reason, it can bind to a table as a read/write resource for doing its task.

As background, here’s a good tutorial using about azure tables and using the v2.x+ azure storage sdk for tables.

The WebJobs SDK currently supports binding a table to an IDictionary.  where:

  1. The dictionary key is Tuple<string,string> represents the partition and row key. 
  2. the dictionary value is a user poco type whose properties map to the table properties to be read. Note that the type does not need to derive from TableServiceEntity or any other base class.
  3. Your poco type’s properties can by strongly typed (not just string) including binding to enum properties or any type with a TryParse() method.
  4. the binding is read/write

For example, here’s a declaration that binds the ‘dict’ parameter to an Azure Table. The table is treated as a homogenous table where each row has properties Fruit, Duration, and Value.

public static void TableDict([Table("mytable")] IDictionary<Tuple<string, string>, OtherStuff> dict) {}
public class OtherStuff { public Fruit Fruit { get; set; } public TimeSpan Duration { get; set; } public string Value { get; set; } } public enum Fruit { Apple, Banana, Pear, }

You can also retrieve PartitionKey, RowKey, or TimeStamp properties by including them as properties on your poco.

Writing to a table

You can use a the dictionary binding to write to a table via the index operator .  Here’s an example of ingressing a file (read via some Parse<> function) to an azure table.

 [NoAutomaticTrigger] public static void Ingress( [BlobInput(@"table-uploads\key.csv")] Stream inputStream,  [Table("convert")] IDictionary<Tuple<string, string>, object> table  ) { IEnumerable<Payload> rows = Parse<Payload>(inputStream); foreach (var row in rows) { var partRowKey = Tuple.Create("const", row.guidkey.ToString()); table[partRowKey] = row; // azure table write } }

In this case, the IDictionary implementation follows azure table best practices for writing by buffering up the writes by partition key and flushing the batches for you.

Writes default to Upserts.

Reading a table entry

You can use the dictionary indexer or TryGetValue to lookup a single entity based on partition row key.

public static void TableDict([Table("mytable")] IDictionary<Tuple<string, string>, OtherStuff> dict) { // Use IDictionary interface to access an azure table. var partRowKey = Tuple.Create("PartitionKeyValue", "RowKeyValue");OtherStuff val; bool found = dict.TryGetValue(partRowKey, out val); OtherStuff val2 = dict[partRowKey]; // lookup via indexer // another write exmaple dict[partRowKey] = new OtherStuff { Value = "fall", Fruit = Fruit.Apple, Duration = TimeSpan.FromMinutes(5) }; } 

Enumerating table entries

You can use foreach() on the table to enumerate the entries. The dictionary<> binding will enumerate the entire table and doesn’t support enumerating a single partition.

public static void TableDict([Table("mytable")] IDictionary<Tuple<string, string>, OtherStuff> dict) { foreach (var kv in dict) { OtherStuff val = kv.Value; } } 

You can also use linq expressions over azure tables, since that just builds on foreach().

Here’s an example of an basic RssAggregator that gets the blog roll from an Azure Table and then writes out a combined RSS feed via [BlobOutput].  The whole sample is available on GitHub, but the interesting code is:

// RSS reader. // Aggregates to: http://<mystorage>.blob.core.windows.net/blog/output.rss.xml // Get blog roll from a table. public static void AggregateRss( [Table("blogroll")] IDictionary<Tuple<string, string>, BlogRollEntry> blogroll, [BlobOutput(@"blog/output.rss.xml")] out SyndicationFeed output ) { // get blog roll form an azure table var urls = (

from kv in

blogroll select kv.Value.url).ToArray(); List<SyndicationItem> items = new List<SyndicationItem>(); foreach (string url in urls) { var reader = new XmlTextReader(url); var feed = SyndicationFeed.Load(reader); items.AddRange(feed.Items.Take(5)); } var sorted = items.OrderBy(item => item.PublishDate); output = new SyndicationFeed("Status", "Status from SimpleBatch", null, sorted); }

BlobRollEntry is just a poco, with no mandatory base class.

// Format for blog roll in the azure table public class BlogRollEntry { public string url { get; set; } }

Here’s the contents of the azure table. So you can see how the POCO maps to the table properties of interest.

image

 

Removing from a table

You can use IDictionary.Remove() to remove from the table.

public static void TableDict([Table(TableNameDict)] IDictionary<Tuple<string, string>, OtherStuff> dict) { var partRowKey = Tuple.Create("PartitionKeyValue", "RowKeyValue"); // Clear dict.Remove(partRowKey); }

You can use IDictionary.Clear()  to clear an entire table.

Summary

Here’s a summary of which IDictionary operations map to table operations.

Assume dict is a dictionary table mapping, and partRowKey is a tuple as used above.

 

Operation Code snippet
Read single entity value = dict[partRowKey]
  dict.TryGetValue(partRowKey, out val)
Contains a key bool found = dict.ContainsKey(partRowKey)
Write single entity dict[partRowKey] = value
  Add(partRowKey, value)
enumerate entire table foreach(var kv in dict) { }
Remove a single entity dict.Remove(partRowKey);
Clear all entities dict.Clear()

 

Other notes

  1. This binding is obviously limited. You can always bind directly to CloudStorageAccount and use the SDK directly if you need more control.
  2. The dictionary adapter does not implement all properties on IDictionary<>. For example, in the Alpha 1 release, CopyTo, Contains, Keys, Value, and other aren’t implemented.
  3. We’re looking at more Table bindings in the next update (such as binding directly to CloudTable).
  4. You see some more examples for table usage on samples site.

How does [BlobInput] work?

$
0
0

The Azure WebJobs SDK supports running functions when a new blob is added.  IE, you can write code like this:

        public static void CopyWithStream(
            [BlobInput("container/in/{name}")] Stream input,
            [BlobOutput("container/out1/{name}")] Stream output
            )
        {
            Debug.Assert(input.CanRead && !input.CanWrite);
            Debug.Assert(!output.CanRead && output.CanWrite);

            input.CopyTo(output);
        }

See modelbinding to blobs for how we bind the blob to types like Stream.  In this entry, I wanted to explain how we handle the blob listening.  The executive summary is:

  1. The existing blobs in your container are processed immediately on startup.
  2. But once you’re in steady state, [BlobInput] detection (from external sources) can take up to 10 minutes. If you need fast responses, use [QueueInput].
  3. [BlobInput] can be triggered multiple times on the same blob. But the function will only run if the input is newer than the outputs.

More details…

Blob listening is tricky since the Azure Storage APIs don’t provide this directly. WebJobs SDK builds this on top of the existing storage APIs by:

1. Determining the set of containers to listen on by scanning  the [BlobInput] attributes in your program via reflection in the JobHost ctor. This is a  fixed list because while the blob names can have { } expressions, the container names must be constants.  IE, in the above case, the container is named “container”, and then we scan for any blobs in that container that match the name “in/{name}”.

2. When JobHost.RunAndBlock is first called, it will kick off a background scan of the containers. This is naively using CloudBlobContainer.ListBlobs.

    a. For small containers, this is quick and gives a nice instant feel.  
    b. For large containers, the scan can take a long time.

3. For steady state, it will scan the azure storage logs. This provides a highly efficient way of getting notifications for blobs across all containers without pulling. Unfortunately, the storage logs are buffered and only updated every 10 minutes, and so that means that the steady state detection for new blobs can have  a 5-10 minute lag. For fast response times at scale, our recommendation is to use Queues.

The scanning from #2 and #3 are done in parallel.

4. There is an optimization where any blob written via a [BlobOutput] (as opposed to being written by some external source) will optimistically check for any matching [BlobInputs], without relying on #2 or #3. This lets them chain very quickly. This means that a [QueueInput] can start a chain of blob outputs / inputs, and it can still be very efficient.

See Also

Hosting interactive code in the Cloud

$
0
0

Azure WebJobs SDK alpha 2 makes it very easy to host code in the cloud and run it interactively.  You can now invoke your SDK functions directly from the dashboard. Some great uses here:

  1. Provide admin diagnostic commands for your live site.
  2. Easily host code in azure for testing benchmarking code within a datacenter.
  3. Sharing your functions so that others trusted folks can call them, without the hassle of writing an MVC front end. 
  4. This provides a great live-site debugging tool since you can replay erroneous executions without perturbing the rest of the system.  (Couple that with other SDK debugging features like Who wrote this blob?)

For example, suppose you have a function Writer(). In this case, we’ll just do something silly (take a string and write it out multiple times), but you can imagine doing something more interesting like providing diagnostic functions on your live site (eg ,”GetLogs”, “ClearStaleData”, etc).

using Microsoft.WindowsAzure.Jobs; // From nuget: Microsoft.WindowsAzure.Jobs.Host
using System.IO;

namespace Live
{
    class Program
    {
        static void Main(string[] args)
        {
            var host = new JobHost();

            host.RunAndBlock();
        }

        // Given a string, write it our 'multiple' times. ("A",3) ==> "AAA".
        public static void Writer(string content, int multiply, [BlobOutput("test/output.txt")] TextWriter output)
        {
            for (int i = 0; i < multiply; i++)
            {
                output.Write(content);
            }
        }
    }
}

Once you run the program to azure, if you go to the dashboard, you’ll see the function show up in the function list (it’s “published” when the JobHost ctor runs):

image

You can click on that function and see a history of executions and their status. :

image

You can click on the blue “run function”  to invoke the function directly from the dashboard! This lets you fill in the parameters. Notice that the parameters are parsed from strings. So ‘multiply’ is strongly-typed as an integer, and we parse it by invoking the Int.TryParse function.

image

And of course, the above page has a shareable password-protected permalink, (looks like: https://simplebatch.scm.azurewebsites.net/azurejobs/function/run?functionId=simplebatch3.Live.Live.Program.Writer ) so that you can share out the invocation ability with others.

You can hit run and the function will get invoked! The assumes that your webjob is running, and the invoke works by queuing a message that the JobHost.RunAndBlock() call will listen to.  (This means that your webjob needs to actually be running somewhere, although if it’s not, that dashboard will warn you about that too). Run will also take you to the “permalink” for the function instance execution, which is a shareable URL that provides information about the function execution. You can see that the output parameter was written to a blob “test/output.txt” and it wrote 9 bytes and the hyperlink will take you to the blobs contents. It also notes that the execution reason was “Ran form dashboard”.

image

You can also hit “replay function”  on an existing instance to replay the function. This will take you back to the invoke page and  a) pre-seed the parameters with values from the execution and b) record a “parent” link to the permalink back to the original instance of the execution so you can see what was replayed.

Redis Cache Service on Azure

$
0
0

We just previewed a Redis cache service on Azure.  A good writeup is also on ScottGu’s blog.

This is Redis hosted within azure as service. You can create a cache via the portal, and then access it via

Some highlights:

  1. Hosting Redis 2.8 on Azure VMs
  2. accessible via redis clients from any language. My recommendation for C# is Marc Gravell’s Stackexchange.Redis.
  3. Caches expose a SSL endpoint, and support the Auth command.
  4. The standard SKU provides a single-endpoint that’s backed by a 2-node Master/Slave cluster to increase availability. The service includes automatic failover detection and forwarding requests to the master so you get the persistence without needing to worry about it.
  5. There’s a redis session state provider for using redis from your ASP apps.

See the Getting Started Guide for how to jump right in.

Azure Storage Naming Rules

$
0
0

I constantly get burned by naming rules for azure storage. Here’s a collection of the naming rules from MSDN. The Storage client libraries don’t help you with these rules and just give you a 400 if you get them wrong. Fortunately, WebJobs SDK will provide client-side validation and give you more friendly messages.

Here’s a summary of rules in table form:










Kind Length  Casing? Valid chars?
Storage Account 3-24 lowercase  alphanumeric
Blob Name 1-1024 case-sensitive any url char
Container Name 3-63 lowercase alphanumeric and dash
QueueName 3-63 lowercase alphanumeric and dash
TableName 3-63 case-insensitive alphanumeric

You’ll notice blobs, tables, and queues all have different naming rules.

Here are the relevant excerpts from MSDN with more details.

Storage account names

Storage account names are scoped globally (across subscriptions).

Between 3 and 24 characters.  Lowercase letters and numbers . 

Blobs

From MSDN here.

Blob Names

A blob name can contain any combination of characters, but reserved URL characters must be properly escaped. A blob name must be at least one character long and cannot be more than 1,024 characters long. Blob names are case-sensitive.

Avoid blob names that end with a dot (.), a forward slash (/), or a sequence or combination of the two.

By convention / is the virtual directory separator. Don’t use \ in a blob name. The client APIs may allow it, but then fail to hash properly and the signatures mismatch

Blob Metadata Names

Metadata for a container or blob resource is stored as name-value pairs associated with the resource. Metadata names must adhere to the naming rules for C# identifiers.

Note that metadata names preserve the case with which they were created, but are case-insensitive when set or read. If two or more metadata headers with the same name are submitted for a resource, the Blob service returns status code 400 (Bad Request).

Container Names

A container name must be a valid DNS name, conforming to the following naming rules:

  1. Container names must start with a letter or number, and can contain only letters, numbers, and the dash (-) character.
  2. Every dash (-) character must be immediately preceded and followed by a letter or number; consecutive dashes are not permitted in container names.
  3. All letters in a container name must be lowercase.
  4. Container names must be from 3 through 63 characters long.

Queues

From MSDN:

Every queue within an account must have a unique name. The queue name must be a valid DNS name.

Queue names must confirm to the following rules:

  1. A queue name must start with a letter or number, and can only contain letters, numbers, and the dash (-) character.
  2. The first and last letters in the queue name must be alphanumeric. The dash (-) character cannot be the first or last character. Consecutive dash characters are not permitted in the queue name.
  3. All letters in a queue name must be lowercase.
  4. A queue name must be from 3 through 63 characters long.

Tables

Name of the table

Table names must conform to these rules:

  • Table names must be unique within an account.
  • Table names may contain only alphanumeric characters.
  • Table names cannot begin with a numeric character.
  • Table names are case-insensitive.
  • Table names must be from 3 to 63 characters long.
  • Some table names are reserved, including "tables". Attempting to create a table with a reserved table name returns error code 404 (Bad Request).

These rules are also described by the regular expression "^[A-Za-z][A-Za-z0-9]{2,62}$".

Table names preserve the case with which they were created, but are case-insensitive when used.

Valid property names

Property names are case-sensitive strings up to 255 characters in size. Property names should follow naming rules for C# identifiers. (The dash is no longer allowed)

Valid values for PartitionKey and RowKey

The following characters are not allowed in values for the PartitionKey and RowKey properties:

  • The forward slash (/) character
  • The backslash (\) character
  • The number sign (#) character
  • The question mark (?) character
  • Control characters from U+0000 to U+001F, including:
    • The horizontal tab (\t) character
    • The linefeed (\n) character
    • The carriage return (\r) character
  • Control characters from U+007F to U+009F

Webjobs SDK Beta is released

$
0
0

 

We just released the WebJobs SDK Beta! Some highlights:

  • ServiceBus support!
  • Better configuration options. You can pass in an ITypeLocator to specify which types are indexed, and an INameResolver to resolve %key% tokens in the attributes to values.
  • Cleaner model for Triggering (this is a breaking change … I need to go and update my previous blob entries)

There were some breaking changes , including an attribute rename and  a branding rename.

  • nuget package rename: Microsoft.WindowsAzure.Jobs.Host –> Microsoft.Azure.Jobs
  • nuget package rename: Microsoft.WindowsAzure.Jobs –> Microsoft.Azure.Jobs.Core
  • attribute change: Instead of [BlobInput]/[BlobOutput] attributes, we now have [BlobTrigger] and [Blob]. 
  • attribute change: Instead of  [QueueInput] / [QueueOutput]  have become just [QueueTrigger] and [Queue].

The attribute changes makes it very clear exactly what’s triggering a function. Functions can only have 1 trigger.


Viewing all 35 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>