Hiding a Subreport: High concurrency, memory allocations, and AppDomains

Hello,

Question:

How can I keep my thread alive after an out of memory exception? That is, I understand that sometimes a server may be unable to satisfy a memory request, but I'm okay with waiting -- I'm not okay with being terminated (think of the reaction to Oliver asking for some more). I would think that, in general, when any application makes a request for a resource that is currently unavailble, but may be available at another time, that application (process/thread/fiber) would be put in a Wait Queue for that resource. On a high concurrency system, this could obviously lead to deadlocks; however, I think in the situation I describe below, the killing is overkill.

Discussion & Background:

In my project, I have a SqlFunction, we'll call "SqlDecimal BigFunction()" that will allocate a large chunk of memory (~3MB) and can take anywhere from 20ms to 500ms to complete (on my system, assuming no other processor load). There are also Functions that are used to set control points for BigFunction (implying thread/fiber state -- or, if there is a distinction, Transaction state), which we will call "SqlBoolean SetControlPoint(SqlInt32 x, SqlInt32 y)". The 3MB requirement is constant, regardless of the number of control points. (Incidentally, the actual implementations of these functions are in a referenced assembly)

In Code:

////////////////////////////////////////////////////////////////////////////////////////////////////

[ThreadStatic] // UNSAFE

static externalAssembly.MyClass myObj;

[SqlFunction]

SqlDecimal BigFunction()

{

if(myObj == null) return -1;

// DoSomeWork will do something like, byte[] b = new byte[3 * 1024 * 1024];

return externalAssembly.DoSomeWork(myObject)

}

[SqlFunction]

SqlBoolean SetControlPoint(SqlInt32 x, SqlInt32 y)

{

if(myObj == null) myObj = new externalAssembly.MyClass();

myObj.SetPoint(x.Value, y.Value);

return SqlBoolean.True; // because we can't have 'void' return type

}

////////////////////////////////////////////////////////////////////////////////////////////////////

In low to moderate concurrency (single hyperthreaded CPU with 20 sessions banging it in a loop), it *usually* does okay. In a higher concurrency situation (2 hyperthreaded cpus with 10 sessions stressing this code and 10 other sessions doing regular TSQL Selects) It runs for a long time, but will occasionally throw an out-of-memory exception. (Previously, I was managing my thread state manually with a locked dictionary, an Int32 key, and CreateSession/ReleaseSession calls). When an out of memory exception is thrown while the dictionary is locked, I get an AppDomain unload, which is *completely* unacceptable)

So, I know that sometimes, I won't be able to allocate my 3MB (it could be 3kb, it just shows up more readily with a larger allocation request). That doesn't mean my externalAssembly is "misbehaving" or "off-in-the-weeds". It just means the server is loaded right now and can't satisfy my request. One may catch an OutOfMemory Exception (perhaps to add additional info about the point of failure), but the thread is already being aborted.

I tried modifying this implementation to use a buffer pool that is allocated on start-up. That worked pretty well (reduced % Time in GC a bit, also), but it forced my external assembly to be marked as unsafe rather than just external access because of the Synchronization methods used to manage the buffer pool. It also doesn't scale, at least not as it sits. Its just a fixed size buffer pool. With more processors and less peripheral loading, the extra processors would just be waiting for a buffer. Besides that, I thought there was some escalation policy about "waiting too long", but I may be wrong.

I would like to eliminate the "UNSAFE" attribute from the primary assembly -- mainly because it "sounds scary", but more realistically, because it is unsafe! Or at least, experience in the field points to synchronization issues being a primary cause of unreliability in systems. Also, calling the C# lock, Mutex, Monitor etc call into native code to use the OS for locking. When this happens, SQL doesn't really know what you're waiting for and can't take that info into account when scheduling. All it knows is that you're waiting on an OS lock. I thought the hosting API would've allowed the host to optionally implement its own locking primitives, especially a host that runs its own scheduler.

I've looked into constrained execution regions and Chris Brumme's blog entry on hosting. Using them would help ensure some protection, but I think even they do not protect a thread from being unloaded in the face of an OutOfMemoryException (or any asynch exception); rather, they allow you to safely clean up unmanaged references and ensure state integrity for the appdomain.

At any rate, this is getting a little long winded. If anyone has any feedback, I'd be delighted to hear it.

Thank you.

-Troy

System Info:

SELECT @.@.version

Hi Troy,

I don't have an easy answer for you as your problem is really a general one that all applications that seek to be reliable under extreme conditions face, but there are definitely some things you can try to help the situation. It seems the biggest problem you face is that OutOfMemoryExceptions causes a ThreadAbortException, which if it takes place while a lock is held results in your appdomain being unloaded.

I don't fully understand your scenario, but it seems like an obvious solution is to restrict your memory allocations so that they do not take place when your dictionary is locked. Is there a reason why this wouldn't work for you?

ThreadAborts that don't cause Appdomain unloads aren't that bad, as no state is corrupted and you're free to try again later. I think you understand the reason why the clr can't just queue memory allocations until memory was available as that would inevitably lead to deadlock.

Also, it happens that when you use CLR locking mechanisms, they do actually go through SQL Server rather than going straight to the OS directly. This allows SQL Server to do deadlock detection and better manage scheduling, more information is in this MSDN Mag article http://msdn.microsoft.com/msdnmag/issues/06/04/Deadlocks/default.aspx

Steven

|||

Hello Steven,

Thanks for you prompt reply (and I apologize for my delayed one).

You're right about not allocating memory while in a lock. When I was still using the dictionary, I was allocating the instance of externalAssembly.MyClass inside the lock. I moved it to just before the lock so that I could just make an assignment inside the lock. There are at least 2 problems, though, with using the dictionary like that. One is that the dictionary may need to allocate memory to insert a new entry. Any writes to the dictinary *must* occur inside the lock (because it's shared state), possibly causing an OutOfMemoryException to be thrown which would cause the AppDomain unload. I think one could protect against the AppDomain unload in this case with CERs and Reliability contracts, but that can still be tricky to get right. The other problem is that the user of my assembly must be sure to remove the key when they're done. This is error prone. Any time someone *must* remember something, sooner or later, they will forget (It's happened to me enough that I try to avoid placing that burden on myself). It would also leave "MyClass"es stranded in the dictionary on a "non-fatal" thread abort. This would eventually lead to an AppDomain unload. One could always implement a WatchDog that would periodically clean up anything that hasn't been touched in a while; however, what triggers the cleanup? (Sounds a bit like the GC, eh?) Besides that, I hate polling (except maybe a judiciously implemented SpinWait, one where you somehow magically know that you have an excellent chance of getting your lock in the first few tries.)

That's why I'm using a [ThreadStatic] variable now. If "this" thread gets aborted, SQL knows to release the threadstatic resources. There might be some issue around finalizers, here, but my state variable is composed of completely managed constructs.

Thanks for the info about the locks going through SQL server. I thought I had read somewhere that SQL just marks the thread as "in a lock"...but maybe that was just when you're using an unmanaged lock. I see, now, the IHostSyncManager (and suspected there would be something like that).

I read through the article you referenced. It was a pretty decent treatment of deadlock detection/avoidance/resolution and the IHostxxx interfaces. It reminded me of how MUCH time/thought people have invested in trying to come up with better ways to deal with concurrency issues. I think, sometimes, "Why can't you just do a finite number of retries before failing?", or "If I'm waiting for a resource that is unavailable and I'm not holding any critical locks, just mark me as such and allow my context to be paged and don't forget that I'm there. Get back to me when you can satisfy my request", or "Can't I just tell the host, *before* I start trying to do anything, an estimate of my resource requirements so that it can decide if its ready to deal with that load right now". There are problems, though, with each of these. The retry thing would help a *small* amount, but one will soon run into a "retry wall" that manifests itself similarly to the original failure. The paging thing could work, taking my resource needs in isolation. But how often does one know *all* that they will need until they get going? And even if you do, that would have to be considered for the entire transaction context. That is, my code isn't the only thing someone wants to run on a particular thread. There are probably some selects going on, maybe some temp tables, or even some writes to tables. Given that all this is going on in a transaction that needs to be able to be rolled back, you can't really stall indefinately.

So basically, I see that there is no way to reliably handle "OutOf<SomeCriticalResource>Exception" and maintain integrity in a high concurrency situation. Resources are finite while our capacity to consume them isn't.

I did have one other question about the BufferPool I tried using. I don't like it because its a fixed-size pool, which means it's either too big or too small; it is also the *only* reason my assembly has to be marked unsafe, because of the sync, which maybe doesn't matter anyway because the interface assembly is definately "UNSAFE" with the [ThreadStatic] variable. I *do* like it because when using it, the % Time in GC sits around 2-5% (when using heap allocation upon request, %GC is anywhere between 5%-60%, averaging around 15-20%). The buffers are an array of structs (value types). I want it to be able to grow/shrink depending on demand, but that starts to run into the original "OutOfMemory unloads the thread" problem. If I'm trying to add another buffer (because I don't want to wait for one to become available), I could get an OutOfMemory. In this case, I really *can* "just wait" for one. In this case, it's more "I was just trying to run on the edge and pushed a smidge too far...okay". All of this is just BEGGING for some kind of "TryAlloc". But with the above discussion of deadlocks and (b)locking, providing such a call would just allow people to try to do their own resource management -- something they can't really do in a SQL context because they don't know what else is going on and can't make an informed decision. Even if they did have all the info, it's still hard to get right (the SQL team has worked VERY hard at it). If there were some way to pre-allocate the set of buffers, but allow them to be revoked when not in use, that would do it. Hmmm...WeakReference, anyone? But you still have the, "I might get an OutOfMemory when the WeakReference object gets recreated." -- so, same problem.

Gosh, I guess I talked myself out of a question. I'll just have to tell the people using my assembly that it can fail occasionally if the server gets really busy; so, they should be prepared to deal with that. But it won't jeopardize the integrity of the system. At least now, when they ask questions, I'll be able to tell them why.

Take care.

-Troy

|||

Troy-

To handle the case where you are handing out a key that must be returned when the client code is done with it, I'd recommend handing out a separate class that implements this return mechanism through IDisposable and its finalizer.

Your client code could then use the using{} construct to cause it to return the key even in the case of a ThreadAbortException. If client code does not use the using{} construct (or implement its own finally { yourObject.Dispose() }), then the finalizer would trigger the return mechanism.

Bear in mind that the finalizer thread does not scale and that your should avoid the invocation of the finalizers whenever possible. This is done through the IDisposable pattern by calling GC.SuppressFinalizer(this) when Dispose() is first called. However, this is a good mechanism for the scenario you describe where in the case of the client "forgetting" to perform an action, it is still handled.

I'd recommend this approach over using ThreadStaticAttribute and leave that to only cases that are specific to its description (Indicates that the value of a static field is unique for each thread).

Unfortunately, this approach and the use of finalizers does require the UNSAFE permission set bucket.

As for the comments when OOM - You raise some great points! SQL and the escalation policy was designed not to be performant when under OOM, but to handle cases where your application logic is invalidated (such as exceptions while modifying your shared state under a lock).

You are definately going down the right path to help in the OOM scenario by delaying your usage of locks until the write takes place. You should prepare as much as possible so that the work done under the lock is as limited as possible.

In addition to changing your code to delay locks, I'd recommend trying to avoid OOM as much as possible. If you are on a X86 machine, increasing the -g value will help. Please understand the tradeoffs there, as you are restricting the maximum memory used for your database buffers. If your database application is light on the storage read/writes, then this is a good case to use or increase -g's value. If it's possible to use a 64 bit architecture, then I'd certainly recommend this route!

-Jason (MSFT)

|||

Hi Jason,

I don't think I understand what your saying about passing an object (class) that has the finalizer. I mean, I understand the use of IDisposable and finalizers (I don't want to get into any holy wars about destructors or deterministic finalization). I just didn't think I could pass an object to SQL server (besides an IEnumerator for a TVF). Looking at the BOL, it says, "SQL Type: sql_variant, CLR Data Type(SQL Server): none, CLR Data Type (.NET Framework): object". Trying to just return object and store it in a variable like this:

Declare @.sess sql_variant

SET @.sess = dbo.GetTestObject()

returns this ArgumentException:

System.ArgumentException: No mapping exists from object type externalAssembly.MyClass to a known managed provider native type.

I agree, though, that the ThreadStatic isn't *exactly* what I need, but I think it does get cleaned up. To clarify, I thought variables marked as ThreadStatic were local to the context in which a thread was running (which is managed by SQL; so, the context would be the duration of a transaction) and not hooked by the ThreadID...So a Thread context does not have ThreadID affinity, just as a particular thread may not have Processor affinity; although, I thought I read SQL does prefer to keep a particular thread bound to a particular logical processor (which gets kinda fuzzy around the dual-core, hyperthreaded processors of late because of cache locality and such). So, the only limitation with the ThreadStatic is that I have to provide a way to Clear out all the points that I've added so that the user can do something else within the same transaction.

As for the -g, my super-helpful client who is "Mr. SQL" (at least to me) wanted to *try* to stay with the default for -g (which is 256MB, I think). If that turns out to be insufficient for a particular workload (which we'd be able to measure by counting OOMs under normal, production load), we could possibly crank it up. I tell you, it would be kinda neat if that threshold had something more like a min and max setting...or rather, a "BorrowingMax" and a "Guaranteed Max". For instance, I could set my guaranteed max to the default -g of 256. That is the memory that is allocated for CLR that the regular buffer pool just *can't* use. Then I could set my "BorrowingMax" to, say, 512...This number indicates what I (me, the SQL Administrator) am willing to allow the CLR to use if it needs it *and* I'm not using it. I know this idea won't work quite as described as SQL likes to use as much memory as it can up to the point where it would have to page to disk; so, having *that* amount of memory be flexible would pretty well screw up the existing allocation strategies/scheduling.

On a side note, I'm trying to think of how to make the call to Dictionary.Insert safe from OOM exceptions. I mean, when pre-allocating my key and value (outside the lock), if I get an OOM, then *this* thread gets aborted, not the whole AD...but the Dicitionary may very well need to grow on an insert. I havn't seen anything like std::vector::reserve(int) for dictionary. I guess I could just make a new dictionary object outside of the lock with the capacity ctor (tmpDictionary = new Dictionary(sharedDictionary.Count+1)...but by the time I get into the lock, the capacity might have changed. Besides that, then the call to GetEnumerator() probably new's something somewhere (new Dictionary.Enumerator()?). Also, if the dictionary exposed the capacity as a property, so that outside the lock I could do "if(d.Capacity > d.Count + 5)", then I stand a better chance of not throwing an OOM on insert. In general, though, if we're running *that* close to the edge, the best bet is to just increase the -g option.

Grrr...tricky, tweaky stuff...I was afraid I wasn't gonna get to do anything like that anymore. Coming from an embedded systems background, where you have 96bytes in one memory bank and 120 bytes in another, and there's a cost for switching between them so, be sure to group all of your bits at the right byte addresses so as to avoid a bank switch while you're handling an interrupt -- Windows and VB/C#/.Net seemed more like "eh, you need 20MB, here you go" -- and if it ends up paging, oh well...it will still work.

I don't know...I guess I just dig the tweaky stuff...

Take care.

-Troy

Monday, March 12, 2012

High concurrency, memory allocations, and AppDomains

No comments:

Post a Comment

Hiding a Subreport

Blog Archive

About Me