Monday, March 19, 2012

High CPU Usage SSIS Package

Anyone aware of anything I can look at to improve performance of my SSIS packages? i basically have two master packages that call about 15 child package tasks each. All the package tasks are called sequentially and all they do is just load flat files into temporary tables and execute a simple stored proc. Each package tasks check to see if a file is there and if so it processes it else it just ends. The packages are scheduled to run every minute to check for files. If I watch the CPU during execution i can see the two instances of DTSEXEC.exe running and consuming about 25-30% of the CPU. The box that this runs on is a Quad processor with 4 GB of ram.

Any ideas on how to improve the performance of this? I have tried playing with the transaction level, but that didn't seem to do much performance wise.

Thanks for the help!

I thoroughly recommend you digest this SSSI Performance Whitepaper: http://www.microsoft.com/technet/prodtechnol/sql/2005/ssisperf.mspx. Its a really interesting read and has some great tips on performance tuning.

Understanding how SSIS processes data is half the battle and this paper really helps towards this.

What performance problems are you having? As ever, performance tuning is all about identifying where your bottlenecks are.

-Jamie

|||

Yeah I browsed through that whitepaper and didn't see anything that triggered me for this situation.

Basically my setup is the following...

There is a directory that contains data that an AS/400 sends over via FTP. So for a customer change there is a file called customer.txt. Then there is an SSIS package that checks if that file exists, if so it processes it, if not it just ends (used a precendence constraint). There are probably 30 different files that get put into the FTP directory. So there are two master packages and each one calls about 15 child packages that do just as the package I described above. Most of the time there aren't very many files there so most of the packages fire up, see there isnt' a file there and end. Seems like the bottleneck is in the packages spinning up as once it gets going it seems like it goes pretty fast. I just don't understand how checking for a file can take up that much CPU. Are there other strategies for implementing the kind of architecture we have?

Thanks!

|||

OK, well you're right that there is an overhead to executing packages. Witness: http://blogs.conchango.com/jamiethomson/archive/2005/03/01/1100.aspx

(But caveat those observations with what I've said here: http://blogs.conchango.com/jamiethomson/archive/2005/03/02/1107.aspx)

Basically, if you can reduce the number of packages that have to be executed that would be very beneficial. That would be my strategy anyway. Is there a reason that everything can't happen within 1 package?

Interested to know how you progress with this.

-Jamie

|||

Those are some interesting findings that you have there.

I could implement most of these in one package and skip the child packages. The reason for having them is for versioning and concurrent development. Now that we are pretty much done with developing these I could probably transform this into one package that doesn't use any child packages.

So any thoughts on the best way to implement this into one package then? Wondering if I can have a config file that has all the file names to check for and based on that perform the appropriate transformation which is nothing more than trimming off the spaces, loading it into a temp table and executing a stored proc that update the master table.

Seems kind of ridiculous that loading these child packages takes that much overhead, but I guess I can see that.

|||

Yeah, the overhead of spinning up the packages is only really a problem if its proportion of total package execution time is large. In other words, if you aint doing much, don't try and put everything into seperate packages.

I would recommend using sequence containers to logically seperate your processing in the control-flow. So, where you had a child package before, you now have a sequence container. These can of course run in parallel. And also, only execute those "stuff" for which a file exists - I'm sure you've already gone down that road already anyway.

-Jamie

|||Maybe what I shoudl try first is to put the check if the file exists in the master package and not in the child package. That way it will have to spin up a lot fewer packages.|||

Stark77 wrote:

Maybe what I shoudl try first is to put the check if the file exists in the master package and not in the child package. That way it will have to spin up a lot fewer packages.

Yep, another good idea. Although if you do that your child packages will be (correct me if I'm wrong) only loading up a file and executing a sproc on the data. Such a small amount of work doesn't really justify moving it out into another package.

-Jamie

|||

Ha...finally getting around to doing this. You have any best practices on doing something like this?

Basically here is what I am doing. We get about 30-40 different files in an ftp directory and when they arrive we need to process them. Each file has a different layout. Basically we grab the file, load it to a temp table that matches the file layout and then fire a stored procedure to process the records in the temp table. At the end we delete the file if it was processed successfully. We process the files in a specific order (orderheader, then orderdetail, shipmentheader, shipmentdetail, etc). The only other task we do is to trim all the string variables.

Basically I am wondering the best way to lay this out or if there is a best practice for processing something like this. I thought about looping through all the .txt files in the directory, but that won't process them in the order I want.

Thanks for the help!

|||

I use a table in a management db that contains the files. This way you have utmost control. In addition you can use it to log failures against the files, processing history etc.

You can improve the overhead time a package has by delaying validation on tasks/components. This way the validation only occurs when needed.

|||

Any chance you can script me out your table you use so I can brain storm off of it? Not really following what you are suggesting. How does the package flow out then? Do you have a for each that reads that table? Would be cool to send me over that table and a screenshot of the package or the package itself.

Thanks for the input. I am definitely going to throw an article together once I get this one done.

No comments:

Post a Comment