Hello,
Can anyone point me at a tutorial or sample that shows how to use IS for importing an xml file containing hierarchically arranged records ?
I have a file which contains multiple orders , the orders contain multiple line items.. the file also contains an element with details of the file source etc...
So, I want to make an insert in the FileLog table an then make inserts into the orders table .. then make inserts into the OrderItems table which will have the foreign key from the orders table in the records...
if you get what I mean...
But I have searched hign and low and can't see any info on how to load anything but a very flat xml file structure...
Thanks
Vida.
The XML Source will take a hierarchical XML feed and crack it, giving multiple outputs, one for each level. Have a play.
To handle the FKs you need to enforce order of inserts which cannot be done in a single Data Flow task, so stage the data and use several tasks, one for each successive table with the FK. Raw Files would be the ideal staging area for this, fast and efficient.
|||Darren,
Am I right in thinkin that SQLIS is just not designed to load anything but a flat file containing a simple set of same-format rows?
I just can't seem to find very much info on loading master-detail records , whether in XML or in any ordinary flat file ....
Now I've been given a task of loading a simple text file... first row is a bit of header info.. then the next ten rows are data rows in a simple csv format... last row is a checksum.
20060518,SUPPLIER1,,
EUR,AUD,1.6869,
GBP,AUD,2.4896,
USD,AUD,0.7568,
DKK,AUD,4.4581,
SEK,AUD,5.6246,
CHF,AUD,0.9246,
JPY,AUD,84.5870,
NZD,AUD,1.2293,
NOK,AUD,4.6809,
SGD,AUD,1.2060,
HKD,AUD,5.9217,
AUD,AUD,1.0000,
CAD,AUD,0.8515,
END,2454534,,
And I've got to insert a row in a Header table.. containing the header info.. the data rows have to be individual record inserts into a Data table... and they need to be foreign keyed with the ID of the record in the Header table...
I am looking at the example in the SQLIS website http://www.sqlis.com/default.aspx?54 which shows how to handle different row types... problem there is that my file does not have a row-type field. Its just: first row is header.. the rest are detail.. and the last is checksum....
So I'm just trying to figure out a strategy to do this... what approach should I take ?
BTW: I'm a newbie at SSIS... am reading the Wrox book... but its slow going...
Thanks
|||Multi-format rows can be handled in the same data-flow.When you import them you'll have to interpret the file as simply a file with one long text column in it and then, after you've split the data (see below) parse the proper columns out.
Regarding splitting the data into header and detail records...
If there is a characteristic of each row that determines what type it is then the Conditional Split transform will split it out into the different sections. From the snippet you have posted above it looks like that shouldn't be too difficult (i.e. Are the first 8 characters all digits? If so, its a header record.)
-Jamie
|||Thanks for your advice Jamie..
I suppose I could use the fact that the first eight chars all digits... I was hoping to just be able to say.. this is the First Row in the file.. so do this...
I dunno.. think I'm having some trouble getting my head around the SSIS way of thinking... being a bit too procedural in my approach...
Like.. I have in my head a procedural way that i think this should be done.. but can't quite map it to an SSIS package...
Maybe you could look at that with me?
For instance I feel that the steps I'd need to carry out to load this file are:
1. Read first line and create a FileLoaded rec in the FileLoaded table. Get an ID for this FileLoaded rec. If there is an error inserting the FileLoaded rec then report error (via email and insert of Error rec to LoadErrors table) and exit.
2. for next bunch of lines: insert FileLine rec in File Line table. Use FileLoaded ID as foreign key. If there's any error then report error.. rollback all inserts.. also rollback insert of FileLoaded rec... report error (detailing line of file that problem was encountered at)
3. When END line is reached, commit transaction. Report success (via email and via insert of record in some Logging table).
4. Copy file to archive.
I'm kinda stumped as to how to do this, especially how to FIRST insert a parent (FileLoaded) record BEFORE starting to insert child (FileLine) records. The sample on SQLIS just splits into master-rows and detail-rows.. seems to happen simultaneously.. so would we not have a race condition where child recs with the foreign key could get inserted before the parent recs ... and so cause a ref integrity error?
Also can't see how to start and roll back transactions and exit straight out of a package when some error occurs...
Please bear in mind that I am still learning this SSIS stuff, I have the Wrox book and am working through it.. but, as is always the case, am under pressure to get something done... something which the book just doesn't seem to cover... ie parent-child related data input...
I do really appreciate any time you (or anyone) spends reading my posts and helping...
PJ
|||PJFINTRAX wrote: I suppose I could use the fact that the first eight chars all digits... I was hoping to just be able to say.. this is the First Row in the file.. so do this...
You can do this in a script component if you really want to, I'm not sure its necassary though. This should help: http://blogs.conchango.com/jamiethomson/archive/2005/07/27/1877.aspx
PJFINTRAX wrote: I dunno.. think I'm having some trouble getting my head around the SSIS way of thinking... being a bit too procedural in my approach...
Like.. I have in my head a procedural way that i think this should be done.. but can't quite map it to an SSIS package...
Maybe you could look at that with me?
For instance I feel that the steps I'd need to carry out to load this file are:
1. Read first line and create a FileLoaded rec in the FileLoaded table. Get an ID for this FileLoaded rec.
Fairly easy to do as I think we've already covered. Once you insert the record you can get the ID that you just created using an Execute SQL Task.
PJFINTRAX wrote: If there is an error inserting the FileLoaded rec then report error (via email and insert of Error rec to LoadErrors table)
SendMail Task will do the emailing.
OnError precedence constraint or the OnError eventhandler can do the error reporting. Personally I would use the OnError eventhandler because you get more information about the error.
PJFINTRAX wrote: and exit.
OnSuccess/OnError precedence constraints handles this
PJFINTRAX wrote: 2. for next bunch of lines: insert FileLine rec in File Line table. Use FileLoaded ID as foreign key.
FileLoadedID is, I presume, the record you created in the previous step? As stated, you can get hold of this value using an Execute SQL Task.
PJFINTRAX wrote: If there's any error then report error
Already covered I think...
PJFINTRAX wrote: .. rollback all inserts.. also rollback insert of FileLoaded rec...
SSIS supports MSDTC transactions: http://blogs.conchango.com/jamiethomson/archive/2004/12/14/456.aspx
PJFINTRAX wrote: report error (detailing line of file that problem was encountered at)
That will be a bit harder to given that SSIS doesn't natively support row numbering within the pipeline (it would be quite nice if it did). You can create your own numbering as explained here: http://www.sqlis.com/default.aspx?37
PJFINTRAX wrote: 3. When END line is reached, commit transaction.
Transactions again
PJFINTRAX wrote: Report success (via email
SendMail Task
PJFINTRAX wrote: and via insert of record in some Logging table).
Lots of ways of doing this. I would use the OnPostExecute eventhandler!
PJFINTRAX wrote: 4. Copy file to archive.
FileSystem task will do this for you.
PJFINTRAX wrote: I'm kinda stumped as to how to do this, especially how to FIRST insert a parent (FileLoaded) record BEFORE starting to insert child (FileLine) records.
I'm currently working on a blog post about this very thing so look out for that in teh next few days (http://blogs.conchango.com/jamiethomson). Its fairly easy, basically you need 2 data-flows. Insert the FileLoaded record in the first data-flow. Pass the child records to the data-flow using a raw file....and then insert them. Easy-peasy!
PJFINTRAX wrote: The sample on SQLIS just splits into master-rows and detail-rows.. seems to happen simultaneously.. so would we not have a race condition where child recs with the foreign key could get inserted before the parent recs ... and so cause a ref integrity error?
Correct. See my previous comment. Raw files are a big help here. I use raw files all over the place in my implementations - they're incredibly useful.
PJFINTRAX wrote: Also can't see how to start and roll back transactions and exit straight out of a package when some error occurs...
Please bear in mind that I am still learning this SSIS stuff, I have the Wrox book and am working through it.. but, as is always the case, am under pressure to get something done... something which the book just doesn't seem to cover... ie parent-child related data input...
I do really appreciate any time you (or anyone) spends reading my posts and helping...
PJ
Hope this has helped!!!
-Jamie
|||Jamie,
thank you very very much for taking the time to go through my post.
I'll go and read all of those references you have given me and hopefully will be back to report some progress!
Thanks again,
PJ
|||Jamie,
I'm kinda stuck for time on this so have been trying to do a few bits of it... not sure how to use a raw file to pause one dataflow until the other has completed....if you could give a very short description to elighten me a bit further?
But anyway.. I think I have the bit about picking out the first record cracked....
I just pumped the file into a script task that added line numbers to the lines of the file... and then did a conditional split based on the line number... if the line num is 1 then I have one output (for parent rec) and if its not = 1 then I have another ouput (for child records)...
Wish I could paste a picture of the dataflow in here...sigh...
Anyway, the script in the script block transform that adds line numbers is really simple:
Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Public Class ScriptMain
Inherits UserComponent
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Static LineCounter As Integer = 0
LineCounter += 1
Me.NumberedRowsBuffer.AddRow()
Me.NumberedRowsBuffer.LineNum = LineCounter
Me.NumberedRowsBuffer.LineData = Row.Column0
End Sub
End Class
So the input file gets pumped through this scirpt.. two fields come out, one is the whole line as input from the file, and the other is a line number...
This then goes into the conditional split...
when the conditional split sends out a parent rec I wanted to insert a rec in the parent table and get back a parent ID....
But I've come up stumped against using an OLE DB Command to insert the parent record in the parent table .. I can't figure out how to get return values out of a stored proc called via an OLE DB Command....
Is this something I'm supposed to be able to do?
Thnks in advance,
PJ
No comments:
Post a Comment