Showing posts with label records. Show all posts
Showing posts with label records. Show all posts

Monday, March 26, 2012

Historical tables, partitioning or what?

I have about 45000 records in a CSV file, which I am using as HTTP request parameters to query a website and store some results in a database. This is the kind of application which runs 24/7, so database grows really quickly. Every insert fires up a trigger, which has to look for some old records based on some criteria and modify the last inserted record. My client is crazy about performance on this one and suggested to move the old records into another table, which has exactly the same structure, but would serve as a historical table only (used to generate reports, statistics, etc.), whilst the original table would store only the latest rows (so no more than 45k at a given time, whereas the historical table may grow to millions of records). Is this a good idea? Having the performance in mind and the fact that there's that trigger - it has to run as quickly as possible - I might second that idea. Is it good or bad? What do you think?

I read a similar post here, which mentioned SQL Server 2005 partitioning, I might as well try this, although I never used it before.

I think you should use the archive table and partition it with 2005s new capabilities. This will allow you to easily administer the table(s). Heres a good link on 2005 partitioning. http://msdn2.microsoft.com/en-us/library/ms345146.aspx If you use a seperate archive table, you can create good indexes that will really speed queries. The only bad thing is that if you insert new records into it often, it could be slow. So, you might want to only insert new archive records during off-peak time if possible.
Tim|||I would caution to have the absolute minimal indexes on the archive table. In some situations, it may be efficient to create an index just for a report, and then remove the index as soon as the report is complete.|||Thanks for your comments. I read about SQL Server 2005's partitioning feature and find it very interesting (and useful, I think), but the application in question runs on the Express Edition currently with possible switch to a higher version, but doubtfuly the Enterprise one, so partitioning is out of the question, unfortunately. Sad I will hopefuly use it when I will work on some enterprise level application in the future.

Historical Data Problem

We have a database that adds over 100,000 records per day. This goes back to 2002 and I only need historical data for 6 months. Presently we can only can delete 1000 row at a time. Is there a faster way of deleting. We seem to continuely run out of disk space. Urgent!!!!!!!!!!!If you just need a small set of data from a much larger set, you are better off insert those data you need into a new table, drop the old table, and rename the new table to the old table's name.|||After deciding to pull data out of the large table going back 6 months only. I created a statement that pulls each field out and will insert into a temp table. I get this error message.
'Server: Msg 544, Level 16, State 1, Line 1
Cannot insert explicit value for identity column in table 'INCOMING_TEMP' when IDENTITY_INSERT is set to OFF.'
I'm not sure what this means. Is it that I cannot insert into a ID field?|||Check 'SET IDENTITY_INSERT' in the BOL.|||? Check the bol???|||Books... On... Line... B.O.L.|||Due to the forum being down. I did some reserch on the error message and found a solution. I went into the management properties of the new temp_table and made it able to except inserts. The update is running as we speak. It's been running about 20 minutes and that's about right. The date range hold about 1,500,000 record per month. And that's why I'm clearing it out and holding only 6 months of backup historical data. Day 2 of new project.|||What your tranny logs and tempdb....

Are you doing this in batches with transactions?

A permanent table I think would have been better

Historical Data Problem

We have a database that adds over 100,000 records per
day. This goes back to 2002 and I only need historical
data for 6 months. Presently we can only can delete 1000
row at a time. Is there a faster way of deleting. We
seem to continuely run out of disk space.
Urgent!!!!!!!!!!!
Can you tell me why you can only delete 1000 rows at a time?
Jeff Duncan
MCDBA, MCSE+I
"Garry Dawkins" <anonymous@.discussions.microsoft.com> wrote in message
news:798401c4311f$5f08b590$a301280a@.phx.gbl...
> We have a database that adds over 100,000 records per
> day. This goes back to 2002 and I only need historical
> data for 6 months. Presently we can only can delete 1000
> row at a time. Is there a faster way of deleting. We
> seem to continuely run out of disk space.
> Urgent!!!!!!!!!!!
|||I also don't understand what you mean by "we can only delete 1000 row at a
time"... what happens when you try to delete 1001?
I recommend halting the inserts briefly, copy the last 6 months of data into
a new table (e.g. SELECT INTO), reset any indexes/primary keys, drop the old
table, rename the new table.
Then, set up a job that runs daily (or maybe more often) that deletes any
data more than 6 months old (184 days is probably the safest to account for
31-day months). This way your window will always be roughly six months, and
you won't be tripping over yourself trying to trim it down. Automate it as
much as possible, it's less taxing on both the system and your hair.
However, if you are running out of disk space now, what happens when your
volume increases and you are adding 200k, 500k, 1m rows per day?
Aaron Bertrand
SQL Server MVP
http://www.aspfaq.com/
"Garry Dawkins" <anonymous@.discussions.microsoft.com> wrote in message
news:798401c4311f$5f08b590$a301280a@.phx.gbl...
> We have a database that adds over 100,000 records per
> day. This goes back to 2002 and I only need historical
> data for 6 months. Presently we can only can delete 1000
> row at a time. Is there a faster way of deleting. We
> seem to continuely run out of disk space.
> Urgent!!!!!!!!!!!
|||Because we at present, time out error occur when
attempting to delete more than 1000.
>--Original Message--
>Can you tell me why you can only delete 1000 rows at a
time?
>--
>Jeff Duncan
>MCDBA, MCSE+I
>"Garry Dawkins" <anonymous@.discussions.microsoft.com>
wrote in message[vbcol=seagreen]
>news:798401c4311f$5f08b590$a301280a@.phx.gbl...
1000
>
>.
>
|||I agree with Aaron. Also what criteria is in your delete command. Is the
value your deleting against indexed.
Jeff Duncan
MCDBA, MCSE+I
"Aaron Bertrand - MVP" <aaron@.TRASHaspfaq.com> wrote in message
news:%238mJPFSMEHA.628@.TK2MSFTNGP11.phx.gbl...
> I also don't understand what you mean by "we can only delete 1000 row at a
> time"... what happens when you try to delete 1001?
> I recommend halting the inserts briefly, copy the last 6 months of data
into
> a new table (e.g. SELECT INTO), reset any indexes/primary keys, drop the
old
> table, rename the new table.
> Then, set up a job that runs daily (or maybe more often) that deletes any
> data more than 6 months old (184 days is probably the safest to account
for
> 31-day months). This way your window will always be roughly six months,
and
> you won't be tripping over yourself trying to trim it down. Automate it
as
> much as possible, it's less taxing on both the system and your hair.
> However, if you are running out of disk space now, what happens when your
> volume increases and you are adding 200k, 500k, 1m rows per day?
> --
> Aaron Bertrand
> SQL Server MVP
> http://www.aspfaq.com/
>
>
> "Garry Dawkins" <anonymous@.discussions.microsoft.com> wrote in message
> news:798401c4311f$5f08b590$a301280a@.phx.gbl...
>

Historical Data Problem

We have a database that adds over 100,000 records per
day. This goes back to 2002 and I only need historical
data for 6 months. Presently we can only can delete 1000
row at a time. Is there a faster way of deleting. We
seem to continuely run out of disk space.
Urgent!!!!!!!!!!!Can you tell me why you can only delete 1000 rows at a time?
--
Jeff Duncan
MCDBA, MCSE+I
"Garry Dawkins" <anonymous@.discussions.microsoft.com> wrote in message
news:798401c4311f$5f08b590$a301280a@.phx.gbl...
> We have a database that adds over 100,000 records per
> day. This goes back to 2002 and I only need historical
> data for 6 months. Presently we can only can delete 1000
> row at a time. Is there a faster way of deleting. We
> seem to continuely run out of disk space.
> Urgent!!!!!!!!!!!|||I also don't understand what you mean by "we can only delete 1000 row at a
time"... what happens when you try to delete 1001?
I recommend halting the inserts briefly, copy the last 6 months of data into
a new table (e.g. SELECT INTO), reset any indexes/primary keys, drop the old
table, rename the new table.
Then, set up a job that runs daily (or maybe more often) that deletes any
data more than 6 months old (184 days is probably the safest to account for
31-day months). This way your window will always be roughly six months, and
you won't be tripping over yourself trying to trim it down. Automate it as
much as possible, it's less taxing on both the system and your hair.
However, if you are running out of disk space now, what happens when your
volume increases and you are adding 200k, 500k, 1m rows per day?
--
Aaron Bertrand
SQL Server MVP
http://www.aspfaq.com/
"Garry Dawkins" <anonymous@.discussions.microsoft.com> wrote in message
news:798401c4311f$5f08b590$a301280a@.phx.gbl...
> We have a database that adds over 100,000 records per
> day. This goes back to 2002 and I only need historical
> data for 6 months. Presently we can only can delete 1000
> row at a time. Is there a faster way of deleting. We
> seem to continuely run out of disk space.
> Urgent!!!!!!!!!!!|||Because we at present, time out error occur when
attempting to delete more than 1000.
>--Original Message--
>Can you tell me why you can only delete 1000 rows at a
time?
>--
>Jeff Duncan
>MCDBA, MCSE+I
>"Garry Dawkins" <anonymous@.discussions.microsoft.com>
wrote in message
>news:798401c4311f$5f08b590$a301280a@.phx.gbl...
>> We have a database that adds over 100,000 records per
>> day. This goes back to 2002 and I only need historical
>> data for 6 months. Presently we can only can delete
1000
>> row at a time. Is there a faster way of deleting. We
>> seem to continuely run out of disk space.
>> Urgent!!!!!!!!!!!
>
>.
>|||I agree with Aaron. Also what criteria is in your delete command. Is the
value your deleting against indexed.
--
Jeff Duncan
MCDBA, MCSE+I
"Aaron Bertrand - MVP" <aaron@.TRASHaspfaq.com> wrote in message
news:%238mJPFSMEHA.628@.TK2MSFTNGP11.phx.gbl...
> I also don't understand what you mean by "we can only delete 1000 row at a
> time"... what happens when you try to delete 1001?
> I recommend halting the inserts briefly, copy the last 6 months of data
into
> a new table (e.g. SELECT INTO), reset any indexes/primary keys, drop the
old
> table, rename the new table.
> Then, set up a job that runs daily (or maybe more often) that deletes any
> data more than 6 months old (184 days is probably the safest to account
for
> 31-day months). This way your window will always be roughly six months,
and
> you won't be tripping over yourself trying to trim it down. Automate it
as
> much as possible, it's less taxing on both the system and your hair.
> However, if you are running out of disk space now, what happens when your
> volume increases and you are adding 200k, 500k, 1m rows per day?
> --
> Aaron Bertrand
> SQL Server MVP
> http://www.aspfaq.com/
>
>
> "Garry Dawkins" <anonymous@.discussions.microsoft.com> wrote in message
> news:798401c4311f$5f08b590$a301280a@.phx.gbl...
> > We have a database that adds over 100,000 records per
> > day. This goes back to 2002 and I only need historical
> > data for 6 months. Presently we can only can delete 1000
> > row at a time. Is there a faster way of deleting. We
> > seem to continuely run out of disk space.
> > Urgent!!!!!!!!!!!
>

Historical Data Problem

We have a database that adds over 100,000 records per
day. This goes back to 2002 and I only need historical
data for 6 months. Presently we can only can delete 1000
row at a time. Is there a faster way of deleting. We
seem to continuely run out of disk space.
Urgent!!!!!!!!!!!Can you tell me why you can only delete 1000 rows at a time?
Jeff Duncan
MCDBA, MCSE+I
"Garry Dawkins" <anonymous@.discussions.microsoft.com> wrote in message
news:798401c4311f$5f08b590$a301280a@.phx.gbl...
> We have a database that adds over 100,000 records per
> day. This goes back to 2002 and I only need historical
> data for 6 months. Presently we can only can delete 1000
> row at a time. Is there a faster way of deleting. We
> seem to continuely run out of disk space.
> Urgent!!!!!!!!!!!|||I also don't understand what you mean by "we can only delete 1000 row at a
time"... what happens when you try to delete 1001?
I recommend halting the inserts briefly, copy the last 6 months of data into
a new table (e.g. SELECT INTO), reset any indexes/primary keys, drop the old
table, rename the new table.
Then, set up a job that runs daily (or maybe more often) that deletes any
data more than 6 months old (184 days is probably the safest to account for
31-day months). This way your window will always be roughly six months, and
you won't be tripping over yourself trying to trim it down. Automate it as
much as possible, it's less taxing on both the system and your hair.
However, if you are running out of disk space now, what happens when your
volume increases and you are adding 200k, 500k, 1m rows per day?
Aaron Bertrand
SQL Server MVP
http://www.aspfaq.com/
"Garry Dawkins" <anonymous@.discussions.microsoft.com> wrote in message
news:798401c4311f$5f08b590$a301280a@.phx.gbl...
> We have a database that adds over 100,000 records per
> day. This goes back to 2002 and I only need historical
> data for 6 months. Presently we can only can delete 1000
> row at a time. Is there a faster way of deleting. We
> seem to continuely run out of disk space.
> Urgent!!!!!!!!!!!|||Because we at present, time out error occur when
attempting to delete more than 1000.
>--Original Message--
>Can you tell me why you can only delete 1000 rows at a
time?
>--
>Jeff Duncan
>MCDBA, MCSE+I
>"Garry Dawkins" <anonymous@.discussions.microsoft.com>
wrote in message
>news:798401c4311f$5f08b590$a301280a@.phx.gbl...
1000[vbcol=seagreen]
>
>.
>|||I agree with Aaron. Also what criteria is in your delete command. Is the
value your deleting against indexed.
Jeff Duncan
MCDBA, MCSE+I
"Aaron Bertrand - MVP" <aaron@.TRASHaspfaq.com> wrote in message
news:%238mJPFSMEHA.628@.TK2MSFTNGP11.phx.gbl...
> I also don't understand what you mean by "we can only delete 1000 row at a
> time"... what happens when you try to delete 1001?
> I recommend halting the inserts briefly, copy the last 6 months of data
into
> a new table (e.g. SELECT INTO), reset any indexes/primary keys, drop the
old
> table, rename the new table.
> Then, set up a job that runs daily (or maybe more often) that deletes any
> data more than 6 months old (184 days is probably the safest to account
for
> 31-day months). This way your window will always be roughly six months,
and
> you won't be tripping over yourself trying to trim it down. Automate it
as
> much as possible, it's less taxing on both the system and your hair.
> However, if you are running out of disk space now, what happens when your
> volume increases and you are adding 200k, 500k, 1m rows per day?
> --
> Aaron Bertrand
> SQL Server MVP
> http://www.aspfaq.com/
>
>
> "Garry Dawkins" <anonymous@.discussions.microsoft.com> wrote in message
> news:798401c4311f$5f08b590$a301280a@.phx.gbl...
>sql

historical data

A general data design question:

We have data which changes every week. We had considered seperating
historical records and current records into two different tables with
the same columns, but thought it might be simpler to have them all
together in one table and just add a WeekID int column to indicate
which week it represents (and perhaps an isCurrent bit column to make
querying easier). We have a number of tables like this, holding weekly
data, and we'll have to query for historical data often, but only back
through the last year -- we have historical data going back to 1998 or
so which we'll rarely if ever look at.

Is the all-in-one-table approach better or the seperation of current
and historical data? Will there be a performance hit to organizing data
this way? I don't think the extra columns will make querying too much
more awkward, but is there anything I'm overlooking in this?

Thanks.rottytooth (rottytooth@.gmail.com) writes:
> A general data design question:
> We have data which changes every week. We had considered seperating
> historical records and current records into two different tables with
> the same columns, but thought it might be simpler to have them all
> together in one table and just add a WeekID int column to indicate
> which week it represents (and perhaps an isCurrent bit column to make
> querying easier). We have a number of tables like this, holding weekly
> data, and we'll have to query for historical data often, but only back
> through the last year -- we have historical data going back to 1998 or
> so which we'll rarely if ever look at.
> Is the all-in-one-table approach better or the seperation of current
> and historical data? Will there be a performance hit to organizing data
> this way? I don't think the extra columns will make querying too much
> more awkward, but is there anything I'm overlooking in this?

One thing you could consider is partitioned views, with one table
per year. (So each year you add a new table and change the view.)

A partioned view is constructed as:

SELECT col1, col2, ...
FROM tbl1
UNION ALL
SELECT col1, col2,
FROM tbl2
UNION ALL
...

In the tables there needs to be a check constraint on a column which is
part of the PK, and the constraint should define the range for that
table. Ranges must not overlap.

The point is that when you query the view, SQL Server will only access
the tables that are covered by the condition on the partioning column.

Please see Books Online for further details.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp

Friday, March 23, 2012

High volume batch insert

Hi,
I need a SQL program to insert a record into a parent table and then up
to 200 child records, using the primary key of the parent record. Whats
the best way to structure this procedure for performance and
reliability ? e.g
INSERT INTO tblExam ( dteExam,intModule,) Values () ...
SET intExamID= @.@.IDENTITY
' Psedo-code
For each Candidate in frmExamRequest
INSERT INTO tblCandidateExam ( intExamID, intCandidateID )
Next
The data is all coming from a form on an ASP page, I could break it
into two forms but for the users a single form with exam date, and list
of candidates they can check/uncheck is much easier.
If I have 2 stored procs to insert a exam instance & candidate
instances, the latter proc will be called 200 times over an ADODB
Connection. Is it possible to have a single stored proc to do the whole
task in a transaction with unknown number of candidates ?
thanks.When the user hits "Enter" you could either collect all the candidate IDs an
d
put them in a comma delimited string and then pass that string to the stored
proc that's doing the insert.
You could perform inserts into tblCandidateExam by looping through the
candidate IDs.
I think that's probably the easy way. Hope that's helpful
Cheers,
mmm
"cc900630@.ntu.ac.uk" wrote:

> Hi,
> I need a SQL program to insert a record into a parent table and then up
> to 200 child records, using the primary key of the parent record. Whats
> the best way to structure this procedure for performance and
> reliability ? e.g
>
> INSERT INTO tblExam ( dteExam,intModule,) Values () ...
> SET intExamID= @.@.IDENTITY
> ' Psedo-code
> For each Candidate in frmExamRequest
> INSERT INTO tblCandidateExam ( intExamID, intCandidateID )
> Next
> The data is all coming from a form on an ASP page, I could break it
> into two forms but for the users a single form with exam date, and list
> of candidates they can check/uncheck is much easier.
> If I have 2 stored procs to insert a exam instance & candidate
> instances, the latter proc will be called 200 times over an ADODB
> Connection. Is it possible to have a single stored proc to do the whole
> task in a transaction with unknown number of candidates ?
> thanks.
>|||See if one of these helps:
http://www.aspfaq.com/show.asp?id=2248 Arrays & Lists
http://www.sommarskog.se/dynamic_sql.html Dynamic SQL
http://www.users.drew.edu/skass/sql...bleProc.sql.txt List to
Table script
Andrew J. Kelly SQL MVP
<cc900630@.ntu.ac.uk> wrote in message
news:1117472922.543985.50420@.g14g2000cwa.googlegroups.com...
> Hi,
> I need a SQL program to insert a record into a parent table and then up
> to 200 child records, using the primary key of the parent record. Whats
> the best way to structure this procedure for performance and
> reliability ? e.g
>
> INSERT INTO tblExam ( dteExam,intModule,) Values () ...
> SET intExamID= @.@.IDENTITY
> ' Psedo-code
> For each Candidate in frmExamRequest
> INSERT INTO tblCandidateExam ( intExamID, intCandidateID )
> Next
> The data is all coming from a form on an ASP page, I could break it
> into two forms but for the users a single form with exam date, and list
> of candidates they can check/uncheck is much easier.
> If I have 2 stored procs to insert a exam instance & candidate
> instances, the latter proc will be called 200 times over an ADODB
> Connection. Is it possible to have a single stored proc to do the whole
> task in a transaction with unknown number of candidates ?
> thanks.
>

Wednesday, March 7, 2012

Hierarchical XML import?

Hello,

Can anyone point me at a tutorial or sample that shows how to use IS for importing an xml file containing hierarchically arranged records ?

I have a file which contains multiple orders , the orders contain multiple line items.. the file also contains an element with details of the file source etc...

So, I want to make an insert in the FileLog table an then make inserts into the orders table .. then make inserts into the OrderItems table which will have the foreign key from the orders table in the records...

if you get what I mean...

But I have searched hign and low and can't see any info on how to load anything but a very flat xml file structure...

Thanks

Vida.

The XML Source will take a hierarchical XML feed and crack it, giving multiple outputs, one for each level. Have a play.

To handle the FKs you need to enforce order of inserts which cannot be done in a single Data Flow task, so stage the data and use several tasks, one for each successive table with the FK. Raw Files would be the ideal staging area for this, fast and efficient.

|||

Darren,

Am I right in thinkin that SQLIS is just not designed to load anything but a flat file containing a simple set of same-format rows?

I just can't seem to find very much info on loading master-detail records , whether in XML or in any ordinary flat file ....

Now I've been given a task of loading a simple text file... first row is a bit of header info.. then the next ten rows are data rows in a simple csv format... last row is a checksum.

20060518,SUPPLIER1,,
EUR,AUD,1.6869,
GBP,AUD,2.4896,
USD,AUD,0.7568,
DKK,AUD,4.4581,
SEK,AUD,5.6246,
CHF,AUD,0.9246,
JPY,AUD,84.5870,
NZD,AUD,1.2293,
NOK,AUD,4.6809,
SGD,AUD,1.2060,
HKD,AUD,5.9217,
AUD,AUD,1.0000,
CAD,AUD,0.8515,
END,2454534,,

And I've got to insert a row in a Header table.. containing the header info.. the data rows have to be individual record inserts into a Data table... and they need to be foreign keyed with the ID of the record in the Header table...

I am looking at the example in the SQLIS website http://www.sqlis.com/default.aspx?54 which shows how to handle different row types... problem there is that my file does not have a row-type field. Its just: first row is header.. the rest are detail.. and the last is checksum....

So I'm just trying to figure out a strategy to do this... what approach should I take ?

BTW: I'm a newbie at SSIS... am reading the Wrox book... but its slow going...

Thanks

|||

Multi-format rows can be handled in the same data-flow.When you import them you'll have to interpret the file as simply a file with one long text column in it and then, after you've split the data (see below) parse the proper columns out.


Regarding splitting the data into header and detail records...

If there is a characteristic of each row that determines what type it is then the Conditional Split transform will split it out into the different sections. From the snippet you have posted above it looks like that shouldn't be too difficult (i.e. Are the first 8 characters all digits? If so, its a header record.)

-Jamie

|||

Thanks for your advice Jamie..

I suppose I could use the fact that the first eight chars all digits... I was hoping to just be able to say.. this is the First Row in the file.. so do this...

I dunno.. think I'm having some trouble getting my head around the SSIS way of thinking... being a bit too procedural in my approach...

Like.. I have in my head a procedural way that i think this should be done.. but can't quite map it to an SSIS package...

Maybe you could look at that with me?

For instance I feel that the steps I'd need to carry out to load this file are:

1. Read first line and create a FileLoaded rec in the FileLoaded table. Get an ID for this FileLoaded rec. If there is an error inserting the FileLoaded rec then report error (via email and insert of Error rec to LoadErrors table) and exit.

2. for next bunch of lines: insert FileLine rec in File Line table. Use FileLoaded ID as foreign key. If there's any error then report error.. rollback all inserts.. also rollback insert of FileLoaded rec... report error (detailing line of file that problem was encountered at)

3. When END line is reached, commit transaction. Report success (via email and via insert of record in some Logging table).

4. Copy file to archive.

I'm kinda stumped as to how to do this, especially how to FIRST insert a parent (FileLoaded) record BEFORE starting to insert child (FileLine) records. The sample on SQLIS just splits into master-rows and detail-rows.. seems to happen simultaneously.. so would we not have a race condition where child recs with the foreign key could get inserted before the parent recs ... and so cause a ref integrity error?

Also can't see how to start and roll back transactions and exit straight out of a package when some error occurs...

Please bear in mind that I am still learning this SSIS stuff, I have the Wrox book and am working through it.. but, as is always the case, am under pressure to get something done... something which the book just doesn't seem to cover... ie parent-child related data input...

I do really appreciate any time you (or anyone) spends reading my posts and helping...

PJ

|||

PJFINTRAX wrote:

I suppose I could use the fact that the first eight chars all digits... I was hoping to just be able to say.. this is the First Row in the file.. so do this...

You can do this in a script component if you really want to, I'm not sure its necassary though. This should help: http://blogs.conchango.com/jamiethomson/archive/2005/07/27/1877.aspx

PJFINTRAX wrote:

I dunno.. think I'm having some trouble getting my head around the SSIS way of thinking... being a bit too procedural in my approach...

Like.. I have in my head a procedural way that i think this should be done.. but can't quite map it to an SSIS package...

Maybe you could look at that with me?

For instance I feel that the steps I'd need to carry out to load this file are:

1. Read first line and create a FileLoaded rec in the FileLoaded table. Get an ID for this FileLoaded rec.

Fairly easy to do as I think we've already covered. Once you insert the record you can get the ID that you just created using an Execute SQL Task.

PJFINTRAX wrote:

If there is an error inserting the FileLoaded rec then report error (via email and insert of Error rec to LoadErrors table)

SendMail Task will do the emailing.

OnError precedence constraint or the OnError eventhandler can do the error reporting. Personally I would use the OnError eventhandler because you get more information about the error.

PJFINTRAX wrote:

and exit.

OnSuccess/OnError precedence constraints handles this

PJFINTRAX wrote:

2. for next bunch of lines: insert FileLine rec in File Line table. Use FileLoaded ID as foreign key.

FileLoadedID is, I presume, the record you created in the previous step? As stated, you can get hold of this value using an Execute SQL Task.

PJFINTRAX wrote:

If there's any error then report error

Already covered I think...

PJFINTRAX wrote:

.. rollback all inserts.. also rollback insert of FileLoaded rec...

SSIS supports MSDTC transactions: http://blogs.conchango.com/jamiethomson/archive/2004/12/14/456.aspx

PJFINTRAX wrote:

report error (detailing line of file that problem was encountered at)

That will be a bit harder to given that SSIS doesn't natively support row numbering within the pipeline (it would be quite nice if it did). You can create your own numbering as explained here: http://www.sqlis.com/default.aspx?37

PJFINTRAX wrote:

3. When END line is reached, commit transaction.

Transactions again

PJFINTRAX wrote:

Report success (via email

SendMail Task

PJFINTRAX wrote:

and via insert of record in some Logging table).

Lots of ways of doing this. I would use the OnPostExecute eventhandler!

PJFINTRAX wrote:

4. Copy file to archive.

FileSystem task will do this for you.

PJFINTRAX wrote:

I'm kinda stumped as to how to do this, especially how to FIRST insert a parent (FileLoaded) record BEFORE starting to insert child (FileLine) records.

I'm currently working on a blog post about this very thing so look out for that in teh next few days (http://blogs.conchango.com/jamiethomson). Its fairly easy, basically you need 2 data-flows. Insert the FileLoaded record in the first data-flow. Pass the child records to the data-flow using a raw file....and then insert them. Easy-peasy!

PJFINTRAX wrote:

The sample on SQLIS just splits into master-rows and detail-rows.. seems to happen simultaneously.. so would we not have a race condition where child recs with the foreign key could get inserted before the parent recs ... and so cause a ref integrity error?

Correct. See my previous comment. Raw files are a big help here. I use raw files all over the place in my implementations - they're incredibly useful.

PJFINTRAX wrote:

Also can't see how to start and roll back transactions and exit straight out of a package when some error occurs...

Please bear in mind that I am still learning this SSIS stuff, I have the Wrox book and am working through it.. but, as is always the case, am under pressure to get something done... something which the book just doesn't seem to cover... ie parent-child related data input...

I do really appreciate any time you (or anyone) spends reading my posts and helping...

PJ

Hope this has helped!!!

-Jamie

|||

Jamie,

thank you very very much for taking the time to go through my post.

I'll go and read all of those references you have given me and hopefully will be back to report some progress!

Thanks again,

PJ

|||

Jamie,
I'm kinda stuck for time on this so have been trying to do a few bits of it... not sure how to use a raw file to pause one dataflow until the other has completed....if you could give a very short description to elighten me a bit further?

But anyway.. I think I have the bit about picking out the first record cracked....

I just pumped the file into a script task that added line numbers to the lines of the file... and then did a conditional split based on the line number... if the line num is 1 then I have one output (for parent rec) and if its not = 1 then I have another ouput (for child records)...

Wish I could paste a picture of the dataflow in here...sigh...

Anyway, the script in the script block transform that adds line numbers is really simple:

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

Public Class ScriptMain
Inherits UserComponent

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Static LineCounter As Integer = 0
LineCounter += 1
Me.NumberedRowsBuffer.AddRow()
Me.NumberedRowsBuffer.LineNum = LineCounter
Me.NumberedRowsBuffer.LineData = Row.Column0
End Sub

End Class

So the input file gets pumped through this scirpt.. two fields come out, one is the whole line as input from the file, and the other is a line number...

This then goes into the conditional split...

when the conditional split sends out a parent rec I wanted to insert a rec in the parent table and get back a parent ID....

But I've come up stumped against using an OLE DB Command to insert the parent record in the parent table .. I can't figure out how to get return values out of a stored proc called via an OLE DB Command....

Is this something I'm supposed to be able to do?

Thnks in advance,

PJ

Hierarchical XML import?

Hello,

Can anyone point me at a tutorial or sample that shows how to use IS for importing an xml file containing hierarchically arranged records ?

I have a file which contains multiple orders , the orders contain multiple line items.. the file also contains an element with details of the file source etc...

So, I want to make an insert in the FileLog table an then make inserts into the orders table .. then make inserts into the OrderItems table which will have the foreign key from the orders table in the records...

if you get what I mean...

But I have searched hign and low and can't see any info on how to load anything but a very flat xml file structure...

Thanks

Vida.

The XML Source will take a hierarchical XML feed and crack it, giving multiple outputs, one for each level. Have a play.

To handle the FKs you need to enforce order of inserts which cannot be done in a single Data Flow task, so stage the data and use several tasks, one for each successive table with the FK. Raw Files would be the ideal staging area for this, fast and efficient.

|||

Darren,

Am I right in thinkin that SQLIS is just not designed to load anything but a flat file containing a simple set of same-format rows?

I just can't seem to find very much info on loading master-detail records , whether in XML or in any ordinary flat file ....

Now I've been given a task of loading a simple text file... first row is a bit of header info.. then the next ten rows are data rows in a simple csv format... last row is a checksum.

20060518,SUPPLIER1,,
EUR,AUD,1.6869,
GBP,AUD,2.4896,
USD,AUD,0.7568,
DKK,AUD,4.4581,
SEK,AUD,5.6246,
CHF,AUD,0.9246,
JPY,AUD,84.5870,
NZD,AUD,1.2293,
NOK,AUD,4.6809,
SGD,AUD,1.2060,
HKD,AUD,5.9217,
AUD,AUD,1.0000,
CAD,AUD,0.8515,
END,2454534,,

And I've got to insert a row in a Header table.. containing the header info.. the data rows have to be individual record inserts into a Data table... and they need to be foreign keyed with the ID of the record in the Header table...

I am looking at the example in the SQLIS website http://www.sqlis.com/default.aspx?54 which shows how to handle different row types... problem there is that my file does not have a row-type field. Its just: first row is header.. the rest are detail.. and the last is checksum....

So I'm just trying to figure out a strategy to do this... what approach should I take ?

BTW: I'm a newbie at SSIS... am reading the Wrox book... but its slow going...

Thanks

|||

Multi-format rows can be handled in the same data-flow.When you import them you'll have to interpret the file as simply a file with one long text column in it and then, after you've split the data (see below) parse the proper columns out.


Regarding splitting the data into header and detail records...

If there is a characteristic of each row that determines what type it is then the Conditional Split transform will split it out into the different sections. From the snippet you have posted above it looks like that shouldn't be too difficult (i.e. Are the first 8 characters all digits? If so, its a header record.)

-Jamie

|||

Thanks for your advice Jamie..

I suppose I could use the fact that the first eight chars all digits... I was hoping to just be able to say.. this is the First Row in the file.. so do this...

I dunno.. think I'm having some trouble getting my head around the SSIS way of thinking... being a bit too procedural in my approach...

Like.. I have in my head a procedural way that i think this should be done.. but can't quite map it to an SSIS package...

Maybe you could look at that with me?

For instance I feel that the steps I'd need to carry out to load this file are:

1. Read first line and create a FileLoaded rec in the FileLoaded table. Get an ID for this FileLoaded rec. If there is an error inserting the FileLoaded rec then report error (via email and insert of Error rec to LoadErrors table) and exit.

2. for next bunch of lines: insert FileLine rec in File Line table. Use FileLoaded ID as foreign key. If there's any error then report error.. rollback all inserts.. also rollback insert of FileLoaded rec... report error (detailing line of file that problem was encountered at)

3. When END line is reached, commit transaction. Report success (via email and via insert of record in some Logging table).

4. Copy file to archive.

I'm kinda stumped as to how to do this, especially how to FIRST insert a parent (FileLoaded) record BEFORE starting to insert child (FileLine) records. The sample on SQLIS just splits into master-rows and detail-rows.. seems to happen simultaneously.. so would we not have a race condition where child recs with the foreign key could get inserted before the parent recs ... and so cause a ref integrity error?

Also can't see how to start and roll back transactions and exit straight out of a package when some error occurs...

Please bear in mind that I am still learning this SSIS stuff, I have the Wrox book and am working through it.. but, as is always the case, am under pressure to get something done... something which the book just doesn't seem to cover... ie parent-child related data input...

I do really appreciate any time you (or anyone) spends reading my posts and helping...

PJ

|||

PJFINTRAX wrote:

I suppose I could use the fact that the first eight chars all digits... I was hoping to just be able to say.. this is the First Row in the file.. so do this...

You can do this in a script component if you really want to, I'm not sure its necassary though. This should help: http://blogs.conchango.com/jamiethomson/archive/2005/07/27/1877.aspx

PJFINTRAX wrote:

I dunno.. think I'm having some trouble getting my head around the SSIS way of thinking... being a bit too procedural in my approach...

Like.. I have in my head a procedural way that i think this should be done.. but can't quite map it to an SSIS package...

Maybe you could look at that with me?

For instance I feel that the steps I'd need to carry out to load this file are:

1. Read first line and create a FileLoaded rec in the FileLoaded table. Get an ID for this FileLoaded rec.

Fairly easy to do as I think we've already covered. Once you insert the record you can get the ID that you just created using an Execute SQL Task.

PJFINTRAX wrote:

If there is an error inserting the FileLoaded rec then report error (via email and insert of Error rec to LoadErrors table)

SendMail Task will do the emailing.

OnError precedence constraint or the OnError eventhandler can do the error reporting. Personally I would use the OnError eventhandler because you get more information about the error.

PJFINTRAX wrote:

and exit.

OnSuccess/OnError precedence constraints handles this

PJFINTRAX wrote:

2. for next bunch of lines: insert FileLine rec in File Line table. Use FileLoaded ID as foreign key.

FileLoadedID is, I presume, the record you created in the previous step? As stated, you can get hold of this value using an Execute SQL Task.

PJFINTRAX wrote:

If there's any error then report error

Already covered I think...

PJFINTRAX wrote:

.. rollback all inserts.. also rollback insert of FileLoaded rec...

SSIS supports MSDTC transactions: http://blogs.conchango.com/jamiethomson/archive/2004/12/14/456.aspx

PJFINTRAX wrote:

report error (detailing line of file that problem was encountered at)

That will be a bit harder to given that SSIS doesn't natively support row numbering within the pipeline (it would be quite nice if it did). You can create your own numbering as explained here: http://www.sqlis.com/default.aspx?37

PJFINTRAX wrote:

3. When END line is reached, commit transaction.

Transactions again

PJFINTRAX wrote:

Report success (via email

SendMail Task

PJFINTRAX wrote:

and via insert of record in some Logging table).

Lots of ways of doing this. I would use the OnPostExecute eventhandler!

PJFINTRAX wrote:

4. Copy file to archive.

FileSystem task will do this for you.

PJFINTRAX wrote:

I'm kinda stumped as to how to do this, especially how to FIRST insert a parent (FileLoaded) record BEFORE starting to insert child (FileLine) records.

I'm currently working on a blog post about this very thing so look out for that in teh next few days (http://blogs.conchango.com/jamiethomson). Its fairly easy, basically you need 2 data-flows. Insert the FileLoaded record in the first data-flow. Pass the child records to the data-flow using a raw file....and then insert them. Easy-peasy!

PJFINTRAX wrote:

The sample on SQLIS just splits into master-rows and detail-rows.. seems to happen simultaneously.. so would we not have a race condition where child recs with the foreign key could get inserted before the parent recs ... and so cause a ref integrity error?

Correct. See my previous comment. Raw files are a big help here. I use raw files all over the place in my implementations - they're incredibly useful.

PJFINTRAX wrote:

Also can't see how to start and roll back transactions and exit straight out of a package when some error occurs...

Please bear in mind that I am still learning this SSIS stuff, I have the Wrox book and am working through it.. but, as is always the case, am under pressure to get something done... something which the book just doesn't seem to cover... ie parent-child related data input...

I do really appreciate any time you (or anyone) spends reading my posts and helping...

PJ

Hope this has helped!!!

-Jamie

|||

Jamie,

thank you very very much for taking the time to go through my post.

I'll go and read all of those references you have given me and hopefully will be back to report some progress!

Thanks again,

PJ

|||

Jamie,
I'm kinda stuck for time on this so have been trying to do a few bits of it... not sure how to use a raw file to pause one dataflow until the other has completed....if you could give a very short description to elighten me a bit further?

But anyway.. I think I have the bit about picking out the first record cracked....

I just pumped the file into a script task that added line numbers to the lines of the file... and then did a conditional split based on the line number... if the line num is 1 then I have one output (for parent rec) and if its not = 1 then I have another ouput (for child records)...

Wish I could paste a picture of the dataflow in here...sigh...

Anyway, the script in the script block transform that adds line numbers is really simple:

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

Public Class ScriptMain
Inherits UserComponent

Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Static LineCounter As Integer = 0
LineCounter += 1
Me.NumberedRowsBuffer.AddRow()
Me.NumberedRowsBuffer.LineNum = LineCounter
Me.NumberedRowsBuffer.LineData = Row.Column0
End Sub

End Class

So the input file gets pumped through this scirpt.. two fields come out, one is the whole line as input from the file, and the other is a line number...

This then goes into the conditional split...

when the conditional split sends out a parent rec I wanted to insert a rec in the parent table and get back a parent ID....

But I've come up stumped against using an OLE DB Command to insert the parent record in the parent table .. I can't figure out how to get return values out of a stored proc called via an OLE DB Command....

Is this something I'm supposed to be able to do?

Thnks in advance,

PJ

Friday, February 24, 2012

Hiding Duplicate Records

Is it possible to output records in a dataset while not showing the
duplicates. The dataset that I am using returns records like the following.
title1; problem1; associateddata1
title1; problem1; associateddata2
title2; problem2; associateddata1
Even though first two rows do not look like duplicates, since they have the
same title, I only want to show title1 and problem1 once, then show
associateddata1 & 2 at the bottom of that group. Then go to another page and
show title2, problem2, and associateddata1. I tried using a table and various
grouping layouts with no success. Can someone help with this?Yes...
Right click on one of the Table cells in the layout and Click on "Dont
show Duplicates"|||Sorry...I should have mentioned that I tried that. Is it not working because
of the associatedata part that I said keeps it from looking like an exact
duplicate?
"sorcerdon@.gmail.com" wrote:
> Yes...
> Right click on one of the Table cells in the layout and Click on "Dont
> show Duplicates"
>|||Alright I just made a simple Table in AdventureSoft looking like this:
title1 problem1 data1
title1 problem1 data2
title2 problem2 data1
title2 problem2 data2
Do you want the following results:
title1 problem1 data1
data2
title2 problem2 data1
data2
Is this what you want?|||if this is what you want then you need to make 1 group. Group by the
title field.
Then rick click on the =Fields!title.Value field and the
=Fields!problem.Value field and say no duplicates.
Doing these together shows the desired results you see above this post.|||That is exactly what I was looking for. Thanks.
"sorcerdon@.gmail.com" wrote:
> if this is what you want then you need to make 1 group. Group by the
> title field.
> Then rick click on the =Fields!title.Value field and the
> =Fields!problem.Value field and say no duplicates.
> Doing these together shows the desired results you see above this post.
>|||Glad it helped.
Regards,
Sorcerdon|||I hope you are getting a reply on this. I tried you solution and at first
thought it worked, but later remembered that I had changed my query. So I
only thought it was working. I'm going to try attaching my query and showing
you exactly what is coming back and how I'm displaying it on a report, to see
if this helps. Course it seemed like you outlined it the same as I'm about
to, but just in case it helps. I appreciate any help you can provide. I
apologize in advance for this being so long...just want to make sure I don't
miss anything.
I have two records in the Problems table. Of the two records in the problems
table, one of the records has two associated data elements and the second
record has none attached. So the resultset would be similar to the following
(with a few rows left off for easy viewing):
Problem Table:
prob1ID; found big problem title 1; location Virginia; mike collins
prob2ID; found big problem title 2; location Maryland; mike collins
Associated Data:
Prob1ID; Data Item 1
Prob1ID; Data Item 2
Below query would return it as:
prob1ID; found big problem title 1; location Virginia; mike collins; data
item 1
prob1ID; found big problem title 1; location Virginia; mike collins; data
item 1
prob1ID; found big problem title 2; location Maryland; mike collins; data
item 2
Select t.ProblemID, t.Title, l.LocationName,
p1.FirstName + ' ' + IsNull(p1.MiddleName + ' ', '') + p1.LastName As
Originator,
p2.FirstName + ' ' + IsNull(p2.MiddleName + ' ', '') + p2.LastName As FixedBy,
p3.FirstName + ' ' + IsNull(p3.MiddleName + ' ', '') + p3.LastName As
Screener,
p4.FirstName + ' ' + IsNull(p4.MiddleName + ' ', '') + p4.LastName As
SbjMtrExpert,
p5.FirstName + ' ' + IsNull(p5.MiddleName + ' ', '') + p5.LastName As
TestDirector,
ProblemSource, SystemVersion, SystemBuild, Type, Category, UserPriority,
DateProblemIdentified, SequenceOfEvents, ProblemDescription, Workaround,
SuggestedFix, TrackingNumber, ad.AssociatedDataIdentifier
From Problems t
Join common..personnel p1 On p1.PersonnelID = t.OriginatorID
Join common..personnel p2 On p2.PersonnelID = t.FixedByID
Join common..personnel p3 On p3.PersonnelID = t.ScreenerID
Join common..personnel p4 On p4.PersonnelID = t.SubjectMatterExpertID
Join common..personnel p5 On p5.PersonnelID = t.TestDirectorID
Join Locations l On t.LocationID = l.LocationID
Left Join AssociatedDataTPRLI a On t.TPRID = a.TPRID
Left Join AssociatedData ad On a.AssociatedDataID = ad.AssociatedDataID
Now that is done, I am displaying everything you see the query returning on
the report using a stored procedure. I am using a table and adding many
detail rows so I can display the data looking like a form as opposed to just
rows. I tried setting the hide duplicate rows on the title and then on every
other field (with the exception of the associated data field) with no luck.
Any suggestions to what I am doing wrong?
"sorcerdon@.gmail.com" wrote:
> Glad it helped.
> Regards,
> Sorcerdon
>|||hi mike
right click the text box, click properties, click on visibility tab. select
Expression radio button and click on the function button. enter this
expression there
=(Previous(Fields!PK_Field1.Value) = Fields!PK_Field1.Value)
replace PK_Field1 with the field name that determines the duplicate value
for you. repeat the same step for all the text boxes that where you want to
suppress the duplicates.
"Mike Collins" wrote:
> I hope you are getting a reply on this. I tried you solution and at first
> thought it worked, but later remembered that I had changed my query. So I
> only thought it was working. I'm going to try attaching my query and showing
> you exactly what is coming back and how I'm displaying it on a report, to see
> if this helps. Course it seemed like you outlined it the same as I'm about
> to, but just in case it helps. I appreciate any help you can provide. I
> apologize in advance for this being so long...just want to make sure I don't
> miss anything.
> I have two records in the Problems table. Of the two records in the problems
> table, one of the records has two associated data elements and the second
> record has none attached. So the resultset would be similar to the following
> (with a few rows left off for easy viewing):
> Problem Table:
> prob1ID; found big problem title 1; location Virginia; mike collins
> prob2ID; found big problem title 2; location Maryland; mike collins
> Associated Data:
> Prob1ID; Data Item 1
> Prob1ID; Data Item 2
> Below query would return it as:
> prob1ID; found big problem title 1; location Virginia; mike collins; data
> item 1
> prob1ID; found big problem title 1; location Virginia; mike collins; data
> item 1
> prob1ID; found big problem title 2; location Maryland; mike collins; data
> item 2
> Select t.ProblemID, t.Title, l.LocationName,
> p1.FirstName + ' ' + IsNull(p1.MiddleName + ' ', '') + p1.LastName As
> Originator,
> p2.FirstName + ' ' + IsNull(p2.MiddleName + ' ', '') + p2.LastName As FixedBy,
> p3.FirstName + ' ' + IsNull(p3.MiddleName + ' ', '') + p3.LastName As
> Screener,
> p4.FirstName + ' ' + IsNull(p4.MiddleName + ' ', '') + p4.LastName As
> SbjMtrExpert,
> p5.FirstName + ' ' + IsNull(p5.MiddleName + ' ', '') + p5.LastName As
> TestDirector,
> ProblemSource, SystemVersion, SystemBuild, Type, Category, UserPriority,
> DateProblemIdentified, SequenceOfEvents, ProblemDescription, Workaround,
> SuggestedFix, TrackingNumber, ad.AssociatedDataIdentifier
> From Problems t
> Join common..personnel p1 On p1.PersonnelID = t.OriginatorID
> Join common..personnel p2 On p2.PersonnelID = t.FixedByID
> Join common..personnel p3 On p3.PersonnelID = t.ScreenerID
> Join common..personnel p4 On p4.PersonnelID = t.SubjectMatterExpertID
> Join common..personnel p5 On p5.PersonnelID = t.TestDirectorID
> Join Locations l On t.LocationID = l.LocationID
> Left Join AssociatedDataTPRLI a On t.TPRID = a.TPRID
> Left Join AssociatedData ad On a.AssociatedDataID = ad.AssociatedDataID
> Now that is done, I am displaying everything you see the query returning on
> the report using a stored procedure. I am using a table and adding many
> detail rows so I can display the data looking like a form as opposed to just
> rows. I tried setting the hide duplicate rows on the title and then on every
> other field (with the exception of the associated data field) with no luck.
> Any suggestions to what I am doing wrong?
> "sorcerdon@.gmail.com" wrote:
> > Glad it helped.
> >
> > Regards,
> >
> > Sorcerdon
> >
> >