Hi Folks,
I'm working on something a little different at present. Given it's a bit out
of the ordinary, and it relates to some of our indexing discussions lately,
I'd appreciate any thoughts/feedback.
It's a SQL Server box that's receiving a stream of real-time data from oil
drilling rigs all over the world. It needs to perform both fast inserts (ie
10,000 per second) and also have good performance on a fixed list of query
types. The essential part of the table data that has me most intrigued looks
like this:
Well (ID of the well involved - varchar(60))
Wellbore (ID of a specific bore at a wellsite - varchar(60))
Curve (ID of a specific set of measures eg: pressures, gas levels, etc.)
Depth (float)
Time (datetime)
DataValue (float)
This data is defined in an external xsd. The hardware/operating system
requirements are basically a non-issue ie: whatever is needed can be
obtained.
The key thing is that about 100 wellbores with up to 100 curves (of sets of
data) can be active each second, so fast insert performance is critical.
Once inserted, the messiest queries involve:
1. All values for a specific curve within a range of depths
2. All values for a specific curve within a range of times
The things I'm pondering are:
1. Whether to have this in one database or to have a database per well.
There are about 1200 wells, but about 100 active concurrently.
2. Whether to store the data in a heap or a clustered index on either depth
or time.
3. Whether to use a non-clustered index for whichever of depth or time
wasn't the clustered one OR whether to store the data twice ie: once by date
and once by time.
4. Whether to find a way to limit the range of depth values (currently
float) to store them instead as a bigint, if the performance of bigint's in
indexes and range queries, etc. is better than floats.
I'd love to hear anyone's thoughts on this. We're about to start
experimenting to see what works.
Regards,
GregGreg
Is it SQL Server 2005?
Well , if it needs perfom fast inserts I think to store the data in a
heap you can get better performance for inserting , however as you said
you need to have good performance on lits of query , so CI will be a good
candidate for range queries
Actually it is really hard to suggest something , you need to test it and
get a decision
"Greg Low [MVP]" <greglow@.lowell.com.au> wrote in message
news:udZGEsj9GHA.3264@.TK2MSFTNGP04.phx.gbl...
> Hi Folks,
> I'm working on something a little different at present. Given it's a bit
> out of the ordinary, and it relates to some of our indexing discussions
> lately, I'd appreciate any thoughts/feedback.
> It's a SQL Server box that's receiving a stream of real-time data from oil
> drilling rigs all over the world. It needs to perform both fast inserts
> (ie 10,000 per second) and also have good performance on a fixed list of
> query types. The essential part of the table data that has me most
> intrigued looks like this:
> Well (ID of the well involved - varchar(60))
> Wellbore (ID of a specific bore at a wellsite - varchar(60))
> Curve (ID of a specific set of measures eg: pressures, gas levels, etc.)
> Depth (float)
> Time (datetime)
> DataValue (float)
> This data is defined in an external xsd. The hardware/operating system
> requirements are basically a non-issue ie: whatever is needed can be
> obtained.
> The key thing is that about 100 wellbores with up to 100 curves (of sets
> of data) can be active each second, so fast insert performance is
> critical. Once inserted, the messiest queries involve:
> 1. All values for a specific curve within a range of depths
> 2. All values for a specific curve within a range of times
> The things I'm pondering are:
> 1. Whether to have this in one database or to have a database per well.
> There are about 1200 wells, but about 100 active concurrently.
> 2. Whether to store the data in a heap or a clustered index on either
> depth or time.
> 3. Whether to use a non-clustered index for whichever of depth or time
> wasn't the clustered one OR whether to store the data twice ie: once by
> date and once by time.
> 4. Whether to find a way to limit the range of depth values (currently
> float) to store them instead as a bigint, if the performance of bigint's
> in indexes and range queries, etc. is better than floats.
> I'd love to hear anyone's thoughts on this. We're about to start
> experimenting to see what works.
> Regards,
> Greg
>|||Hi...
We had a simmilar problem just alitle bit scaled down.
We implemented like:
- use an autoid for PK -> heap, rows are appended in the pages, no other
indexes, no constrains, all collumns allows null.
- use EE version (sql2005) :)
- use snapshot isolation mode for querying the data. (decrease lockings)
- use for querying a second maschine wich is sync-ing the data each n
sec/min. (you can try a log shipping but we've got no time to checkit)
(decreasing lockings, free cpu).
Have phun,
Radu
"Greg Low [MVP]" <greglow@.lowell.com.au> wrote in message
news:udZGEsj9GHA.3264@.TK2MSFTNGP04.phx.gbl...
> Hi Folks,
> I'm working on something a little different at present. Given it's a bit
> out of the ordinary, and it relates to some of our indexing discussions
> lately, I'd appreciate any thoughts/feedback.
> It's a SQL Server box that's receiving a stream of real-time data from oil
> drilling rigs all over the world. It needs to perform both fast inserts
> (ie 10,000 per second) and also have good performance on a fixed list of
> query types. The essential part of the table data that has me most
> intrigued looks like this:
> Well (ID of the well involved - varchar(60))
> Wellbore (ID of a specific bore at a wellsite - varchar(60))
> Curve (ID of a specific set of measures eg: pressures, gas levels, etc.)
> Depth (float)
> Time (datetime)
> DataValue (float)
> This data is defined in an external xsd. The hardware/operating system
> requirements are basically a non-issue ie: whatever is needed can be
> obtained.
> The key thing is that about 100 wellbores with up to 100 curves (of sets
> of data) can be active each second, so fast insert performance is
> critical. Once inserted, the messiest queries involve:
> 1. All values for a specific curve within a range of depths
> 2. All values for a specific curve within a range of times
> The things I'm pondering are:
> 1. Whether to have this in one database or to have a database per well.
> There are about 1200 wells, but about 100 active concurrently.
> 2. Whether to store the data in a heap or a clustered index on either
> depth or time.
> 3. Whether to use a non-clustered index for whichever of depth or time
> wasn't the clustered one OR whether to store the data twice ie: once by
> date and once by time.
> 4. Whether to find a way to limit the range of depth values (currently
> float) to store them instead as a bigint, if the performance of bigint's
> in indexes and range queries, etc. is better than floats.
> I'd love to hear anyone's thoughts on this. We're about to start
> experimenting to see what works.
> Regards,
> Greg
>|||On 23.10.2006 02:34, Greg Low [MVP] wrote:
> Hi Folks,
> I'm working on something a little different at present. Given it's a bit out
> of the ordinary, and it relates to some of our indexing discussions lately,
> I'd appreciate any thoughts/feedback.
> It's a SQL Server box that's receiving a stream of real-time data from oil
> drilling rigs all over the world. It needs to perform both fast inserts (ie
> 10,000 per second) and also have good performance on a fixed list of query
> types. The essential part of the table data that has me most intrigued looks
> like this:
> Well (ID of the well involved - varchar(60))
> Wellbore (ID of a specific bore at a wellsite - varchar(60))
> Curve (ID of a specific set of measures eg: pressures, gas levels, etc.)
> Depth (float)
> Time (datetime)
> DataValue (float)
> This data is defined in an external xsd. The hardware/operating system
> requirements are basically a non-issue ie: whatever is needed can be
> obtained.
> The key thing is that about 100 wellbores with up to 100 curves (of sets of
> data) can be active each second, so fast insert performance is critical.
> Once inserted, the messiest queries involve:
> 1. All values for a specific curve within a range of depths
> 2. All values for a specific curve within a range of times
> The things I'm pondering are:
> 1. Whether to have this in one database or to have a database per well.
> There are about 1200 wells, but about 100 active concurrently.
What about partitioning by wellid? You should at least create a setup
with allows for as much concurrent processing as possible (i.e.
separating data physically on different disks or at least file groups
etc.) to be able to stream your data into the DB as independently as
possible.
> 2. Whether to store the data in a heap or a clustered index on either depth
> or time.
> 3. Whether to use a non-clustered index for whichever of depth or time
> wasn't the clustered one OR whether to store the data twice ie: once by date
> and once by time.
If you need purging of old data and query according to time range then a
CI with timestamp as leading column is certainly a good idea.
> 4. Whether to find a way to limit the range of depth values (currently
> float) to store them instead as a bigint, if the performance of bigint's in
> indexes and range queries, etc. is better than floats.
I would start out with floats if the data is actually float. Only go
for this type of optimization if you actually need it and it is proved
that bigint is faster.
> I'd love to hear anyone's thoughts on this. We're about to start
> experimenting to see what works.
On a more general level I am thinking that Oracle might be more suited
for this scenario because it offers several features that give you
better control. For example, you do not have a single CI on a table -
basically all indexes are ordered (but you can mimic SQL Server's CI
with an IOT). Another point is partitioning of tables, Oracle is much
more flexible there (IIRC you can do only hash partitioning on SQL 2005).
Just my 0.02EUR...
Kind regards
robert
Friday, March 23, 2012
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment