Wednesday, March 7, 2012

Hierarchy - Finding loops in the data

I've spent hours trying to find the answer to this with no luck. One
of our systems tracks connections between objects. If a user enters an
object that is connected to to another object that is connect to its
parent object, we have problems.
We have many diffent type of objects, so I will try to give a simple
example. If we have a meter "Meter A" that is connected to a tank
"Tank A" and that tank is connected to another meter "Meter B" which is
connected to "Tank B". The problem arises when "Tank B" is connected
to "Meter A"
Meter A <-> Tank A <-> MeterB <--> Tank B <-//-> Meter A
As you can see, there is a "loop" in connectivity. The data is store
as hierarchial data as upstream and downstream attributes. Using
hierarchial queries, I've been able to retrieve the data in a
hierarachial representation, but my goal is to determine if there is a
loop in the data. Just to make it clear, its the same as the
Manager/Employee example seen everywhere except the idea is to create a
process to ensure that no manager reports to his employee.
A sample dataset is provide for your much appreciated assistance.
--DST = DownStreamType
--DID = DownStreamID
--UST = UpstreamType
--UID = UpstreamID
DROP TABLE #Conn
CREATE TABLE #Conn (DST int, DSID int, UST int, UPID int)
--Dataset connects with no loop
INSERT INTO #Conn VALUES (1,1,1,2)
INSERT INTO #Conn VALUES (1,2,1,3)
INSERT INTO #Conn VALUES (1,3,2,1)
INSERT INTO #Conn VALUES (2,1,2,2)
INSERT INTO #Conn VALUES (2,2,2,3)
INSERT INTO #Conn VALUES (2,3,3,1)
--Dataset loops (each record is connected to each other, last
--record connects back to first record.
INSERT INTO #Conn VALUES (1,10,1,20)
INSERT INTO #Conn VALUES (1,20,1,30)
INSERT INTO #Conn VALUES (1,30,2,10)
INSERT INTO #Conn VALUES (2,10,2,20)
INSERT INTO #Conn VALUES (2,20,2,30)
INSERT INTO #Conn VALUES (2,30,1,10)
TIAIf you search the web for sql+cycle+detection+graphs, you'll find some
things
that should help.
Your hierarchy representation is a bit different than most, since you
don't store the nodes anywhere, just the edges, so a "dangling" edge
is harder to spot (usually it would have NULL as the parent or child).
Here's one easy-to-implement solution that might work, though it will
probably be less efficient than other solutions for large data sets.
I also haven't taken the time to verify that it's sound, but I think it is:
repeatedly remove dead-end links from the network/table. If you end
up with nothing, there were no loops. If you end up with a sub-network
having no dead ends, there must be a loop involving the remaining
links.
-- start with all non-dead-end links
select DST, DSID, UST, UPID
into #remnants
from #Conn as C1
where exists (
select * from #Conn C2
where C2.UST = C1.DST
and C2.UPID = C1.DSID
)
while @.@.rowcount <> 0
delete from #remnants
where not exists (
select * from #remnants as R2
where R2.UST = #remnants.DST
and R2.UPID = #remnants.DSID
)
select * from #remnants
go
-- Steve Kass
-- Drew University
-- 0D2B1B93-2E53-448C-B441-61B1C5EF1F40
carmaboy@.gmail.com wrote:

>I've spent hours trying to find the answer to this with no luck. One
>of our systems tracks connections between objects. If a user enters an
>object that is connected to to another object that is connect to its
>parent object, we have problems.
>We have many diffent type of objects, so I will try to give a simple
>example. If we have a meter "Meter A" that is connected to a tank
>"Tank A" and that tank is connected to another meter "Meter B" which is
>connected to "Tank B". The problem arises when "Tank B" is connected
>to "Meter A"
>Meter A <-> Tank A <-> MeterB <--> Tank B <-//-> Meter A
>As you can see, there is a "loop" in connectivity. The data is store
>as hierarchial data as upstream and downstream attributes. Using
>hierarchial queries, I've been able to retrieve the data in a
>hierarachial representation, but my goal is to determine if there is a
>loop in the data. Just to make it clear, its the same as the
>Manager/Employee example seen everywhere except the idea is to create a
>process to ensure that no manager reports to his employee.
>A sample dataset is provide for your much appreciated assistance.
>--DST = DownStreamType
>--DID = DownStreamID
>--UST = UpstreamType
>--UID = UpstreamID
>DROP TABLE #Conn
>CREATE TABLE #Conn (DST int, DSID int, UST int, UPID int)
>--Dataset connects with no loop
>INSERT INTO #Conn VALUES (1,1,1,2)
>INSERT INTO #Conn VALUES (1,2,1,3)
>INSERT INTO #Conn VALUES (1,3,2,1)
>INSERT INTO #Conn VALUES (2,1,2,2)
>INSERT INTO #Conn VALUES (2,2,2,3)
>INSERT INTO #Conn VALUES (2,3,3,1)
>--Dataset loops (each record is connected to each other, last
>--record connects back to first record.
>INSERT INTO #Conn VALUES (1,10,1,20)
>INSERT INTO #Conn VALUES (1,20,1,30)
>INSERT INTO #Conn VALUES (1,30,2,10)
>INSERT INTO #Conn VALUES (2,10,2,20)
>INSERT INTO #Conn VALUES (2,20,2,30)
>INSERT INTO #Conn VALUES (2,30,1,10)
>TIA
>
>|||Hi there,
Right now I have to leave for home. Will post u a solution by tomorrow
evening GMT. I have worked a lot with hierarchical tables, u surely will
enjoy the solution. bye
"carmaboy@.gmail.com" wrote:

> I've spent hours trying to find the answer to this with no luck. One
> of our systems tracks connections between objects. If a user enters an
> object that is connected to to another object that is connect to its
> parent object, we have problems.
> We have many diffent type of objects, so I will try to give a simple
> example. If we have a meter "Meter A" that is connected to a tank
> "Tank A" and that tank is connected to another meter "Meter B" which is
> connected to "Tank B". The problem arises when "Tank B" is connected
> to "Meter A"
> Meter A <-> Tank A <-> MeterB <--> Tank B <-//-> Meter A
> As you can see, there is a "loop" in connectivity. The data is store
> as hierarchial data as upstream and downstream attributes. Using
> hierarchial queries, I've been able to retrieve the data in a
> hierarachial representation, but my goal is to determine if there is a
> loop in the data. Just to make it clear, its the same as the
> Manager/Employee example seen everywhere except the idea is to create a
> process to ensure that no manager reports to his employee.
> A sample dataset is provide for your much appreciated assistance.
> --DST = DownStreamType
> --DID = DownStreamID
> --UST = UpstreamType
> --UID = UpstreamID
> DROP TABLE #Conn
> CREATE TABLE #Conn (DST int, DSID int, UST int, UPID int)
> --Dataset connects with no loop
> INSERT INTO #Conn VALUES (1,1,1,2)
> INSERT INTO #Conn VALUES (1,2,1,3)
> INSERT INTO #Conn VALUES (1,3,2,1)
> INSERT INTO #Conn VALUES (2,1,2,2)
> INSERT INTO #Conn VALUES (2,2,2,3)
> INSERT INTO #Conn VALUES (2,3,3,1)
> --Dataset loops (each record is connected to each other, last
> --record connects back to first record.
> INSERT INTO #Conn VALUES (1,10,1,20)
> INSERT INTO #Conn VALUES (1,20,1,30)
> INSERT INTO #Conn VALUES (1,30,2,10)
> INSERT INTO #Conn VALUES (2,10,2,20)
> INSERT INTO #Conn VALUES (2,20,2,30)
> INSERT INTO #Conn VALUES (2,30,1,10)
> TIA
>|||Thanks GetGoing, I look forward to your answer.
Steve, Thanks for the prompt response. Howerver, I think I did not
make myself clear. I tried your solution and the results were not what
I was looking for. The first dataset sample shows that there is a set
of object connected to each other linearly (if you will). The second
set is circular with object (1,10 - first record) as the downstream
object and again (1,10 - last record) as the upstream object. My goal
is to identify either that (1,10) is causing a loop or simply identify
that there is a loop and where. If I can state this in a different way
using the Emp/Man example, hopefully it would help. If you have 5
employees, each one reporting to the next. (Mary reports to Greg who
reports to Tom who reports to Frank who reports to Sam). I'm looking
for errors that would show (Mary reports to Greg who reports to Tom who
reports to Frank who reports to Sam who reports to Greg). Since Sam is
Greg boss 4 levels down, it incorrect/impossible that Sam should report
to Greg. The reason I need this is that some processing occurs based
on this "linear" flow. If there is a looping of the data, the process
gets stuck rather then ends. Thanks again.|||Based on the illustrations in your post I think I see the source of the
problems. The hierarchical chain is missing one vital element: context.
You say that "Meter A" is connected to "Tank A" which in turn is connected
to "MeterB" which connects to "Tank B" which then connects to "Meter A". All
these connections cannot be in the same context. If "Meter A" and "Meter B"
are gauges, then the context could be described as "Pressure", and if "Tank
A" and "Tank B" are two interconnected tanks (containers), then this is a
different context - let's call it "Containment".
If context is respected, then - logically - the situation really is:
1) the "Pressure" context:
MeterA
<-- Tank A
<-- Tank B
Meter B
<-- Tank A
<-- Tank B
2) the "Containment" context:
Tank A
<-- Tank B
or
Tank B
<-- Tank A
or even
CombinedTanks
<-- Tank A
<-- Tank B
"Mixing apples and pears makes cocktails not a trees."
-- Confucius
Maybe you should do some more reading on trees and hierarchies.
ML|||I appreciate the critique ML. Actually, what your describing is one
aspect of its possible context. What these meters and tanks are doing
is measuring liquids, specifically oil and gas. As oil/gas is pumped
from the ground, its piped to a tank. Howerver, the meter is used to
measure the amount of o&g from a specific point in the ground at
specific time intervals where as the tanks are used to measure the
amount of o&g from the ground from many points (i.e meters). My
example is a simple one, but to illustrate its complexity, here is a
real sincerio. From a well, oil and/or gas is produced which flows
though a pipepine to a tank. Meters are attached to each well to
measure the amount of product is retreived. Many wells can be attached
to a single meter as well as a meter can be attached to other meters.
The pipeline continues to the tank where it stores the o&g and is used
to measure how much has been retreived over all. Many tanks can be
attached to each other as well as the tank can be attached to another
meter before going to another tank. The direction of the production
demonstrates the use of downstream and upstream as an object (Meter,
Well, Tank, Equipment). Each item can be a downstream object or
upstream object depending on how they are connected. I hope this
clears things up a little. My appologies if you are in the O&G
industry and already know this. My intention is to make my request
clear. Thanks.|||I'll assume that the graph you represent is a digraph where the relationship
points from the downstream item to the upstream one (d-->u).
The function I provided (not thoroughly tested with this particular
implementation) traverses the graph starting with an input root, and
constructs a path for each node made of all nodes leading to the current
node.
For example, if you start traversing the graph from node 1-1, when you get
to node 3-1, the path will be:
'.1-1.1-2.1-3.2-1.2-2.2-3.3-1.'
Once you have this path available, detecting a cycle is simple--if the
source node's path already contains the target node id, you have a cycle.
Here's an implementation with a UDF with a couple of tests against your
sample data. I really didn't test it thoroughly, so I hope I didn't confuse
the references to dst, dsid, ust, usid.
-- ddl and sample data
create table conn (dst int, dsid int, ust int, usid int);
-- no cycle
insert into conn values (1,1,1,2);
insert into conn values (1,2,1,3);
insert into conn values (1,3,2,1);
insert into conn values (2,1,2,2);
insert into conn values (2,2,2,3);
insert into conn values (2,3,3,1);
-- cycle
insert into conn values (1,10,1,20);
insert into conn values (1,20,1,30);
insert into conn values (1,30,2,10);
insert into conn values (2,10,2,20);
insert into conn values (2,20,2,30);
insert into conn values (2,30,1,10);
go
create function fn_getsubgraph(@.rootdst as int, @.rootdsid as int)
returns @.subgraph table
(
dst int not null,
dsid int not null,
ust int not null,
usid int not null,
lvl int not null,
path varchar(900) not null,
cycle int not null
)
as
begin
declare @.lvl as int;
set @.lvl = 1;
insert into @.subgraph
select dst, dsid, ust, usid, @.lvl,
-- constract path of nodes leading to current node
-- in the form '.t1-id1.t2-id2. ... .tn-idn.'
'.' + cast(dst as varchar(10)) + '-'
+ cast(dsid as varchar(10)) + '.'
+ cast(ust as varchar(10)) + '-'
+ cast(usid as varchar(10)) + '.' as path,
case when dst = ust and dsid = usid then 1 else 0 end as cycle
from conn
where dst = @.rootdst and dsid = @.rootdsid;
while @.@.rowcount > 0
begin
set @.lvl = @.lvl + 1;
-- insert into @.subgraph next level
insert into @.subgraph
select u.dst, u.dsid, u.ust, u.usid, @.lvl,
d.path
+ cast(u.ust as varchar(10)) + '-'
+ cast(u.usid as varchar(10)) + '.' as path,
-- cycle is detected when source path contains target node
case when d.path like '%.' +
cast(u.ust as varchar(10)) + '-'
+ cast(u.usid as varchar(10)) + '.%'
then 1 else 0 end
from @.subgraph as d
join conn as u
on d.lvl = @.lvl - 1 -- filter prev level only
and u.dst = d.ust and u.dsid = d.usid
and d.cycle = 0; -- don't pursue cyclic paths
end
return;
end
go
select * from fn_getsubgraph(1, 1);
dst dsid ust usid lvl path cycle
-- -- -- -- -- -- --
1 1 1 2 1 .1-1.1-2. 0
1 2 1 3 2 .1-1.1-2.1-3. 0
1 3 2 1 3 .1-1.1-2.1-3.2-1. 0
2 1 2 2 4 .1-1.1-2.1-3.2-1.2-2. 0
2 2 2 3 5 .1-1.1-2.1-3.2-1.2-2.2-3. 0
2 3 3 1 6 .1-1.1-2.1-3.2-1.2-2.2-3.3-1. 0
select * from fn_getsubgraph(1, 10);
dst dsid ust usid lvl path cycle
-- -- -- -- -- -- --
1 10 1 20 1 .1-10.1-20. 0
1 20 1 30 2 .1-10.1-20.1-30. 0
1 30 2 10 3 .1-10.1-20.1-30.2-10. 0
2 10 2 20 4 .1-10.1-20.1-30.2-10.2-20. 0
2 20 2 30 5 .1-10.1-20.1-30.2-10.2-20.2-30. 0
2 30 1 10 6 .1-10.1-20.1-30.2-10.2-20.2-30.1-10. 1
BG, SQL Server MVP
www.SolidQualityLearning.com
<carmaboy@.gmail.com> wrote in message
news:1123166940.315534.214460@.g44g2000cwa.googlegroups.com...
> I've spent hours trying to find the answer to this with no luck. One
> of our systems tracks connections between objects. If a user enters an
> object that is connected to to another object that is connect to its
> parent object, we have problems.
> We have many diffent type of objects, so I will try to give a simple
> example. If we have a meter "Meter A" that is connected to a tank
> "Tank A" and that tank is connected to another meter "Meter B" which is
> connected to "Tank B". The problem arises when "Tank B" is connected
> to "Meter A"
> Meter A <-> Tank A <-> MeterB <--> Tank B <-//-> Meter A
> As you can see, there is a "loop" in connectivity. The data is store
> as hierarchial data as upstream and downstream attributes. Using
> hierarchial queries, I've been able to retrieve the data in a
> hierarachial representation, but my goal is to determine if there is a
> loop in the data. Just to make it clear, its the same as the
> Manager/Employee example seen everywhere except the idea is to create a
> process to ensure that no manager reports to his employee.
> A sample dataset is provide for your much appreciated assistance.
> --DST = DownStreamType
> --DID = DownStreamID
> --UST = UpstreamType
> --UID = UpstreamID
> DROP TABLE #Conn
> CREATE TABLE #Conn (DST int, DSID int, UST int, UPID int)
> --Dataset connects with no loop
> INSERT INTO #Conn VALUES (1,1,1,2)
> INSERT INTO #Conn VALUES (1,2,1,3)
> INSERT INTO #Conn VALUES (1,3,2,1)
> INSERT INTO #Conn VALUES (2,1,2,2)
> INSERT INTO #Conn VALUES (2,2,2,3)
> INSERT INTO #Conn VALUES (2,3,3,1)
> --Dataset loops (each record is connected to each other, last
> --record connects back to first record.
> INSERT INTO #Conn VALUES (1,10,1,20)
> INSERT INTO #Conn VALUES (1,20,1,30)
> INSERT INTO #Conn VALUES (1,30,2,10)
> INSERT INTO #Conn VALUES (2,10,2,20)
> INSERT INTO #Conn VALUES (2,20,2,30)
> INSERT INTO #Conn VALUES (2,30,1,10)
> TIA
>|||There are many different ways to present the results of
loop detection. If what I provided is not "what [you were]
looking for", can you post the result set you want?
My code gives this result:
DST DSID UST UPID
-- -- -- --
1 10 1 20
1 20 1 30
1 30 2 10
2 10 2 20
2 20 2 30
2 30 1 10
You say your goal is "is to identify either that (1,10)
is causing a loop or simply identify that there is a loop
and where."
I've given you a list of connections that participate in a loop. It
doesn't make sense to say that (1,10) is the specific cause, since
nothing distinguishes (1,10) from any of the other nodes in the
loop. Loops have no beginning or end, and one could just as well
say that there is a loop that starts at (2,20) and ends at (2,20),
where the loop is
(2, 20); (2,30); (1,10); (1,20); (1,30); (2,10); (2,20)
If you want to know "where" the loop is, how would such a
question be answered, if not by listing the parts of the loop,
since loops have no distinguished location beyond their being
a set of nodes and edges.
You can list the nodes instead of the edges using my example
with this final query:
select DST as T, DSID as ID
from #remnants
union
select UST, UPID
from #remnants
The result would then be
T ID
-- --
1 10
1 20
1 30
2 10
2 20
2 30
Is that any better? It doesn't give as much information about
how the loop connects these nodes, but it is a list of nodes instead
of a list of connections.
SK
carmaboy@.gmail.com wrote:

>Thanks GetGoing, I look forward to your answer.
>Steve, Thanks for the prompt response. Howerver, I think I did not
>make myself clear. I tried your solution and the results were not what
>I was looking for. The first dataset sample shows that there is a set
>of object connected to each other linearly (if you will). The second
>set is circular with object (1,10 - first record) as the downstream
>object and again (1,10 - last record) as the upstream object. My goal
>is to identify either that (1,10) is causing a loop or simply identify
>that there is a loop and where. If I can state this in a different way
>using the Emp/Man example, hopefully it would help. If you have 5
>employees, each one reporting to the next. (Mary reports to Greg who
>reports to Tom who reports to Frank who reports to Sam). I'm looking
>for errors that would show (Mary reports to Greg who reports to Tom who
>reports to Frank who reports to Sam who reports to Greg). Since Sam is
>Greg boss 4 levels down, it incorrect/impossible that Sam should report
>to Greg. The reason I need this is that some processing occurs based
>on this "linear" flow. If there is a looping of the data, the process
>gets stuck rather then ends. Thanks again.
>
>|||Thanks very much for everyones posts. I think I am well on my way to
getting my results.
SK, thanks again. You make a great point about what creates the loop
and how to identify it. I didn't not include that sincerio in my
thought, but have now. Your help is greatly appeciated.|||As far as I can see your model is based on a doubly-linked list. The positiv
e
thing about it is the fact that it's fairly easy to find connected items, ye
t
it can lead to circular references.
I'd suggest designing the model based on a singly-linked list. This way
circular references can be avoided quite easily.
Maybe something like this:
EquipmentPiece : EquipmentInstance : ConnectedToInstance : Direction :
ConnectionType
If redesigning is out of the question, I believe Itzik has a solution that
will help you detect circular references.
ML

No comments:

Post a Comment