Queries used in the experiments of the paper "Lazy vs Eager Query Plans for Tuple-Independent Probabilistic Databases" (ICDE'09)

We analyze which of the 22 TPC-H queries can benefit from our low-level operator for exact confidence computation on tuple-independent probabilistic databases. For each TPC-H query, we consider its largest subquery without aggregations and inequality joins, but with the conf()..group by aggregation in the outermost select clause. A query with conf()..group by returns the distinct answer tuples together with their exact probability.
We consider two flavours of each of these queries: A version with the original (TPC-H) selection attributes, and a version where we drop all the selection attributes (a Boolean version). For each of the two versions, we also consider the case when the existing TPC-H functional dependencies (FDs) are used to derive a better query signature.
For each query we also state the cost for the evaluation of the aggregation conf()..group by in terms of secondary-storage sorting and scans.

The following shorthand notations are used in the query signatures:
Item: lineitem     Part: part     Sup: supplier     PS: partsupp     Nat: nation     Reg: region     Ord: orders     Cust: customer    
In the query signatures we use the names of tables to denote the variable columns that originate from the corresponding tables in the answer to the plain queries without the conf construct. The computation necessary for conf uses these variable columns and does not propagate them in the final results consisting of the distinct tuples with exact confidences.

TPC-H Query No TPC-H Query (without aggregations and theta joins) Query Signature Cost for conf()..group by
1 Non-Boolean Version
select 
	l_returnflag, 
	l_linestatus, 
	conf() 
from 
	lineitem 
where 
	l_shipdate <= date '1998-09-01' 
group by 
	l_returnflag, 
	l_linestatus;
Item* Sorting on [l_returnflag,l_linestatus] plus one scan.
TPC-H FDs do not change the signature.
Boolean Version
select 
	conf()
from 
	lineitem 
where 
	l_shipdate <= date '1998-09-01';
Item* One scan.
TPC-H FDs do not change the signature.
2 Non-Boolean Version
select
	s_acctbal,
	s_name,
	n_name,
	p_partkey,
	p_mfgr,
	s_address,
	s_phone,
	s_comment,
	conf()
from
	part,
	supplier,
	partsupp,
	nation,
	region
where
	p_partkey = ps_partkey
	and s_suppkey = ps_suppkey
	and p_size = 15
	and p_type like '%BRASS'
	and s_nationkey = n_nationkey
	and n_regionkey = r_regionkey
	and r_name = 'EUROPE'
group by
	s_acctbal,
	s_name,
	n_name,
	p_partkey,
	p_mfgr,
	s_address,
	s_phone,
	s_comment;
No hierarchy between subgoals(suppkey)={PS,Sup} and subgoals(nationkey)={Nat,Sup}.  
Under TPC-H FDs: Part(Reg(Nat(Sup(PS)*)*)) Sorting on [s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment, Part, Reg, Nat, Sup] plus one scan.
The Boolean version is not hierarchical: PS has no key.    
3 Non-Boolean Version
select
	l_orderkey,
  	o_orderdate,
   	o_shippriority,
	conf()
from
  	customer,
   	orders,
   	lineitem
where
  	c_mktsegment = 'BUILDING'
   	and c_custkey = o_custkey
   	and l_orderkey = o_orderkey
   	and o_orderdate < date '1992-01-10'
   	and l_shipdate > date '1992-01-10'
group by
   	l_orderkey,
   	o_orderdate,
 	o_shippriority;
(Cust*(Ord*(Item)*)*)* Sorting on [l_orderkey, o_orderdate, o_shippriority, Cust, Ord] plus three scans.
Under TPC-H FDs: Cust(Ord(Item)*). Sorting on [l_orderkey, o_orderdate, o_shippriority, Cust, Ord] plus one scan.
Boolean Version
select
	conf()
from
  	customer,
  	orders,
  	lineitem
where
  	c_mktsegment = 'BUILDING'
   	and c_custkey = o_custkey
 	and l_orderkey = o_orderkey
   	and o_orderdate < date '1992-01-10'
 	and l_shipdate > date '1992-01-10';
No hierarchy between subgoals(orderkey)={Ord,Item} and subgoals(custkey)={Ord,Cust}.  
Under TPC-H FDs: (Cust(Ord(Item*))*)* Sorting on [Cust, Ord] plus one scan.
4 Non-Boolean Version
select
	o_orderpriority,
	conf()
from
	orders,
	lineitem
where
	o_orderdate >= date '1993-07-01'
	and o_orderdate < date '1993-10-01'
	and l_orderkey = o_orderkey
	and l_commitdate < l_receiptdate
group by
	o_orderpriority;

(Ord*Item*)* Sorting on [o_orderpriority, Ord] plus two scans.
Under TPC-H FDs: (Ord(Item)*)*. Sorting on [o_orderpriority, Ord] plus one scan.
Boolean Version
select
	conf()
from
	orders,
	lineitem
where
	o_orderdate >= date '1993-07-01'
	and o_orderdate < date '1993-10-01'
	and l_orderkey = o_orderkey
	and l_commitdate < l_receiptdate;
(Ord*Item*)* Sorting on [Ord] plus two scans.
Under TPC-H FDs: (Ord(Item)*)*. Sorting on [Ord] plus one scan.
6 Non-Boolean Version
The same as the Boolean version.
   
Boolean Version

select
	conf()
from
	lineitem
where
	l_shipdate >= '1994-01-01'
	and l_shipdate < '1995-01-01'
	and l_discount >= 0.05 
	and l_discount <= 0.07
	and l_quantity < 24;
Item* One scan.
TPC-H FDs do not change the signature.
7 Non-Boolean Version
select
	s_suppkey,
	n1.n_name,
	conf()	
from
	supplier,
	lineitem,
	orders,
	customer,
	nation n1,
	nation n2
where
	s_suppkey = l_suppkey
	and o_orderkey = l_orderkey
	and c_custkey = o_custkey
	and s_nationkey = n1.n_nationkey
	and c_nationkey = n2.n_nationkey
	and ((n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
            or (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY'))
	and l_shipdate between '1995-01-01' and '1996-12-31'
group by 
	s_suppkey,
	n1.n_name;
(Nat1*(Sup*(Nat2*(Cust*(Ord*Item*)*)*)*)*)*
Remark: Self-join on Nation, but the two copies of Nation use distinct tuples to create an answer tuple.
Sorting on [s_suppkey, n1.n_name, Nat1, Sup, Nat2, Cust, Ord] plus six scans.
Under TPC-H FDs: Nat1(Sup(Nat2(Cust(Ord(Item)*)*)*)*) Sorting on [s_suppkey, n1.n_name, Nat1, Sup, Nat2, Cust, Ord] plus one scan.
The Boolean version is not hierarchical.    
10 Non-Boolean Version

select
	c_custkey,
	c_name,
	c_acctbal,
	n_name,
	c_address,
	c_phone,
	c_comment,
	conf()
from
	customer,
	orders,
	lineitem,
	nation
where    
	c_custkey = o_custkey
	and l_orderkey = o_orderkey
 	and o_orderdate >= '1993-01-01' 
 	and o_orderdate < '1993-1-10'
  	and l_returnflag = 'N'
  	and c_nationkey = n_nationkey
group by
	c_custkey,
	c_name,
	c_acctbal,
	n_name,
	c_address,
	c_phone,
	c_comment;
(Nat*(Cust*(Ord*Item*)*)*)* Sorting on [c_custkey, c_name, c_acctbal, n_name, c_address, c_phone, Nat, Cust, Ord] plus four scans.
Under TPC-H FDs: Nat(Cust(Ord(Item)*)*). Sorting on [c_custkey, c_name, c_acctbal, n_name, c_address, c_phone, Nat, Cust, Ord] plus one scan.
Boolean Version
select
	conf()
from
	customer,
	orders,
	lineitem,
	nation
where    
	c_custkey = o_custkey
	and l_orderkey = o_orderkey
 	and o_orderdate >= '1993-01-01' 
 	and o_orderdate < '1993-1-10'
  	and l_returnflag = 'N'
  	and c_nationkey = n_nationkey;
No hierarchy between subgoals(orderkey)={Ord,Item} and subgoals(custkey)={Ord,Cust}.  
Under TPC-H FDs: (Nat(Cust(Ord(Item)*)*)*)* Sorting on [Nat, Cust, Ord] plus one scan.
11 Non-Boolean Version
select
	ps_partkey,
	conf()
from
	partsupp,
	supplier,
	nation
where
	ps_suppkey = s_suppkey
	and s_nationkey = n_nationkey
	and n_name = 'GERMANY'
group by
	ps_partkey;
No hierarchy between subgoals(suppkey)={PS,Sup} and subgoals(nationkey)={Nat,Sup}.  
Under TPC-H FDs: (Nat(Sup(PS)*)*)*. Sorting on [ps_partkey, Nat, Sup] plus one scan.
Boolean Version
select
	conf()
from
	partsupp,
	supplier,
	nation
where
	ps_suppkey = s_suppkey
	and s_nationkey = n_nationkey
	and n_name = 'GERMANY';
No hierarchy between subgoals(suppkey)={PS,Sup} and subgoals(nationkey)={Nat,Sup}.  
Under TPC-H FDs: (Nat(Sup(PS)*)*)*. Sorting on [ps_partkey, Nat, Sup] plus one scan.
12 Non-Boolean Version
select
	l_shipmode,
	conf()
from
	orders,
	lineitem
where
	orders.o_orderkey = lineitem.l_orderkey
  	and (l_shipmode = 'MAIL' 
		or l_shipmode = 'SHIP')
   	and l_commitdate < l_receiptdate
   	and l_shipdate < l_commitdate
   	and l_receiptdate >= '1992-01-01'
   	and l_receiptdate < '1999-01-01'
group by
	l_shipmode;

(Ord*(Item)*)* Sorting on [l_shipmode, Ord] plus two scans.
Under TPC-H FDs: (Ord(Item)*)*. Sorting on [l_shipmode, Ord] plus one scan.
Boolean Version
select
	conf()
from
	orders,
	lineitem
where
	orders.o_orderkey = lineitem.l_orderkey
  	and (l_shipmode = 'MAIL' 
		or l_shipmode = 'SHIP')
   	and l_commitdate < l_receiptdate
   	and l_shipdate < l_commitdate
   	and l_receiptdate >= '1992-01-01'
   	and l_receiptdate < '1999-01-01';
(Ord*(Item)*)* Sorting on [l_shipmode, Ord] plus two scans.
Under TPC-H FDs: (Ord(Item)*)*. Sorting on [l_shipmode, Ord] plus one scan.
14 Non-Boolean Version
The same as the Boolean version.
   
Boolean Version
select
    conf()
from
    lineitem,
    part
where
    l_partkey = p_partkey
    and l_shipdate >= date '1995-09-01'
    and l_shipdate < date '1995-10-01';
(Part*Item*)* Sorting on [Part] plus two scans.
Under TPC-H FDs: (Part Item*)* Sorting on [Part] plus one scan.
15 Non-Boolean Version
select
    s_suppkey,
    s_name,
    s_address,
    s_phone,
    conf()
from
    supplier,
    lineitem
where
    s_suppkey = l_suppkey
    and l_shipdate >= date '1991-10-10'
    and l_shipdate < date '1992-01-10'
group by
    s_suppkey,
    s_name,
    s_address,
    s_phone;
(Sup*Item*)* Sorting on [s_suppkey, s_name, s_address, s_phone, Sup] plus two scans.
Under TPC-H FDs: Sup(Item)*. Sorting on [s_suppkey, s_name, s_address, s_phone, Sup] plus one scan.
Boolean Version
select
    conf()
from
    supplier,
    lineitem
where
    s_suppkey = l_suppkey
    and l_shipdate >= date '1991-10-10'
    and l_shipdate < date '1992-01-10';
(Sup*Item*)* Sorting on [Sup] plus two scans.
Under TPC-H FDs: (Sup(Item)*)* Sorting on [Sup] plus one scan.
16 Non-Boolean Version
select
    p_brand,
    p_type,
    p_size,
    conf()
from
    partsupp,
    part
where
    p_partkey = ps_partkey
    and p_brand = 'Brand#45'
    and p_type like 'MEDIUM POLISHED%'
    and p_size in ( 49, 14, 23, 45, 19, 3, 36, 9 )
group by
    p_brand,
    p_type,
    p_size;
(Part*PS*)* Sorting on [p_brand, p_type, p_size, Part] plus two scans.
Under TPC-H FDs: (Part(PS)*)*. Sorting on [p_brand, p_type, p_size, Part] plus one scan.
Boolean Version
select
    conf()
from
    partsupp,
    part
where
    p_partkey = ps_partkey
    and p_brand <> 'Brand#45'
    and p_type like 'MEDIUM POLISHED%'
    and p_size in ( 49, 14, 23, 45, 19, 3, 36, 9 );
(Part*PS*)* Sorting on [Part] plus two scans.
Under TPC-H FDs: (Part(PS)*)* Sorting on [Part] plus one scan.
17 Non-Boolean Version
The same as the Boolean version.
   
Boolean Version
select
    conf()
from
    lineitem,
    part
where
    p_partkey = l_partkey
    and p_brand = 'Brand#23'
    and p_container = 'MED BOX';
(Part*Item*)* Sorting on [Part] plus 2 scans.
Under TPC-H FDs: (Part(Item)*)* Sorting on [Part] plus 1 scans.
18 Non-Boolean Version
select
    c_custkey,
    c_name,
    o_orderkey,
    o_orderdate,
    o_totalprice,
    conf()
from
    customer,
    orders,
    lineitem
where
    c_custkey = o_custkey
    and o_orderkey = l_orderkey
    and c_name = 'Customer#000148949'
group by
    c_custkey,
    c_name,
    o_orderkey,
    o_orderdate,
    o_totalprice;
(Cust*(Ord*Item*)*)* Sorting on [c_custkey, c_name, o_orderkey, o_orderdate, o_totalprice, Cust, Ord] plus three scan.
Under TPC-H FDs: (Cust(Ord(Item)*)*)* Sorting on [c_custkey, c_name, o_orderkey, o_orderdate, o_totalprice, Cust, Ord] plus one scan.
Boolean Version
select
    conf()
from
    customer,
    orders,
    lineitem
where
    c_custkey = o_custkey
    and o_orderkey = l_orderkey
    and c_name = 'Customer#000148949';
No hierarchy between subgoals(orderkey)={Ord,Item} and subgoals(custkey)={Ord,Cust}.  
Under TPC-H FDs: (Cust(Ord(Item)*)*)* Sorting on [Cust, Ord] plus one scan.
19 Non-Boolean Version
The same as the Boolean version.

   
Boolean Version
select
    conf()
from
    lineitem,
    part
where
    p_partkey = l_partkey and (
    (       p_brand = 'Brand#12'             
        and p_container in ('SM CASE','SM BOX','SM PACK','SM PKG')
        and l_quantity >= 1 and l_quantity <= 1 + 10
        and p_size between 1 and 5
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON' )
    or (     
            p_brand = 'Brand#23'
        and p_container in ('MED BAG','MED BOX','MED PKG','MED PACK')
        and l_quantity >= 10 and l_quantity <= 10 + 10
        and p_size between 1 and 10
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON')
    or (
            p_brand = 'Brand#34'
        and p_container in ('LG CASE','LG BOX','LG PACK','LG PKG')
        and l_quantity >= 20 and l_quantity <= 20 + 10
        and p_size between 1 and 15
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'));
(Part*Item*)*
Remark: The three clauses in the disjunction are mutually exclusive.
Sorting on [Part] plus two scans.
Under TPC-H FDs: (Part(Item)*)* Sorting on [Part] plus one scan.
20 Non-Boolean Version
select
    s_suppkey,
    s_name,
    s_address,
    conf()
from
    supplier,
    nation,
    partsupp,
    part
where
    s_suppkey = ps_suppkey
    and p_partkey = ps_partkey
    and p_name like 'forest%'
    and s_nationkey = n_nationkey
    and n_name = 'CANADA'
group by
    s_suppkey,
    s_name,
    s_address;
(Nat*Sup*)* Sorting on [s_suppkey, s_name, s_address, Nat] plus four scans.
Under TPC-H FDs: NatSup Sorting on [s_suppkey, s_name, s_address, Nat] plus one scan.
The Boolean version is not hierarchical.    
21 Non-Boolean Version
select
    s_suppkey,
    conf()
from
    supplier,
    lineitem,
    orders,
    nation
where
    s_suppkey = l_suppkey
    and o_orderkey = l_orderkey
    and o_orderstatus = 'F'
    and l_receiptdate > l_commitdate
    and s_nationkey = n_nationkey
    and n_name = 'SAUDI ARABIA'
group by
    s_suppkey;
(Nat*(Sup*(Ord*Item*)*)*)* Sorting on [s_suppkey, Nat, Sup, Ord] plus four scans.
Under TPC-H FDs: Nat(Sup(Ord(Item)*)*) Sorting on [s_suppkey, Nat, Sup, Ord] plus one scan.
The Boolean version is not hierarchical.    

Last update: Jul 2009.