Friday, May 24, 2013

No more, no less comparison


One of the queries I finally came up today is worth documenting; well at least I thought so.

Here is the scenario:

An input list of statenos is to be flagged for being same:
If and only if such an EXACT list (no more, no less statenos) is currently pending review.

The highlighted part was the most difficult to implement.

Here are the involved structures:

1.       First is a parent table that is to store merely the flagging instances and a status. ‘NEW’ for yet to be reviewed

CREATE TABLE NYEHMS_DUPLICATE_STATENOS
(
  DUPLICATION_ID     NUMBER,
  FLAGGED_BY         NUMBER,
  FLAGGED_DATE       DATE,
  STATUS_CD          VARCHAR2(10 BYTE),
  REQUEST_COMMENTS   VARCHAR2(3000 BYTE),
  REVIEW_COMMENTS    VARCHAR2(3000 BYTE),
  LAST_UPDATED_BY    NUMBER,
  LAST_UPDATED_DATE  DATE
)

2.       Second is the meat of the matter here, the child table stores a parent id and all the involved statenos in that flagging

CREATE TABLE NYEHMS_DUPLICATE_STATENOS_LIST
(
  DUPLICATION_LIST_ID  NUMBER,
  DUPLICATION_ID       NUMBER,
  ORIGINAL_STATENO     VARCHAR2(40 BYTE),
  CURRENT_STATENO      VARCHAR2(40 BYTE)
)

So the query must loop through the child table field Current_stateno and look for a complete set identified by Duplication_ID that is a one-one match with the input list.

If each member of the input list is found to be in some different sets (different duplication_id) then that must not be counted as a full match. This would be much easier but as stated above that is not what we need.

Here is the query. It makes use of Analytical functions in Oracle among others.

  1 SELECT COUNT (*)
  2     INTO l_exist_count
  3      FROM (  SELECT duplication_id, COUNT (match) match_cnt
  4                FROM (SELECT tbl.COLUMN_VALUE,
  5                             dup_list.current_stateno match,
  6                             dup_list.duplication_id
  7                        FROM    TABLE (l_stateno_num_arr) tbl
  8                             LEFT OUTER JOIN
  9                                (SELECT duplication_id,
  10                                       current_stateno,
  11                                        COUNT (DISTINCT current_stateno)
  12                                          OVER (PARTITION BY duplication_id)
  13                                          list_cnt
  14                                  FROM nyehms_new_flagged_same_vw) dup_list
  15                             ON dup_list.current_stateno = tbl.COLUMN_VALUE
  16                               AND dup_list.list_cnt = l_token_cnt)
  17            GROUP BY duplication_id)
  18     WHERE duplication_id IS NOT NULL AND match_cnt = l_token_cnt;

·         TABLE (l_stateno_num_arr) tbl  represents the input
·         nyehms_new_flagged_same_vw represents the yet to be reviewed existing flagged sets

Dissection of key parts:
1.       Line 7 to 14: First thing we need is a LEFT OUTER JOIN between the input list and the yet to be reviewed (YTBR). Even in the YTBR we are interested in only those sets that have as many distinct statenos as the input list did. (For instance upon merges 2 of the values in current_stateno under one set could end up being just the same stateno). That explains the portion from 9 to 14
2.       Line 4 to 6: We select each input stateno and if a match existed the matched stateno (calling it match) and the set (duplication_id) in which it was found. Being a Left outer, if a match wasn’t found it would just pull NULL for the match and set id fields
3.       Line 3 and 17: This result-set from previous 2 steps is now aggregated to give in which set (duplication_id) how many matches were found, the GROUP BY and COUNT(match)
4.       Line 1, 2 and 18: Finally we now check if any results from step 3 (duplication id and number of matches) exists that had as many matches as the count of our input list (line 18)
 

No comments:

Post a Comment