ELIMINATING CURSORS IN HANA
INTRODUCTION:
Use of cursors in HANA degrades the performance a lot as it performs operation on single record at a time due to which the read and write on table happens more number of times. An alternate to this is to fetch all the records at one stretch and store it in a temporary table type and perform the calculation together. I would like to share a few complex use cases where we eliminated the use of cursors.
USE CASE 1:Reads data by fetching the most relevant record and does update of the most relevant record based on the condition
PROBLEM STATEMENT: We need to select candidates for air Force. There are 2 levels of exams where a single candidate has 2 attempts to clear each level.
Conditions are:
1. There are 3 measure based on which a candidate is tagged ideal for that level
Level 1: Marks >= 70, Age >= 25, weight >= 70
Level 2: Mark >= 90, Age >= 30, weight >= 75
The order of importance of the measures is (in descending order):
- Marks
- Age
- Weight
2. If the candidate has passed level 2 then his job title should be updated as ‘Pilot’ and should not be considered in level 1 selection. And if he has not passed then check if he has passed in level 1, if so then update his job title as ‘Co-Pilot’.
PERFORMANCE IMPROVEMENTS:
Before :
After:
Table :1 MEASURE [row-count :1,00,000 approx.]
ROLL_NO | NAME | WEIGHT | AGE | MARKS | CATEGORY | LEVEL |
2001 | Vignesh | 75 | 30 | 60 | B | 2 |
2002 | Prachi | 75 | 30 | 90 | B | 2 |
1001 | Vignesh | 70 | 25 | 70 | A | 1 |
1001 | Srivatsan | 70 | 25 | 80 | A | 1 |
Table 2 : IDEAL
CATEGORY | LEVEL | WEIGHT | AGE | MARKS |
B | 2 | 75 | 30 | 90 |
A | 1 | 70 | 25 | 70 |
Table 3 : ELIGIBILITY
NAME | ELIGIBLE |
Prachi | Y |
Vignesh | Y |
Srivatsan | Y |
Table 4: JOB_TABLE[row-count :1,00,000 approx.]
NAME | JOB_TITLE |
Prachi | Pilot |
Vignesh | Co-Pilot |
Srivatsan | Co-Pilot |
Bottle necks:
- If the candidate has already been tagged as ‘Pilot’ in category B then he shouldn’t be considered for evaluation in Level 1 which can be done using a NOT IN check but it again degrades the performance.
- The data should come in the sequence with the highest relevant score of the candidate getting updated for that particular level.
------CURSOR ELIMINATED
-- TEMPORARY TABLE VARIABLE THAT HAS ALL THE ELIGIBLE CANDIDATES FOR LEVEL 1 AND LEVEL 2
-- TEMPORARY TABLE VARIABLE THAT HAS ALL THE ELIGIBLE CANDIDATES FOR LEVEL 2
-- WE DO A OUTER JOIN TO AVOID SELECTION OF CANDIDATE WHO CLEARED LEVEL 2
-- SELECT ONLY THE RECORDS WHERE 'DEL' FLAG IS NULL SO THAT WE SELECT THE CANDIDATES WHO HAS NOT CLEARED LEVEL 2 BUT ELIGIBLE FOR EVALUATION IN LEVEL 1
CONCLUSION:
- Assign cursor s to table variables and then process the logic of the cursor at once.
- Cartesian product as a result of inner Join can be used to get all the records that are processed through nested loop.
- We can eliminate NOT IN conditions by setting a ‘DEL’ flag in table A and then do a left outer join with table B and then select the only records having flag as NULL which ensures the records are not selected again in table B this method improves the performance drastically when you are performing a NOT IN condition on a large data set.