Background
We feel SDA has great potential in a number of use cases, however we wanted to see how it would perform when combining local and remote data sources, which is key for any practical use. For this particular blog, I'm using both HANA for Local(BW) & Remote(Sidecar) sandboxes that are both on revision 72, both sitting on the same physical rack but on different servers. I may follow on with a Local HANA & a remote Oracle db if I folks are interested and we can get the drivers to work for connecting to an 11g db.
Setting up SDA
Not going to go over the actual setup in this blog, as it's been well documented on saphana and in other blogs; some I've noted below..
Smart Data Access: Connecting multiple SAP HANA... | SAP HANA
Smart Data Access with HADOOP HIVE & IMPALA
To Confirm:
(Local & Remote) are both HANA Rev72
select database_name as remote_db, substr(version,0,7) as remote_db_version from sys.m_database
Demo Objective:
Find the Equipment Status from the "CV_EQUIP_GEN_NEW_STATUS" calculation view on HANA Sidecar HT2 (contains SLT replicated tables from ECC) for a small range of Equipment Numbers from the Equipment Master table "/BI0/QEQUIPMENT" in BS1 (BW on HANA).
Step 1: Add Virtual Table
Navigate to _SYS_BIC (where all activated content resides)
Scroll to find the desired view, right click and Add as Virtual Table.
Create Virtual Table in a local Schema we created for SDA Testing.
Let's execute some queries..
Query 1
Query on BS1 to remote calc view through virtual table with some filters added.
select * from "SDA_TEST"."Zhana_CV_EQUIP_GEN_NEW_STATUS" r_cv where r_cv.equnr like '6XR39%' -- equip range and r_cv.datbi = '99991231' -- active records only
What's being Passed to the Remote db? You can check using..
- Highlighting the Select statement and doing a Explain Plan or Visualize Plan, I'd recommend to a Explain Plan minimum just in case you don't get the remote query you expect, e.g. In this case we want to avoid materializing the entire calc view on the remote db and bringing back the entire result set.
- For this demo and what I believe is the most accurate way is to query SQL Cache on the remote database to find the actual sql statement being executed.
There's a lot more fields of interest in m_sql_plan_cache, just selecting a couple of fields for illustration.
select STATEMENT_STRING,TOTAL_CURSOR_DURATION,TOTAL_EXECUTION_TIME,TOTAL_EXECUTION_FETCH_TIME, TOTAL_PREPARATION_TIME,TOTAL_RESULT_RECORD_COUNT from m_sql_plan_cache where upper(statement_string) like '%CV_EQUIP_GEN_NEW_STATUS%' and user_name = 'SMART_BS1' -- User name from the SDA connection details order by last_execution_timestamp desc
I'll execute each statement 3 times so we can average these by dividing by 3.
(Note the equipment range record count is 10,224/3 = 3408)
This is confirmed by BS1 results window
Statement 'select * from "SDA_TEST"."Zhana_CV_EQUIP_GEN_NEW_STATUS" r_cv where r_cv.equnr like '6XR39%' -- ...'
successfully executed in 481 ms 263 µs (server processing time: 352 ms 300 µs)
Fetched 3408 row(s) in 2.048 seconds (server processing time: 9 ms 802 µs)
So we can see that the filtering is being successfully passed to the remote query as expected.
Query 2
Query same range, but this time apply filters to local BW equipment master table and join local to remote table in query.
select le.equipment, re.stat from sapbs1."/BI0/QEQUIPMENT" le, "SDA_TEST"."Zhana_CV_EQUIP_GEN_NEW_STATUS" re where le.equipment like '6XR39%' and le.equipment = re.equnr -- inner join on equipment and le.dateto = '99991231' -- active records only and re.datbi = '99991231' -- active records only
SQL Plan cache results
Statement String,
SDA was smart enough to apply the equipment range filter to the remote calc view, even though I had applied that filter to the local db table.
Query 3
(2 local tables & remote join to Calc View)
Create a local column table and insert the same equipment numbers range as previously used.
create column table sda_test.temp_equip_list (EQUIPMENT NVARCHAR(18) ); insert into sda_test.temp_equip_list select equipment from sapbs1."/BI0/QEQUIPMENT" where equipment like '6XR39%';
Now include the new column table in the same query.
select lt.equipment, re.stat from sda_test.temp_equip_list lt, sapbs1."/BI0/QEQUIPMENT" le, "SDA_TEST"."Zhana_CV_EQUIP_GEN_NEW_STATUS" re where lt.equipment = le.equipment and le.equipment = re.equnr and le.dateto = '99991231' and re.datbi = '99991231'
I was a bit weary of this one, so let's check the explain plan ahead of executing.
That's not good, the remote query has lost the crucial equipment filter and is now only filtering out the in-active records. I would have expected some remote cache or remote join type in this scenario. Let's let it roll to confirm..
Query hangs and eventually after a couple of cancel attempts, gives a rollback error on the local db.
Checking the remote to check the damage..
Ugghh, not good at all. it bombed after 56million rows, the remote calculation view can return up to 80 million active records. The sql statement string reflects the same as the explain plan above.
Query 4
Let's simplify this a bit, let's just see if having one local table and one remote table helps the remote cache and remote query join kick in. So created a new virtual table called Zhana_EQBS (4 million rows x 12 columns) and join to local sda_test.temp_equip_list (50K rows x 1 column)
select lt.equipment, re.b_werk plant, re.b_lager sloc from sda_test.temp_equip_list lt, "SDA_TEST"."Zhana_EQBS" re where lt.equipment = re.equnr
While the response time is quite impressive, all 4 million rows are still fetched from the remote query
(Note, again executed same query 3 times)
Observations in Summary:
- So based on the results above, we have to be quite selective on how we use SDA as it comes out of the box in rev 72. It has potential and there may be improvements on the way.
- I've been able to create a calculation view based on a virtual table and expose it to BW through an external view, but I haven't been able to create a Attribute or Analytical view based on virtual tables as all virtual tables get IS_COLUMN attribute set to false.
- There's a bug in Studio (again rev 72) where you can't filter on the remote schema/table. It only brings up the local tables, which is quite painful if you have a lot of activated content in _SYS_BIC for example.
- There are parameters available under configuration that may be able to help, so I'm going to open a message and engage SAP to see if we can improve the efficiency on how smart data access is suppose to work. I'm not sure if statistics have a part to play in the query optimization, but again hoping SAP folks can help out here.