Quantcast
Channel: SCN : Blog List - SAP HANA Developer Center
Viewing all 676 articles
Browse latest View live

Technical FAQs on Business Rules on HANA

$
0
0

What is Business Rules on HANA ?
Business Rules on HANA is an offering in SAP HANA to author, generate and execute rules. Rules can be designed using Decision Tables in SAP HANA Studio, followed by generation of SQLScript procedure which is nothing but the runtime artifact of the decision table and finally use this SQLScript procedure in application for radical improvement in decision making time. For more information refer the blog series – Big-Data Decision-Making made better with Business Rules in SAP HANA

 

 

How is Business Rules on HANA aligned with Business Rules Management Systems ?
In Business Rules on HANA,  Decision Table is the rule artifact that is used to extract the core decision logic in a tabular structure that is quick to read, clean and understandable. The power lies in the SAP HANA database-based rules engine that uses in-memory technologies for quicker decisions in real-time. In coherence to BRMS, the tool to define and manage rules is HANA Studio and the runtime environment to invoke rules is SQLScript Procedure that is generated once the rules are activated in HANA Studio. The lifecycle of the rules are managed and maintained by SAP HANA.



What is the runtime artifact for Business Rules on HANA ?
SQLScript Procedure. This SQLScript Procedure is generated when the decision table is activated from SAP HANA Studio. This procedure resides at following location in HANA Studio :  <HANA System>/Catalog/ _SYS_BIC/ Procedures/<package-name>/<decisiontable-name>



Which engine in SAP HANA is used to execute Business Rules ?
Calculation Engine is used to execute the SQL Procedure generated from the decision table. The calculation engine is optimized for complex calculation, SQL Scripts, MDX queries and planning engine operators. All these different programming models are translated into a common representation called “calculation model”, then these models are optimized using a rule-based model optimizer and finally this optimized model is executed by the Calculation Engine executor.

 


Is pure SQL faster than Decision Table ?
Decision Table is generated into a SQLScript Procedure which is the runtime artifact of Business Rules on HANA.  This SQLScript Procedure is run using Calculation Engine which is optimized to give high performance. While pure SQL is optimized and executed by the SQL Processor, the SQLScript Procedure by default is executed by Calculation Engine. In certain situations, SQL Processor could be faster than Calculation engine processor.



What is overall architecture of Business Rules on HANA ?


Overall.jpg


 

How does Business Rules on HANA scale with the increase in database rows ?
Performance of Decision Table increases linearly with increase in the database records. Follow the blog on performance to see how decision table performs in Select and Update SQL statements - Performance Studies of Decision Table in SAP HANA

 

 

What are the  usecases for Business Rules on HANA ?
Business Rules on HANA can be used in various industries where decision are subjective to high -change like in Banking for relationship-based pricing, credit decisioning, score card etc. OR in Insurance for new products, claims, agent commissions management etc . OR in Healthcare for fraud detection etc. The two usecase studies of Retail Industry could be found at the blog -Big-Data Decision-Making made better with Business Rules in SAP HANA

 


When is Column View generated as part of Decision Table activation ?
Decision table can be modeled in several ways. The data foundation can be based on (a) Single Physical table (b) Multiple Physical Tables with JOIN (c) Table types and (d) Information model like Attribute view, Analytical View or Calculation View. Depending upon the type of data foundation, there is an additional Column View or Result View generated together with the basic SQLScript procedure.
For case(b) where there is join operator (inner, right outer or left outer) used in Data Foundation, a column view is generated which is then used in final SQLScript procedure. This column view can be found at <HANA System>/Catalog/ _SYS_BIC/Column Views/<package-name>/<decisiontable-name>CV

 

 


When is Result View generated as part of Decision Table activation ?
Decision table that are modeled with ONLY parameters as Action return a Table Type when activated. The SQLScript procedure that is generated is a READ-ONLY procedure that uses this table type as OUTPUT. To consume such decision table, you have to use the Result View which is nothing but a column view that internally consumes the SQLScript procedure. This Result View can be found at -
<HANA System>/Catalog/ _SYS_BIC/Column Views/<package-name>/<decisiontable-name>RV

 



Why I do not find Result View even when Decision Table has Parameters as Action ?
Result View is generated when ONLY Parameters are used in Action. If any other Attribute is used in Action then Result View is not generated.


'Connect by' like functionality using User Defined Table Functions

$
0
0

Problem statement

 

Data looks like this

 

aField

bField

1
21
31
42
52
62
73
83
93


the output needs to look like this (displaying only aField values) :

 

Level 1Level 2Level 3
124
-35
--6
--7
--8
--9

Solution Explained

 

The solution has two parts
  1. Tag each row with the 'level' it is on - we will use a User Defined Table function for this
  2. Use the level to 'pivot' the table on its side - We will use SQL Query with table views

 

Part 1: Tagging each row with level
  1. Identify the rows whose 'aField' value never occurs in 'bField' (for e.g. values 4 to 9 never occur in bField)
  2. Row identified in step 1 are tagged with a  level (for e.g. 1)
  3. Once tagged, remove the rows from the reckoning (for e.g. remove rows where aField value between 4 to 9, leaving rows with values 1..3 in aField)
  4. loop over steps 1 to 3 until bField has only null values remaining. (for e.g. row 1)
  5. Union all tagged rows and return

 

Output

AFIELDBFIELDLEVEL
1?3
212
312
421
521
621
731
831
931

 

Part 2: Use the level to 'pivot' the table on its side
  1. We generate on (ad hoc) view per level column in the output filtered on level
  2. Each view also uses a window function to generate rownumber
  3. Using row numbers  write an outer join to generate the output

 

Output:

LEVEL1LEVEL2LEVEL3
124
?35
??6
??7
??8
??9

 

The code:

 

FUNCTION "************"."***********.rootpkg.procs.dml::connectby" ( )  RETURNS TABLE(aField tinyint,bfield tinyint, level tinyint)  LANGUAGE SQLSCRIPT  SQL SECURITY INVOKER AS
BEGIN  DECLARE Rank tinyint :=0;  DECLARE level tinyint :=1;  DECLARE I Tinyint;
-- get the data to process  bothFields = select aField,bField from  "DEV_1X28Q7TC9RQSNZH49YABNQ4JZ"."FLATSTRUCT";
--create the output stucture  outputTable = select aField,bField, :level as level from :bothFields where 1=2;  -- check if we need to an iteration  select count(*)  into Rank  from (select distinct bField from :bothFields where bField is not null)  where bField is not null;  --get distinct superior field and count of valid values  while :Rank > 0 Do  -- Get the lowest level - not in superior field and tag with level id  lowestlevel = Select aField, :level as level  from(select aField from :bothFields    EXCEPT    select distinct bField from :bothFields);  -- get rows that were not in the lowest level  newAField = select aField from :bothFields  EXCEPT  select aField from :lowestlevel;  --add tagged levels to output  tempOut=CE_JOIN(:lowestlevel,:bothFields,["AFIELD"],["AFIELD","BFIELD","LEVEL"]);  outputTable = Select aField, bField,  :level as level from :tempOut    UNION ALL    Select aField, bField, level from :outputTable;  --remove the rows tagged with level id from data set  bothFields = CE_JOIN(:newAField,:bothFields,["AFIELD"],["AFIELD","BFIELD"]);  --reset level id  level := :level+1;  -- check if we need to do another iteration  select count(*)  into Rank  from (select distinct bField from :bothFields where bField is not null)  where bField is not null;  END WHILE;  --attaching the top most level  outputTable = Select aField, bField, :level as level from :bothFields WHERE bField is null    UNION ALL    Select aField, bField, level from :outputTable;  return select * from :outputTable;
END;

The SQL Query

select three.afield  as level1, two.AFIELD as level2, one.AFIELD as level3
from
(select afield, ROW_NUMBER() over (order by afield) as rnum from "DEV_1X28Q7TC9RQSNZH49YABNQ4JZ"."i036632sapdev.rootpkg.procs.dml::connectby"()
Where level =1) one
full outer join
(select afield , ROW_NUMBER() over (order by afield) as rnum from "DEV_1X28Q7TC9RQSNZH49YABNQ4JZ"."i036632sapdev.rootpkg.procs.dml::connectby"()
Where level =2) two
on one.rnum = two.rnum
full outer join
(select afield , ROW_NUMBER() over (order by afield) as rnum from "DEV_1X28Q7TC9RQSNZH49YABNQ4JZ"."i036632sapdev.rootpkg.procs.dml::connectby"()
Where level =3) three
on one.rnum = three.rnum

Similar Solutions:

Please have a look at similar solutions Flattening the hierarchy using SQL code

 

Request:

This is my first blog post, please do leave feedback in the comments

Tip of the iceberg: Using Hana with Hadoop Hbase

$
0
0

Hana is a  Structured Database

Hbase is Hadoop’s Unstructured Database.


You probably already used Hbase today and didn’t realise it.

Hbase is used by Facebook, Twitter and Linkedin, to name but a few companies.


Broadly speaking a Hbase table has only 3 fixed points:

1) Table name

2) It’s key (a single field)

3) It’s Column families  [A column family is similar to a BW cube dimension, it represents a logical grouping of fields]


Beyond that anything goes. Columns can be added and populated on the fly.  Columns are unique to a record rather than the entire table.


Logically it makes sense though to make it semi-structured.

With consistent structures you can then use HIVE (Hadoop SQL) or Impala (Cloudera’s Realtime SQL) via smart data access to be read within HANA.   Good luck though as the underlying dataset grows


In an earlier blog I demonstrated how real-time tweets could be loaded into HANA & HBASE using HADOOP Flume.

http://scn.sap.com/community/developer-center/hana/blog/2013/08/07/streaming-real-time-data-to-hadoop-and-hana


This wasn’t necessarily the easiest way to load tweets into HANA  but provided the foundation for this blog.

To keep HANA lean and mean I only transferred the most relevant info to HANA.  E.g. to do sentiment analysis.


Rather than discard the rest of the tweet information (e.g. meta-data), I decided to horde it in Hadoop Hbase, for future reference.  The point of Hadoop is that it is a cheap scalable medium for storing and analyzing Big Data in the years to come.


Much like an iceberg I’ve keep the most important data visible in Hana, and left the large amount of related data, hidden beneath the waves, in HADOOP.


I still want to be able to view the data in Hadoop, so the the following is an example HANA XS development that enables me to view the entire Iceberg.


Using HANA XS & SAPUI5  I’ve created a simple Summary table of the tweets in HANA:


By selecting a Tweet I can now view the full details of the Tweet which is ONLY stored in Hadoop Hbase:

Note: In this case the 'Detail' is the full JSON string of the original tweet.

With a bit of time I could have easily split this into different SAPUI5 elements



To achieve this I’ve made use of the HBASE REST API known as Stargate:

http://wiki.apache.org/hadoop/Hbase/Stargate

 

Using the Hadoop User interface (HUE)  I can also view the same tweet details:

Note:  This screenshot is taken from a CDH version of Hadoop. 

The Hortonworks distribution doesn’t have HBASE visible from HUE yet, but you can still use the Stargate API.


In both HANA and Hbase tables I used the same key - TWEET ID


In order to bridge the gap between HANA and HBASE  I needed to:

- Create a link to the Hadoop Hbase Stargate API using an .xshttpdest  artifact

- Create HANA server side Javascript to perform a GET on HBASE stargate,  using the common KEY.


External http service:

Hbase.xshttpdest

host = "yyy.yyy.yyy.yyy";       {IP address of the HADOOP Cluster}

port = 20550;                       {Configured Hbase Stargate Port}

description = "Hbase stargate connection";

useSSL = false;

authType = none;

useProxy = false;

proxyHost = "";

proxyPort = 0;

timeout = 0;


HANA Server Side JS to GET and re-format the Stargate response:

HbaseTweetGET.xsjs

//import a library for decoding base64 string

$.import("HanaHbase","base64");

var base64 = $.HanaHbase.base64;

 

 

//create client

var client = new $.net.http.Client();

 

 

// Use HBASE destination defined in Hbase.xshttpdest

var dest = $.net.http.readDestination("HanaHbase", "Hbase");

var hBaseUrl;

 

 

//Next need to build up the url string expected by HBASE Stargate REST service

//to return a single tweet records from the tweet table, by the key

// e.g. /tweets/418677249424904192

 

 

//Currently hard coding the name of the Hbase table 'tweets'

//input is 'key'  of the HBASE table to return a single row

hBaseUrl = '/tweets/';

hBaseUrl += $.request.parameters.get("key") + '/'; // || "/";

 

 

var request = new $.net.http.Request($.net.http.GET, hBaseUrl);

 

 

request.headers.set("Accept", "application/json");

 

 

// send the request and synchronously get the response

client.request(request, dest);

var response = client.getResponse();

 

 

 

 

//get all the cookies and headers from the response

var co = [], he = [];

for(var c in response.cookies) {

    co.push(response.cookies[c]);

}

 

 

for(var c in response.headers) {

     he.push(response.headers[c]);

}

 

 

// get the body of the Response from Hbase

var body = undefined;

if(!response.body)

    body = "";

else

    body = response.body.asString();

 

 

var objBody;

objBody = JSON.parse(body);

 

 

// Hbase returns strings in base64 encoding

// Strings need to be decoded which could be done in XSJS or in the Front End JS

// I've opted to do in XSJS as I am also reformatting the body before sending to front end

if (objBody.Row != undefined ) {

  for (var i=0;i<objBody.Row.length;i++) {

  objBody.Row[i].key = base64.decode(objBody.Row[i].key);

  for (var j=0;j<objBody.Row[i].Cell.length;j++) {

  objBody.Row[i].Cell[j].column = base64.decode(objBody.Row[i].Cell[j].column);

  objBody.Row[i].Cell[j].$ = base64.decode(objBody.Row[i].Cell[j].$);

  }

  }

}

 

// send the response as JSON

$.response.contentType = "application/json";

$.response.setBody(JSON.stringify({"status": response.status, "cookies": co, "headers": he, "body": objBody}));

NOTE:  HBASE Stargate uses BASE64 encoding so I also used a library (base64.xsjslib) to decode the dataset.

I used the following code to create base64.xsjslib:

Algorithm Implementation/Miscellaneous/Base64 - Wikibooks, open books for an open world


The Results of the HbaseTweetGET.xsjs are:


NOTE:  The areas marked in    above denote the data already stored in HANA (tip of the iceberg).  The rest resides submerged in Hbase.


With a working connection in place it's then pretty straightforward to create a simple SAPUI5 page that enables the summary data to be read from HANA and the details from HBASE:

index.html

<!DOCTYPE html>

<html><head>

    <meta http-equiv='X-UA-Compatible' content='IE=edge' />

    <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"/>

    <title>Hana Hbase integration</title>

 

    <script id='sap-ui-bootstrap'

        src="/sap/ui5/1/resources/sap-ui-core.js"

        data-sap-ui-theme='sap_goldreflection'

        data-sap-ui-libs='sap.ui.commons,sap.ui.ux3,sap.ui.table'></script>

 

<script>

 

   /***************************************************

        HANA Output Table

      ***************************************************/

        var oPanel = new sap.ui.commons.Panel().setText('Tweets in Hana');

 

        var oModel = new sap.ui.model.odata.ODataModel("tweets.xsodata", false);

 

        oTableHana = new sap.ui.table.Table("tweetsTable",{tableId: "tableID",

                   visibleRowCount: 4,

                   firstVisibleRow: 3,

                   visibleRowCountMode: sap.ui.table.VisibleRowCountMode.Fixed,

                   rowSelectionChange: onRowSelect,

                   selectionMode: sap.ui.table.SelectionMode.Single,

                   selectionBehavior: sap.ui.table.SelectionBehavior.Row

                    });

 

 

        oTableHana.setTitle("Tweets");

 

        oTableHana.setModel(oModel);

 

        var colGby = new sap.ui.table.Column({label: new sap.ui.commons.Label({text:"Tweet id"}),

                                         template: new sap.ui.commons.TextView().bindProperty("text","ID"),

                                         width: "40px",

                                      sortProperty: "Tweet id",

                                      filterProperty: "Tweet id"

                                             });

     oTableHana.addColumn(colGby);

 

 

     colGby = new sap.ui.table.Column({label: new sap.ui.commons.Label({text:"Created At"}),

                                         template: new sap.ui.commons.TextView().bindProperty("text","CREATEDAT"),

                                         width: "40px",

                                      sortProperty: "CREATEDAT",

                                      filterProperty: "CCREATEDAT"

                                             });

     oTableHana.addColumn(colGby);

 

 

     colGby = new sap.ui.table.Column({label: new sap.ui.commons.Label({text:"User Name"}),

                                         template: new sap.ui.commons.TextView().bindProperty("text","USERNAME"),

                                         width: "40px",

                                      sortProperty: "User Name",

                                      filterProperty: "User Name"

                                             });

     oTableHana.addColumn(colGby);

 

 

     colGby = new sap.ui.table.Column({label: new sap.ui.commons.Label({text:"Tweet"}),

                                         template: new sap.ui.commons.TextView().bindProperty("text","CONTENT"),

                                         width: "300px",

                                      sortProperty: "Tweet",

                                      filterProperty: "Tweet"

                                             });

      oTableHana.addColumn(colGby);

 

  //Initially sort the table

       var sort1 = new sap.ui.model.Sorter("ID");

       oTableHana.bindRows("/TWEETS",sort1);

        oTableHana.sort("Tweet id");

 

 

       oPanel.addContent(oTableHana);

 

        oPanel.placeAt("uiArea");

 

 

 

 

   /***************************************************

        HADOOP HBASE Output

      ***************************************************/

 

        var oPanelHbase = new sap.ui.commons.Panel().setText('Tweet Detail in Hbase');

  

  

        //******* new try

        var oModelHbase = new sap.ui.model.json.JSONModel();

        oLayout = new sap.ui.commons.layout.MatrixLayout("mHbaseLayout",{columns: 2, widths : ['5%', '95%' ]} );

        oLayout.setModel(oModelHbase);

        var vText = '';

        var vField = '';

    

    

        // Twitter profile image

       var oImage = new sap.ui.commons.Image("i1");

       oImage.bindProperty("src", "Cell/6/$", function(sValue) {

          return sValue;

        });

       oImage.setTooltip("Tweet Profile Image");

       oImage.setDecorative(false);

    

    

        // Tweet Comment

        vText = 'Tweet';

        vField = 'comment';

      var oTF = new sap.ui.commons.TextArea("HTA-TextArea-"+ vField, {tooltip: vText, editable: false,

      value: '',

      width: '100%', height: '50px',

            wrapping : sap.ui.core.Wrapping.Soft

        });

 

 

        oTF.bindProperty("value", "Cell/3/$", function(sValue) {

          return sValue; // && sValue.toUpperCase();

        });

     

        oLayout.createRow(oImage,oTF);

  

        // Full Tweet JSON String

        vText = 'Tweet JSON';

        vField = 'jsonstr';

        var oLabel_JStr = new sap.ui.commons.Label("HLabel-l"+ vField, {text: vText, labelFor: oTF});

      var oTA_JStr = new sap.ui.commons.TextArea("HTA-TextArea-"+ vField, {tooltip: vText, editable: false,

      value: '',

      width: '100%', height: '300px',

            wrapping : sap.ui.core.Wrapping.Soft

       });

       oTA_JStr.bindProperty("value", "Cell/2/$", function(sValue) {

          return sValue;

       });

  

       var oCell = new sap.ui.commons.layout.MatrixLayoutCell({colSpan : 2 });

       oCell.addContent(oTA_JStr);

       oLayout.createRow(oCell);

   

       //Add Layout to Hbase Panel

       oPanelHbase.addContent(oLayout);

     

     

 

      oPanelHbase.placeAt("uiArea");

 

 

 

 

   /***************************************************

        ON ROW SELECT

      ***************************************************/

  function onRowSelect (oEvent){

     var oContext = oEvent.getParameter("rowContext");

     var TweetID = oContext.getProperty("ID");

 

 

     var HbaseJSON =  "HbaseTweetGET.xsjs?key=" + TweetID;

 

 

                         

      jQuery.ajax({

       url: HbaseJSON,

       method: 'GET',

       dataType: 'json',

       //async: false, // Switch of ASync

       success: setTweet,

       error: function(xhr, textStatus, errorThrown) {return;} });

  }

    

         function setTweet(collection) {

           oModelHbase.setData(collection);

           oLayout.bindContext("/body/Row/0");

         }

    

</script>

 

 

</head>

<body class='sapUiBody'>

    <div id="uiArea"></div>

</body>

</html>



I hope you found this interesting, your comments and suggestions are welcome.


Install R Language and Integrate it With SAP HANA

$
0
0



We have a Chinese version of this blog.


1 R language introduce

         R language is is an GNU project which based on S language, it can be treated as an implementation of S language. It is firstly developed by Ross lhaka and Robert Gentleman in the University of Auckland, New Zealand, it mainly used for statistic computing, graphing, and data mining.

        Since SAP HANA SP5, it is has been enforced greatly with the integrate  of memory computing and the statistic function of R language. This enables you use R as a procedure language, and call the function of R. The data exchange between SAP HANA and R is very efficient, because they all use the column storage style.  The communication process between SAP HANA and R is shown below:

1.png

      In order to execute R code in SAP HANA, the R code is writen as a procedure with RLANG. It is executed

by the external Rserve.For the support of R operator, The calculate engine inside SAP HANA has been extended, for the given input object, after some computing , output with a result table. Differently with the local operator, R operator can be processed by R function, when the calculate engine recognized the R operator, the R client will send a request to the Rserve, and at the same time send the parameters needed by the program, then it will begin to execute, and result data frame will be send back to the calculate engine.


      currently, there are some restrictions:

       1. In RLang, the input parameter can only be table types, so if you want to pass a scalar type, you need to encapsulate it into a table,.

       2. the name of variable in RLang Procedure can not contain uppercase letters.

       3. the RLang procedure must return at least one result ,in the form of data frame.


2  The installation of R

    

     The installation in windows is very simple, just download the corresponding installation package and double-click the set up program and go on with next. The following will focus on the installation in Linux platform. Before installation please make sure that the related software packages exists:

           xorg-x11-devel: for the support of X window

          gcc-fortran: build environment

          readline-devel: When using R as a standalone program.

          libgfortran46: SLES 11 sp2


         Then download the R language source package(R-2.15.0 has been tested), and decompress it, run the following command:

          ./configure --enable-R-shlib

         make

         make install

       if the installation is successful, execute R command in the shell ,you can start the interactive interpreter of R, as shown in the following figure.

 

2.png



     3 Integrate it with SAP HANA

      (1) Install Rserve

            Rserve is a server side of R based on TCP/IP. After start R, execute "install.packages("Rserve") ", the program will prompt you to select the image, and then it will start downloading and install it. Of course , you can also download "Rserve.tar.gz ,and execute "install.packages("/path/to/your/Rserve.tar.gz",repos=NULL) ", this will also lead you to install Rserve.

          After the installation ,please edit "/etc/Rserve.conf", and the following content:


         maxinbuf 10000000

         maxsendbuf 0

          remote enable


       then launch Rserve:

       /usr/local/lib64/R/bin/Rserve --RS-port 30120   --no-save   --RS-encoding utf8


    (2) configure SAP HANA

        Start SAP HANA Studio, choose Manage View, Configuration tab, navigate to indexserver.ini -> calcEngine, add the following parameters:

3.png



(4) simple test demo


     The following demo will compute the square of a column of prime:


CREATEROWTABLE"WEIYY_TEST"."PRIME" ( "NUMBER" INT CS_INT );

insertinto"WEIYY_TEST"."PRIME"values(2);

insertinto"WEIYY_TEST"."PRIME"values(3);

insertinto"WEIYY_TEST"."PRIME"values(5);

insertinto"WEIYY_TEST"."PRIME"values(7);

CREATEROWTABLE"WEIYY_TEST"."PRIME_SQR" ( "NUMBER" INT CS_INT );

CREATEPROCEDURE MY_F(IN input1 PRIME,OUT result PRIME_SQR)

LANGUAGE RLANG AS

BEGIN

 

      result<-as.data.frame(input1$NUMBER^2);

      names(result)<-c("NUMBER");

END;




execute this procedure, the result is like this:

4.png



5 tips


  Because currently the RLANG procedure only support table type parameter, but many times you need to input a scalar argument. When this happens, you can use "select from dummy" to generate a temp table.



CREATEPROCEDURE WAPPER_WEIBOSOHU(IN keyword NVARCHAR, INcrawltime INTEGER,OUT result WEIBOSOHU_TYPE)

LANGUAGE SQLSCRIPT

ASBEGIN

 

      inputinfo=select :keyword AS"keyword",:crawltime as"crawltime"from DUMMY;

      CALL fetch_weibosohu(:inputinfo, :result);

 

END;


For example , the procedure above, the outer WAPPER_WEIBOSOHU procedure is a SQLSCRIPT procedure, the fetch_weibosohu is a RLANG procedure, we use "select from dummy " to generate a temp table, and pass it to the inner procedure.


[Note: The test case for this article is based on SAP HANA SPS07 revision 70.00]

SAP HANA PAL quick start

$
0
0

We also have a chinese version of this blog.

 

1 Application scenario

  

   SAP HANA is a in-memory database which keeps data resident in memory for quick access. At the same time, the physical disk storage is also used as a data backup and logging to prevent data loss in caseof losing power. This architecture greatly reduced the time of data access and makes SAP HANA is "high speed".

  

      In traditional data model, database just a tool for storing  and fetching data, so for the application similar to the figure below, the client obtains the data from the Database, and then calculate the result and finally write it back to the Database, if the data is too large, the data transmission overhead is too large, and if the client does not have enough memory, sometimes the calculation and analysis process will also very slow.

flow.png

 

      With the help of large memory , SAP HANA provide a solution which move the data sensitive calculation into the database layer,  in this way we can eliminate the overhead of data transmission,  typical framework is as follows:

 

 

hanasys.png

 

        For some simple calculation, we can use SQLScript to accomplish, SQLScript provide some basic variable definition and flow control statement. But for complicated calculation and analysis, SQLScript may be not convenient, such as clustering analysis. For this purpose, SAP HANA provide AFL( application Function library), which implement some algorithms in C++ and package them into a library for a SQLScript to call. This greatly enriched the SQLScript's function.


2 PAL introduction


   PAL( Predictive Analysis Library)  is one of the library under the AFL framework, mainly used for prediction and analysis , providing a lot of data mining algorithm. For different application scenarios, the PAL function include the following categories:


     (1) cluster analysis

      (2) classification

      (3) association analysis

      (4) time series analysis

      (5) data preprocessing

      (6) statistial analysis

      (7) Social network analysis.


     for each category ,there are many specific algorithms, for exampe the k-means algorithm under the cluster analysis category.

       It is worth mentioning that ,AFL is a separate package, you need to install it first if you want to use it.


3  basic step


    To use PAL function, there are 3 steps.

     (1) generate AFL_WRAPPER_GENERATOR and AFL_WRAPPER_ERASER procedure

           It is simple to generate these two procedure. In the AFL package, there are two files named afl_wrapper_generator.sql and afl_wrapper_eraser.sql, copy their content to the SQL Console and execute them.  After that ,you need to assign the permisson.

 

  GRANTEXECUTEONsystem.afl_wrapper_generator to USER1;

  GRANTEXECUTEONsystem.afl_wrapper_eraser to USER1;


    This step just need to execute only once when you it is the first time you use AFL.


   (2)Generate the algorithm's instance.

          CALLSYSTEM.AFL_WRAPPER_GENERATOR(

              '<procedure_name>',       

              '<area_name>',

              '<function_name>', <signature_table>);

Procedure_name: name

Area_name:usuallyAFLPAL;

Function_name:algorithm name;

Signature_table: arguments table;


(3) call the algorithm

CALL<procedure_name>(

<data_input_table> {,…},   

<parameter_table>,

<output_table> {,…}) with overview;

Procedure_name:algorithm instance name

Data_input_table: data input;

Parameter_table:arguments table

Output_table:table for output;

 

 

 

 

 

4  Demo

 

I will show you the usage with one demo of DBSCAN algorithm. (Because the environment of my Machine exists the AFL_WRAPPER_GENERATOR procedure , so the first step need not to execute, and here we assume that the schema is named TEST.

DBSCAN is a cluster algorithm which is good at noise reduction, more description please see DBSCAN - Wikipedia, the free encyclopedia


 

/* create data table with two attribute */ 
CREATE TYPE PAL_DBSCAN_DATA_T AS TABLE ( ID integer, ATTRIB1 double, ATTRIB2  
double); 
/*table for control arguments*/ 
CREATE TYPE PAL_CONTROL_T AS TABLE( NAME varchar(50), INTARGS integer, DOUBLEARGS  
double, STRINGARGS varchar(100)); 
/*create result table type*/ 
CREATE TYPE PAL_DBSCAN_RESULTS_T AS TABLE( ID integer, RESULT integer); 
/*create parameter table*/ 
CREATE COLUMN TABLE PAL_DBSCAN_PDATA_TBL( "ID" INT, "TYPENAME" VARCHAR(100), "DIRECTION" VARCHAR(100) ); 
/*insert some arguments to the table*/ 
INSERT INTO PAL_DBSCAN_PDATA_TBL VALUES (1, 'TEST.PAL_DBSCAN_DATA_T', 'in');  
INSERT INTO PAL_DBSCAN_PDATA_TBL VALUES (2, 'TEST.PAL_CONTROL_T', 'in');  
INSERT INTO PAL_DBSCAN_PDATA_TBL VALUES (3, 'TEST.PAL_DBSCAN_RESULTS_T', 'out');  
/*permission assign*/ 
GRANT SELECT ON DM_PAL.PAL_DBSCAN_PDATA_TBL to SYSTEM; 
/*generate the algorithm instance of DBSCAN*/ 
call SYSTEM.afl_wrapper_eraser('PAL_DBSCAN9'); 
call SYSTEM.afl_wrapper_generator('PAL_DBSCAN9', 'AFLPAL', 'DBSCAN', PAL_DBSCAN_PDATA_TBL); 
/* create data table*/ 
CREATE COLUMN TABLE PAL_DBSCAN_DATA_TBL ( ID integer, ATTRIB1 double, ATTRIB2 double); 
/*insert test data*/ 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(1,0.10,0.10); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(2,0.11,0.10); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(3,0.10,0.11); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(4,0.11,0.11); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(5,0.12,0.11); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(6,0.11,0.12); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(7,0.12,0.12); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(8,0.12,0.13); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(9,0.13,0.12); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(10,0.13,0.13); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(11,0.13,0.14); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(12,0.14,0.13); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(13,10.10,10.10); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(14,10.11,10.10); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(15,10.10,10.11); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(16,10.11,10.11); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(17,10.11,10.12); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(18,10.12,10.11); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(19,10.12,10.12); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(20,10.12,10.13); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(21,10.13,10.12); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(22,10.13,10.13); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(23,10.13,10.14); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(24,10.14,10.13); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(25,4.10,4.10); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(26,7.11,7.10); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(27,-3.10,-3.11); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(28,16.11,16.11); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(29,20.11,20.12); 
INSERT INTO PAL_DBSCAN_DATA_TBL VALUES(30,15.12,15.11); 
/*create temp table*/ 
CREATE LOCAL TEMPORARY COLUMN TABLE #PAL_CONTROL_TBL( NAME varchar(50), INTARGS  
integer, DOUBLEARGS double, STRINGARGS varchar(100));  
/*set input arguments*/ 
/*threads 18*/ 
INSERT INTO #PAL_CONTROL_TBL VALUES('THREAD_NUMBER',18,null,null); 
/*auto set parmater*/ 
INSERT INTO #PAL_CONTROL_TBL VALUES('AUTO_PARAM',null,null,'true'); 
/*Manhattan distance*/ 
INSERT INTO #PAL_CONTROL_TBL VALUES('DISTANCE_METHOD',1,null,null); 
/*result table*/ 
CREATE COLUMN TABLE PAL_DBSCAN_RESULTS_TBL( ID integer, RESULT integer); 
/*call the algorithm*/ 
CALL _SYS_AFL.PAL_DBSCAN9(PAL_DBSCAN_DATA_TBL, "#PAL_CONTROL_TBL",  
PAL_DBSCAN_RESULTS_TBL) with overview; 
/*see the result*/ 
SELECT * FROM PAL_DBSCAN_RESULTS_TBL;  
  

DBSCANresult.png

 

if executed correctly, you will see the result above, the record is clustered into 3 categories, 0, 1, -1 is the cluster ID.



5   conclusion

    This article introduced the PAL in SAP HANA with the example of DBACAN. And many other algorithms have the similar step the use, The main work is to prepare the data ,and define the arguments according to the document, and finally call the algorithm.

      From the efficiency perspective, the use of PAL will take advantage of large memory of SAP HANA. If used properly, it is really fast for the analysis if big data!

       [Note: SAP HANA version of the test cases used here is for SAP HANA SPS06]

connect to SAP HANA in Python

$
0
0

We have a Chinese version of this blog


        Python is a simple scripting language, it is good at text processing and network programming,especially in Linux, because for various release version of Linux they contain Python. For some simple data processing task , if we use python to access SAP HANA it will be very simple. And inside SAP HANA, many test task is also done by python.

         In SCN ,there is one blog ”SAP HANA and Python? Yes Sir! “(http://scn.sap.com/community/developer-center/hana/blog/2012/06/08/sap-hana-and-python-yes-sir) which introduced the usage of python for SAP HANA under windows. In Linux platform, it is a little different, this article will show you how to use python in Linux platform.


environment set up


   The first thing is to download the SAP HANA client for Linux(SAP_HANA_CLIENT),  after the download , decompress the package then you will get a file list like this:

  linuxclient.png

   just execute the command "./hdbinst" and follow the instructions which lead you to complete the installation.

   The default installation path is "/usr/sap/hdbclient"

    After the installation, the file list is as follows:

installed.png


   You will see there is a directory names "Python" . This is the Python which shipped within SAP HANA CLIENT. Before you can use it you need to do some work. copy "dbapi.py","__init__.py","resultrow.py" which under the directory hdbcli to the  destination: /usr/sap/hdbclient/Python/lib/python2.6. At the same time, copy the file "pyhdbcli.so"  to the same directory.


   Then you can execute "/usr/sap/hdbclient/Python/bin/python"

connect.png



   generally as long as the "import dbapi" does not complains about error, the environment is OK. But it seems not cool to use the python package which shipped inside SAP HANA Client. You may wonder  if you can use your system's python to connect to SAP HANA, integrate them together, using the Linux's python to access HANA, this will be cool.


  In fact, you may discovered that for python to access SAP HANA,the key is the 4 files, the problem is , where should you put these 4 files for the Linux's python to access SAP HANA. If you are clean about this, for a new machine to access SAP HANA, you need not to install the client again ,just copy these 4 files. So we need to find the library for Python of 'Linux'.

  

  You can execute "whereis python" to locate python (for now my test environment is SUSE Linux 11 x86_64 sp2)


python: /usr/bin/python2.6 /usr/bin/python /usr/lib64/python2.6 /usr/lib64/python /usr/bin/X11/python2.6 /usr/bin/X11/python /usr/local/bin/python /usr/local/bin/python2.7-config /usr/local/bin/python2.7 /usr/local/lib/python2.7 /usr/include/python2.6 /usr/share/man/man1/python.1.gz


   You may need to try a few times to decide which is the correct path.

    In my machine, I found that copy the file "dbapi.py","__init__.py","resultrow.py", "pyhdbcli.so" to the directory "/usr/local/lib/python2.7" , then the Linux system's python can access to SAP HANA.


   But I think this is depends on the Linux distribution's version. So it is may be different in you machine.


2 simple demo


#import the database connect api  
import dbapi   
#argument  
serverAdress='<your IP>'  
serverPort=<port>  
userName='<username>'  
passWord='<password>'  
#connect to hana database  
conn=dbapi.connect(serverAdress,serverPort,userName,passWord)  
#query  
query="select idno,name  FROM WEIYY_TEST.PTAB1"  
cursor=conn.cursor()  
try:      ret=cursor.execute(query)      ret=cursor.fetchall()      for row in ret:          for col in row:              print col,          print  
except Exception,ex:      print ex  
#insert data  
query="insert into WEIYY_TEST.PTAB1(IDNO,NAME) values('111','hello,world')"  
try:      ret=cursor.execute(query)  
except Exception,ex:      print ex  
#close connection and cursor    
conn.close()  
cursor.close()  

 


     The code above will login to a database and execute a query command and then insert a new record:

query.png




  then we execute it again:


error.png


it shows the record which I have inserted, and in the exception process part there is a 'unique constraint violated" error.


Now you can see that it is really easy to use python to connect to SAP HANA,  a few lines of code will finish the task.


[SAP HANA version of the test cases used here is  SAP HANA SPS7 Revision 70.00]





Timing Task in SAP HANA

$
0
0

we have a Chinese version of this blog

 

   Assume that you will  receive a batch of file every other minutes, and then you need to import these files into SAP HANA. Traditionally ,  you have to write a script program to import these files and then set up a timing task to execute this program. And you may wonder if SAP HANA can support timing task ?


    Before SAP HANA SPS07 , this is not supported. After SAP HANA SPS07, this is simple, and very cool, as you will see, you can tell SAP HANA to execute some timing task autocratically, just SAP HANA, no other components like "crontab" of linux.


   In simple terms, SAP HANA's timing task will depend on the XS Engine, you can use XS Engine to set up the timing task. There are two types of task you can use, they are SQLScript task and javascript task.

 

   Now I will show you with an example.


1  Set up the environment.

     first ,you need to configure SAP HANA in HANA Studio, like the following figure:

 

readonly.png


    In default situation, the repository section, the sqlscript_mode  is 'default' , here I recommend to change it to 'UNSECURE'.  If you do not change this ,then  the task may not able to do the 'insert' or 'update' job.

studioconfig.png


    On the other hand, for the SCHEMA which you will handle, you should assign the related privilege. Or you may see the following error when activate the procedure.

activate.png


when this happens, you should assign the privilege to _SYS_REPO for the schema WEIYY. This is because when you commit to the repository, and then activate, it is the _SYS_REPO's job to generate the related object. So ,_SYS_REPO, must have the permission.


grant.png


    at last, you also should configure the xsengine, like the following figure, in the xsengine.ini, add a section named scheduler, and add the enabled argument with the value true.

 

QQ截图20140306133924.png


2 Timing Task


   There are two types of Timing Task, it may be a javascript task or a SQLScript task. The schedule information is described in a xsjob file, as the example will show you ,we will  implement a javascript task and a SQLScript  task.


    First we create a XS Project in SAP HANA Studio, and then create a .xsapp file. And then create a jsjobtest.xsjs file , this is a "xsjs" file, inside it there is a function myjsjob, this function is the job we will do. Here we just insert a record to he table "TESTTBL", and record the time information.


File: jsjobtest.xsjs

function myjsjob() {     var sql = "INSERT INTO WEIYY.TESTTBL VALUES (NOW(), 'inserted from javascript job')";     var conn = $.db.getConnection();     var pstmt = conn.prepareStatement(sql);     pstmt.execute();     conn.commit();     conn.close(); 
} 

Then we need a xsjob file, to descibe when the job will be scheduled.

File: myjsjobdesc.jsjob

{

    "description": "my first javascript job",

    "action": "weiyy.scheduletest:jsjobtest.xsjs::myjsjob",

    "schedules": [

    {

    "description": "run every 5 seconds",

    "xscron": "* * * * * * 0:59/5"

    }

    ]

}


  description: the description of the task.

  action:the task of the file,and the entrance of the function.

   schedules: the schedule information of the task ,the  grammar is similar to the grammar of "crontab", here we let the task run every 5 seconds.


  And now we add a SQLScript task, it also insert record to the "TESTTBL" ,but the information is a little different.


  File:sqljobtest.procedure

CREATE PROCEDURE sqljobtest ( )        LANGUAGE SQLSCRIPT       SQL SECURITY INVOKER       --DEFAULT SCHEMA <schema>  
AS  
BEGIN  
/*****************************        Write your procedure logic   
 *****************************/  
insert into WEIYY.TESTTBL(T,INFO) VALUES(NOW(),'insertted from SQLScript job');
END;

Here we let it run every 10 seconds.

File:mysqljobdesc.xsjob

 

{

    "description": "my first SQLScript job",

    "action": "weiyy.scheduletest::sqljobtest",

    "schedules": [

    {

    "description": "run every 10 seconds",

    "xscron": "* * * * * * 0:59/10"

    }

    ]

}


[Note: the action part is a little different between js task and SQLScript task ]



3  Launch the timing task

   

  After add the timing task ,the next step is to launch the task. We need to use the "XS Administration Tool", and you need to assign the role "sap.hana.xs.admin.roles::JobAdministrator" to XS Engine. For different HANA instance ,the web address is like this:


http://<WebServerHost>:80<SAPHANAinstance>/sap/hana/xs/admin/


   For our javascript task, set the "User" and "Locale" arguments and finally check the "Active", then the task is activated.

myjsjobdesc.png


For the SQLScript task, do the same:

mysqljobdesc.png


then go back to SAP HANA Studio


data.png


you can see that both of the task has been executed. And the javascript task executed more than the SQLScript task.


   In the "_SYS_XS"."JOBS" table you can see the task information.

    In the "_SYS_XS"."JOB_LOG" you can see the log information.

jobs.png


log.png


Use the XS Engine javascript API ,you can also add or delete timing task dynamically. for more information ,please refer to

http://help.sap.com/hana/SAP_HANA_Developer_Guide_en.pdf


[SAP HANA version of the test cases used here is SAP HANA SPS7 Revision 70.00. ]







In-depth understanding the principles of SAP HANA integrating with R

$
0
0

        we also have a Chinese version of this blog.

 

        From this article on, I will research the principle to combine HANA with R. After the SAP D-code meeting, I figured out that so many related applications are using R, especially the application about analysis and prediction. So I want to study the details deeply. These articles need the experience of R, and you had better have used R in SAP HANA. You can find related documents on https://help.sap.com/hana/SAP_HANA_R_Integration_Guide_en.pdf, and you can also read my another document on http://scn.sap.com/community/chinese/hana/blog/2014/02/14/r%E8%AF%AD%E8%A8%80%E5%8C%85%E5%AE%89%E8%A3%85%E5%B9%B6%E5%AE%9E%E7%8E%B0%E4%B8%8Ehana%E7%9A%84%E6%95%B4%E5%90%88

        These documents will show the bi-directional data flows that SAP HANA communicates with R. Then you can design more efficient procedure on SAP HANA with R and it also can help you to figure out the reason for your problem. You can check out logs for R, and you can even integrate R with your own applications in TCP/IP if you can support TCP/IP.

 

(1)       Embedded R’s execution environment

            Because R can support an embedded execution environment, so SAP HANA can integrate with R. It means that if you have installed some specified  libraries, you can add R programs into C programs.

 

QQ截图20140413175009.png

 

          Under R’s installation directory (such as /user/local/lib64/R), there are some head files providing some functions’ prototype and a dynamic-link library file- libR.so.  It can run R program with C with these files’ support. For example

 

 

 

#include <stdio.h>  
#include "Rembedded.h" //header file
#include  "Rdefines.h"  
int main(){          char *argv[] = {                  "REmbeddedPostgres", "--gui=none", "--silent" //arguments          };          int argc = sizeof(argv)/sizeof(argv[0]);          Rf_initEmbeddedR(argc, argv);          SEXP e;          SEXP fun;          SEXP arg;       int i;          fun = Rf_findFun(Rf_install("print"),  R_GlobalEnv);          PROTECT(fun);       arg = NEW_INTEGER(10);           for(i = 0; i < GET_LENGTH(arg); i++)               INTEGER_DATA(arg)[i]  = i + 1;          PROTECT(arg);          e= allocVector(LANGSXP, 2);       PROTECT(e);       SETCAR(e, fun);       SETCAR(CDR(e), arg);          /* Evaluate the call to the R function.Ignore the return value. */       eval(e, R_GlobalEnv);           UNPROTECT(3);         return 0;  
}  

       This code mainly defines some functions and macro in R language kernel. Firstly, initialize an embedded execution environment through calling Rf_initEmbeddedR(argc, argv). SEXP represent a kind of pointer which point some internal data structures. (refer to R’s source code with R-2.15.0/src/main), then it defines an array arg values from 1 to 10. Lastly, execute the function print() with eval(e, R_GlobalEnv).

            We can compile these codes with command:

gcc embed.c  -I/usr/local/lib64/R/include -L/usr/local/lib64/R/lib –lR

            -l: the path of head files;

            -L: the path of dynamic-link library;

            -IR: with this parameter, it can link to libR.so

 

QQ截图20140413185129.png

 

        As we can see, the result is similar to R.

        Because of this, based on embedded R execution environment, we can create an R server as a TCP/IP server. It can accept request from TCP/IP client, execute the R program and return the result to the client. This is the primary reason for developing Rserve

         Above all is the basis of combination of SAP HANA and R.

 

(2)     Introduction for Rserve

          Rserve was born on 10.2003, the newest version is Rserve 1.7-3 which was published on 2013. The writer is Simon Urbanek(http://simon.urbanek.info/) who is doing some research work on AT&T labs. We can download Rserve server and client from http://www.rforge.net/Rserve/, and we can get more details about Rserver on this website.

            Server is implemented with C. It can accept request and data from client, then it will return the results to client after calculation. The client provides C++, java and PHP version. Speak of this, the C++ interface only provides basic functionality, with author’s own words, “This C++ interface is experimental and does not come in form of a library”. It is just experimental which only some basis data structures, like lists, vectors, and doubles. For some other types, you need to design it by yourself. Just as “Look at the sources to see how to implement other types if necessary”

            However, in SAP HANA’s R client, it is implemented by C++. But it is more complicated than the original C++ interface. Actually in theory, you can implement any kind of clients with any language if you get TCP/IP’s support.



(3)      Message-oriented communication protocol:QAP1

          QAP1(quad attributes protocol v1) is applied for Rserve to communicate with clients. According to QAP1, the clients should send a message first, which contains specific actions and some related data, then it will wait for the response message from server. The response message should contain the response code and the result data. As the structure of the response message, it contains a header portion whose size is 16 byte and data portion. The structure of header is as follows:


             Offset                     type       meaning

             [0]                            (int)       the type to request and response

             [4]                            (int)      set the length of message(0 to 31bit)

             [8]                            (int)       set the offset of data part

             [12]                          (int)       set the length of message(32 to 63 bit)


          The data portion of the message may contain some additional parameters, such as DT_INT, DT_STRING or other types of parameters. Specific reference Rsrv.h .

         Here are some commands which Rserve support,

 

command                 parameters         | response data

  CMD_login                DT_STRING      | -

CMD_voidEval           DT_STRING     | -

CMD_eval DT_STRING or  | DT_SEXP

DT_SEXP

CMD_shutdown        [DT_STRING]   | -

CMD_openFile            DT_STRING     | -

CMD_createFile          DT_STRING     | -

CMD_closeFile              -                        | -

CMD_readFile                [DT_INT]        | DT_BYTESTREAM

CMD_writeFile     DT_BYTESTREAM  | -

CMD_removeFile      DT_STRING       | -

CMD_setSEXP             DT_STRING,     | -

                                               DT_SEXP

CMD_assignSEXP          DT_STRING, | -

                                               DT_SEXP

CMD_setBufferSize        DT_INT         | -

CMD_setEncoding       DT_STRING    | - (since 0.5-3)

since 0.6:

CMD_ctrlEval               DT_STRING     | -

CMD_ctrlSource          DT_STRING     | -

CMD_ctrlShutdown                  -           | -

since 1.7:

CMD_switch                   DT_STRING     | -

CMD_keyReq                  DT_STRING     | DT_BYTESTREAM

CMD_secLogin            DT_BYTESTREAM | -

  CMD_OCcall DT_SEXP                                | DT_SEXP

 

 

           The most commonly used command is CMD_EVAL. It can receive an R code. After syntax parsing, it can execute the code and get the result, then sends back the response message.

            Actually, we can run embedded R program directly in SAP HANA, and it is more simple and efficient. But we cannot do this because of the copyright issues of open source software.

            That’s it for now, I will introduce the operating mechanism Rserve and its communication with Rserve in the following blogs. If you know the principles of the detail, I think this will help you write better R procedure.

 

 




Web Based Hana VDM Explore

$
0
0

When we use Hana Studio to view an complex VDM, we often meet following challenges:

  • We are very interested in the calculated columns, but we don't know which nodes have the calculated columns. We need open one by one to check
  • We want to know how one column coming from: whether rename from a to b then to c, or be calculated from several other columns. In Hana Studo, we need open the node one by one, then manually check the columns part, then remember the relationship.
  • Some calculated column is very complex, the if/then/else nested for so many level, it is very hard to understand the complex expression
  • One VDM call other VDM, we need open the related VDM one by one. We can't jump from one VDM to other VDM easily
  • Lack the overall information. For example, we want to know how several column mapping from the related VDM or tables
  • Sometime the Hana Studio is so slow. We need connect to HANA Server to see the VDM

 

In order to solve those challenges, recently I developed a Web Based Hana VDM Explore using SAPUI5. ( As currently the HANA Web-Based Development Workbench don't support display the VDM in graphic, detail see What´s New? SAP HANA SPS 07 Web-based Development Workbench )

 

How to use:

Just open this url in IE9 or Chrome: (user name/password: anzeiger/display ) (Only work inside SAP, if use interested outside sap, i can provide outside access later) https://ldciuxd.wdf.sap.corp:44329/sap/bc/ui5_ui5/sap/VdmExplore/index.html?sap-client=902

 

It include 3 VDMs just for show case. You can press the 'Load Files...' button to load your VDM file ( one time can load multiple files), and press 'Add Package' or 'Del' to manage the package.

 

How to save VDM into your local computer:

There are 3 solutions:

  1. In hana studio, click the 'Repository' tab, select the package, then right click mouse to 'check out', then it will save to local computer
  2. In hana studio open the VDM, choose the 'Display Repository XML' in the right top, then save it as local file ( like following snapshot)
  3. Use the Hana Web-Based IDE, click the VDM file, it will display the xml content, select all, then save to computer.

SaveXml.png

 

Main Screen:

main.png

There are three main parts in the main screen ( the layout similar as the Hana Studio main screen):

 

Hana Package Part:

User can add package by press 'Add Package' button, and load the VDMs from file by press 'Load Files...'  Just click one VDM to show detail information in the left part.

 

VDM Node Part:

This part shows the VDM as an tree. Click one node to show the detail information in the 'Node Detail Part'

 

Node Detail Information Part:

This part shows the detail information about one node. For the topmost Semantic node, it will show the depended Tables and VDMs and the Input parameter/Variables. For the other node, show the calculated column and normal columns.

 

VDM Semantic Overall information

Just choose the topmost semantic node, the overall VDM information will show like following picture: the depended VDM (just click to open), tables and input parameters.

VdmOverall.png

 

Input Parameter Information:

Click the 'Show Details', it will show all the input parameter / variables information, include the name, description, type, mapping
InputParam.png

 

How to track the column path ?

 

TrackPath.png

 

We often want to know how a column coming from other nodes.  Just select that column and press 'Track Path',  then it will high light the nodes which have relationship with this column. Just like the above picture, it will mark the node use red block and show the detail in text: rename from A to B, or ...   Press 'Clear Track' to clear the tracking mark.

 

 

Show the calculated column in formatted style

In Hana Studio, we need click one calculated column to show the detail information. With this tool all the calculated column display in a table in the formatted style.  Further more, user can easily know which node have the calculated column / filter by just check the node icon.  With the C in left top corner means have the calculated column, with the F in the left bottom corner means have the filter expression

CalculatedColumn.png

 

Display complex expression in friendly way

In hana Studio, it is very difficult to understand some complex nested if expression. Just like following real example for the net due data, it nearly 2000 char, it is a nightmare to understand the logic

HanaStudioComplexIf.png

With this tool, the life became easy. It format it like following. So no matter how many levels it nested, it is very easy to understand. ( So we can easily know that this complex expression is not optimized well....)

ComplexIf.png

 

View the VDM in an overall and compact way

Sometimes we want to know how several columns coming from (from other VDM, Table, or nodes) in an overall view. In Hana Studio we need click the node one by one and remember them. With this tool, we can: just select the columns you interested in, and click 'Advance Analyze', then it will open a new windows to show the relationship in a tree table.

CompactView.png

 

Unit Test support (coming soon)

When we do unit test use the SAP hana unit test framework, we have so  many tiring work: replace the depended vdm/table with private vdm/tables, provide some sample data, comparing the result. As we have all the information for the VDM, so we can provide GUI interface to help user easily do the unit test.

 

Conclusion:

This tool only support 'explore' the VDM in read-only mode, so it can display some information in a user friendly way. It target at help to understand the complex VDM.

I build it use SAPUI5 recently, nearly 5000 line of source code. If found any bug, please drop a mail to me.

 

 

Table Distribution impact on Query execution in a HANA Scale out landscape

$
0
0

Background:

We wanted to review our current table distribution in the HANA sidecar and see if the resident reports/models/queries would be negatively/positively affected if we starting “grouping” similar tables on the same node. Much of the replicated tables from ECC are oltp transactional based tables and so even basic reports would need 5-10 tables to be joined or merged together. Before spending a lot of “where used” type investigation for the logical "grouping", we wanted to do some basic analysis on how much overhead not having all the sources of a report reside on the same node is having.

 

 

Our "Test/Perf" Scale out Landscape

 

 

dbversion.PNG

 

Up to now, table distribution was handled by the default round robin assignment, there's also an option through Studio to redistribute using a Wizard type function here. Please also Save the current distribution prior to making any changes.


Execute_Optimize_Distribution.PNG




At this stage it's a good time for folks to look at this good reference document to be found here, but a bit outdated for Studio links etc. As I don't want to go through all the same stuff here.

 

Table Distribution in Hana

 

 

I'm going to focus on just one schema for now, this is where all the ECC tables are replicated to, so let’s see how our current configuration matches up.


Landscape_Qry_Results1.PNG


From above, 02 looks to be carrying the bulk of the tables, so not sure how efficient the default distribution is working. In any case, please also note the 04 node is being kept exclusive for failover and isn't allocated any tables currently.

 

SELECT a.host, a.schema_name, a.table_name, a.record_count, a.loaded, indexserver_actual_role as "Server Role", 'Column Store Tables' as Component1,  round(ESTIMATED_MAX_MEMORY_SIZE_IN_TOTAL/(1024*1024*1024), 2) AS "ESTIMATED MEMORY SIZE(GB)",  round(MEMORY_SIZE_IN_TOTAL/(1024*1024*1024), 2) AS "Currently used MEMORY SIZE(GB)",  round(MEMORY_SIZE_IN_DELTA/(1024*1024*1024), 2) AS "Currently used Delta Store size(GB)"
FROM M_CS_TABLES a, public.m_landscape_host_configuration b
WHERE a.host = b.host
AND a.schema_name IN ('ECC_REPORTING')
AND ESTIMATED_MAX_MEMORY_SIZE_IN_TOTAL > 0
order by 8 desc

 

Going to focus specifically on 3 of the top 10,

 

Table_Qry_Results_new.PNG

 

An initial problem was that HF1 didn't look anything the table distribution in Production, so let's execute the following script on Prod, so we can sync HF1 to match.

 

select 'ALTER TABLE '||a.schema_name||'.'||a.table_name||' MOVE TO '||     replace(a.host,'PROD_HOST_NAME','PERF_HOST_NAME')||':37003'
from M_CS_TABLES a, public.m_landscape_host_configuration b
WHERE a.host = b.host
AND a.schema_name IN ('ECC_REPORTING')

Syntax for re-distribution


  1. 1. ALTERTABLE table_name MOVETO'<host:port>'


Note: Port is the port of the target index server and not the SQL port.

 

The actual alter table statement returns immediately, as during Redistribution the link to the table and not the physical table is moved. You'll also note the table is fully removed from memory and will have the loaded = No. So on the next merge or access, it will take some additional time.

 

A simple Calc View for our analysis

 

calcView.PNG

 

JEST being our largest table has already been hash partitioned equally across the 3 nodes on HF1. So let's work with what we have. The tables all basically reside on the 3 different nodes. Will I see any significant performance overhead due to this configuration?


PriorAlter1.PNG


Qry1:

 

Let's just select equipment number for now, which should prune the right hand side of the calc view above and not hit the partitioned JEST table at all.

 

Select top 1000 equnr
from "_SYS_BIC"."abc.mfg.genealogy.models/CV_EQUIP_GEN_NEW_STATUS"
where erdat > '20140301'

Let's see what the VizPlan has to offer, I do expect to see some Network Data Transfer entries based on current set-up.



qry1_viz_plan2.PNG

Note, there are a few Transfers noted above, however the overhead is relatively small compared to the overall run time.


Qry2:

While we have the table allocation as is, lets add in the partitioned table and see if we see more of these Network Data Transfer entries in the VizPlan. I'll also expand the query with no where clause, but keeping the top 1000 stop clause. This is a Perf box after all and I want to see if having to move around much larger volumes of records will increase the time spend on the network data transfer.

 

Select top 1000 equnr, stat
from "_SYS_BIC"."abc.mfg.genealogy.models/CV_EQUIP_GEN_NEW_STATUS"

qry2_viz_plan2.PNG

 

From above, we can see there is an increase in the number of Network Data Transfer, however even with the larger volume, the overhead seems acceptable, again compared to the overall runtime. We also see the both paths of the calc view are now executed and the JEST partitions are hit in parallel.

 

qry2_viz_plan3.PNG

 

 

Qry3:

 

Let's see if moving EQUI & EQUZ to the same node removes the network data transfer entries.

 

ALTER TABLE ECC_REPORTING.EQUZ MOVE TO 'HF1_HOSTNAMEX02:37003'    ;

Statement 'ALTER TABLE ECC_REPORTING.EQUZ MOVE TO 'hf1hana02:37003''

successfully executed in 259 ms 793 µs  (server processing time: 150 ms 915 µs) - Rows Affected: 0


AfterAlter1.PNG


Note EQUZ is no longer resident in memory, let's do a LOAD so this won't be an issue when we execute the query.


LOAD ECC_REPORTING.EQUZ ALL  
Started: 2014-05-09 15:19:49  
Statement 'LOAD ECC_REPORTING.EQUZ ALL'  
successfully executed in 1:05.285 minutes  (server processing time: 1:05.181 minutes) - Rows Affected: 0


Exec same qry1 again and let's check the VizPlan,

 

Select top 1000 equnr  
from "_SYS_BIC"."abc.mfg.genealogy.models/CV_EQUIP_GEN_NEW_STATUS"  
where erdat > '20140301'

 

qr3_results.PNG

As expected, the Network Data Transfer steps seen in Qry1 VizPlan are now gone.

 

 

Summary:

So I hope this blog is a start for folks that are looking at table distribution, how to check current configuration and how to possibly identify potential issues etc. The results themselves are not very dramatic or resounding, but hopefully this may kick off ffurther discussions. There's also room for a follow up specifically on Partitioning and how it can improve/reduce query efficiency. I only used 3 tables in the queries above, so the relatively small impact of having the tables distributing may increase with even larger tables or queries involving a larger number of tables.

So all comments or observations welcome

 

How to securely mask or hide column data using SQL MAP Function in SAP HANA views

$
0
0

The need for column level security at the database layer can be avoided by masking or hiding data at the application level. However, as a DBA, you may prefer to set up a model where

  1. Only DBAs have access to the physical model
  2. Access to data is only exposed through base views for which DBAs securely manage access.  Once the base views are in place and secured for your highly sensitive data, you can have the confidence you need to sleep well at night

 

SAP HANA offers the ability to manage column security by allowing the creation of additional (secured) views to expose more sensitive columns. In the event that this approach does not fit for your project due to specific administrative requirements, I offer here an alternate approach that may be considered.

 

Benefit of this approach

Lower maintenance due to less views

 

Overview

The following is intended to be a simple example demonstrating how to hide or mask column data based on a HANA user’s assigned privileges or roles. We will store sensitive employee data (social security number) in a HANA table that is not directly exposed to users. The employee names and SSNs will be exposed to users by a single view that exposes the SSN to some users and not others.

Step 1: Create schema and employee table

CREATESCHEMA MYSCHEMA;

 

CREATECOLUMNTABLE"MYSCHEMA"."EMPLOYEE"

("FIRST_NAME"NVARCHAR(32),

        "LAST_NAME"NVARCHAR(32),

        "SSN"NVARCHAR(12),

        "EMPLOYEE_ID"INTEGERNOTNULL,

        PRIMARYKEY ("EMPLOYEE_ID"));

 

insertinto"MYSCHEMA"."EMPLOYEE"values('LOU','JOHNSON','456-78-9123',1);

insertinto"MYSCHEMA"."EMPLOYEE"values('BOB','THOMPSON','345-67-8912',2);

insertinto"MYSCHEMA"."EMPLOYEE"values('CINDY','BENSON','234-56-7891',3);

 

Step 2: Create privilege table

In this example we create a privilege table for users where a one (1) in the HAS_PRIV column indicates that a user has this privilege. So USER1 has the privilege to access social security numbers.

CREATECOLUMNTABLE"MYSCHEMA"."PRIVS"

("USER_ID"NVARCHAR(32)  NOTNULL,

        "PRIV_NAME"NVARCHAR(32) NOTNULL,

        "HAS_PRIV"TINYINTNOTNULL,

        PRIMARYKEY ("USER_ID"));

 

insertinto"MYSCHEMA"."PRIVS"values('USER1', 'READ_SSN', 1);

insertinto"MYSCHEMA"."PRIVS"values('USER2', 'READ_SSN', 0);

 

Step 3: Create privilege view

This view uses the SQL MAP function to list the session_user’s granted privileges from the privilege table as columns.

 

Important Note: You must use the system variable session_user instead of current_user. See the explanation at the end of this post for the reason.

 

CREATEVIEW"MYSCHEMA"."V_PRIVS"AS

select

user_id,

        MAX(READ_SSN_PRIV) AS READ_SSN_PRIV

from ( select p.user_id,

MAP(p.PRIV_NAME, 'READ_SSN', MAP(p.HAS_PRIV, 1, p.HAS_PRIV, NULL), NULL) AS READ_SSN_PRIV

       from"MYSCHEMA"."PRIVS" p

       WHERE p.user_id=session_user )

GROUPBY user_id;

 

When I am logged in as USER1, I see the following privileges when I query the view.

pic1.png

Step 3: Create employee view (option 1)

The employee view will use the privilege view and return a null if the session_user does not have the required privilege granted in the priv tavble.

CREATEVIEW"MYSCHEMA"."V_EMPLOYEE"AS

select"FIRST_NAME",

        "LAST_NAME",

        "EMPLOYEE_ID",

         MAP(p.READ_SSN_PRIV, 1, e.SSN, NULL) AS SSN

from"MYSCHEMA"."EMPLOYEE" e,

        "MYSCHEMA"."V_PRIVS" p;

 

When I am logged in as USER1, I see the complete SSN data when I query the view.

pic2.png

When I am logged in as USER2, I see nulls for the SSN data when I query the view.

pic3.png

 

Step 4: Create employee view (option 2)

Instead of returning nulls, we could mask the first 5 digits of the SSN and display only the last four digits for users without the required privilege.

 

CREATEVIEW"MYSCHEMA"."V_EMPLOYEE_MASK_SSN" ( "FIRST_NAME",

        "LAST_NAME",

        "EMPLOYEE_ID",

        "SSN") AS

select"FIRST_NAME",

        "LAST_NAME",

        "EMPLOYEE_ID",

         MAP(p.READ_SSN_PRIV, 1, e.SSN, 'XXX-XX-' || SUBSTR(e.SSN, 8)) AS SSN

from"MYSCHEMA"."EMPLOYEE" e,

        "MYSCHEMA"."V_PRIVS" p

 

When I am logged in as USER1, I see the full SSN data when I query the view.

pic4.png

 

When I am logged in as USER2, I see only the last four digits of the SSN data when I query the view.

pic5.png

 

Option: Tie the privileges to assigned roles

You can also create a privilege view using the assigned role of a user by querying the sys.granted_roles table and matching the grantee column to the session_user

CREATEVIEW"MYSCHEMA"."V_PRIVS"AS

select

        user_id,

        MAX(READ_SSN_PRIV) as READ_SSN_PRIV

from ( select

        r.grantee as user_id,

        MAP('SSN_ROLE', r.role_name, 1, 0) AS READ_SSN_PRIV

       from sys.granted_roles r

       WHERE r.grantee = session_user )

groupby user_id

 

Considerations

  • In this example I show a SQL view. If using a column view you are limited to using a scripted calc view. As such, any dependent column views must also be calc views, as opposed to attribute or analytic views
  • You may have some performance degradation compared to just adding new views for the columns that need to be secured. But depending on your requirements and the size of your data, this should be tolerable. In my example above, I am able to select a single row from the employee view in 23ms where the underlying employee table has 1 million rows
  • This approach only works on tables with primary keys (tables must have a unique identifier)
  • Field names are visible. To completely hide field names, creating additional (secured) views exposing the hidden columns would be the best approach

 

Session_User vs Current_User

Using this approach you must use the session_user variable, not the current_user variable to filter access. The current_user variable returned from HANA column views is not the invoker’s user ID but that of the definer. Even though the definer’s ID is returned as the current_user, HANA secures column views based on the invoker of the view.

 

MAP Function

More on the map function in the SAP HANA SQL Script Help Guide.

http://help.sap.com

-> SAP In-Memory Computing -> SAP HANA Platform -> Reference Information -> SAP HANA SQL and System Views Reference

The MAP function is documented in the SQL Functions-> Miscellaneous Functions section.

 

Disclaimer

This post describes a possible approach to consider when securing column data in HANA but your specific implementation should be validated by security experts within your organization.

Developers in the Bay Area: Join SAP experts for a hands-on Developer Day on SAP HANA

$
0
0

I-DeveloperDay-PaloAlto-300dpi.jpg

JUNE 11    |    SAP HANA

 

 

Here is an exclusive opportunity for you to learn and get your hands dirty with SAP HANA - SAP’s column-based in-memory platform for big data, real-time apps - with live support from SAP experts. During the event, the experts will show you how the platform works and how to get started. You’ll get hands-on coding experience developing on top of SAP HANA. You’ll have the opportunity to explore the building blocks needed to create SAP HANA apps such as:

 

  • Creating tables and views using Core Data Services
  • Modeling HANA views
  • Creating SQLScript stored procedures
  • Leveraging services such as oData and server side JavaScript
  • Building UIs using SAPUI5

 

This is a bring-your-own-laptop event that will be supported by SAP HANA development experts who will be on hand to provide live support while you’re building your apps. In the days prior to the event, we will send you a checklist including links to download SAP HANA tools. If you run into problems with the installations, experts will be available from 7 AM to 9 AM the day of the event to help you troubleshoot and install the tools. 

 

The event will take place on Wednesday, June 11 at the SAP Labs campus in Palo Alto (3410 Hillview Ave, Bldg 1, Palo Alto, CA 94304). Our leading experts will be developers and product managers Thomas Jung and Rich Heilman. For more details and for an overview of the agenda, click here.

 

This is event is free of charge but space is limited so REGISTER NOW!


We look forward to seeing you there!


Access to Hive from HANA - Section 1 Hadoop Installation

$
0
0

To Access to Hive from HANA, we first should have Hadoop and Hive installed. In the first section and the second section, installation of Hadoop and hive will be introduced. 


1. Download Hadoop and move to directory

Download Hadoop from apache Hadoop mirror: http://hadoop.apache.org/releases.html#Download

In this case, we choose Hadoop-2.2.0.

Unzip the downloaded Hadoop package and put the Hadoop fold to directory where you want it to be installed.

tar -zxvf  hadoop-2.2.0.tar.gz

Switch to your Hana server user:

su hana_user_name.

We need to install Hadoop under Hana user, because Hana server needs to communicate with Hadoop with the same user.

If you just want to set up Hadoop without accessing from Hana, you can simply create a dedicate Hadoop account by “addgroup” and “adduser” (these two command lines depend on system, Suse and Ubuntu seem to have different command lines)

 

2. Check Java

Before we install the Hadoop, we should make we have Java installed.

Use:

java –version 

to check java and find java path by

whereis java

And write the following script in $HOME/.bashrc to add your java path:

export JAVA_HOME=/java/path/

export PATH=$PATH:/java/path/


3. SSH passwordless

Install ssh first if you don’t have.

Type the following commands in console to create a public key and put the key to authorized keys

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

 

4. Add path to Hadoop

Write the following script in $HOME/.bashrc if you want to add the Hadoop path permanently.

Open the .bashrc file by

vi $HOME/.bashrc

Add the following script

export HADOOP_INSTALL=/hadoop/path/

For the hadoop path, I put the Hadoop folder under /usr/local,

so I use /usr/local/hadoop instead of /hadoop/path/ in my case

export PATH=$PATH:$HADOOP_INSTALL/bin

export PATH=$PATH:$HADOOP_INSTALL/sbin

 

5. Hadoop configuration

Find the configuration files, core-site.xml, hdfs-core.xml, yarn-site.xml, mapred-site.xml, hadoop-env.sh in Hadoop folder. These files exist in $ HADOOP_INSTALL /etc/hadoop/ under you Hadoop folder. You may simply rename the “template file” in the folder if you can find the xml files. For example:

cp mapred-site.xml.template mapred-site.xml

Some other tutorials said you can find them under /conf/ directory, I guess /conf/ is for older Hadoop version, but in hadoop-2.2.0 the files are under /etc/hadoop/

 

Modify the configuration files as followed:

vi core-site.xml

Put the following between configuration tab

<property>

<name>fs.default.name</name>

<value>hdfs://computer name or IP(localhost would also work):8020</value>

</property>

 

vi hdfs-site.xml

Put the following between configuration tab

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>file:/namenode/dir</value>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>file:/datanode/dir</value>

</property>

<property>

<name>dfs.permissions</name>

<value>false</value>

</property>

 

vi yarn-site.xml

Put the following between configuration tab

<property>

<name>yarn.resourcemanager.hostname</name>

<value>yourcomputername or IP</value>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>

    <value>mapreduce_shuffle</value>

  </property>

 

vi mapred-site.xml

Put the following between configuration tab

<property>

<name>mapreduce.framework.name</name>

   <value>yarn</value>

</property>

 

For more information about all the tabs, please check

http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/core-default.xml

 

http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

 

http://hadoop.apache.org/docs/r2.2.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

 

http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

 

vi hadoop-env.sh

add the following two statement in the end of this file

export HADOOP_COMMON_LIB_NATIVE_DIR=/hadoop/path/lib/native

export HADOOP_OPTS="-Djava.library.path=/hadoop/path /lib"

 

6. Start Hadoop

       The last thing needs to do before starting your Hadoop is to format your namenode and datanode simply by:

            hadoop namenode -format

            hadoop datanode -format

In the end, you can start Hadoop by calling “start-all.sh”, you may find this file in /hadoop/path/sbin.

To check your Hadoop has started, type

jps

You should see NameNode, NodeManager, DataNode, SecondaryNameNode and ResourceManager are running.

 

JPS.PNG

Alternatively, you can also check if Hadoop is running by visiting localhost:50070 to check Hadoop file system information


namenode.PNG

and localhost:8088 to check cluster information.

cluster.PNG


 

You may find that localhost:50030 contains jobtracker info in some tutorials. However, localhost:50030 does not exist in hadoop-2.2.0, because hadoop-2.2.0 divides the two major functions of the JobTracker: resource management and job life-cycle management into separate components. Don’t worry about localhost:50030 not working.

Access to Hive from HANA - Section 2 Setup Hive

$
0
0

After we installed Hadoop in the machine, we then need to install Hive in the second section.


1. Download Hive

Download Hive-0.13.0 from http://hive.apache.org/downloads.html, and unzip and put Hive package together with Hadoop (not necessary to put Hive together with hadoop, but it is easy to manage in the future).


2. Add path to Hive

Add the statements to $HOME/.bashrc to add path

export HIVE_HOME=/ hive/path

export PATH=$PATH:$HIVE_HOME/bin

export PATH=$PATH:$HIVE_HOME/lib


3. Make file on Hadoop file system

Make file on hadoop file system for hive database

hadoop fs –mkdir /user/hive/warehouse

hadoop fs –mkdir /temp


4. Config.sh file

Go to hive/bin, find config.sh and add:

export HIVE_CONF_DIR=$HIVE_CONF_DIR

export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH

export HADOOP_INSTALL= /hadoop/path (the same path as in section 1)

Start hive by typing in “hive” in console, you will see hive cli and do queries with Hiveql.

 

Notice: the hive default metadata is stored by Derby. You can only access to you previous database from the same location where you start hive last time. Otherwise, you would not be able to see your previous database. Also, hive will create metastore_db and a log file anywhere you start hive. To fix it, you may set the hive metastore with mysql. I will write this instruction later.

SAP HANA- ADVANCE MODELING FEATURES

$
0
0
  • Hierarchies
  • Restricted & calculated measures
  • Input Parameters
  • Currency conversion
  • Filter operations and variables

 

HIERARCHIES: Hierarchies are used to structure and define the relationships among attributes of attribute views used for business analysis.HANA supports two types of hierarchy.

  • Level Hierarchies are hierarchies that are rigid in nature, where the root and the child nodes can be accessed only in the defined order. For example, organizational structures, and so on.
  • Parent/Child Hierarchies are value hierarchies, that is,hierarchies derived from the value of a node. For example, a Bill of Materials(BOM) contains Assembly and Part hierarchies, and an Employee Master record contains Employee and Manager data. The hierarchy can be explored based on a selected parent; there are also cases where the child can be a parent.

This discussion will helps us to create a Level Hierarchy or a Parent Child Hierarchy in order to structure and define relationship between view attributes.

  • In the Hierarchy Type dropdown, select the required option as follows:
    • Level Hierarchy
    • Parent Child Hierarchy
  • In the Node tab page, perform the following based on your selection of hierarchy type:
    • For a Level Hierarchy you can specify the Node Style that determines the unique node ID composition. Also, add various levels and assign attributes to each of them with the Level Type that specifies the formatting instructions for the level attributes.  Specify Order By to control the hierarchy members ordering, and Sort Direction to sort the hierarchy members display in ascending or descending order.
    • For a Parent Child Hierarchy specify the parent and the child attribute. Also, in the Step Parent node specify where to place the orphan parent-child pair.
  • In the Advanced tab page, specify the other properties of the hierarchy which are common to both hierarchy types as follows:

You can set Aggregate All Nodes to true if there is a value posted on the aggregate node and you want to compute that value while aggregating data.

  • Specify the default member
  • You can select Add a Root Node if you want to create a root node if the hierarchy does not have any
  • Specify how to handle orphan nodes using Orphan Nodes dropdown
  • Select Multiple Parent if the hierarchy needs to support multiple parents for its members

LEVEL HIERARCHIES:

  1. Select the Semantics node.
  2. In the Hierarchies panel, choose Create option.
  3. Enter a name and description for the hierarchy.
  4. In the Hierarchy Type dropdown, select Level Hierarchy.
  5. In the Node tab page do the following:
    1. Select the required value from the Node Style dropdown list. Note Node style determines the composition of a unique node ID. The different values for node styles are as:
      • Level Name - the unique node ID is composed of level name and node name, for example "[Level 2].[B2]".
      • Name Only - the unique node ID is composed of level name, for example "B2".
      • Name Path - the unique node ID is composed of the result node name and the names of all ancestors apart from the (single physical) root node. For example "[A1].[B2]".
    2. Add the required columns as levels from the drop-down list. Note You can select columns from the required table fields in the drop-down list to add to the view.
    3. Select the required Level Type. Note The level type is used to specify formatting instructions for the level attributes. For example, a level of the type LEVEL_TYPE_TIME_MONTHS can indicate that the attributes of the level should have a text format such as "January", and LEVEL_TYPE_REGULAR indicates that a level does not require any special formatting.
    4. To control how the memebers of the hierarchy are ordered, select the required column in the OrderBy drop-down list. Note In the MDX client tools, the members will be sorted on this attribute.
    5. To sort the display of the hierarchy members in the ascending or descending order, select the required option from the Sort Direction drop-down list.
  6. In the Advanced tab page do the following:
    1. Select the required value in the Aggregate All Nodes. Note This option indicates that data is posted on aggregate nodes and should be shown in the user interface. For example, if you have the members A with value 100, A1 with value 10, and A2 with value 20 where A1 and A2 are children of A. By default the value is set to false, and you will see a value of 30 for A. With the value set to true, you will count the posted value 100 for A as well and see a result of 130. If you are sure that there is no data posted on aggregate nodes you should set the option to false. The engine will then calculate the hierarchy faster as when the option is set. Note that this flag is only interpreted by the SAP HANA MDX engine. In the BW OLAP engine the node values are always counted.
    2. Enter a value for the default member.
    3. To specify how to handle the orphan nodes in the hierarchy, select the required option as described below from the dropdown.
      OptionDescription
      Root NodeTreat them as root nodes
      ErrorStop processing and show an error
      IgnoreIgnore them
      Step ParentPut them under a step-parent node Note This enables you to create a text node and place all the orphaned nodes under this node.
  • Optional Step: If you have selected Step Parent in the Orphan Nodes drop-down, enter a value to create the step-parent node.
  • Select the Add a Root Node check-box if required. Note If a hierarchy does not have a root node but needs one for reporting use case, select this option. This will create a root node with the technical name “ALL” .
  • If the level hierarchy needs to support multiple parents for its elements for example, Country 'Turkey' to be assigned to two regions 'Europe' and 'Asia', select the Multiple Parent check-box.
  • Choose OK.

Let us take a scenario to explain Level Hierarchies. In eFashion package we created an Attribute View for Article_Lookup table below is the table fields

            • ARTICLE_ID
            • ARTICLE_LABEL
            • CATEGORY
            • SALE_PRICE
            • FAMILY_NAME
            • FAMILY_CODE

Here we can define a hierarchy by using field CATEGORY & ARTICLE_LABEL.

  • Open AV_AL (Attribute View-Article_Lookup)
  • Select Semantics.
  • Click on PLUS icon on Hierarchy box.
  • Provide Hierarchy definition(Name, Type, Node)
  • In Hierarchy Type –select Level Hierarchy.
  • OK,Validate & Save and Activate.
  • Activate Analytic View wherein AV_AL consumed.

AV_HR.jpgAV_HR1.jpgCONSUMED HEIRARCHY USING MDX PROVIDER:

  • Open Excel workbook > Data > From Other
    Sources > Select “From Data Connection Wizard”
  • Select Other/Advanced from Data ConnectionWizard.
  • Select “SAP HANA MDX PROVIDER”, NEXT
  • Provide SAP HANA login credential, OK.
  • Select Package and cube (AV_SHOP_FACT), NEXT

AV_HR2.jpg

 

PARENT/CHILD HIERARCHY:

  1. Select the Semantics node.
  2. In the Hierarchies panel, choose Create option .
  3. Enter a name and description for the hierarchy.
  4. In the Hierarchy Type dropdown, choose Parent Child Hierarchy.
  5. In the Node tab page, add the parent and child nodes by selecting the Parent Node and Child Node from the drop-down list. Note In case you decide to place the orphaned parent-child pair under a node called Step Parent from the Advanced tab page, you can specify its value in the Step Parent column. The step-parent node could only be one of the column or calculated column or the current view. You can specify different step-parent values for all the parent-child pairs. These values appear as a comma separated list in the Advance tab page Step Parent field. In case of a single parent-child node, you can also specify the value for step-parent node in the Advanced tab page. The same value appears in the Node tab page.
  6. In the Advanced tab page, do the following: Choose OK.
    1. Select the required value in the Aggregate All Nodes. Note This option indicates that data is posted on aggregate nodes and should be shown in the user interface. For example, if you have the members A with value 100, A1 with value 10, and A2 with value 20 where A1 and A2 are children of A. By default the value is set to false, and you will see a value of 30 for A. With the value set to true, you will count the posted value 100 for A as well and see a result of 130. If you are sure that there is no data posted on aggregate nodes you should set the option to false. The engine will then calculate the hierarchy faster as when the option is set. Note that this flag is only interpreted by the SAP HANA MDX engine. In the BW OLAP engine the node values are always counted.
    2. Enter a value for the default member.
    3. To specify how to handle the orphan nodes in the hierarchy, select the required option as described below from the dropdown.
      OptionDescription
      Root NodeTreat them as root nodes
      ErrorStop processing and show an error
      IgnoreIgnore them
      Step ParentPut them under a step-parent node Note This enables you to create a text node and place all the orphaned nodes under this node.
    4. Optional Step: If you have selected Step Parent in the Orphan Nodes dropdown, enter a value to create the step-parent node.
    5. Select the Add Root Node checkbox if required as described below. Note If a hierarchy does not have a root node but needs one for reporting use case, set the option to true. This will create a root node.
    6. If the level hierarchy needs to support mulitple parents for its elements for example, Country 'Turkey' to be assigned to two regions 'Europe' and 'Asia', select the Mulitple Parent checkbox.

 

Note: The hierarchies belonging to an attribute view are available in an analytic view that reuses the attribute view, in read-only mode. However, the hierarchies belonging to an attribute view are not available in a calculation view that reuses the attribute view.


Let us take a scenario of Parent –Child relationship in ITEM_MASTER table. Here for Item ID 2 and 3 the parent ID is 1 i.e. to say item id 2 & 3 falls under CONSUMABLES. Similarly Item 5 to 8 fall under STEEL category. Parent/Child Hierarchy type can be used to define Hierarchy in this scenario.

 

ITEM_IDPARENT_ITEM_IDITEM_DESCRIPTION
1CONSUMABLES
21Cutting Disc 4”
31Grinding Disc 4”
4STEEL
54Plate 10mm
64Beam
74Angle
84Channel

 

 

AV_HR4.jpg

RESTRICTED COLUMN:

Restricted Columns are used to filter the value based on the user defined rules on the attribute values.

Restricted Column dialog helps to create a restricted column and filter its value based on the columns that you select in the Restrictions view. In the Column dropdown, you can select a column of type measure for which you want to apply filter. In the Restrictions view, to apply a filter, you need to choose Add Restriction. Select a Column, an Operator, and enter a value. For example, you can create a restricted column to view the Revenue of a particular country, where Revenue is the measure and Country is the attribute having a list of countries. If you have added many restrictions, and do not want to apply all of them but want to retain them, deselect the Include checkbox.

Creating a Restricted Column

You use this procedure to create a restricted column to filter the value based on the user-defined restrictions for the attribute values.

 

For example, to filter the sales of a product based on a category you can create a restricted column Sales based on a measure Total Sales amount, and attribute category where you can restrict the value for category.

 

1.     In the Output panel of the Logical Join, right-click Restricted Columns, and choose New.

 

2.     Enter a name and description for the restricted column.

 

3.     From the Column dropdown, select a measure.

 

4.      In the Restrictions view, choose Add Restriction.

 

1.      In the Column dropdown, select column to define filter.

 

2.      Select the required operator.

 

3.      Enter the filter value.

 

4.      If you want to remove a particular filter on the column, deselect its corresponding Include checkbox.

 

5.      Choose OK.

REST_COL.jpg

Data preview shows Sales amount for only restricted attributes i.e, category "Jewelry" & Category "Pants".

REST_COL1.jpg

Lets review below senario where in "Jewelry" category has been excluded.

REST_COL2.jpg

Here you can see sales amount of all category except category "Jewelry".

REST_COL3.jpg

 

CALCULATED COLUMN:Calculated columns are used to derive some meaningful information in the form of columns, from existing columns.

Calculated Column dialog helps you to derive a calculated column of type attribute or measure based on the existing columns, calculated columns,restricted columns, and input. You can write the formula in the Expression panel or assemble it making use of the available Elements, Operators and Functions.

You can specify how to aggregate row data for calculated column of type measure using the Calculate Before Aggregation checkbox and specifying Aggregation Type. If you select the Calculate Before Aggregation,the calculation happens as per the expression specified and then the results are aggregated as SUM, MIN, MAX or COUNT. If Calculate Before Aggregation is not selected, the data is not aggregated but it gets calculated as per calculation expression (formula), and the Aggregation is shown as FORMULA. After writing the expression, you can validate it using Validate.

You can also associate a calculated column with Currency or Unit of Measure using the Advanced tab page.

CREATING CALCULATED COLUMN

You use calculated columns to derive some meaningful information, in the form of columns, from existing columns, calculated columns, restricted columns and input parameters. 

 

For example:

  • To derive postal address based on the existing attributes.
  • To prefix the customer contact number with the country code based on the input parameter country.
  • To write formula in order to derive values like,
    if("PRODUCT" = 'ABC, "DISCOUNT" * 0.10,
    "DISCOUNT") if attribute PRODUCT equals the string ‘ABC’ then DISCOUNT
    equals to DISCOUNT multiplied by 0.10 should be returned. Otherwise the
    original value of attribute DISCOUNT should be used.

 

Procedure

       
1.
     In the Output panel of the Logical Join,right-click Calculated Columns, and choose New.

2.     Enter a name and description for the calculated column.

3.     Select the data type, and enter length and scale for the calculated column.

4.     Select the Column Type to specify the calculated column as attribute or measure.

5.     In case of measure column type, if you select Calculate Before Aggregation, select the aggregation type.

Note: If you select Calculate Before Aggregation, the calculation happens as per the expression specified and then the results are aggregated as SUM, MIN, MAX or COUNT. If Calculate Before Aggregation is not selected, the data is not aggregated but it gets calculated as per calculation expression (formula), and the Aggregation is shown as FORMULA.If the aggregatoin is not set, then it will be considered as attribute.

6.     In the expression editor enter the expression or assemble it using the menus in the below window.

7.     If you want to associate the calculated column with currency and unit of measuring quantity, select the Advanced tab page and select the required type.

8.     Choose OK.

 

calculated_column.png

 


INPUT PARAMETER:

You use this procedure to allow you to provide input for the parameters within stored procedures, to obtain a desired functionality when the procedure is executed.

In an Analytic View you use input parameters as placeholders during currency conversion, formulas like calculated columns where the calculation of the formula is based on the input you provide at runtime during data preview. Input parameters are not used for filtering attribute data in Analytic View that is achieved using variables.

 

In calculation Views you can use input parameter to during currency conversion, calculated measures, input parameters of the script node and to filter data as well.

You can apply input parameters in analytic and calculation views. If a calculation view is created using an analytic view with input parameters, those input parameters are also available in the calculation view but you cannot edit them.

The following types of input parameters are supported:              

 

 


Type


Description


Attribute Value/
  Column


Use this when the value
  of a parameter comes from an attribute.


Currency (available
  in Calculation View only)


Use this when the
  value of a parameter is in a currency format, for example, to specify the
  target currency during currency conversion.


Date (available in Calculation
  View only)


Use this when the
  value of a parameter is in a date format, for example, to specify the date
  during currency conversion.


Static List


Use this when the
  value of a parameter comes from a user-defined list of values.


Derived From Table
  (available in Analytic View and Graphical Calculation View)


Use this when the
  value of a parameter comes from a table column based on some filter
  conditions and you need not provide any input at runtime.


Empty


Use this when the
  value of a parameter could be anything of the selected data type.


Direct Type
  (avaliable in Analytic View)


To specify an input
  parameter as currency and date during currency conversion.

 

 

Each type of input parameter can be either mandatory or non-mandatory. For a mandatory input parameter, it is necessary to provide a value at runtime. However, for a non-mandatory input parameter, if you have not specified a value at runtime,the data for the column where the input parameter is used remains blank.

 

Note:You can check whether an input parameter is mandatory or not from the properties of the input parameter in the properties pane.

  • If you want to create a formula to analyze the annual sales of a product in various regions, you can use Year and Region as input parameters.
  • If you want to preview a sales report with data for various countries in their respective currency for a particular date for correct currency conversion, you can use Currency and Date as input parameters.

Procedure

In Analytic View

  1. In the Output panel of the Data Foundation or Logical Join node, right-click the Input Parameters node.
    • Note: You can also create input parameters at the Semantics node level, using the Create Input Parameter option in the Variables/Input Parameters panel.
  2. From the context menu, choose New.
  3. Enter a name and description.
  4. Select the type of input parameter from the Parameter Type drop-down list.
    1. For the Column type of input parameter, you need to select the attribute from the drop-down list. At runtime the value for the input parameter is fetched from the selected attribute's data.
    2. For input parameter of type Derived from Table, you need to select a table and one of it's column as Return Column whose value is used as input for formula caluclation. You can also define conditions to filter the values of Return Column in the Filters   panel. For example, to calculate Discount for specific clients, you can create an input parameter based on Sales table and return column Revenue with filter set on  the Client_ID.
    3. For Direct Type input parameter, specify the Semantic Type that describes the use parameter as a currency or date , for example, to specify the target currency during currency conversion.
  5. If required, select a data type.
  6. Enter length and scale for the input parameter.
  7. Choose OK.

    In Calculation View

     

    1. In the Output panel,right-click the Input Parameters node.
    2. From the context menu, choose New.
    3. Enter a name and description.
    4. Select the type of input parameter from the drop-down list.
      1. For the Attribute Value type of input parameter, you need to select the attribute from the drop-down list. At runtime the value for the input parameter is fetched from the selected attribute's data.
      2. For input parameter of type Derived from Table, you need to select a table and one of it's column as Return Column whose value is used as input for formula caluclation. You can also define conditions to filter the values of Return Column in the Filters panel. For example, to calculate Discount for specific clients, you can create an input parameter based on Sales table and return column Revenue with filter set on the Client_ID.
    5. Select a data type.
    6. Enter length and scale for the input parameter.
    7. Choose OK.

    CURRENCY & UNIT OF MEASURE:

    Measures used in analytic views and calculation views can be defined as amount or weight in the analytical space using currency and unit ofmeasure. You can also perform currency conversion and unit of measure conversion.

    For example, you need to generate a sales report for a region in a particular currency, and you have sales data in database tables ina different currency. You can create an analytic view by selecting the table column containing the sales data in this other currency as a measure, and perform currency conversion. Once you activate the view, you can use it to generate reports.

    Similarly, if you need to convert the unit of a measure from cubic meters to barrels to perform some volume calculation and generate reports, you can convert quantity with unit of measure.

    To simplify the process of conversion, system provides the following:

    • For currency conversion - a list of currencies, and exchange rates based on the tables imported for currency
    • For quantity unit conversion - a list of quantity units and conversion factors based on the tables imported for units.

    Currency conversion is performed based on the source currency, target currency, exchange rate, and date of conversion. You can also select currency from the attribute data used in the view.

    Similarly, quantity unit conversion is performed based on the source unit,target unit, and conversion factor.

    You can also select the target currency or unit of measure at query runtime using input parameters. If you use this approach then, you have to first create an input parameter with the desired currency/unit specified, and use the same input parameter as target in the conversion dialog.
    Note
    Currency conversion is enabled for analytic views and base measures of calculation views.

     

    Prerequisites

    You have imported tables T006, T006D, and T006A for Unit of Measure.

    You have imported TCURC, TCURF, TCURN, TCURR, TCURT, TCURV, TCURW, and TCURX for currency.

     

    Procedure

    1. Select a measure.
    2. In the Properties pane, select Measure Type.
    3. If you want to associate the measure with a currency, perform the following substeps:

                 a. In the Measure Type dropdown list, select the value Amount with Currency.

                 b.In the Currency Dialog,select the required Type as follows:


      Type

      Purpose

      Fixed

      To select currency from the currency table TCURC.

      Attribute

      To select currency from one of the attributes used in the view.

                c. Select the required value, and choose OK.

                d. If you want to convert the value to another currency, choose Enable for Conversion.

                            i.To select the source currency, choose Currency.

                            ii..Select the target currency.

                                Note: For currency conversion, in addition to the types Fixed and Attribute, you can select Input Parameter to provide target currency at                             runtime. If you select an input parameter for specifying target currency and deselect Enable for Conversion checkbox, the target currency                             field gets clear because input parameters can be used only for currency conversion.

                             iii.To specify exchange rate type, in the Exchange Rate Types dialog, select the Type as follows:


      Type

      Purpose

      Fixed

      To select exchange rate from the currency table TCURW.

      Input Parameter

      To provide exchange rate input at runtime as input parameter.


                             iv.To specify the date for currency conversion, in the Conversion Date dialog, select the Type as follows:

     


      Type

      Purpose

      Fixed

      To select conversion date from the calendar.

      Attribute

      To select conversion date from one of the attributes used in the view.

      Input Parameter

      To provide conversion date input at runtime as input parameter.


                           v.To specify the schema where currency tables are located for conversion, in the Schema for currency conversion, select the required schema.

                           vi.To specify the client for which the conversion rates to be looked for, in the Client for currency conversion, select the required option.

     

                  e. From the dropdown list, select the required value that is used populate data if the conversion fails:

     


    Option

    Result

    Fail

    In data preview, the system displays an error for conversion failure.

    Set to NULL

    In data preview, the value for the corresponding records is set to NULL.

    Ignore

    In data preview, you view the unconverted value for the corresponding records.


    4.If you want to associate a measure with a unit of measure other than currency, perform the following substeps:

        a. Select the value Quantity with Unit of Measure in the Measure Type drop-down list.

        b. In the Quantity Units dialog, select the required Type as follows:

     


      Type

      Purpose

      Fixed

      To select a unit of measure from the unit tables T006 and T006A.

      Attribute

      To select a unit of measure from one of the attributes used in the view.


        c. Select the required value, and choose OK.

    5.Choose OK.

    Note You can associate Currency or Unit of Measure with a calculated measure, and perform currency conversion for a calculated measure by editing it

    Currency_Con_Fixed.png


    To Be or Not To Be: HANA Text Analysis CGUL rules has the answer

    $
    0
    0

    To set the context lets do things in HANA TA without the CGUL rules first.

     

    1. Lets create a small table with texts

     

    So lets create a Table which looks something like this:

    TableDefinition.PNG

     

    Now lets create two texts in it:

    insert into "S_JTRND"."TA_TEST" values(1,'EN','TO BE','');

    insert into "S_JTRND"."TA_TEST" values(2,'EN','NOT TO BE','');

     

    So now the table entries look like:

    TableData.PNG

     

    2. Text Analysis Via Dictionary

    Now lets say we want to do text Analysis where we Say

    1. if the text is "TO BE" it is to be treated as POSITIVE_CONTEXT
    2. if the text is "NOT TO BE" it is to be treated as NEGATIVE_CONTEXT

     

    Lets create a dictionary to have these two values:

    So in XSJS Project we create a english-Contextdict.hdbtextdict and content will be as follows(also attached):

    <dictionary xmlns="http://www.sap.com/ta/4.0">

      <entity_category name="POSITIVE_CONTEXT">

        <entity_name standard_form="TO BE">

          <variant name="TO BE" />

        </entity_name>

        </entity_category>

      <entity_category name="NEGATIVE_CONTEXT">

        <entity_name standard_form="NOT TO BE">

          <variant name="NOT TO BE" />

        </entity_name>

        </entity_category>

     

     

    </dictionary>

     

    Now we use the dictionary above to create a configuration file(also attached):

    So, pick content from any .hdbtextconfig and add the path to the above dictionary in it:

      <configuration name="SAP.TextAnalysis.DocumentAnalysis.Extraction.ExtractionAnalyzer.TF" based-on="CommonSettings">

      <property name="Dictionaries" type="string-list">

      <string-list-value>JTRND.TABlog.dictonary::english-Contextdict.hdbtextdict</string-list-value>

        </property>

      </configuration>

     

     

    3. Create Full text index on the Table using this configuration

    CREATE FULLTEXT INDEX "IDX_CONTEXT" ON "S_JTRND"."TA_TEST" ("TEXT")

      LANGUAGE COLUMN "LANG"

      CONFIGURATION 'JTRND.TABlog.cfg::JT_TEST_CFG' ASYNC

      LANGUAGE DETECTION ('en','de')

      PHRASE INDEX RATIO 0.000000

      FUZZY SEARCH INDEX OFF

      SEARCH ONLY OFF

      FAST PREPROCESS OFF

      TEXT MINING OFF

      TEXT ANALYSIS ON;

     

     

    Check the TA results:

    TA_1.PNG

     

    Note* for NOT TO BE, we did not get both POSTIVE(for substring TO BE) AND NEGATIVE, altough this is good, its a fluke, as TA did take the longest string matching and hence for NOT TO BE, and its sub String TO  BE we got a Negative, but this could create problems.

     

    Now moving on, lets add more to this context, lets add text NOT-TO BE as also a possibility of NEGATIVE_CONTEXT, infact NOT, followed by, TO BE,in same sentence is to be a NEGATIVE_CONTEXT.

     

    Without changing anything lets insert some more values and see how they look:

    insert into "S_JTRND"."TA_TEST" values(3,'EN','NOT-TO BE','');

    insert into "S_JTRND"."TA_TEST" values(4,'EN','NOT, TO BE','');

    insert into "S_JTRND"."TA_TEST" values(5,'EN','NOT, Negates TO BE','');

     

    Check the TA results:

    TA_2.PNG

     

    So you see we now have a problem, Also we could have NOT, -, NEG etc as possible predecessors before TO BE to point that its a NEGATIVE_CONTEXT

     

    Solution 1: Lets have synonyms of NOT as one category, TO BE as "CONTEXT" category, and in post processing of TA lets see if we have TA_TYPE value of CONTEXT and NEGATIVE in same sentence then its a NEGATIVE CONTEXT,

     

    But wouldnt it be great if index could do this on its own?

     

    CGUL Rules save the day:

     

    So here we go:

     

    4. CREATE A .rul file

    CONTEXT.rul(also attached) containing following rule:

    #group NEGATIVE_CONTEXT (scope="sentence") : { <NOT> <>*? <TO> <>*? <BE> }

     

    We need to compile this rule to get a .fsm file and put it on server under ...lexicon/lang (oos for this blog, I have attached the complied file here)

     

    Now enhance you configuration file with reference to this fsm file.

     

    <configuration name="SAP.TextAnalysis.DocumentAnalysis.Extraction.ExtractionAnalyzer.TF" based-on="CommonSettings">

      <property name="Dictionaries" type="string-list">

      <string-list-value>JTRND.TABlog.dictonary::english-Contextdict.hdbtextdict</string-list-value>

        </property>

        </property>

      <property name="ExtractionRules" type="string-list">

          <string-list-value>CONTEXT.fsm</string-list-value>

        </property>

       

      </configuration>

     

     

    5. Restart the indexserver process so that the newly compiled rule file is picked up by the system.

    indexServerProcessRestart.PNG

     

     

    6. Recreate the index using the same statement as above and check the TA table:

     

    TA_3.PNG

     

    So, as you see the highlighted values come from the rule and mark extracted NEGATIVE CONTEXT, below I kept the dictionary value which wrongly identified the POSITIVE_CONTEXT for comparison, this should ideally not be handled by dictionaries.

     

    So, in this context: To Be or Not To Be: HANA Text Analysis CGUL rules indeed has the answer!!

     

    Hope this helps,

    Bricks and Bats are Welcome

    TIPS AND MY EXPERIENCE- SAP-HANA CERTIFICATION

    $
    0
    0

    I would like to share my SAP HANA certification exam experience and some of the TIPS for the preparation of SAP-HANA certification.

     

    First I would like to brief about myself  , well, I started my carrier as programmer and then got interest in database development.I am NON-SAP professional,having hand on experience in MSSQL,MSSAS. I then started reading about In-Memory database technology and quite impressed with SAP-HANA technology and platform as In-memory database which is dual in nature (OLTP as well as OLAP).I generated huge interest to learn the new innovation technology HANA. I started my learning journey of SAP HANA in the starting of year 2014 and today on 22nd May 2014 I became SAP CERTIFIED APPLICATION ASSOCIATE-SAP HANA by qualifying the certification exam and scored 97%.

     

    My MSSQL knowledge help me a lot to understand the SQLScript,Procedure,database development part of SAP HANA.

    My MSSSAS knowledge help me to understand the DATA MODELING concept of SAP HANA.

    My MSSIS knowledge help me to understand the ETL technology of SAP(BO-DATA SERVICES) of SAP HANA.

     

    EXAM PREPARATION GUIDELINE:

     

    1. Fist step to know the exam course content and topic areas as described in SAP training site-

     

              Data Provisioning                                                                                                                                > 12%

              Describe possible scenarios and tools for replicating and loading data into SAP HANA from different data sources (e.g. SAP Landscape Transformation (SLT), SAP Data Services, or Direct Extractor Connection (DXC)).

    Security and Authorization                                                                                                               8% - 12%

    Describe the concept of authorization in SAP HANA, and implement a security model using analytic privileges, SQL privileges, pre-defined roles and schemas. Perform basic security and authorization troubleshooting.

    Data modeling - Analytical views                                                                                                    8% - 12%

    Implement data models with SAP HANA using analytical views. Advise on the modeling approach and best practices.

    Data modeling - Calculation views                                                                                                  8% - 12%

    Implement data models with SAP HANA using calculation views, and advise on the modeling approach and best practices.

    Advanced data modeling                                                                                                                  8% - 12%

    Apply advanced data modeling techniques, including currency conversion, variables and input parameters. Implement decision automation using business rules.

    Optimizion of data models and reporting                                                                                       8% - 12%

    Monitor, investigate and optimize data models and reporting performance on SAP HANA. Advise on modeling approach and tools to achieve optimum performance. Evaluate the impact of different implementation options such as table joins, aggregation, or filters. Understand the implication on performance of the various reporting tools and connetion types.

    Administration of data models                                                                                                         8% - 12%

    Administer data models in SAP HANA, including setting of information validation rules, managing schemas, the importing/exporting and transporting of data models.

    Reporting                                                                                                                                              < 8%

    Provide advice on reporting strategies, and perform appropriate reporting solutions with SAP HANA. Build reports using various tools, for example, Microsoft Excel or SAP BusinessObjects BI tools.

    Data modeling - SQL Script                                                                                                               < 8%

    Apply SQL Script to enhance the data models in SAP HANA using AFL, CE functions, and ANSI-SQL.

    Data modeling - Attribute views                                                                                                       < 8%

    Implement data models with SAP HANA using attribute views, and advise on the modeling approach and best practices.

    Deployment scenarios of SAP HANA                                                                                             < 8%

    Describe the deployment scenarios for SAP HANA and evaluate appropriate system configurations.

    SAP HANA Live & Rapid Deployment Solutions for SAP HANA                                                 < 8%

    Describe the value of HANA and identify scenarios for SAP delivered content for SAP HANA, such as SAP HANA Live and Rapids Deplyment Solutions.

     

       2. Analyse which topic area covers more proportionate of the course content.As stated above DATA PROVISIONING and DATA MODELING covers almost 70% of the course content.If we keep hold on this then 70% of exam topic covers.

     

       3. Now we know the focus area of exam,start collecting and reading content of above topics.There are complete resources available in SAP site which gives broad level of understanding of SAP HANA DATA PROVISIONING,MODELING and other topic area.      

              http://help.sap.com/hana_appliance#section5

     

       4. SAP HANA Academy - YouTube video tutorials is huge library to gain further knowledge of above topics.     

     

       5. openSAP course materials and videos will help a lot to get hang on new innovation technology of SAP HANA.

     

       6. Refer my blog post on DATA MODELING.

       

              Data Modeling in SAP HANA with sample eFashion Database-Part I

              Data Modeling in SAP HANA with sample eFashion Database-Part II

              SAP HANA- ADVANCE MODELING FEATURES

        

       7. Start reading content and video tutorials topic by topic,let say first start DATA PROVISIONING and read all content and watch all videos tutorials on this topic.Note down the key concept,methods and flow of topic.Keep focusing on graphical representation of content area which will help to memorize the key        concept easily.

     

       8. Practical experience is quite important as most of the question are simply based on HANA STUDIO UI.SAP provides 30 days free access to HANA on Amazon cloud services.Please refer SAP HANA One on AWS Marketplace

     

    QUESTION PATTERN AND TYPE:

     

    1. All question shall be multiple choice.most of the questions shall be single selection.However there are questions which contains more than one answer too(multiple select).
    2. No negative marking for wrong answer.If question contains more than one wright answer then all right option must be selected to make the question rightly answered otherwise it will be treated as wrong answer even if one option selected rightly.
    3. All questions has been grouped based on the topic described above.For example reporting section will contain 4 question and grouped by heading Reporting and all 4 question will be in sequential order then question form next topic and so on.
    4. DATA PROVISION question patterns like,how different provisioning tools connect to HANA,like one of the question asked is which connection involves in DXC. Type of replication like which tools support real-time replication,which tools support ETL,etc.
    5. MODELING topic question patters like,function of different views(Attribute,Analytic & calculation),variable,Input parameters,Hierarchy, different Join,CE function,SQLScript and procedure etc
    6. Reporting section generaly covers how SAP HANA talk to different reporting tool.Mainely the connectivity of SAP HANA to reporting tools.Like how Crystal Report,WEBI,DASHBOARD,EXPLORER,ANALYSIS EDITION FOR OLAP,ANALYSIS FOR MS OFFICE,MS EXCEL etc. can be connected to HANA.
    7. Different type of Privileges(Object,System,Pakage,SQL),which privilege does what authorization rule in HANA,Users and role assignment etc.

     

    Hope this blog will help for those who set their mind to go for HANA certification.

     

    Good Luck.

     

    Mohammad Shafiullah

    Introduction to HANA XS application development (Part 1): BlogProject creation

    $
    0
    0

    Hi all,

     

    This is the first post of a series that will engage with the development, from start to end, of a simple application on the SAP HANA platform, accompanied of course by the underlying theory. To offer a spherical presentation of the platform, I tried to use a variety of HANA capabilities, but of course some things will be missing. Further information can be found in the developer guide.

     

    • Application

      To begin with, let’s see what our application is about. The application that I will present here is called BlogProject and I think that what it will do is pretty obvious considering the name, but I am going to describe it anyway.  So, in the app a user will be able to login/register, search for and view a post, write one or write a comment on a certain post.  Simple as that. I may extend the application, adding a user profile, for a user to be able to see his/her own posts and comments, or some kind of statistics page, for example a geographical distribution of the posts.

     

    • Prerequisites

    1.   You have access to an SAP HANA system. If you don’t have, then explore your options and choose the right one for

           you: http://scn.sap.com/community/hana-in-memory/blog/2014/05/21/get-access-to-a-sap-hana-instance

    2.   Add your HANA system to your HANA studio

     

    • Studio explanation

      Let’s take a glimpse to the HANA studio. The studio includes different perspectives, each giving a different view.

    1.png10.png3.png

     

    Project Explorer: here we can see all our projects and files included in each.

    Repositories: here we see all your local workspaces

    Systems: in this view we can add and manage our HANA systems and database objects

     

    • Repository Workspace

    First, we have to create a workspace for our project. In the “Repositories” view right click -> New Repository Workspace. In the window choose your user (SYSTEM in this case). Then give a name to your workspace. In our case “workspace1” (don’t ask why…) and a location where to save it.

     

    • Project

      After the workspace creation we have to create our project. To do so we have to go to the “Project Explorer”, right click and then “New” -> “Project”. Then find and choose XS Project (SAP HANA -> Application Development).

     

    After the project is created, it is a good practice to create the following folders, in order for our project to be organized: data, model, procedures, services and ui. We will see later what files we create in each folder.

     

    Share and activate the project: To be able to activate our project we have to share it first, adding it to our workspace. To share it follow this procedure: right click on your project-> “Team” -> “Share Project” -> Choose “SAP HANA Repository” -> Choose (or add) your Workspace and click “Finish”.

     

    We must always activate our project when we make any changes, for them to be committed and shared. To do soright click on your project or the file(s) you want to activate -> “Team” -> ”Activate”.

     

    .xsapp .xsaccess files: These files are necessary for exposing any content via the XSEngine web server. Create the three files but without a name, just the extension. The .xsapp file does not contain any content. In the .xsaccess file paste the following:

    {

    "exposed":true

    }

     

    • Schema

    For our application to be able to create and access database objects we have to create a schema, which will hold all the database objects we will create, such as tables, SQL views and stored procedures. By default we are granted a schema called “SYSTEM”, but it is a better practice to create a separate schema for our application, so that your database is more organized.

     

    The procedure is very easy. We just have to create in the “data” folder of our project a new file with the extension .hdbschema. The definition of the schema is the following

     

    schema_name="BlogProject";

     

    Or we can open the SQL console from the “Systems” view and execute the following create statement:

     

    CREATE SCHEMA BlogProject [OWNED BY SYSTEM]

     

    • Authorization model

    Authorization is about providing access to specific resources only to certain users. The basic entity of the authorization model is a privilege. Privileges can be granted to a role, which is a group of privileges, or a user. Additionally, a user can be granted certain roles. The best practice it to assign privileges to roles and roles to users. That way the authorization model will be more organized.

     

    First, let’s create a privilege to our application. We have to create an .xsprivileges file without a name. In the .xsprivileges paste the following:

    {

    "privileges":[

    {

    "name":"Basic",

    "description":"Basic usage privilege"

    },

    {

    "name":"Admin",

    "description":"Administration privilege"

    } ]

     

    Now that we have created the privileges to our application, we must grant them to a role so that we can have access. To create a role we just create a new file with the extension .hdbrole. Inside the file we type the following definition.

     

    role MyBlogProject::model_access {

           applicationprivilege: MyBlogProject::Admin;

    }     

     

    Now our role has Admin rights upon our application. Then to assign the role to our user “SYSTEM” we have to go to the “Systems” view -> “Security” -> “Users” -> SYSTEM -> “Granted Roles”  and then add the role MyBlogProject::model_access we created.

     

    Next for our user and application to be able to access and write on the schema we created, we have to go to again to the security tab -> “roles” -> MyBlogProject::model_access -> object privileges and add our schema  “BlogProject. Now that we added this privilege to our role, all the users who have the certain role will also have the same privileges to the schema.

     

    As you may have noticed I granted our role certain privileges twice, the first time via the .hdbrole file in the repository and the second via the “Systems” view. We can edit the authentication model using both ways.

     

     

     

    This concludes this first post of the series. Next we will see how to create our data definition (aka persistence model) beyond the creation of the schema that was illustrated above, creating tables and sequences:

    http://scn.sap.com/community/developer-center/hana/blog/2014/05/27/introduction-to-hana-xs-application-development-part-2-blogproject-persistence-model

    Introduction to HANA XS application development (Part 2): BlogProject persistence model

    $
    0
    0

    Hi all,

     

    This is the second post of a series that talks about the BlogProject application development with HANA XS. If you missed the first one make sure to have a look:

    http://scn.sap.com/community/developer-center/hana/blog/2014/05/27/introduction-to-hana-xs-application-development-part-1-blogproject-creation

     

    Given the fact that I will develop the application in parallel with the posts, I will update them when and if new requirements arise. If I miss anything, forgive me if there are any inconsistencies regarding what I am proposing and what I will actually do.

     

    • Application schema

    1.png.jpg

    The above schema represents the data model of the application. Some things that might by confusing are the “subject”, “latitude” and “longitude” columns and the POST2POST table. Let’s start with the columns. The “subject” column of the POST table is going to hold the subject of a post, which will be the result of the use of the text analysis capabilities of SAP HANA. The “latitude” and “longitude” columns will help us with the geospatial statistics. Lastly, the POST2POST table will save all the link action between the posts, storing, for each link, the post that includes the link and the post that is referenced.

     

    • Tables

    Column or row?

    In the most cases the tables we create in HANA are columnar, but you can use row tables of course, depending on your application’s needs. Each table type benefits specific processes.

     

    Row stores:

    a) Easier to insert and update

    b) Better if you need the complete row, because the reconstructing is one of the most expensive column store operations

    c) Preferred if the table has a small number of rows (e. g. configuration tables). 

    d) If your application needs to only process a single record at one time

    Column stores:

    a) Only affected columns have to be read during the selection process of a query.

    b) All the columns can serve as an index

    c) Facilitates higher compression rates

    d) Better if the table holds huge amounts of data that should be aggregated and analyzed

    e) Better if the table has a large number of columns. 

    f) Preferred if calculations are typically executed on single or a few columns only. 

    To sum up, the basic difference is the type of processes for which we use each table type. In OLAP processes it is better to use column stores because for analysis we query certain columns and column stores provide a much better compression, thus minimizing the querying time. On the other hand, row stores are better for OLTP processes facilitating fast inserts and updates.

     

    Notes and suggestions:

    To enable fast on-the-fly aggregations, ad-hoc reporting, and to benefit from compression mechanisms on transactional data it is recommended to store them in a column-based table.

    If you need to join tables avoid using different storage types, since using both storage engines reduces performance.

    Attribute, Analytic and Calculation Views are only supported on columnar tables.

    Enabling search is only possible on column tables.

     

    How to create

    There are two ways to create a table:

     

    Via repository

    We can create a table via the repository by creating a new file in the “data” folder, with the extension .hdbtable. (If we choose to create a new “Database Table” and not a “File”, then the repository understands the file extension and we don’t need to add it. This applies to all file types). Then all we have to do is execute an SQL CREATE statement.

     

    Via catalog

    To create a table, simply create a “New Table” in the “tables” folder of the schema and then add the columns, types etc. in a graphical manner. If we create our tables that way, we will not be able to add a foreign key to the table. To do so we will have to add it with a simple SQL statement.

     

    For our application I have created the 5 tables below via the “Systems” view (Catalog):

    2.png

    3.png

    4.png

    5.png

    6.png

     

    • Sequences

    HANA does not support the “autoincrement” of a column (e.g. id), so we need to create a sequence to do that, which also provides additional capabilities.

     

    Sequence variables

    A sequence definition includes the below variables, of which only the schema is compulsory, some have default values and the rest are optional:

     

    schema = "";

    increment_by = integer; //the incrementation value (default = 1)

    start_with = integer; //the first value of the sequence (default = -1)

    maxvalue = integer; //the maximum value of the sequence

    nomaxvalue = boolean; //if the sequence has a max value or not (default = false)

    minvalue = integer; // the minimum value of the sequence

    nominvalue = boolean; // if the sequence has a min value or not (default = false)

    cycles = boolean;//if the sequence starts with the minvalue after the maxvalue has been reached, or the opposite

    reset_by = ""; //the query that will be used on server reboot to find the value that the sequence will start with

    public = boolean; //(default = false)

    depends_on_table = ""; //the dependency to a specific table

    depends_on_view = ""; // the dependency to a specific view

     

    How to create

    Via repository

    Create a new file inside the “data” folder of the repository with the extension .hdbsequence. In our application, we just want the ID columns to increment by 1 so I used a few variables. For example, for the USER table:

     

    schema= "BlogProject";

    start_with= 1;

    depends_on_table= "BlogProject::USER";

     

    Via catalog

    Create a “New Sequence” in the “Sequences” folder of the schema and add the values of the sequence in a graphical manner as shown below:

    7.png

    For the BlogProject I created the below sequences. The POST2POST table does not need a sequence because it does not have an ID column, but two columns that are foreign keys referencing the POST table’s ID.

     

    8.png

    Note: A single sequence can be used for all tables, but its value is incrementing regardless of the table. For example if we get the value 5 from a call of the sequence and then call it for another table, we get the value 6 and not the next value we were probably expecting for the specific column of the table. If we want a single incremental value for each table, then we must create different sequences, one for each.

     

    Calling a sequence

    Sequences are not associated with tables, but can only be restricted to apply to certain tables (with the “depends_on_table” variable). In fact, they are used by applications through SQL statements, which can use CURRVAL to get the current value of the sequence, or NEXTVAL to get the next value.

     

    For example if we want to insert a new user to our USER table we execute this statement:

     

    insertinto"BlogProject"."USER"

    values ("BlogProject"."MyBlogProject.data::userID".NEXTVAL, 'user', 'user', 'user', 1)

     

    • Accessing database objects

    When we access a table from a file in the repository we write “schema::table_name”, and if we call it from the SQL console we write “schema”.”table_name”. The same rule applies to all the database objects.

     

     

     

     

     

    And now we came to the end of our persistence model creation. In the next post we will talk about the modeling of our data using information views and other data-centric application logic enablers (triggers, functions and procedures).

     

    Thanks for reading!

    How to install SAP HANA studio on Mac OS?

    $
    0
    0

    This may be useful for any Mac users who are developing or administering SAP HANA: as you may noticed long time ago Mac version of SAP HANA Studio was available for download here:https://hanadeveditionsapicl.hana.ondemand.com/hanadevedition/ . Unfortunately, recentlyMac version disappeared and only versions available for download are Windows and Linux versions.

     

    Don't worry.

     

    You can get most recent version of HANA Studio running on your MacOS easily.

     

    Follow next steps to install SAP HANA Studio on your Mac:

     

    1. Download Eclipse Kepler from http://www.eclipse.org
    2. Unzip and move to Applications folder
    3. Start Eclipse
    4. Help -> Install New Software ...
    5. Add... to add repository, use this URL: https://tools.hana.ondemand.com/kepler
    6. Use this repository to see available software list
    7. Pick SAP HANA Tools (or more, depending on your needs)
    8. Finish installation (you will be asked to restart Eclipse)
    9. After restart, switch to HANA perspective and you are ready to start!

     

    How to install HANA studio on Mac - Eclipse Update Site.png

    How to install HANA studio on Mac - Eclipse Update Site 2.png

     

    HANA Studio - Switch to HANA Perspective - Eclipse Kepler.png

    Viewing all 676 articles
    Browse latest View live


    <script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>