Quantcast
Channel: SCN : Blog List - SAP HANA Developer Center
Viewing all 676 articles
Browse latest View live

Simple sql table export in ABAP for HANA

$
0
0

just read a document Simple csv table export in ABAP for HANA and decided to share my own experience in exporting DB Tables to HANA via ABAP.

Idea was to play with HANA and to try it's functionality for educational purposes.

 

For data extraction I wrote a simple ABAP Report, which extracts selected tables with its data and prepares sql script for import.

Here is a source code (s. attached file).

 

Here is a screenshot of selection screen.

 

Screen_000.png

You can define a schema name and select if a new schema must be created or existing one used.

Of course, you define table names for extraction (only structure or with data).

Additionally you can select if the files should be sent per mail or directly downloaded to selected folder on your local hard drive.

 

By extraction the data are splitted by 65535 entries into separate files. SQL Scripts are zipped before sending/downloading.

 

On the next screenshot you can see result of extraction. I selected some tables, which represent a purchasing documents in SRM Solution.

Screen_001.png

after extraction...

Screen_002.png

For the mass import of the sql scripts I created a simple cmd-script on windows.

Here is an example only for BBP_PDBEI and BBP_PDIGP tables.

 

C:
cd "C:\Program Files\sap\hdbclient\"
hdbsql.exe -i 11 -n hanadb -u SYSTEM -p ********** -I V:\temp\hana_script_1_BBP_PDBEI.sql
hdbsql.exe -i 11 -n hanadb -u SYSTEM -p ********** -I V:\temp\hana_script_2_BBP_PDBEI.sql
hdbsql.exe -i 11 -n hanadb -u SYSTEM -p ********** -I V:\temp\hana_script_3_BBP_PDBEI.sql
hdbsql.exe -i 11 -n hanadb -u SYSTEM -p ********** -I V:\temp\hana_script_4_BBP_PDBEI.sql
hdbsql.exe -i 11 -n hanadb -u SYSTEM -p ********** -I V:\temp\hana_script_5_BBP_PDBEI.sql
hdbsql.exe -i 11 -n hanadb -u SYSTEM -p ********** -I V:\temp\hana_script_6_BBP_PDIGP.sql
hdbsql.exe -i 11 -n hanadb -u SYSTEM -p ********** -I V:\temp\hana_script_7_BBP_PDIGP.sql
hdbsql.exe -i 11 -n hanadb -u SYSTEM -p ********** -I V:\temp\hana_script_8_BBP_PDIGP.sql
hdbsql.exe -i 11 -n hanadb -u SYSTEM -p ********** -I V:\temp\hana_script_9_BBP_PDIGP.sql
hdbsql.exe -i 11 -n hanadb -u SYSTEM -p ********** -I V:\temp\hana_script_10_BBP_PDIGP.sql

 

Here I provided path to Hana DB Client intalled and to the files, which were saved in V:\temp folder.

 

By the import I mentioned, that only one processor core used by the sequential processing.

Screen_003.png

 

So, I changed the script for parallel processing

 

C:
cd "C:\Program Files\sap\hdbclient\"
START "" hdbsql.exe -i 11 -n hanadb -u SYSTEM -p ********** -I V:\temp\hana_script_1_BBP_PDBEI.sql
TIMEOUT /T 2
START "" hdbsql.exe -i 11 -n hanadb -u SYSTEM -p ********** -I V:\temp\hana_script_2_BBP_PDBEI.sql
START "" hdbsql.exe -i 11 -n hanadb -u SYSTEM -p ********** -I V:\temp\hana_script_3_BBP_PDBEI.sql
START "" hdbsql.exe -i 11 -n hanadb -u SYSTEM -p ********** -I V:\temp\hana_script_4_BBP_PDBEI.sql
START "" hdbsql.exe -i 11 -n hanadb -u SYSTEM -p ********** -I V:\temp\hana_script_5_BBP_PDBEI.sql
START "" hdbsql.exe -i 11 -n hanadb -u SYSTEM -p ********** -I V:\temp\hana_script_6_BBP_PDIGP.sql
TIMEOUT /T 2
START "" hdbsql.exe -i 11 -n hanadb -u SYSTEM -p ********** -I V:\temp\hana_script_7_BBP_PDIGP.sql
START "" hdbsql.exe -i 11 -n hanadb -u SYSTEM -p ********** -I V:\temp\hana_script_8_BBP_PDIGP.sql
START "" hdbsql.exe -i 11 -n hanadb -u SYSTEM -p ********** -I V:\temp\hana_script_9_BBP_PDIGP.sql
START "" hdbsql.exe -i 11 -n hanadb -u SYSTEM -p ********** -I V:\temp\hana_script_10_BBP_PDIGP.sql

 

I gave a paar seconds timeout after each table creation. Tables must be created first, only then start parallel import of content.

Now it looks much better. All CPU cores are used, import runs more efficient...

Screen_004.png

 

And here is a result

 

Screen_006.png

Looks great...

 

Problems I encountered:

  1. Import is relative slow. I supposed SQL Import will run much faster.
  2. Memory consumption of ABAP Report is very high - may be some advices from you, how to optimize it.
  3. HANA does not understand field names with "/" sign. SRM uses some field names (and even tables) as /SAPSRM/*
  4. ABAP writes negative values as '1-', HANA needs '-1'.
  5. How to export cluster tables? Actually I did not need it. But it would be interesting to know...

 

P.S.:  English language is not my native language, and any person is not insured from mistakes and typing errors. If you have found an error in the text, please let me know.

P.P.S.: If you have some ideas, how to correct/improve the report - please don't hesitate to leave a comment.


SAP HANA Installation in Oracle VirtualBox VM

$
0
0

A very interesting article was bought to my attention about two weeks ago regarding the installation of SAP HANA Platform Edition 1.0 SP05 on a VMware virtual machine (most likely VMware player).  Thus being a SAP HANA enthusiast I decided to undertake the same process using Oracle VirtualBox v4.2.12.  I had been a big fan of VMware player for a long time, but about 2 years ago I switched to VirtualBox (reasons that I won't get into right now).

 

So, after finally getting my purchase order approved by my wife, I upgraded my PC to 32GB of RAM and installed SAP HANA PE1.0 SP05 into VirtualBox running SAP SUSE Linux Enterprise Server 11.2.

 

The installation is relatively straight forward with a couple of minor VirtualBox issues.  The full instructions can be found here thanks to W. Goslinga :

http://scn.sap.com/community/developer-center/hana/blog/2013/05/08/how-to-install-the-hana-server-software-on-a-virtual-machine

 

The key quirks with VirtualBox were:

  1. You need to enable the CMPXCHG16B instruction after you have created the guest in VirtualBox.  Without the CMPXCHG16B instruction enabled the HANA installation will fail.
  2. VirtualBox with SUSE 11.2 running on my Intel i7 reported the number of CPU sockets as 0.  Thus the HANA hardware check would fail with a divide by 0 error and terminate the installation regardless of the IDSPISPOPD environment variable.  I manually updated the HanaHwCheck.py shell script and forced the number of sockets to 1.
  3. One last issue, I had the HANA media sitting on a different VM and had NFS mounted the filesystem to my HANA host.  I had a number of packages that failed to "untar" during the installation until I mounted the NFS share as a "rw,hard,intr" mount.  Obviously the NFS soft mount was playing nice over my internal network.

 

Technical bits:

  1. cd <virtualbox install dir>; VBoxManage setextradata [vmname] VBoxInternal/CPUM/CMPXCHG16B 1
  2. vi <path to HANA media>/DATA_UNITS/HDB_SERVER_LINUX_X86_64/server/HanaHwCheck.py
    • comment out the line > self.HWInfo['CPU Sockets']=len(lines)-1
    • insert line > self.HWInfo['CPU Sockets']=1    (or set to the actual number of sockets you have)
  3. If using an NFS mount ensure its set to a "hard" mount eg: vi /etc/fstab
<nfshost>:/software/softwarenfsrw,hard,intr 0 0

 

 

Below is a screen shot of SAP HANA Studio directly after I finished the installation of HANA.

Studio-Admin.GIF

 

I plan to post a YouTube video of the installation process shortly.  The session will cover the full life cycle from VirtualBox guest creation including network config, through to the completion of the HANA installation.  We'll also install SAP HANA Studio on the physical PC and connect to the HANA backend.  Stay tuned.

Connecting to your hana database from php using odbc.

$
0
0

I had spent quite some time to make a connection to my hana database from my PHP page. I did find a lot of help from the forum. I just want to take the time to help out anyone new to SAP HANA like myself. The first thing one needs to know is that the PHP's 32 bit usually, so you'll need to install a 32 bit hana client(to get 32bit hana odbc drivers) to make odbc connections from your PHP page. Here's a howto : http://www.youtube.com/watch?v=au7eziBLAtU . You can check if the installation went all right by opening ODBC Data Sources (32 bit) by searching "ODBC Data Sources", it is usually located in "C:\Windows\SysWOW64\odbcad32.exe". If the driver HDBODBC32 is not listed in the drivers tab, you'll have to add a new data source from the System DSN tab. Once you have the HDBODBC32(32 bit drivers) you are all set. Also note that the php odbc is set by default, so you probably won't have to modify your php.ini. Im using xampp and did not have to do anything.

 

Here's some working sample code.

<?php

 

$driver  = "HDBODBC32"; // 32 bit odbc drivers that come with the hana client installation.

$servername  = "yourservername.vm.cld.sr:30015"; // Enter your external access server name

$db_name = "HDB";        // This is the default name of your hana instance.
$username= "SYSTEM"; // This is the default username, do provide your username
$password= "manager";  // This is the default password, do provide your own password.
$conn    = odbc_connect("Driver=$driver;ServerNode=$servername;Database=$db_name;", $username, $password, SQL_CUR_USE_ODBC);

 

// example query string.

$queryString = 'INSERT INTO "SCHEMA_NAME"."table_name" (SiteID,Date_Time,SensorValue,KVA,PF,ErrorLog) VALUES('.$siteID.',\''.$time.'\','.$sensorValue.','.$kva.','.$pf.',\''.$errorLog.'\' )';

 

//echo $queryString; to get clarification, you can copy and paste your query string in SAP HANA Studio and see the results.

 

// if condition's optional.

if ($conn)

{

     odbc_exec($conn, $queryString); // odbc_exec prepares and executes the sql statement.

}

 

?>

A simple rule to live by when using the SAP HANA IMPORT FROM command - Don't forget the ERROR LOG clause

$
0
0

In life, there are simple rules to live by. For example, Jim Croce told us that "You don't mess around with Jim". In the database world there are similar rules like "you never issue an UPDATE   or DELETE statement without a WHERE clause. In this blog post, I'm hopefully going to convince you to add another rule - "Always include the ERROR LOG clause with your IMPORT FROM command".

 

I'm currently working on a Big Data project where I'm importing the results from a generated data file based on Wikipedia page count data. After doing the import operation, I did a SELECT COUNT(*) on the resulting table and then got to wondering - was I hallucinating or am I missing almost 2 million rows of data?

 

It turns out that I was missing more than 2 million rows of data - yikes! So what's going on? When I ran the IMPORT FROM command, it reported that it took  45 seconds, affected 0 rows (that alone is a bit disturbing) and that there were no errors. So, again - what's going on?

 

Since the data I'm getting from Wikipedia could be suspect, my first inclination is that it had to be a problem with using a pipe "|" symbol as a delimiter. My original file actually used the 0x01 character as a field delimiter and I has used the following sed Linux command to change them to the pipe character:

sed "s/\x01/|/g" 000000 > 000000.csv

 

I then used the wc (word count) command to count the number of rows in both files to compare the results.

hana:/wiki-data/year=2013/month=05 # wc --lines 000000*

  4755634 000000

  4755634 000000.csv

  9511268 total

 

As you can see, the line counts were identical. So, I opened up the help topic for the command at http://help.sap.com/hana/html/sql_import_from.html and noticed that there is a clause called "ERROR LOG", so why not give it a try. I went ahead and added the following clause:

ERROR LOG '/wiki-data/import.err'

 

After running the IMPORT FROM command again, I got no errors, but what was weird was I also had no import.err file in my /wiki-data directory. This I've seen before, so I issued the following Linux command to make sure the SAP HANA database engine can write data to this directory:

chmod 777 /wiki-data

 

Lo and behold, I ended up with a 55 meg import.err file! It turns out that there were two things preventing the load of all the data. First, one of my column definitions was not large enough to support the longest of the Wikipedia page titles which was 1023 - more than double of the VARCHAR(500) that I had defined.  So, I dropped the table and recreated it with a column length of 2000 to be on the safe side. I then came across a new - numeric overflow. It turns out I needed to use a BIGINT data type for the number of bytes download for an hour for pages. After making that correction, I now got the COUNT(*) to match the line count for the three CSV files that I imported.

 

I was lucky and noticed that the COUNT result didn't seem right and tracked it down, but I'm guessing that most people that use the IMPORT FROM command aren't using the optional ERROR LOG clause. So - back to the new rule.

  1. Create a directory that you will use for your error file and make sure the HANA database engine has rights to the file using the chmod 777 <directory name> command.
  2. Just because the IMPORT FROM command reports no errors when running it from SAP HANA Studio, doesn't mean there were no errors. Always include the ERROR LOG clause and then check to see that it's a zero byte file. Otherwise, open it up and examine the records.
  3. Tell your friends and colleagues about this rule.

 

So in the spirit of Jim Croce, you can check it out my updated lyrics and sing along:

"You don’t tug on Superman's cape"

"You don’t spit into the wind"

"You don’t pull the mask off that old Lone Ranger"

And you don't forget the ERROR LOG clause for IMPORT FROM command.

 

Again, data is a precious thing to waste, so please pass this on.

Regards,

Bill Ramos

A peek inside xSync and the HANA XS Engine

$
0
0

icon_128x128.png

 

On saturday I published a blog about a small app I wrote called xSync - basically a XS Engine app for Mac developers where you can sync a local development folder with your HANA repository. This is for rapid development and to encourage the "bring your own IDE" approach to application development on HANA. Here is a look behind the scenes on how the app works and some of the challenges of the project.

 

  Image.png

As mentioned in my previous blog - I started using the IDE Lightweight editor after doing the upgrade of my AWS HANA box last weekend. I enjoyed the experience but after working with it for nearly a full day was wanting a little more. Syntax highlighting, easy commenting, easy indentation, CSS autocomplete and hints, etc. etc. so I started doing some peaking around the editor itself and came to find the editor is something called ACE, a pretty nice little open source IDE (written in JS). This got me thinking … maybe I could insert text directly into the Lightweight IDE browser text box, and submit the form as a save …. hmmm …. not a terrible idea …. just need to scrape the page, find the elements and submit the form via some injected JS. Pretty simple …  I did some digging and found the HTML objects I needed by using Firebug when a lightbulb went off … instead of populating the form via a HTML page, why not rather check the HTTP methods it is calling when doing the actual save, since there must be some integration with HANA directly … which is when I came across the mother load … a small file called reposervice.xsjs It seemed that every time I was saving or modifying my objects through the IDE, it was calling this file. After checking out the parameters it was `, it was very clear that the methods and text were easy to simulate. I fired up REST Client and within a couple minutes the concept was POC'ed. Pass your file contents as your body with a path param and a POST and you were off to the races

 

 

Screen Shot 2013-06-09 at 4.39.png

Using Firefox Rest Client to monitor system calls showed each save, create, delete operation was using a small file called reposervice.xsjs, which references the libraries needed for the repository modifications.

 

 

Image2.png

 

The diagram above displays the HTTP call made when saving/creating a file, and how the IDE initially does a HEAD request for the XSRF token, followed by the HTTP PUT.

 

 

The initial HEAD request is to fetch the CSRF Token, secondly the token along with the parameter of mode, path and activate are passed to the URL. Pending you are successful, a JSON message is returned with the status. For those of you are not familiar with Cross-Site-Request-Forgery, you can read about it here: http://en.wikipedia.org/wiki/Cross-site_request_forgery

 

Once I had this done, I was wondering what the best integration option would be and weighed up a couple options of a simple check in type procedure, but wanted something faster, easier and "click free". Being a bit of a highly iterative developer myself, I find it easier to develop "online", which is why I decided it would be best to do a File System watch of a particular folder and save any changes automatically to my HANA instance. Similar to a dropbox type approach.

 

I had my POC working nicely, a integration goal defined and set out to start developing the UI/Application in Objective-C (Xcode). I had a template type of app from one of my little SAP Note Viewer applications which could act as a foundation. I threw some code out and pulled some very useful little open source packages in as helpers. Within a couple hours in my evenings each night the app was running nicely and doing what I had expected, modify a file or two in a predefined location and sync up to XS. easy.

 

Thats generally where development grinds to halt for me, as I envision feature after feature to build a Mac clone of HANA Studio Luckily my senses got the better of me, and I worked on doing a recursive package downloader, the ability to create, rename and delete files and folders and not a HANA Studio rewrite Once this was all done, ironing out the bugs was painful. The cocoa FSEvents stream (File System Events) Class on the mac is not easy to work with and a bear at best. Having to monitor a folder for any modifications, deletes and creates turned into a bit of a logic nightmare. One of the interesting challenges is that if you "delete" a file on the mac file system, it does not get a "delete" FS Event but rather a rename! (Since it goes to the trash/recycle bin!). This leads to having to do multiple … if exists then …. type statements around each file and folder event

 

UI is another interesting one, I like apps to look somewhat decent … and I spent a good amount of time working on each of the elements in Adobe Photoshop as usual … (Whenever I do a mobile app development talk I mention that I spend close to 40% of entire project time in apps like Photoshop with design work! Most are surprised!)

 

If you are interested in incorporating some these types of features into your own app, I will be posting a copy of the integration classes on GitHub shortly.

 

PLEASE KEEP IN MIND: This is exploratory type work with undocumented API's, I would not recommend using this in production, or any important production work (or your important opensap homework!). The reason I shared this was to encourage people to look under the hood and understand the how's and why's of how some of these great new tools work.

 

I would be interested to hear if anyone has any interesting use-cases for being able to manipulate both HANA repository and DB artifacts from outside of the Studio? Does anyone have any challenges with the HANA Studio today they would like to see changed?

Real-time sentiment rating of movies on SAP HANA One

$
0
0

I am an intern visiting Palo Alto from SAP’s Shanghai office for a month-long project. It’s my first trip to the bay area so I am soaking up all the sun and all the excitement here. Last weekend, I found myself wanting to watch a movie. I searched the internet and found all the new releases listed on rottentomatoes and imdb but it was hard to pick one. I wanted to get a pulse of the movie before I watch it not from the critics but actual movie goers like me. Also, I wanted one which had high buzz not only in US but also in China. So I decided, why don’t I build one myself, after all I am in the heart of Silicon Valley.

 

I decided to pick SAP HANA One to power my app not just because I got the db & application server in the cloud but also because the platform would support sentiment analysis for English & Simplified Chinese right out-of-the-box! I used the Rotten Tomatoes API to find newly released movies and twitter & Sina Weibo APIs for sentiment for US & China respectively.

 

Prerequisites

 

Before we start to build the application, we need to get SAP HANA One developer edition and install SAP HANA Studio. You can get the info here:

"Get your own SAP HANA, developer edition on Amazon Web Services" http://scn.sap.com/docs/DOC-28294

 

You can find how to get SAP HANA One developer edition in part 1, 2, 5 and how to install SAP HANA Studio in part 3, 4.

 

Schema

 

I did most of my work in the HANA Studio which is based on the eclipse IDE so very familiar for Java and other open-source developers.

1.jpg

First, I created a schema and full text index for all the movie metadata, including title, rating, running time, release data, synopsis, etc. Then I used the JTomato (https://github.com/geeordanoh/JTomato) to populate the table.

 

MOVIE: Stores movie metadata, including the title, rating, runtime, release date, etc.

2.jpg

Then I used Twitter4J (http://twitter4j.org) to search the movie keywords on Twitter. I found that twitter, given just the keyword, did a good job pulling all combinations of the movie name: fast and furious, fast & furious.

 

TWEET: Stores crawled tweets from Twitter, including ID, time, location, content, etc.

3.jpg

However, I ran into problems while crawling Sina Weibo because they have a strict process for usage of their API. So I decided to use Tencent Weibo instead.

 

TWEET_ZH: Stores crawled tweets from Tencent Weibo

4.jpg

Next I created a fulltext index and sentiment tables (called VoiceOfCustomer) using the following SQL. Voila! I now have sentiment analysis for all twitter and tencent weibo data!

 

CREATE FULLTEXT INDEX TWEET_I ON TWEET (CONTENT) CONFIGURATION 'EXTRACTION_CORE_VOICEOFCUSTOMER' ASYNC FLUSH EVERY 1 MINUTESLANGUAGE DETECTION ('EN') TEXT ANALYSIS ON;


CREATE FULLTEXT INDEX TWEET_ZH_I ON TWEET_ZH (CONTENT) CONFIGURATION 'EXTRACTION_CORE_VOICEOFCUSTOMER' ASYNC FLUSH EVERY 1 MINUTESLANGUAGE DETECTION ('ZH') TEXT ANALYSIS ON;

 

TWEET_I: Used to perform sentiment analysis for the table TWEET

5.jpg

TWEET_ZH_I: Used to perform sentiment analysis for the table TWEET_ZH

6.jpg

In addition to the tables in SAP HANA and the full text index to perform sentiment analysis, I also wrote stored procedures to wrap complex SQL making it easy for XS (HANA’s application server) to consume.

 

Architecture

 

The final architecture looks like this:

7.jpg

 

Rating


Now, I had to create a formula to quantify rating. I used a very simple formula for this:

 

Score = (# of strong positive sentiment * 5 + # of weak positive sentiment * 4 + # of neutral sentiment * 3 + # of weak negative sentiment * 2 + # of strong negative sentiment *1) / # of total sentiments

 

This score would be helpful to rank movies so I could easily pick the top one.

 

Additionally, I showed a distribution of the sentiments, positive vs. negative vs. neutral, so I could better understand how strong or weak people’s opinion was on the movie both in US & in China.

 

XS Application

 

The application should be built on XS Engine to prevent data transfer latency between the database and the web application server so users can access the website directly. The application was built in the following steps:

 

Step 1: Create stored procedures for rating and sentiment analysis

Currently, there are two stored procedures in the app. One is for rating and the other is for sentiment analysis:

 

1. Rating

We can use the following SQLs to create the type and the stored procedure:

 

CREATETYPE MOVIEINFO ASTABLE (

      POSTER NVARCHAR(100),

      TITLE NVARCHAR(100),

      RATING DECIMAL(5, 2),

      NUM INTEGER,

      TITLE_ZH NVARCHAR(100),

      RATING_ZH DECIMAL(5, 2),

      NUM_ZH INTEGER,

      YEARINTEGER,

      MPAA_RATING NVARCHAR(100),

      RUNTIME NVARCHAR(100),

      CRITICS_CONSENSUS NVARCHAR(2000),

      RELEASE_DATE DATE,

      SYNOPSIS NVARCHAR(2000),

      ID INTEGER

);

 

CREATEPROCEDURE GETMOVIEINFO(OUT RESULT MOVIEINFO) LANGUAGE SQLSCRIPT READS SQL DATA AS

BEGIN

RESULT =

SELECT A.POSTER, A.TITLE, B.RATING, B.NUM, A.TITLE_ZH, C.RATING_ZH, C.NUM_ZH, A.YEAR, A.MPAA_RATING, A.RUNTIME, A.CRITICS_CONSENSUS, A.RELEASE_DATE, A.SYNOPSIS, A.ID

FROM MOVIE A

 

INNERJOIN

 

(SELECT ID, CASESUM(NUM) WHEN 0 THEN 0 ELSETO_DECIMAL(SUM(TOTAL) / SUM(NUM), 5, 2) ENDAS RATING, SUM(NUM) AS NUM FROM

(SELECT

  1. A.ID,
  2. C.TA_TYPE,

COUNT(C.TA_TYPE) AS NUM,

CASE C.TA_TYPE

      WHEN'StrongPositiveSentiment'THENCOUNT(C.TA_TYPE) * 5

      WHEN'WeakPositiveSentiment'THENCOUNT(C.TA_TYPE) * 4

      WHEN'NeutralSentiment'THENCOUNT(C.TA_TYPE) * 3

      WHEN'WeakNegativeSentiment'THENCOUNT(C.TA_TYPE) * 2

      WHEN'StrongNegativeSentiment'THENCOUNT(C.TA_TYPE) * 1

ENDAS TOTAL

FROM MOVIE A

LEFTJOIN TWEET B

ON A.ID = B.MOVIEID

LEFTJOIN"$TA_TWEET_I" C

ON B.ID = C.ID AND C.TA_TYPE IN ('StrongPositiveSentiment', 'WeakPositiveSentiment', 'NeutralSentiment', 'WeakNegativeSentiment', 'StrongNegativeSentiment')

GROUPBY

  1. A.ID,
  2. C.TA_TYPE) A

GROUPBY ID) B ON A.ID = B.ID

 

INNERJOIN

 

(SELECT ID, CASESUM(NUM) WHEN 0 THEN 0 ELSETO_DECIMAL(SUM(TOTAL) / SUM(NUM), 5, 2) ENDAS RATING_ZH, SUM(NUM) AS NUM_ZH FROM

(SELECT

  1. A.ID,
  2. C.TA_TYPE,

COUNT(C.TA_TYPE) AS NUM,

CASE C.TA_TYPE

      WHEN'StrongPositiveSentiment'THENCOUNT(C.TA_TYPE) * 5

      WHEN'WeakPositiveSentiment'THENCOUNT(C.TA_TYPE) * 4

      WHEN'NeutralSentiment'THENCOUNT(C.TA_TYPE) * 3

      WHEN'WeakNegativeSentiment'THENCOUNT(C.TA_TYPE) * 2

      WHEN'StrongNegativeSentiment'THENCOUNT(C.TA_TYPE) * 1

ENDAS TOTAL

FROM MOVIE A

LEFTJOIN TWEET_ZH B

ON A.ID = B.MOVIEID

LEFTJOIN"$TA_TWEET_ZH_I" C

ON B.ID = C.ID AND C.TA_TYPE IN ('StrongPositiveSentiment', 'WeakPositiveSentiment', 'NeutralSentiment', 'WeakNegativeSentiment', 'StrongNegativeSentiment')

GROUPBY

  1. A.ID,
  2. C.TA_TYPE) A

GROUPBY ID) C ON A.ID = C.ID

ORDERBY B.RATING DESC

;

END;


After creating the type and the stored procedure successfully, we can use the following SQL to test:

 

CALL GETMOVIEINFO(?);

8.jpg

From the column “RATING” and “RATING_ZH”, we can show the score on the main page.

 

2. Sentiment analysis

We can use the following SQLs to create the type and the stored procedure:

 

CREATETYPE SENTIMENT ASTABLE (SENTIMENT NVARCHAR(100), NUM INTEGER);

 

CREATEPROCEDURE GETSENTIMENT(IN ID INTEGER, IN LANG VARCHAR(2), OUT RESULT SENTIMENT) LANGUAGE SQLSCRIPT READS SQL DATA AS

BEGIN

      IF LANG = 'EN'THEN

      RESULT = SELECT'Strong Positive'AS SENTIMENT, COUNT(*) AS NUM FROM"$TA_TWEET_I" A

            INNERJOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = 'StrongPositiveSentiment'

            UNIONALL

            SELECT'Weak Positive'AS SENTIMENT, COUNT(*) AS NUM FROM"$TA_TWEET_I" A

            INNERJOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = 'WeakPositiveSentiment'

            UNIONALL

            SELECT'Neutral'AS SENTIMENT, COUNT(*) AS NUM FROM"$TA_TWEET_I" A

            INNERJOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = 'NeutralSentiment'

            UNIONALL

            SELECT'Weak Negative'AS SENTIMENT, COUNT(*) AS NUM FROM"$TA_TWEET_I" A

            INNERJOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = 'WeakNegativeSentiment'

            UNIONALL

            SELECT'Strong Negative'AS SENTIMENT, COUNT(*) AS NUM FROM"$TA_TWEET_I" A

            INNERJOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = 'StrongNegativeSentiment';

      ELSEIF LANG = 'ZH'THEN

      RESULT = SELECT'很好'AS SENTIMENT, COUNT(*) AS NUM FROM"$TA_TWEET_ZH_I" A

            INNERJOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = 'StrongPositiveSentiment'

            UNIONALL

            SELECT''AS SENTIMENT, COUNT(*) AS NUM FROM"$TA_TWEET_ZH_I" A

            INNERJOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = 'WeakPositiveSentiment'

            UNIONALL

            SELECT'一般'AS SENTIMENT, COUNT(*) AS NUM FROM"$TA_TWEET_ZH_I" A

            INNERJOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = 'NeutralSentiment'

            UNIONALL

            SELECT''AS SENTIMENT, COUNT(*) AS NUM FROM"$TA_TWEET_ZH_I" A

            INNERJOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = 'WeakNegativeSentiment'

            UNIONALL

            SELECT'很差'AS SENTIMENT, COUNT(*) AS NUM FROM"$TA_TWEET_ZH_I" A

            INNERJOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B

            ON A.ID = B.ID

            WHERE A.TA_TYPE = 'StrongNegativeSentiment';

      ENDIF;

END;


After creating the type and the stored procedure successfully, we can use the following SQLs to test:

 

CALL GETSENTIMENT(771313125, 'EN', ?);

9.jpg

CALL GETSENTIMENT(771313125, 'ZH', ?);

10.jpg

Step 2: Build the application based on XS Engine

Till now, we can access the tables, indexes, data and stored procedures directly from the XS Engine. To build the application, follow the following steps:

 

1. Create .xsaccess, .xsapp and .xsprivileges to do the access control.


2. Create getMovies.xsjs to call the stored procedure “GETMOVIEINFO

 

function createEntry(rs) {

      return {

            "poster" : rs.getNString(1),

            "title" : rs.getNString(2),

            "rating": rs.getDecimal(3),

            "num": rs.getInteger(4),

            "title_zh" : rs.getNString(5),

            "rating_zh": rs.getDecimal(6),

            "num_zh": rs.getInteger(7),

            "year": rs.getInteger(8),

            "mpaa_rating": rs.getNString(9),

            "runtime": rs.getNString(10),

            "critics_consensus": rs.getNString(11),

            "release_date": rs.getDate(12),

            "synopsis": rs.getNString(13),

            "id": rs.getInteger(14)

      };

}

 

try {

      var body = '';

      var list = [];

 

      var query = "{CALL SMARTAPP.GETMOVIEINFO(?)}";

      $.trace.debug(query);

      var conn = $.db.getConnection();

      var pcall = conn.prepareCall(query);

      pcall.execute();

      var rs = pcall.getResultSet();

 

      while (rs.next()) {

            list.push(createEntry(rs));

      }

 

      rs.close();

      pcall.close();

      conn.close();

 

      body = JSON.stringify({

            "entries" : list

      });

 

      $.response.contentType = 'application/json; charset=UTF-8';

      $.response.setBody(body);

      $.response.status = $.net.http.OK;

} catch (e) {

      $.response.status = $.net.http.INTERNAL_SERVER_ERROR;

      $.response.setBody(e.message);

}

 

3. Create getSentiment.xsjs to call the stored procedure “GETSENTIMENT”

 

function createEntry(rs) {

      return {

            "sentiment" : rs.getString(1),

            "num" : rs.getInteger(2)

      };

}

 

try {

      var id = parseInt($.request.parameters.get("id"));

      var lang = $.request.parameters.get("lang");

 

      var body = '';

      var list = [];

 

      var query = "{CALL SMARTAPP.GETSENTIMENT(?, ?, ?)}";

      $.trace.debug(query);

      var conn = $.db.getConnection();

      var pcall = conn.prepareCall(query);

      pcall.setInteger(1, id);

      pcall.setString(2, lang);

      pcall.execute();

      var rs = pcall.getResultSet();

 

      while (rs.next()) {

            list.push(createEntry(rs));

      }

 

      rs.close();

      pcall.close();

      conn.close();

 

      body = JSON.stringify({

            "entries" : list

      });

 

      $.response.contentType = 'application/json; charset=UTF-8';

      $.response.setBody(body);

      $.response.status = $.net.http.OK;

} catch (e) {

      $.response.status = $.net.http.INTERNAL_SERVER_ERROR;

      $.response.setBody(e.message);

}

 

4. Create index.html and code the HTML part.

 

<!DOCTYPEHTML>

<html>

      <head>

            <metahttp-equiv="X-UA-Compatible"content="IE=edge">

            <title>Real-Time Movie Rating</title>

            <scriptsrc="/sap/ui5/1/resources/sap-ui-core.js"

                        id="sap-ui-bootstrap"

                        data-sap-ui-libs="sap.ui.commons,sap.ui.ux3,sap.viz"

                        data-sap-ui-theme="sap_goldreflection">

            </script>

            <!-- add sap.ui.table,sap.ui.ux3 and/or other libraries to 'data-sap-ui-libs' if required -->

 

            <script>

                        sap.ui.localResources("movieui");

                        var view = sap.ui.view({id:"idMovieMatrix1", viewName:"movieui.MovieMatrix", type:sap.ui.core.mvc.ViewType.JS});

                        view.placeAt("content");

            </script>

 

      </head>

      <bodyclass="sapUiBody"role="application">

            <h1>Real-Time Movie Rating</h1>

            <divid="content"></div>

      </body>

</html>

 

5. Create some views and controllers to use native SAP UI 5 to accelerate building the application.

 

Website

 

The live webapp is available at (http://107.20.137.184:8000/workshop/sessionx/00/ui/MovieUI/WebContent/index.html) but I bring down the AWS instance to reduce the billing cost. I have captured screenshots and a brief video if you find the server is down.

 

 

Real-time movie rating homepage

The following screenshot is the app’s main page. For each movie, there are two scores: the upper score is from Twitter and the lower score from Tencent Weibo.

 

11.jpg

I heard a lot of buzz about “Man of Steel” but it is currently ranked No. 7 so I was really curious. “Man of Steel” had a 3.72 rating but “20 Feet from Stardom” had a 4.54 rating. Interesting! Looking closer I discovered that this was because “20 Feet” had only 351 mentions but “Man of Steel” had more than 20K, meaning that a popular movie may not necessarily be the one with the highest score but could also be one which has the most buzz.         

 

I then created a page with detailed breakdown of the sentiments of the Movie’s sentiments for both Twitter and Tencent Weibo. Looks like “Man of Steel” has a higher positive sentiment in China compared to the US. Well not surprising, we like superhero movies and Superman is our favorite.

 

Sentiment/Social Media

Twitter

Tencent Weibo

 

#

%age

#

%age

Strong Positive

9,986

44%

528

34%

Weak Positive

5,903

26%

723

47%

Neutral

839

4%

12

1%

Weak Negative

2,067

9%

123

8%

Strong Negative

3,757

17%

166

11%

12.jpg

Let's see what the score on Rotten Tomatoes looks like. The critics have given it a meager 56% but 82% of the audience liked it. That number is compares well with 70% positive sentiment rating from my real-time rating app. 

13.jpg

"20 Feet from stardom" has 97% rating from critics and 100% from the audience on rottentomatoes. So my real-time rating app was able to successfully identify this hit from social sentiment on twitter. Looks like the movie is a sleeper hit!

15.jpg

14.jpg

 

This application is just a prototype now and I hope to make more enhancements to the drill-down page. For the next version, I want to use the predictive libraries in SAP HANA to create a recommendation engine for movie based on a user’s interests, something like a “Pandora for movies”. Hope you enjoyed reading my blog.

Calling XSJS Service using SAP UI5

$
0
0

Hi Everyone,

 

In the blog SAP HANA Extended Application Services( http://scn.sap.com/community/developer-center/hana/blog/2012/11/29/sap-hana-extended-application-services) by Thomas Jung, he showed us lots of things about XS development and one of them was how to create and extend Server Side JavaScript and it was explained beautifully in the video :

http://www.youtube.com/watch?v=ckw_bhagvdU

 

At the end in the above video, Thomas told us about Calling the XSJS Service From the User Interface :

 

Here i would like to tell how to create the UI files and then call xsjs service step by step

 

1. We will go to Project Explorer tab in SAP HANA Development perspective and then R-Click and select Other:

 

1.jpg

 

2. Select SAP UI5 Application Development and then select Application Project:

 

2.jpg

 

3. We will enter project name and select Desktop for rendering the ui on our desktop and also select create an initial view so that wizard creates a view for us

 

3.jpg

 

4. Enter the name of the View that we need to create Select JavaScript rendering for our purpose

 

4.jpg

 

5. We found that wizard created three objects for us:

index.html

XSUI.controller.js

XSUI.view.js

In index.html file we will enter the full path of Source as

src="/sap/ui5/1/resources/sap-ui-core.js"

 

5.jpg

 

6. After that enter the following code in XSUI.controller.js file :

 

sap.ui.controller("xsui.XSUI", {          onLiveChangeV1: function(oEvent,oVal2){                    var aUrl = '../../../Services/Func.xsjs?cmd=multiply'+'&num1='+escape(oEvent.getParameters().liveValue)+'&num2='+escape(oVal2.getValue());                    jQuery.ajax({                              url: aUrl,                              method: 'GET',                              dataType: 'json',                              success: this.onCompleteMultiply,                              error: this.onErrorCall });          },          onLiveChangeV2: function(oEvent,oVal1){                    var aUrl = '../../../services/Func.xsjs?cmd=multiply'+'&num1='+escape(oVal1.getValue())+'&num2='+escape(oEvent.getParameters().liveValue);                    jQuery.ajax({                              url: aUrl,                              method: 'GET',                              dataType: 'json',                              success: this.onCompleteMultiply,                              error: this.onErrorCall });          },          onCompleteMultiply: function(myTxt){                    var oResult = sap.ui.getCore().byId("result");                     if(myTxt==undefined){ oResult.setText(0); }                     else{                       jQuery.sap.require("sap.ui.core.format.NumberFormat");                       var oNumberFormat = sap.ui.core.format.NumberFormat.getIntegerInstance({                          maxFractionDigits: 12,                          minFractionDigits: 0,                          groupingEnabled: true });                       oResult.setText(oNumberFormat.format(myTxt)); }          },          onErrorCall: function(jqXHR, textStatus, errorThrown){                     sap.ui.commons.MessageBox.show(jqXHR.responseText,                                          "ERROR",                                         "Service Call Error" );                     return;           }
});

 

7. After that enter the following code in XSUI.view.js file

 

sap.ui.jsview("xsui.XSUI", {      getControllerName : function() {         return "xsui.XSUI";      },      createContent : function(oController) {                var multiplyPanel = new sap.ui.commons.Panel().setText("XS Service Test - Multiplication");                multiplyPanel.setAreaDesign(sap.ui.commons.enums.AreaDesign.Fill);                multiplyPanel.setBorderDesign(sap.ui.commons.enums.BorderDesign.Box);                  var layoutNew = new sap.ui.commons.layout.MatrixLayout({width:"auto"});                multiplyPanel.addContent(layoutNew);                var oVal1 = new sap.ui.commons.TextField("val1",{tooltip: "Value #1", editable:true});                var oVal2 = new sap.ui.commons.TextField("val2",{tooltip: "Value #2", editable:true});                var oResult = new sap.ui.commons.TextView("result",{tooltip: "Results"});                var oEqual = new sap.ui.commons.TextView("equal",{tooltip: "Equals", text: " = "});                                var oMult = new sap.ui.commons.TextView("mult",{tooltip: "Multiply by", text: " * "});                 //Attach a controller event handler to Value 1 Input Field                      oVal1.attachEvent("liveChange", function(oEvent){                            oController.onLiveChangeV1(oEvent,oVal2); });                  //Attach a controller event handler to Value 2 Input Field                      oVal2.attachEvent("liveChange", function(oEvent){                            oController.onLiveChangeV2(oEvent,oVal1); });                              layoutNew.createRow(oVal1, oMult, oVal2, oEqual, oResult );                   return multiplyPanel;      }


});

 

8. Now we will save all the files and share the project

 

10.jpg

 

9. Now Select SAP HANA Repository:

 

11.jpg

 

10. Inside the repository select the folder where you would like to share it : I selected UI5 folder here

 

12.jpg

 

11. Now we will commit and activate our UI5 project :

 

13.jpg

 

12. As we share our XSUI project in UI5 folder in the repository, so now we can see that in our project explorer also :

 

15.jpg

 

13. Now in Services folder we will create Func.xsjs file that we have used in our Controller and View in XSUI project.

 

16.jpg

 

14. Now enter the following code in Func.xsjs file :

 

function performMultiply() 
{          var body = '';         var num1 = $.request.getParameter('num1');          var num2 = $.request.getParameter('num2');          var answer;          answer = num1*num2;          body = answer.toString();          $.response.addBody(body);          $.response.setReturnCode($.net.http.OK);
}


var aCmd = $.getParameter('cmd');
switch(aCmd)
{
          case "Multiply": performMultiply();          break;          default:                     $.response.setReturnCode($.net.http.INTERNAL_SERVER_ERROR);                    $.response.addBody("Invalid Choice" +aCmd);
}

 

15. In the browser enter address :  http://ipaddress:8000/path/index.html

 

f.jpg

 

As in the aboveexample we have used JavaScript, JSON, Ajax and JQuery, so i would also like to tell you some basics about them

 

First i will start with JavaScript

 

JavaScript is a Object based Scripting language. It is not a Object Oriented language.

• Client-side JavaScript allows an application to place elements on an HTML form and respond to user events such as mouse clicks, form input, and page navigation.

• Server-side JavaScript allows an application to communicate with a relational database, provide continuity of information from one invocation to another of the application, or perform file manipulations on a server.

 

Features:

JavaScript is very light language

JavaScript supports only supports data type but doesn't support variable type.

For Example:

In Java or C++, if we define a variable of integer type we will define it as :

int a; // so 'a' cannot hold anything other than integer

 

 

But in case of JavaScript, we only define:

var a; // here 'a' can be a string an integer or anything else
a = hello; // 'a' becomes string
a = '10'; // here 'a' becomes an integer

 

JavaScript doesn't have Keywords like Class, Private, Public but there are different ways through which we can make an object Public or Private and we can even use Inheritance concept in JavaScript through the use of prototype and inherit from.

To learn more about JavaScript, please visit http://www.w3schools.com/js/ or http://www.javascriptkit.com/

 

SAP HANA has JavaScript editor that includes the JSLint open-source library, which helps to validate JavaScript code.

For debugging purpose:

We can use SPA HANA debug perspective or any browser like Chrome and Mozilla Firefox.

Chrome has default JavaScript debugger and for Mozilla, we can download a plugin called Firebug.

There is also a free online tool called jsfiddle that can be used to create, debug and run your JavaScript code along with HTML and CSS.

jsfiddle : http://jsfiddle.net/

 

jsfiddle.jpg

 

Now moving on to JQuery :

 

JQuery is a JavaScript Library and it simplifies our JavaScript Coding as we don't need to write many lengthy codes.

In the OPenSAP course in Week 4 Unit 3 example a JQuery function fadeout() was used on Button.

 

To learn more about JQuery, visit http://www.w3schools.com/jquery/ or http://learn.jquery.com/

 

Now about JSON :

 

Well JSON stands for JavaScript Object Notation.

It is a light weight data interchange format and it is preferred over XML because:

In XML, Parsing is difficult and XML doesn't support rich data type( everything is written in the form of string )

Other benefits of JSON are that :

Data is typeless and values can be assigned dynamically.

Its Syntax is written in key and value pair

For Example => User(Key) : Name(Value)

 

We can use eval  and JSON.parse functions to convert JSON String into JavaScript string.

JSON.parse is preferref over eval because of the following reason :

When we use eval to parse JSON data, eval is always active and it might be used to create malicious data that can be harmful to our sources.

For learning JSON visit : http://www.w3schools.com/json/default.asp

 

Finally AJAX :

 

AJAX stands for Ashynchronous JavaScript and XML

It is one of the Web 2.0 standards and is used by web applications to send data to and retrieve  data from a server asynchrously without interfering in the display and beahaviour of the existing page.

It runs completely independent of the browser and no plugins are required.

Google Suggest was the very first example of using AJAX.

In most of the payment sites we generally see " Please Wait! Your Order is being processed" - this is done through the help of AJAX only.

One of the sites from where we can download those moving GIFs for our web design is : http://ajaxload.info/

For learning AJAX visit : http://www.w3schools.com/ajax/

 

Thank You for reading the Blog.

As this is my very first Blog so i would like to get feedback

Reduce your HANA AWS cost by utilizing Spot Instances

$
0
0

I am sure many of you have been using the HANA Developer Edition on AWS for  openSAP's Software Development on HANA course. As we are into week 6 of the course (trying to spend as much time to practice the examples shared by Thomas Jung), I will try to share a quick tip to save on AWS costs.

 

In Week 1 Thomas pointed out that by using Spot Requests we can bid for unused capacities for Amazon instances. While working on these spot instances I realised that there is a catch. Any work that you do on the spot instance would be lost as you need to TERMINATE the instance. Why? Because, there is no option to STOP a spot instance. Thankfully, there is a way around this to ensure you dont lose your long hours' work.

 

Pre-requisites:

  1. SAP HANA Developer Edition on AWS
  2. An Amazon Machine Image (referred to as base AMI subsequently) taken at a suitable point of time. This must include all your development artifacts that you would like to be available in your spot instance (to be requested in subsequent steps).

 

Steps:

  1. Create a new Spot Request using your base AMI
  2. When you create the request ensure you mark all your EBS volumes as "Delete on Termination" = TRUE. This will ensure that after you are done with using your HANA instance, these volumes don't lie in your AWS account adding to the costs
  3. After you HANA instance is up and running, associate your Elastic IP to the newly created instance. You can then complete your development tasks that you planned for. Please Save & Activate everything to ensure the HANA repository is up to date.
  4. After your tasks are complete and you no longer need the instance, create a new Amazon Machine Image for your "Spot Instance". [Please note that you are not creating an image of your original instance which is probably not running at this point of time.]SpotInstances.jpg
  5. The status of your new AMI will change to "available" after successful completion.
  6. Remove the association between the Elastic IP and the Spot Instance.
  7. Now you can terminate your spot instance.

 

Whenever you want to resume your work for HANA, you can create a new Spot Request using the new AMI created in Step 4 and use it as the Base AMI. As you keep following these steps week after week, please ensure that you remove older AMIs & snapshots to keep your costs down.

 

If you do not want the hassle of creating a different AMI each time, Mani Sekaran Muthu has a good blog on an alternative approach.

 

http://scn.sap.com/community/developer-center/hana/blog/2013/06/23/preventing-the-deletion-of-ebs-volumes-during-the-termination-of-aws-spot-instances


My HANA Certification (C_HANAIMP_1)Experience

$
0
0

Hello Everyone

 

With this blog, I would like to share few Unique things, My Experience and Views of SAP HANA Application (ASSOCIATE LEVEL , C_HANAIMP_1). I have cleared this exam on 20th June 2013 with 2.5 to 3 months of efforts. As I was having 3+ years of experience in SAP BI which helped me in understanding few concepts of HANA, but frankly speaking, you can learn HANA without any BI, BO, ABAP. They just add an advantage to understand HANA easily.

 

 

Registration for the Certification Exam :

       

     First of all, check the examination centers which are available near to you in Training and Certification Shop.

If there are no examination centers in your country, you can contract the PEARSON VUE  (SAP Testing with Pearson VUE.) to take the exam in your country.  I have taken the exam in London, so i just mailed toeducation.uk@sap.com and they have done all booking to me. You need to carry a photo identity proof and you knowledge to your examination hall. Nothing much required from your end, everything will be taken care by SAP.

 

How to Prepare for a Certification Exam in 5 Simple Steps

 

Books, Materials & Links :

 

1. Well ! The material i have prepared for the exam are HA100 & HA300. It is very much important to study these material each line by line to clear your certification.

 

2. Apart from these material, i have practiced on HANA at cloudshare for 30days free trail one.  If you want to get the cloudshare hana access for 30days, please follow this document Get 30 days of free access to SAP HANA, developer edition . Later i have taken the hana on AWS (SAP and Amazon Web Services).

 

3. I have registered in OPEN.SAP (openSAP) for Introduction to software development on HANA. None of the topics (Except Modeling) from opensap course were a part of the certification but this course helped me a lot in understanding the overview and benefits of HANA.

 

4. Reguarly visit the HANA forums and read the questions, documents, blogs posted there. Surely, you will get a great knowledge from these forums.

SAP HANA Developer Center

SAP HANA and In-Memory Business Data Management

 

 

5. Lots of other links which i have followed,

 

My Experience on HANA Certification

     Welcome | SAP HANA

The ultimate set of HANA related links from SAP

 

 

 

During Exam

 

After entering into the examination hall, you will be given a userID and password with respect to your registration to login into the system. Apart from the credential, you need to fill up a sheet which consists of your address, name to be appear on certificate , mail id, phone no.. blah blah blah....

 

Once you login to the system, read the instruction and begin the exam..

 

There are 3 types of question,

 

1. One correct answers.

2. Multiple correct answers. In this type of question,  it will tell how many correct answers are available and we are allowed to check only those number of options.

 

For Example, After the question in the NOTE they will display saying that this question has 2 correct answers.

 

3. Drop Down Answers.

 

In this type, you need to select the correct answers from the drop down list. The drop down list may contain 4 to 5 values.

 

Most of the question i faced are multiple correct answers (In which i have been asked where there are  2 correct answers) and 1 question from drop down type and remaining are single correct answer types.

 

During the exam, if you have any doubts or question or any problem, we just need to raise our hand and the invigilator will walk up to you.

80 questions and 180 minutes. Read the question twice or thrice, there will be few tricky questions asked, where you will be tempted to choose the wrong answers.

You are allowed to navigate to previous question and if you have any doubt on the question, you can flag it and come to it later after answering all the question.

 

Once you do the final submit, the score is displayed , The score is drilled down and shows your performance in each areas.

 

Hope you all get some information from this blog.

 

Thank You !

Scalar User Defined Functions in SAP HANA

$
0
0

Back in December, I introduced you Table UDFs in HANA 1.0 SP5.  At that time, I also mentioned that we are working on implementing Scalar UDFs as well.   Today, I am very happy to announce that as of HANA 1.0 SP6(Rev 60), we now support Scalar UDFs as well.  Scalar UDFs are user-defined functions which accept multiple input parameters and result exactly one scalar value.  These functions allow the developer to encapsulate complex algorithms into manageable, reusable code which can then be nested within the field list of a SELECT statement.  If you have worked with scalar UDFs with other databases, you know how powerful they can be.  Below is an example showing how to create two scalar UDFs, and then leveraging both within the field list of a SELECT statement.  This is a very simplistic example, and of course the logic can be done by other means, I just wanted to remove any complexity of logic and focus purely on the syntax.

 

CREATEFUNCTION add_surcharge(im_var1 decimal(15,2), im_var2 decimal(15,2))

RETURNS result decimal(15,2)

LANGUAGE SQLSCRIPT  

SQL SECURITY INVOKER AS

BEGIN 

result := :im_var1 + :im_var2;

END

 

 

CREATEFUNCTION apply_discount(im_var1 decimal(15,2), im_var2 decimal(15,2)) 

RETURNS result decimal(15,2)

LANGUAGE SQLSCRIPT  

SQL SECURITY INVOKER AS

BEGIN 

result := :im_var1 - ( :im_var1 * :im_var2 );

END;

 

Once you execute the CREATE statements in the SQL Console, the new objects will show up on the catalog in the “Functions” folder.

  pic1.png

 

As shown below, you can now use the functions in the field list of your SELECT statements.

  pic2.png

 

Again, this is a pretty simple example, but I think you can see how powerful a tool scalar UDFs could be to a developer.   Currently, both table and scalar UDFs can only be created via the SQL Console, but rest assured we are working to allow the creation of these artifacts in the HANA repository via an XS Project.

Using Dynamic Filters in SAP HANA

$
0
0

With SAP HANA 1.0 SP6(Rev 60), we can now leverage the concept of dynamic filters.   There have been several requests for this type of functionality, since SAP does not recommend the use of dynamic SQL(EXEC statement) when developing SQLScript procedures.  We now have a new statement in SQLScript called APPLY_FILTER.  This statement accepts two parameters.  The first parameter is the dataset in which you want to apply the filter.  This dataset can be a database table, database view, HANA attribute or calculation view, or even an intermediate table variable.  The second parameter is of course the filter condition itself. This would be very similar syntax that you would use in the WHERE clause of a SELECT statement.   In the following example, I have a SQLScript procedure which simply reads data from the “Products” table and applies a filter which is passed as an input parameter to the procedure.  The result set then shows the filtered dataset.

 

CREATEPROCEDURE get_products_by_filter(

            IN im_filter_string VARCHAR(5000),

            out ex_products “SAP_HANA_EPM_DEMO"."sap.hana.democontent.epm.data::products" )

  LANGUAGE SQLSCRIPT

  SQL SECURITY INVOKER

  READS SQL DATA AS

 

BEGIN

 

ex_products =

APPLY_FILTER("SAP_HANA_EPM_DEMO"."sap.hana.democontent.epm.data::products",

                   :im_filter_string) ;

END;

 

ppic1.png

SAP HANA SPS6 - Various New Developer Features

$
0
0

With the recent release of SAP HANA SPS6 (Revision 60), developers working on SAP HANA can expect a wide variety of new and improved features.  In this blog I would like to highlight a few of the most prominent features for developers building SAP HANA native applications.

 

Development Workflow

 

First, we have various improvements to the overall development workflow; most of these improvements focused on the SAP HANA Studio. For example SAP HANA Studio in SPS6 brings in Eclipse 4.2, introduces new object search dialogs, better integration of the view modeling tools into the Repository view and Project Explorer, keyboard shortcuts and toolbar icons for check and activate, improved error display, and numerous other small enhancements. 

 

One of the enhancements to the workflow actually comes outside of the HANA Studio.  There is a new Application Lifecycle Manager web application which is part of HANA. For developers one of the biggest advantages of this new tool is an application creation wizard.  From this tool, a developer can quickly create a Schema, .xsapp, .xsaccess and even a local developer role which already has the necessary package and schema access rights. 

 

For a quick overview of these various tool improvements and how they streamline the development process, have a look at the following video.

 

 

 

Browser Based IDEs

One of the new features of SAP HANA SPS6 is the inclusion of new browser based development tools for SAP HANA native application development. These tools make it fast and easy to get started creating html, server side JavaScript, and OData services directly within SAP HANA. 

 

No longer are you required to install the SAP HANA Studio and Client if you only need to do some basic development object creation or editing in the SAP HANA Repository. This means you can be coding your first application within seconds of launching a SAP HANA instance.

 

The usage of such browser based development tools is particularly appealing to cloud-based SAP HANA development scenarios, like SAP HANA one.  You only need access to the HTTP/HTTPS ports of the SAP HANA server and avoid the need for any additional client side software installation. A browser pointed at the SAP HANA server is all you need to begin development.

 

 

 

On the openSAP forums, I was recently asked why SAP decided to invest in browser based tools when we already have SAP 

HANA Studio. I'm going to plagiarize myself and repost the response here:

 

Web IDEs require no installation on the client side.  So if you are using a MacOS device today, you can't use HANA Studio for development tasks as it isn't supported (has no REGI / HANA client). Therefore the Web IDE offers an alternative if your OS doesn't support Studio.

 

Web IDEs run on the HTTP port. Especially in the cloud usage scenario, it’s probably easier to get this port open rather than the various ports used by the Studio. 

 

Web IDEs work on mobile devices. No you probably don't want to develop day to day on a mobile device, but what if you get a support call in the middle of the night?  It’s a great way to quickly check on something. I've used it to fix a demo in the middle of an event from my phone.  

 

Web IDEs are good for system admins that might want to only change a xsaccess file.  Even if they have HANA Studio, they will probably feel more comfortable with the direct edit/save workflow of the Web IDEs rather than the project check out /commit /activate of the HANA Studio. 

 

The Web IDEs allow you to delete even the .project file and .settings folders from repository packages. HANA Studio can't do this and commit back to the repository because these are key files to the project itself.  Therefore the Web IDEs can be used to clean up a package and make it ready for complete deletion.

 

Core Data Services (CDS) / HDBDD

Core data services (CDS) is a new infrastructure for defining and consuming semantically rich data models in SAP HANA. Using a a data definition language (DDL), a query language (QL), and an expression language (EL), CDS is envisioned to encompass write operations, transaction semantics, constraints, and more.

 

A first step toward this ultimate vision for CDS is the introduction of the hdbdd development object in SPS6. This new development object utilizes the Data Definition Language of CDS to define tables and structures. It can therefore be consider an alternative to hdbtable and hdbstructure. In the following video we explorer the creation of several tables within a single hdbdd design time artifact. 

 

 

 

Server Side JavaScript Outbound Connectivity

 

Server Side JavaScript (XSJS) was already the cornerstone of creating custom services within HANA to expose data to the outside world. In SPS6, the primary programming model of the HANA Extended Application Services gets outbound connectivity support as well.  Now developers can  call out of the HANA system and consume services via HTTP/HTTPS from other systems. This can be a way of combining real-time data from multiple sources or gathering data to store into SAP HANA.

 

In the following video we show two scenarios - calling to an ABAP system to get additional user details and consuming a Library of Congress image search services.  These are just two simple examples of the power of this new capability.

 

 

 

XSODATA Create/Update/Delete

 

Similar to the role of XSJS in SPS5, XSODATA services are already established as the quickest and easiest way to generate read-only REST services from existing tables and views within HANA. In SPS6, those capabilities are complemented with new support for Create, Update, and Delete operations as well.  No longer are you required to create a custom service in XSJS if you only need to add simple update capabilities to your service. Furthermore, if you need some simple validation or other update logic; it can be coded in a SQLScript exit mechanism from within the generic XSODATA service framework.

 

This video demonstrates these new Create/Update/Delete capabilities, a useful test tool for Chrome called Postman, and even how to use the SQLScript extension technique.

 

 

 

Source Code

 

Here are the various source code segments if you wish to study them further

 

The USER.hdbdd CDS example:

namespace sp6.data;

@Schema: 'SP6DEMO'
context USER {
           type SString : String(40);           type LString : String(255);   @Catalog.tableType: #COLUMN     Entity Details {               key PERS_NO: String(10);               FIRSTNAME: SString;               LASTNAME: SString;               E_MAIL: LString;     }; 
 };

 

 

The popUsers.xsjs:

function insertRecord(userDet){          var conn = $.db.getConnection();          var pstmt;          var query =           'UPSERT "sp6.data::USER.Details" ' +            'VALUES(?, ?, ?, ?) ' +             'WHERE PERS_NO   = ? ';          pstmt = conn.prepareStatement(query);          pstmt.setString(1, userDet.ITAB.PERS_NO);          pstmt.setString(2, userDet.ITAB.FIRSTNAME);           pstmt.setString(3, userDet.ITAB.LASTNAME);          pstmt.setString(4, userDet.ITAB.E_MAIL);           pstmt.setString(5, userDet.ITAB.PERS_NO);           pstmt.executeUpdate();          pstmt.close();          conn.commit();          conn.close();
}


function populateUserDetails(){
          var user = $.request.parameters.get("username");          var dest = $.net.http.readDestination("sp6.services", "user");          var client = new $.net.http.Client();          var req = new $.web.WebRequest($.net.http.GET, user);          client.request(req, dest);          var response = client.getResponse();          var body;          if(response.body){body = response.body.asString(); }          $.response.status = response.status;          $.response.contentType = "application/json";          if(response.status === $.net.http.INTERNAL_SERVER_ERROR){                    var error = JSON.parse(body);                    $.response.setBody(error.ITAB[0].MESSAGE);          }          else{                    var userDet = JSON.parse(body);                    insertRecord(userDet);                     $.response.setBody('User ' + userDet.ITAB.FULLNAME + ' Personel Number: ' + userDet.ITAB.PERS_NO + ' has been saved' );          } 
}
populateUserDetails();

 

The searchImages.xsjs:

function searchImages(){          var search = $.request.parameters.get("search");          var index = $.request.parameters.get("index");          if(index === undefined){                    index = 0;          }          var dest = $.net.http.readDestination("sp6.services", "images");          var client = new $.net.http.Client();          var req = new $.web.WebRequest($.net.http.GET, search);          client.request(req, dest);          var response = client.getResponse();  var body;          if(response.body){body = response.body.asString(); }          $.response.status = response.status;          if(response.status === $.net.http.INTERNAL_SERVER_ERROR){                    $.response.contentType = "application/json";                     $.response.setBody('body');          }          else{                    $.response.contentType = "text/html";                    var searchDet = JSON.parse(body);                    var outBody =                               'First Result of ' + searchDet.search.hits + '</br>'+                              '<img src="' + searchDet.results[index].image.full + '">';                    $.response.setBody( outBody );          } 
}
searchImages();

 

The user.xsodata:

service namespace "sp6.services" {   "sp6.data::USER.Details" as "Users"      create using "sp6.procedures::usersCreateMethod";
}

 

The usersCreateMethod.procedure:

CREATE PROCEDURE _SYS_BIC.usersCreateMethod(IN row SYSTEM."sp6.data::USER.Details", OUT error tt_error)          LANGUAGE SQLSCRIPT          SQL SECURITY INVOKER AS 
BEGIN
/*****************************           Write your procedure logic 
 *****************************/

declare lv_pers_no string;
declare lv_firstname string;
declare lv_lastname string;
declare lv_e_mail string;


select PERS_NO, FIRSTNAME, LASTNAME, E_MAIL 
     into lv_pers_no, lv_firstname,           lv_lastname, lv_e_mail                      from :row;


if :lv_e_mail = ' ' then
  error = select 400 as http_status_code,                'invalid email' as error_message,                      'No Way! E-Mail field can not be empty' as detail from dummy;
else  insert into "sp6.data::USER.Details"              values (lv_pers_no, lv_firstname,                      lv_lastname, lv_e_mail);
end if;

END;

 

ABAP Service Implementation:

TRY.     DATA(lr_writer) = cl_sxml_string_writer=>create(         type = if_sxml=>co_xt_json ).     DATA(ls_address) = zcl_user=>get_details( iv_username = to_upper( me->username ) ).     CALL TRANSFORMATION id SOURCE itab = ls_address                                RESULT XML lr_writer.     lv_json = cl_abap_codepage=>convert_from( CAST cl_sxml_string_writer( lr_writer )->get_output( ) ).   CATCH zcx_user_error INTO DATA(lx_user_error).     CALL TRANSFORMATION id SOURCE itab = lx_user_error->return                              RESULT XML lr_writer.     lv_json = cl_abap_codepage=>convert_from( CAST cl_sxml_string_writer( lr_writer )->get_output( ) ).     response->set_status( code = 500                           reason = CONV #( lx_user_error->return[ 1 ]-message ) ).
ENDTRY.

 

ABAP Class ZCL_USER:

class ZCL_USER definition  public  create public .  public section.    class-methods GET_DETAILS    importing      !IV_USERNAME type BAPIBNAME-BAPIBNAME    returning      value(RS_ADDRESS) type BAPIADDR3    raising      ZCX_USER_ERROR .
protected section.
private section.
ENDCLASS.

CLASS ZCL_USER IMPLEMENTATION.
  METHOD get_details.    DATA lt_return TYPE STANDARD TABLE OF bapiret2.    CALL FUNCTION 'BAPI_USER_GET_DETAIL'      EXPORTING        username = iv_username    " User Name      IMPORTING        address  = rs_address    " Address Data      TABLES        return   = lt_return.    " Return Structure    IF lt_return IS NOT INITIAL.      RAISE EXCEPTION TYPE zcx_user_error        EXPORTING          return = lt_return.    ENDIF.  ENDMETHOD.
ENDCLASS.

Importing Wikipedia Hive data into SAP HANA One

$
0
0

Welcome to part 3 in this series of demonstrating how to analyze Wikipedia page hit data using Hadoop with SAP HANA One and Hive. To see how we got here, check out the first blog post in the series at “Using SAP HANA to analyze Wikipedia data – Preparing the Data”. There are numerous ways to get data from AWS Elastic MapReduce (Hadoop) and Hive into Sap HANA One. SAP put together a video on using SAP BusinessObjects DataServices to import data from Hadoop into SAP HANA at http://www.youtube.com/watch?v=ls_MGp8R7Yk. The challenge with this approach is that Data Services is not easily available for AWS SAP HANA One users. Another way to import data from an active Hadoop cluster is through Apache Sqoop via a JDBC connection to your SAP HANA One database. With Sqoop, the operative word here is having an “active” Hadoop cluster. This means you need to keep your ASW EMR cluster active to run Sqoop. If you are going to keep an EMR cluster with nine m1.xlarge EC2 instances running at 48 cents per hour per instance, you are looking at approximately $100 per day to keep the cluster alive. The beauty of using AWS EMR with S3 storage is that you can access the data in the S3 bucket after terminating the EMR cluster.

 

In part 2 of this series “Reducing the amount of working data for SAP HANA to process using Amazon Elastic MapReduce and Hive”, I showed how you can use Hive to come up with a reasonable working set of data for SAP HANA One to process. The result was the creation of three files for each month stored in the “s3://wikipedia-pagecounts-hive-results/” bucket in the directories named “year=2013/month=03”, “year=2013/month=04” and “year=2013/month=05”.

 

In this blog post, I will show how to import the data generated into Hive using just the resources available to you on AWS and SAP HANA One using the components highlighted in the diagram below.

Blog 301.png

Note: This is the first time that I’m actually going to be using SAP HANA One. If you want to follow along, you will need an active SAP HANA One database up and running. You can get started by going to http://www.saphana.com/community/try and following the steps in the “Try SAP HANA One” box. I’m also assuming that you have downloaded SAP HANA Studio to either your local computer or to an EC2 instance on AWS. For instructions on configuring SAP HANA Studio, check out http://www.saphana.com/docs/DOC-2438. I prefer using a Windows Server 2012 Server EC2 instance that is at least an m1.xlarge instance. The reason that I like the xlarge instances is that they use a high network bandwidth connection that makes data operations within the same AWS data center fast – much faster than using a local machine and then having to move data locally and then back to AWS for processing.

 

Let’s get started!

 

Copy the S3 files to your SAP HANA One Linux server

If you think you are going to do lots of work with AWS S3 storage, I recommend you get one of the many available tools that are out there for working with S3 storage. The one I use is the S3 Browser by NetSDK Software. They have a freeware edition available at http://s3browser.com/. NetSDK also has a free trial of TntDrive that allows you to map an S3 bucket as a network drive at http://tntdrive.com/. If you have experience with other S3 browsers, please share your suggestions as a comment to the blog.

 

I’m going to provide instructions on how to download the files using the AWS S3 console.

 

Making a local copy of the Hive files to your computer

First, create a local directory that you will use to download the nine Hive files. I’m going to use C:\wiki-data for this example.

 

Sign into your AWS account and then navigate to Services -> S3. If you followed along, you will navigate to the unique S3 bucket name you created in part 2. For me, I’m using the “s3://wikipedia-pagecounts-hive-results/” bucket that I created in part 2. Then, navigate to the year=2013/month=03/ directory as shown below.

Blog 302.png

Next, select the first file named 000000 and then right click on the file and select the download command.

Blog 303.png

AWS displays a message box to download the file. You need to right click on the Download link and choose the Save link as… command in your browser.

Blog 304.png

Next, navigate to your C:\wiki-data directory and save the file with the name 2013-13-000000 without a file extension. Once the download is complete, click the OK button to dismiss the message box.

Blog 305.png

You then need to repeat this operation for the 000001 and 000002 files in the month=03\ folder. Then, do the same for the three files for the month=04\ and month=05\ directories.

 

NOTE: As a public service and for a limited time, I made the Hive files available for download using the following URLs:


http://wikipedia-pagecounts-hive-results.s3.amazonaws.com/year=2013/month=03/000000

http://wikipedia-pagecounts-hive-results.s3.amazonaws.com/year=2013/month=03/000001

http://wikipedia-pagecounts-hive-results.s3.amazonaws.com/year=2013/month=03/000002

http://wikipedia-pagecounts-hive-results.s3.amazonaws.com/year=2013/month=04/000000

http://wikipedia-pagecounts-hive-results.s3.amazonaws.com/year=2013/month=04/000001

http://wikipedia-pagecounts-hive-results.s3.amazonaws.com/year=2013/month=04/000002

http://wikipedia-pagecounts-hive-results.s3.amazonaws.com/year=2013/month=05/000000

http://wikipedia-pagecounts-hive-results.s3.amazonaws.com/year=2013/month=05/000001

http://wikipedia-pagecounts-hive-results.s3.amazonaws.com/year=2013/month=05/000002

 

Just click on the link to download the file and use the Save As command to save the files into your C:\wiki-data directory.

 

Copy the local files to your SAP HANA One instance

If you haven’t started you SAP HANA One instance, now would be a good time. J Navigate to your EC2 console, right click on your SAP HANA One instance and choose the Connect command.

Blog 306.png

For the user name, use root. For the Private key path, enter in the path name for your pem key file. Then, click on the Launch SSH Client button.

 

In the SSH client, create a directory called /wiki-data and then grant rights to the directory so that the HAHA instance can access it with the following Linux commands:

hana:/ # mkdir /wiki-data                                                      

hana:/ # chmod 777 wiki-data

 

In the AWS SSH client, go to the Plugins menu and choose the SFTP File Transfer … command.

Blog 307.png

On the left side of the dialog, navigate to your C:\wiki-data directory. Then, on the right side, click on the ChDir button, type in /wiki-data for the path and click OK. The dialog should look like the one shown below.

Blog 308.png

Then, multi-select all of the files on the left site and click the --> button in the middle of the dialog to start the transfer.

 

You should see the following dialog that shows the progress of the copy operation.

Blog 309.png

Again, this goes very fast if you used an EC2 instance on S3 as your client. You can see in the dialog above, that I got an upload speed of 509 kb/sec.

Blog 310.png

When using my xlarge EC2 instance with high network bandwidth, you can see the transfer rate was 37 MB/sec! Yes – 72 times faster!

 

Once your copy operation is complete, click the Close button to dismiss the SFTP dialog.

 

NOTE: I decided to copy the data to the Linux system like this for a couple of reasons. The first one is that SAP HANA Studio’s import command supports a limited number of characters like comma, colon and semi-colon as the delimited value for columns. Second, I want to show how you can use the IMPORT command to load data instead of using the HANA Studio import wizard.

 

Replacing the ^A field delimiters with a pipe "|" delimiter for importing into HANA

The IMPORT FROM command makes it easy to import text-delimited files with the CSV FILE option, but there is a catch. You can’t specify a non-printable character using the FIELD DELIMITED BY clause. Since the Hive output files use the non-printable ^A character, we need a way to transform the character into a printable character that doesn’t conflict with the Wikipedia data. I used the Linux grep command to verify that there are no pipe characters in the Hive files.

 

In order to convert the field delimited values, we are going to use the Linux sed command. Here is what the command looks like for performing the substitution:

 

sed 's/\x01/|/g' 2013-03-000000 > 2013-03-000000.csv


The s/ indicates the string expression to replace. In this case, the Linux represents the Ctrl-A character as \x01. The /| tells sed to replace the expression with the "|" character. The /g parameter tells sed to replace all instances in the stream. The last two parameters are input file and the output file. If you don’t specify the output file, sed makes the replacement within the input file.

 

To make the replacement, first change directory in the SSH client using the cd command below.

hana:~ # cd /wiki-data

 

You can then check to see that all the files are in place by using the ls command below.

 

hana:/wiki-data # ls -l

total 1798800

-rw-r--r-- 1 root root 256709072 Jun 30 21:34 2013-03-000000

-rw-r--r-- 1 root root 256564106 Jun 30 21:34 2013-03-000001

-rw-r--r-- 1 root root 155604882 Jun 30 21:34 2013-03-000002

-rw-r--r-- 1 root root 256407589 Jun 30 21:34 2013-04-000000

-rw-r--r-- 1 root root 256251290 Jun 30 21:34 2013-04-000001

-rw-r--r-- 1 root root 36089431 Jun 30 21:34 2013-04-000002

-rw-r--r-- 1 root root 256520276 Jun 30 21:34 2013-05-000000

-rw-r--r-- 1 root root 256179168 Jun 30 21:34 2013-05-000001

-rw-r--r-- 1 root root 109761031 Jun 30 21:34 2013-05-000002

 

Now, you are ready to issue the following nine sed commands:

 

sed 's/\x01/|/g' 2013-03-000000 > 2013-03-000000.csv

sed 's/\x01/|/g' 2013-03-000001 > 2013-03-000001.csv

sed 's/\x01/|/g' 2013-03-000002 > 2013-03-000002.csv

sed 's/\x01/|/g' 2013-04-000000 > 2013-04-000000.csv

sed 's/\x01/|/g' 2013-04-000001 > 2013-04-000001.csv

sed 's/\x01/|/g' 2013-04-000002 > 2013-04-000002.csv

sed 's/\x01/|/g' 2013-05-000000 > 2013-05-000000.csv

sed 's/\x01/|/g' 2013-05-000001 > 2013-05-000001.csv

sed 's/\x01/|/g' 2013-05-000002 > 2013-05-000002.csv

 

As a quick verification, you can use the tail command as shown below to see the last few lines of the file.

 

hana:/wiki-data # tail 2013-05-000002.csv

www.wd|Special:AutoLogin|2013|05|20|16|438|782851

zh.mw|zh|2013|05|20|16|13888|207241012

zh|%E6%88%91%E6%84%9B%E4%BD%A0|2013|05|20|16|132|48055

zh|%E7%99%BE%E5%BA%A6|2013|05|20|16|122|1679770

zh|%E9%80%B2%E6%93%8A%E7%9A%84%E5%B7%A8%E4%BA%BA|2013|05|20|16|391|14259683

zh|%E9%87%91%E9%99%B5%E5%8D%81%E4%B8%89%E9%92%97|2013|05|20|16|167|1894824

zh|File:Otto_Hahn_(Nobel).jpg|2013|05|20|16|217|2557464

zh|Special:Search|2013|05|20|16|229|683531

zh|Special:\xE9\x9A\x8F\xE6\x9C\xBA\xE9\xA1\xB5\xE9\x9D\xA2|2013|05|20|16|3099|2031107

zh|Wikipedia:%E9%A6%96%E9%A1%B5|2013|05|20|16|1530|29640480

 

The last step we want to perform is to use the following Linux wc – word count – command to see how many rows to expect for importing into HANA:

 

hana:/wiki-data # wc -l *.csv

   5177628 2013-03-000000.csv

   5109305 2013-03-000001.csv

   3173608 2013-03-000002.csv

   5176167 2013-04-000000.csv

   5080170 2013-04-000001.csv

   1500647 2013-04-000002.csv

   4755634 2013-05-000000.csv

   4868995 2013-05-000001.csv

   2169557 2013-05-000002.csv

  37011711 total


We can use the total number of lines to verify the total number of records imported into HANA.


Importing the delimited files into a staging table

It’s time now to import the delimited text files into a staging table. The reason that I use the term “staging table” is because we will eventually want to enhance our data mode so that we have meaningful values for language, project and the date values. The first step is to import the data as is into the staging table to validate the data. In the next blog for this series, I’ll show how to enhance the data model by creating a fact table out of the staging table and add the dimension tables.

 

I’m going to assume that you have already installed SAP HANA Studio and that you have connected it to your SAP HANA One database instance. If you haven’t done this before, follow the instructions in documented in the topic “Configuring HANA One and Installing SAP HANA Studio”.

 

Go ahead and start SAP HANA Studio. Then right click on your system and select the SQL Console command as shown below.

Blog 311.png

This command opens up a new query window that you can use to execute HANA SQL statements.

 

You can copy and paste the following commands into the query window:

 

-- Create the schema for the Wikipedia page hit data and related tables

CREATESCHEMA"WIKIDATA";

 

-- Create the COLUMN table for the staging data

CREATECOLUMNTABLE"WIKIDATA"."STAGE-PAGEHITS"

("PROJECTCODE"VARCHAR(50),

"PAGENAME"VARCHAR(2000),

"YEAR"VARCHAR(4),

"MONTH"VARCHAR(2),

"DAY"VARCHAR(2),

"HOUR"VARCHAR(2),

"PAGEHITCOUNTFORHOUR"BIGINT

"BYTESDOWNLOADEDFORHOUR"BIGINT

)

 

Press the F8 key to execute the two statements.

 

To import the first file into the staging table, paste the following statement at the end of the window.

 

-- Import the first delimited text file

IMPORTFROM CSV FILE '/wiki-data/2013-03-000000.csv'

INTO"WIKIDATA"."STAGE-PAGEHITS"

WITH RECORD DELIMITED BY'\n'

FIELD DELIMITED BY'|'

ERROR LOG'/wiki-data/import-2013-03-000000.err';

 

Next, select the six lines as shown above and then press the F8 – Execute – command to run just the selected IMPORT statement. NOTE: I recommend that you always use the ERROR LOG clause for IMPORT FROM statements so that you can capture any errors that may have occurred. Check out my blog post titled “A simple rule to live by when using the SAP HANA IMPORT FROM command - Don't forget the ERROR LOG clause” for the full story. We are about to find out why.

 

Go ahead and run the following select statement in the query editor to get the count of imported records.

 

SELECTCOUNT(*) FROM"WIKIDATA"."STAGE-PAGEHITS";

 

You should come back with a value of 5,177,628, but the result from the query was 5,177,032! You should ask – what happened? Time to switch over to the SSH client and use the SFTP File Transfer plugin to copy the “/wiki-data/import-2013-03-000000.err” file onto your local computer’s c:\wiki-data directory as shown below.

Blog 312.png

In SAP HANA Studio, File | Open File command to open up .err file.

Blog 313.png

If you look at the results, you will notice that we got multiple errors as shown below.

Blog 314.png

It turns out there are records with no value for the BTYESDOWNLOADEDFORHOUR column. The \N in the example above is the special Hive notation for a NULL value. We have a couple of options to deal with this problem. First, we can simply ignore these records. Second, we could go back to Hive and create two output results. One Hive INSERT statement would add the clause “AND bytesperhour IS NOT NULL” and the other would have “AND bytesperhour IS NULL”. The third option is to use a VARCHAR(50) to import the value as text and later convert the value to BIGINT value for the actual fact table.  I’ll take the third approach as I believe that no record should be left behind.

 

Go back to the query window in HANA Studio and run the following drop table statement:

Blog 315.png

Notice that I placed the command before the CREATE table statement. This is to prevent dropping the table in the event I forget to select the statement to execute in the query window.

 

Now, change the CREATE TABLE statement to use VARCHAR(50) for the BYTESDOWNLOADEDFORHOUR column. Then, select the entire CREATE TABLE statement and press F8.

 

Next, select the IMPORT FROM and the SELECT COUNT(*) commands and press F8.

 

Success! You should now see the correct result of records as shown below.

 

It’s now time to use the following commands to import the remaining 8 files:

-- Import the remaining files

IMPORTFROM CSV FILE '/wiki-data/2013-03-000001.csv'INTO"WIKIDATA"."STAGE-PAGEHITS"WITH RECORD DELIMITED BY'\n' FIELD DELIMITED BY'|' ERROR LOG'/wiki-data/import-2013-03-000001.err';

IMPORTFROM CSV FILE '/wiki-data/2013-03-000002.csv'INTO"WIKIDATA"."STAGE-PAGEHITS"WITH RECORD DELIMITED BY'\n' FIELD DELIMITED BY'|' ERROR LOG'/wiki-data/import-2013-03-000002.err';

IMPORTFROM CSV FILE '/wiki-data/2013-04-000000.csv'INTO"WIKIDATA"."STAGE-PAGEHITS"WITH RECORD DELIMITED BY'\n' FIELD DELIMITED BY'|' ERROR LOG'/wiki-data/import-2013-04-000000.err';

IMPORTFROM CSV FILE '/wiki-data/2013-04-000001.csv'INTO"WIKIDATA"."STAGE-PAGEHITS"WITH RECORD DELIMITED BY'\n' FIELD DELIMITED BY'|' ERROR LOG'/wiki-data/import-2013-04-000001.err';

IMPORTFROM CSV FILE '/wiki-data/2013-04-000002.csv'INTO"WIKIDATA"."STAGE-PAGEHITS"WITH RECORD DELIMITED BY'\n' FIELD DELIMITED BY'|' ERROR LOG'/wiki-data/import-2013-04-000002.err';

IMPORTFROM CSV FILE '/wiki-data/2013-05-000000.csv'INTO"WIKIDATA"."STAGE-PAGEHITS"WITH RECORD DELIMITED BY'\n' FIELD DELIMITED BY'|' ERROR LOG'/wiki-data/import-2013-05-000000.err';

IMPORTFROM CSV FILE '/wiki-data/2013-05-000001.csv'INTO"WIKIDATA"."STAGE-PAGEHITS"WITH RECORD DELIMITED BY'\n' FIELD DELIMITED BY'|' ERROR LOG'/wiki-data/import-2013-05-000001.err';

IMPORTFROM CSV FILE '/wiki-data/2013-05-000002.csv'INTO"WIKIDATA"."STAGE-PAGEHITS"WITH RECORD DELIMITED BY'\n' FIELD DELIMITED BY'|' ERROR LOG'/wiki-data/import-2013-05-000002.err';

 

SELECTCOUNT(*) FROM"WIKIDATA"."STAGE-PAGEHITS";

 

Success again! The total COUNT(*) value was 37,011,711 and this matches the number of lines for all 9 CSV files.

Blog 317.png

You can also see below, that each of the .err files has a zero length indicating no errors.

Blog 318.png

Exploring the data in HANA Studio

The Open Data Preview command in HANA Studio provides a great way to make sure the data is consistent with how you think it should look. To launch the command, go to the navigation pane, expand out the WIKIDATA schema, navigate to Tables, right click on the STAGE-PAGEHITS table and choose the Open Data Preview Command as shown below.

Blog 319.png

HANA Studio displays the first 200 rows in the table in no particular order. Here are just a few of the rows you might see.

Blog 320.png

You can use the Analysis tab to do basic charting of data in the table. Let’s say we want to see how many rows there are per month. To try it out, drag the MONTH column into the Label axis region and then drag the DAY column into the Values axis. You should see something like this.

Blog 321.png

Now, to see the page hits per month, click on the little X next to DAY (Count) to remove the field and then drag the PAGEHITCOUNTFORHOUR field into the Values axis.

Blog 322.png

It’s very interesting that March has almost twice the page hits as April and May.  I'll need to do some digging around in the next blog post to see if this is a data error or just a lot of people wanting to research their favorite sports team in the month of March.

 

At this point, we have a staging table created. In the next blog post for this series, I will show how to enhance the data model to create a page hit fact table and add dimension tables for dates, project codes and language codes.

 

I hope you have enjoyed the series so far.

Debugging SQLScript Procedures with Input Parameters

$
0
0

Back in December, I introduced the new SQLScript debugger in SAP HANA 1.0 SP5.  In that blog, I mentioned the limitation about not having the ability to debug procedures with input parameters.   Today, I’m glad to announce that as of SAP HANA 1.0 SP6(Rev 60), there is no longer this limitation.  Now we have the ability to define the values of the input parameters directly in the debug configuration.   Below is the new debug configuration screen.  You will notice that you can specify the name of the procedure to debug.

 

1.png

 

When you click the “Input Parameters” tab, you can see all input parameters associated with that procedure.  In this case, there are two, IM_VAL1 and IM_VAL2.   Here I can set the values for each input parameter.

 

2.png

 

Once the debug session is initiated, you can see the input parameter values in the “Variables” tab. You can now debug as normal.  I think you would agree that this is a bit nicer than having to create a wrapper procedure and hard code the parameter values in order to debug your procedures.

 

3.png

 

Check out the video demonstration on the SAP HANA Academy.

SQLScript Procedure Templates in SAP HANA

$
0
0

One of the new features in SAP HANA 1.0 SP6(Rev 60) is the ability to create procedures based on a procedure template.  Procedure templates allow you to create procedures with a specific interface(input and output parameters) but can contain generic coding leveraging placeholders or template parameters.  Currently only a subset of these placeholders can be used.  For example, you can create template parameters for a schema value, a column field name, a table or view name, or a name of a procedure.  In order to create a procedure template from the HANA studio, choose “New” then “File”.

 

1.png

 

In the following dialog, enter the name of the procedure template and add the file extension .proceduretemplate.

2.png

The procedure template editor allows you to define the template parameters, as well as the template script. In this example, I am creating a template which simply gets the number of rows from a table.  The table name will be inserted from the template parameter called “table”.  You will notice that I reference this parameter in my code by using angle brackets(< >).  You can give any name to the parameter as long as  you reference it with the same exact name and wrapped in these brackets. Again, you can only use these parameters in certain situations, like when specifying a schema, column field name, table name, or  procedure name. 

 

3.png

 

Now that you have a procedure template, you can create a procedure based on that template.  You can do this from the new procedure wizard which has been introduced in SP6 as well.  From your project, choose “New”, then “Other”.  In the SAP HANA Development folder, you will see an artifact called SQLScript Procedure. Choose this and click “Next”.

 

4.png

 

Enter the name of the procedure.  There is no need to type the .procedure file extension here. The wizard will add it for you automatically when you navigate out of this field.  Click the “Advanced” button.  Here you can specify the name of the procedure template which you would like to use to create your procedure from.

 

5.png

 

The procedure editor will allow you to define the values for the procedure templates. In this example, I am simply specifying the products table. 

 

6.png

 

The runtime object which is generated in the _SYS_BIC schema will have the source code from the template and the values for the template parameters inserted accordingly.   If you were to change the template at any point, all procedures created based on this template would be updated and activated automatically.

 

7.png

 

Of course we can call this procedure from the SQL Console and the result set, which is the count from the products table, is shown.

 

8.png

 

So this feature has been introduced to help developers become more efficient, and have less redundancy in their coding by using templates to create procedures with very similar structures in both the interface as well as the code itself.  Check out the video demonstration on the SAP HANA Academy.


SQLScript Debugging from External Session

$
0
0

With the introduction of Suite on HANA, ABAP developers will need to branch out a bit more and extend their skills into the database.  We’ve discussed the concept of code push down for quite some time now, and if it hasn’t sunk in yet, let me remind you again with this blog.  ABAP based applications can run faster on HANA without modification as long as the bottleneck is not the application server itself.  If your ABAP program is simply bringing massive amounts of data from HANA into the ABAP application server, and you are doing a LOOP over that data and doing some calculation, then you are not really using the full potential of HANA.  We need to take that data intensive logic and rewrite it in SQLScript and push that logic into the database for faster processing, leveraging the massive parallel processing capabilities of HANA.  So if we are integrating SQLScript procedures into our ABAP applications running on top of HANA, of course we need to be able to debug the procedures in-line with our ABAP applications.  As of HANA 1.0 SP06(Rev 60), we now support the concept of external session debugging. In this blog, I will concentrate on debugging from ABAP, but the external session debugging is not specific to ABAP, and can be used to debug from other sources as well.  

In this example, I have a procedure called GET_PRODUCTS which accepts an input parameter called IM_PRODUCT which is used as a filter value, and returns one export parameter called EX_PRODUCTS.  The code of this procedure is rather simple, it reads the “Products” table based on the “Product Id” filter and returns a list of products.

 

CREATEPROCEDUREget_products ( inim_productnvarchar(20),

                                outex_productstt_products_list )

       LANGUAGESQLSCRIPT

       SQLSECURITYINVOKER

       READSSQLDATAAS    

BEGIN

/*****************************

       Write your procedure logic

*****************************/

 

ex_products = select"ProductId", "Category", "Price"

                 from"SAP_HANA_EPM_DEMO"."sap.hana.democontent.epm.data::products"

                       where"ProductId"likeim_product

 

END;

 

There are two ways to call a procedure from ABAP.  You can use ADBC(ABAP Database Connectivity) to call your procedure or use the new “Database Procedure Proxy” method which was introduced in NetWeaver 7.40.  Debugging  from ABAP into SQLScript works the same way whether you call it via ADBC, or use a proxy. For this example, I have created a Database Procedure Proxy called ZGET_PRODUCTS for this procedure. 

 

1.png

I have also created an ABAP program which calls this “Database Procedure Proxy”.   Here I am passing P_PROD which is a PARAMETER and has the product filter value. LT_PRODUCTS is an internal table defined with the same structure as the EX_PRODUCTS exporting parameter.

 

2.png

 

Now that I have all of the components of my application, I can start debugging from my ABAP program straight into the SQLScript procedure in HANA.  In the SQLScript Debug configuration screen, I can set the external session parameters.  In this dialog is where you set the HANA server which is sitting under your ABAP system. The “HANA User” value would be the SAP<sid> user id which ABAP uses to talk to the database.  The “Application User” would be the user id which the developer uses to log on to the ABAP system.

 

3.png

 

Once the configuration is complete, click the “Debug” button.  Next, I can set the breakpoints in the procedure and switch to the “Debug” perspective.  Notice there is a debug session running which is simply waiting for ABAP.

 

4.png

 

Next, I return to my ABAP program and set a breakpoint on the CALL DATABASE PROCEDURE statement and run the program.  Once I execute the CALL DATABASE PROCEDURE statement from the debugger, control is then passed to the SQLScript debugger in HANA Studio.  You can see that the input parameter value has been passed to the procedure and you can see this value in the “Variables” tab of the SQLScript debugger. From here I can debug the procedure as normal, and when complete, control is passed back to the ABAP application.

 

5.png

 

Check out the video demonstration on the SAP HANA Academy.

Fix for "Database connection is not available" Error in HANA SPS05 (Rev 56)

$
0
0

Background:

 

I had to upgrade the HANA One (Developer Edition) in AWS from Rev 48 to Rev 56 as part of the openSAP Introduction to Software Development on SAP HANA class.

 

Rev 48 client setup:

 

My Win7, C:\Windows\system32\drivers\etc\hosts file had the entry, aa.bb.cc.dd     imdbhdb, where aa.bb.cc.dd is the Elastic IP of the HANA Instance in AWS. 

 

I added the system in HANA studio using the entries below:

     Hostname: imdbhdb

     Instance Number: 00

     DB User Name: SYSTEM

     Password: <your password>   

 

Rev 56 Error:

 

After the upgrade to Rev 56, the above setup resulted in the Error "Database connection is not available" within a few minutes after adding the system in the HANA Studio.

 

Problem Analysis:

 

I activated the JDBC trace as below for Rev 56 (See http://help.sap.com/hana_platform SAP HANA Developer's Guide):

 

7-5-2013 1-44-09 PM.jpg

 

 

JDBC would initially connect to the HANA server:

 

new Connection 'jdbc:sap://imdbhdb:30015'
locale=en_US
user=SYSTEM
password=***
timeout=0
reconnect=true
validateCertificate=false
encrypt=false
HOSTLIST: [imdbhdb:30015,]

 

After a few minutes, the JDBC connect string would be replaced by the internal IP of the AWS HANA server and the connection would fail: 
new Connection 'jdbc:sap://10.29.1.187:30015'
locale=en_US
user=SYSTEM
password=***
timeout=0
reconnect=true
validateCertificate=false
encrypt=false
HOSTLIST: [10.29.1.187:30015,]
new RTEException: -813 Cannot connect to host 10.29.1.187:30015 [Connection timed out: connect], -813.
whereAmIjava.lang.Throwable
at com.sap.db.util.Tracer.whereAmI(Tracer.java:348)
at com.sap.db.rte.comm.RTEException.<init>(RTEException.java:66)
at com.sap.db.rte.comm.SocketComm.openSocket(SocketComm.java:125)
at com.sap.db.rte.comm.SocketComm.<init>(SocketComm.java:58)
at com.sap.db.rte.comm.SocketComm$1.open(SocketComm.java:42)
at com.sap.db.jdbc.topology.Topology.getSession(Topology.java:145)
at com.sap.db.jdbc.Driver.openByURL(Driver.java:1016)
at com.sap.db.jdbc.Driver.connect(Driver.java:230)
at com.sap.ndb.studio.jdbc.JDBCPlugin$3.run(JDBCPlugin.java:642)
using null
=> FAILED

 

 

 

Rev 48 JDBC Connect

Rev 56 JDBC Connect

First three Connection attempts: new Connection 'jdbc:sap://imdbhdb:30015'

HOSTLIST: [imdbhdb:30015,]

Subsequent Connection attempts:

new Connection 'jdbc:sap://imdbhdb:30015'

HOSTLIST:

[imdbhdb:30015,imdbhdb.sapcoe.sap.com:30015,
10.30.128.6:30015,]

First three Connection attempts: new Connection 'jdbc:sap://imdbhdb:30015' HOSTLIST: [imdbhdb:30015,]

Subsequent Connection attempts:

new Connection 'jdbc:sap://10.29.1.187:30015'

HOSTLIST: [10.29.1.187:30015,]

I looked at some of the SQL commands in the JDBC trace files:
SELECT "HOST","PORT",
"SERVICE_NAME","ACTIVE_STATUS","PROCESS_ID",
"COORDINATOR_TYPE","SQL_PORT" FROM SYS.M_SERVICES

Rev 48 Output

7-5-2013 2-30-05 PM.jpg

Rev 56 Output

7-5-2013 2-31-41 PM.jpg

SELECT "HOST","KEY","VALUE" FROM SYS.M_HOST_INFORMATION WHERE
UPPER("HOST") = 'IMDBHDB' AND (UPPER("KEY") = 'SID' OR UPPER("KEY") = 'SAPSYSTEM'

Rev 48 Output

Rev 56 Output

7-5-2013 2-40-33 PM.jpg

7-5-2013 2-42-06 PM.jpg

SELECT "HOST","KEY","VALUE" FROM SYS.M_HOST_INFORMATION WHERE UPPER("KEY") LIKE 'NET_%'

Rev 48 Output

Rev 56 Output

7-5-2013 2-19-42 PM.jpg

7-5-2013 2-24-45 PM.jpg

 

Workaround:

 

#1) Someone in openSAP forum suggested using hanaserver in the hosts file entry and in adding the system in HANA studio. While this worked, I did not like this option since hanaserver has no relationship to the actual hostname -- imdbhdb.

 

#2) I have been suggesting to folks -- in the openSAP forum and in the AWS upgrade blog (http://scn.sap.com/docs/DOC-30980) -- to use AWS Elastic IP  for hostname in HANA studio while adding the system and to NOT add any entry in the hosts file. While this might be the lesser of the two evils, both options had the yellow-icon problem for the sapstartsrv process in the HANA studio.The SAP HANA Systems (Navigator) tab/view would also say "Some services are not started" (with an yellow icon) or "System state cannot be determined" (with a gray icon) for the system.

 

Fix:

 

When you compared the outputs of the SQL commands between Rev 48 and Rev 56 listed above, the only difference is the new key net_publicname added in Rev 56 in the View SYS.M_HOST_INFORMATION.

 

I had to wait until SPS06 documents were published to see what this new key was.

 

"Public host name that should be used by client interfaces. Can contain a host name, FQDN or IP address". (See http://help.sap.com/hana_platform SAP HANA System Views Reference)

 

The public_hostname_resolution parameter is documented in the SAP HANA Administration Guide (http://help.sap.com/hana_platform). This is a new parameter that has obviously been introduced after Rev 48. The values for this parameter are no,ip,name,fqdn with ip being the default in Rev 56. That is probably why the net_publichostname was set to the internal IP of the AWS HANA Instance. This raises another question as to why this was not set to the public IP (aka AWS Elastic IP) of the AWS Instance. The prudent thing at this point seems to be to disable this feature for the aforementioned reason and for backward compatibility.

 

Change this parameter value to no (disable feature and use internal hostname) when you are in HANA Studio (configured using the workaround #2 above).

 

7-5-2013 3-39-09 PM.jpg

 

7-5-2013 7-51-39 PM.jpg

 

7-5-2013 7-54-42 PM.jpg

 

The changed parameters are stored at the OS level as shown below.

7-5-2013 8-02-14 PM.jpg

 

The original *.ini files are at the OS level as shown below.

7-5-2013 4-53-55 PM.jpg

 

Let us run the SQL command now.

 

SELECT "HOST","KEY","VALUE" FROM SYS.M_HOST_INFORMATION WHERE UPPER("KEY") LIKE 'NET_%'

 

7-5-2013 3-51-41 PM.jpg

Now the net_publicname is set to imdbhdb.

 

Delete the system from the HANA studio, add the AWS ELASTIC IP-to-host (imdbhdb) mapping in the hosts file, and re-add the system in the HANA studio using the hostname ibmdbhdb.

 

7-5-2013 4-05-15 PM.jpg

 

Now the sapstartsrv process has the green icon as above.

 

Let us compare the JDBC Connections.

 

Rev 48 JDBC Connect

Rev 56 JDBC Connect

First three Connection attempts: new Connection 'jdbc:sap://imdbhdb:30015' HOSTLIST: [imdbhdb:30015,]

Subsequent Connection attempts:

new Connection 'jdbc:sap://imdbhdb:30015'HOSTLIST:

[imdbhdb:30015,imdbhdb.sapcoe.
sap.com:30015,10.30.128.6:30015,]

First three Connection attempts: new Connection 'jdbc:sap://imdbhdb:30015'

HOSTLIST: [imdbhdb:30015,]

Subsequent Connection attempts: new Connection 'jdbc:sap://imdbhdb:30015'

HOSTLIST:

[imdbhdb:30015,imdbhdb.sapcoe.sap.com:30015,
10.29.1.187:30015,]

 

You are done.

 

[Please note that I have tested this fix only in AWS. I am not sure if Rev 56 systems hosted by other HANA Cloud hosting providers like CloudShare would have the same issue or if the fix would work if they did.]

Enhancing the Wikipedia data model using fact and dimension tables with SAP HANA

$
0
0

Welcome to part 4 in this series of demonstrating how to analyze Wikipedia page hit data using Hadoop with SAP HANA One and Hive. In part 3 of this series “Importing Wikipedia Hive data into SAP HANA One”, I discovered an anomaly in the March 2013 data. I will show how I used HANA Studio’s data preview feature to isolate the problem and then fix our resulting fact table. Next, I will show how to create dimension tables to add “color” to the data model so that we are not looking at coded values in our analysis of the data. The resulting data model will be the foundation for creating an Analytic View that I will cover in part 5. To put this blog into perspective, I will focus SAP HANA database and SAP HANA Studio components shown below.

01 Diagram.png                   

Let’s get started!

 

Cleaning data before adding to a fact table

If you have been following along, open up SAP HANA Studio and make the connection to your HANA database. In the last blog post, I showed how to use the Data Preview feature in HANA Studio to look at the number of page hits per month by going to the object browser and navigating to the WIKIDATA schema -> Tables->STAGE-PAGEHITS and right clicking on the table and selecting the Open Data Preview command. Then, click on the Analysis tab. Drag the MONTH column into the Label axis region and then drag the PAGEHITCOUNTFORHOUR column into the Values axis to get a chart as shown below.

02 Odd March.png 

It looked odd that March has such a large number of page hits, so it’s time to dig in a little deeper. I am going to drag the DAY column into the Label axis region as well to look for the anomalies in March.

03 adding day.png 

Clicking on the two abnormally long bars reveal that these correspond to Day=01 in March with pagecount of 14,830,164,116 and Day=05 in March with pagecount of 15,257,088,562. I am going to add the HOUR statistics as well by dragging it to the Labels axis. Looks like there are a few hours when something odd occurred. 

04 adding hour.png 

To see the records in question, switch to the Raw Data tab. Click on the PAGEHITCOUNTFORHOUR column name twice to sort it in descending order.

05 display data.png

You can see the four records with zh code with a large number of pagecounts. zh is the code for Chinese language and these rows have most likely resulted from a denial of service (DoS) attack on Wikipedia. This would not be the first time – see “Technical glitch causes Wikipedia outage”. In addition, the values for pages QQ and ICQ: related to an instant messaging client, so one has to wonder if the attack was made through this client. Another interesting record is the large number of hits on the de.d – 0152 page. It turns out that this page in the German Wiktionary does not exist – could be another DoS attempt.  When it comes time to move these into the staging table, I’ll delete them since they are clearly a problem.

 

Checking for duplicate rows in the staging table

Initially, I thought that the large number of records was due to duplicate rows. This is because when I attempted to load the data into HANA in part three, the error list looked like this:

06 errors importing.png 

Notice that the duplicate year, month, day and hour for the first two errors. These could add up. So, I came up with the following query to look at the number of duplicate records in the month of March.

SELECT

"PROJECTCODE","PAGENAME",SUM("PAGEHITCOUNTFORHOUR"),COUNT("PAGENAME"),"YEAR", "MONTH","DAY","HOUR"

FROM

"WIKIDATA"."STAGE-PAGEHITS"

WHERE"MONTH"='03'

GROUPBY"PROJECTCODE","PAGENAME","YEAR","MONTH","DAY","HOUR"

HAVING (COUNT("PROJECTCODE")>1) and (COUNT("PAGENAME")>1) and (COUNT("YEAR")>1)

and (COUNT("MONTH")>1) and (COUNT("DAY")>1) and (COUNT("HOUR")>1)

ORDERBY 3 DESC;

By using a combination of the GROUP BY clause on the fields that I want to compare for duplicates and the HAVING clause looking for a COUNT(field) > 1 on each of the fields, we can identify duplicate records as shown below.

07 check dups.png 

As you can see, there are some relatively large values that could skew the results. The problem is, I don’t know if these are by design or just a bug with the way Wikipedia posts the data.

 

Creating the Fact Table

Without going into a lot of detail, the fact table contains the specific data we want to track (measurements, metrics and facts of a business process) and foreign keys that form relationships to dimension tables that contain descriptive attributes. To learn more about how fact and dimension tables work together, check out the Data Warehouse topic on Wikipedia.

 

NOTE: For data warehouse experts, please show a little mercy when looking at resulting design. I have intentionally bypassed creating surrogate keys in favor of using natural keys to keep things simple. Given the repetitive nature of the data, I would expect even better compression of the data if I separated out the PAGENAME into a dimension table and used a surrogate key in the fact table.

 

In order to create the fact table from the staging table, we will need to do the following high-level tasks:

  1. Create the fact table
  2. Develop a SELECT statement that splits the language and Wikimedia project codes contained in the PROJECTCODE and concatenates the Year, Month and Day values to create a natural key column used to form a relationship to a date dimension table and inserts the data into the fact table
  3. Remove the dirty records from the fact table. NOTE: Normally I would delete the dirty records from the staging table, but I wanted to save time in preparing this article.

Create the fact table

It turns out that SAP HANA COLUMN tables are ideal for fact tables. This is because much of the data ends is repeated and HANA compressions the repeated values with several different encoding methods. Use the following statement to create the fact table:

CREATECOLUMNTABLE"WIKIDATA"."PAGEHITSFACTTABLE"

("PAGENAME"VARCHAR(2000),

"YEAR"VARCHAR(4),

"MONTH"VARCHAR(2),

"DAY"VARCHAR(2),

"HOUR"VARCHAR(2),

"PAGEHITCOUNTFORHOUR"BIGINT

"BYTESDOWNLOADEDFORHOUR"VARCHAR(50),

"EVENTDATE"VARCHAR(8),

"LANGCODE"VARCHAR(50),

"PROJCODE"VARCHAR(50)

);

 

Create computed columns

Next, I need to split the PROJECTCODE into two separate values. If the column has a '.' (period), then the value has a combination of a language value and wiki-project value, otherwise it’s the language code for the main Wikipedia project. I will use the code 'wp' for Wikipedia values in the fact table as a lookup value to the projects dimension table. Here is a test query to make sure I got the logic right.

 

SELECTDISTINCT"PROJECTCODE",

CaseWhen ("PROJECTCODE"like'%.%')

     ThenLOCATE("PROJECTCODE",'.')-1

     Else 0

EndAS"POSITION",

CaseWhen ("PROJECTCODE"like'%.%')

     ThenSubstring("PROJECTCODE", 1, LOCATE("PROJECTCODE",'.')-1)

     Else"PROJECTCODE"

EndAS LANGCODE,

CaseWhen ("PROJECTCODE"like'%.%')

     ThenSubstring("PROJECTCODE", LOCATE("PROJECTCODE",'.')+1)

     Else'wp'

EndAS PROJCODE

FROM"WIKIDATA"."STAGE-PAGEHITS"

WHEREMonth = '03'ANDDay = '01'ANDHour = '00'

LIMIT 100;

 

The first Case When clause uses the like operator ("PROJECTCODE"like'%.%') to see if there is a period for the column value. If this evaluates to non-null expression, then I use the locate function LOCATE("PROJECTCODE",'.') to find the position of the period. By subtracting 1 from the value, I end up with the length of the language value. If the period was not located in the value, then I return a 0. Finally, I alias the resulting column value as POSITION. NOTE: I will not use this debug value in the fact table query.

 

The second Case When clause performs the same CaseWhen ("PROJECTCODE"like'%.%') check, but this time use the substring function Substring("PROJECTCODE", 1, LOCATE("PROJECTCODE",'.')-1) to extract the language code. If there was no period, the Else clause Else"PROJECTCODE" returns the column value which is just the language code.

 

The third Case When clause extracts the project code located just after the period using the substring statement ThenSubstring("PROJECTCODE", LOCATE("PROJECTCODE",'.')+1). If the period not in the value, the Else clause returns value of 'wp' for denoting the Wikipedia project.

 

NOTE: I added a WHERE clause to limit the data that HANA has to process. Here is the result:

08 splitting the projectcode.png

 

 

To concatenate text from YEAR, MONTH, DAY columns into a new column called EVENTDATE I will use the concatenate || operator "YEAR"||"MONTH"||"DAY"AS EVENTDATE. The EVENTDATE column will be used later on to lookup the data in the date dimension table I’ll create later on.

 

Load the data into the fact table

The following SELECT … INTO statement copies the data from the staging table into the fact table with the project and language values split out and with the new EVENTDATE column.

SELECT"PAGENAME","YEAR","MONTH","DAY","HOUR",

       "PAGEHITCOUNTFORHOUR", "BYTESDOWNLOADEDFORHOUR",

       "YEAR"||"MONTH"||"DAY"AS EVENTDATE,

CaseWhen ("PROJECTCODE"like'%.%')

     ThenSubstring("PROJECTCODE", 1, LOCATE("PROJECTCODE",'.')-1)

     Else"PROJECTCODE"

EndAS LANGCODE,

CaseWhen ("PROJECTCODE"like'%.%')

     ThenSubstring("PROJECTCODE", LOCATE("PROJECTCODE",'.')+1)

     Else'wp'

EndAS PROJCODE

FROM"WIKIDATA"."STAGE-PAGEHITS"

INTO"WIKIDATA"."PAGEHITSFACTTABLE";


Exclude bad records

I have noted the page hit counts from the outliers shown in the preceding section Exploring March Data. Before I ever perform a DELETE statement, I like to test the WHERE clause out. The query below should display the five suspect records that I will delete from the fact table.

SELECT *FROM"WIKIDATA"."PAGEHITSFACTTABLE"

WHERE"PAGEHITCOUNTFORHOUR" > 53000000;

 

09 records to delete.png

Now that I am happy with the results, the following DELETE statement removes the five records.

DELETEFROM"WIKIDATA"."PAGEHITSFACTTABLE"

WHERE"PAGEHITCOUNTFORHOUR" > 53000000;

HANA Studio confirms the execution of the statement with 5 affected rows!

10 deleted records.png

Now that we have our fact table, let us create the dimension tables.

 

Create the Wikipedia dimension tables

Time to create the two dimension tables for projects and languages. I am going to create three dimension tables: DIMPROJECT, DIMLANGUAGE, and use the HANA Studio feature to generate M_TIME_DIMENSION in the _SYS_BI schema. I will first create TWO CSV files and import them into the respective tables. I’ll then show how to generate the M_TIME_DIMENSION table.

 

Creating the .csv file for the project dimension table

Let’s start with the projects. The “Page view statistics for Wikimedia projects” page contains the list of project abbreviations that includes the following:

wikibooks: ".b"

wiktionary: ".d"

wikimedia: ".m"

wikipedia mobile: ".mw"

wikinews: ".n"

wikiquote: ".q"

wikisource: ".s"

wikiversity: ".v"

mediawiki: ".w" 

 

The challenge is, when I went to browse the data with the distinct values feature of HANA Studio, I got a few more than listed.

11 distinct values.png

After digging around the “WikiMedia Projects” page and looking at the page names used in the missing values, I created a .CSV file with the unique project codes as follows:

wikibooks,b

wiktionary,d

wikimedia,m

wikipedia mobile,mw

wikinews,n

wikiquote,q

wikisource,s

wikiversity,v

mediawiki,w

wikivoyage,voy

wikimedia foundation,f

wikipedia,wp

wikimedia labs,labs

org,org

us,us

wikidata,wd

 

You can download the CSV file at http://wikipedia-proj-lang-codes.s3.amazonaws.com/UniqueProjectCodes.csv. You should save this file into your Downloads folder.

 

Creating the .csv file for the language dimension table

Wikimedia maintains a list of the languages at http://meta.wikimedia.org/wiki/Template:List_of_language_names_ordered_by_code. Trying to create this list of over 300 languages can be a bit tricky, so I decided to go with Microsoft Excel to import the data. To save time, I used the new Microsoft Power Query for Excel add-in that works with either Excel 2010 or 2013. You can download the add-in from here.  To save space in the blog, I put together a short video that shows the steps.

 

.

 

You can download the CSV file at http://wikipedia-proj-lang-codes.s3.amazonaws.com/UniqueLanguageCodes.csv. You should save this file into your Downloads folder.

 

Creating the two dimension tables in HANA

To create the two dimension tables for projects and languages, execute the following commands:

CREATECOLUMNTABLE"WIKIDATA"."DIMPROJECT"

("ProjectTitle"VARCHAR(50) NOTNULL,

"ProjectCode"VARCHAR(10) NOTNULL

);

 

CREATECOLUMNTABLE"WIKIDATA"."DIMLANGUAGE"

("LanguageCode"VARCHAR(25) NOTNULL,

"Language"VARCHAR(50) NOTNULL

);

 

You have a couple of options with smaller comma separated files. You can upload the csv files using the SFTP plug-in like we did back in “Importing Wikipedia Hive data into SAP HANA One” blog post and then issue the following two IMPORT statements.

IMPORTFROM CSV FILE '/wiki-data/UniqueProjectCodes.csv'

INTO"WIKIDATA"."DIMPROJECT"

WITH RECORD DELIMITED BY'\n'

FIELD DELIMITED BY',';

 

IMPORTFROM CSV FILE '/wiki-data/UniqueLanguageCodes.csv'

INTO"WIKIDATA"."DIMLANGUAGE"

WITH RECORD DELIMITED BY'\n'

FIELD DELIMITED BY',';

 

Or, you can use the File | Import… command in SAP HANA Studio guided by the blog post “Export and Import feature in HANA Studio

Without going into detail, here is how you would import the UniqueProjectCodes.csv file using HANA Studio assuming that you downloaded it from the S3 bucket at http://wikipedia-proj-lang-codes.s3.amazonaws.com/UniqueProjectCodes.csv.

  • Choose the File | Import… command
  • Type the word data into the Select an import source search box to locate Data from Local File and click Next >.
  • Select the HANA database you are working with and click Next >.
  • Click Browse and select the UniqueProjectCodes.csv file and click OK.
  • Click in the Target Table | Existing option and then click the Select Table button.
  • Type DIMPROJECT in the Select Table dialog, select WIKIDATA.DIMPROJECT and click OK.
  • Click Next to Manage Table Definition and Data Mappings step.
  • Drag COLUMN_O in the Source File list over to the ProjectTitle row in the Target Table list.
  • Drag COLUMN_1 in the Source File list over to the ProjectCode row in the Target Table list.

12 import col map.png

  • Click Finish to perform the import operation.

Do the same steps to load the UniqueLanguageCodes.csv file into the DIMLANGUAGE table.

 

Create the Date dimension table for name lookups of date values

HANA Studio has a feature to generate a date dimension table. You first need to be in the Modeler Perspective. Go to the Window menu and issue the Open Perspective > Modeler. If you never have launched the Modeler perspective, choose the Other… command and then select Modeler from the list as shown below.

13 open perspective.png

You should then see the Welcome to Modeler page. Then, click on the Generate Time Data… command in the Data section.

14 modeler quick start.png

In the Generate Time Data pop-up, select Calendar type as Gregorian, enter 2007 and 2014 for year range, select Day for Granularity and Monday for First day of the week. Then click on Generate.

15 gen date table.png 

HANA Studio creates the M_TIME_DIMENSION table in the _SYS_BI schema. To see the structure of the table, enter in the following command into a SQL Console window.

SELECT * FROM"_SYS_BI"."M_TIME_DIMENSION"LIMIT 50;

16 display date table.png

I will use the DATE_SAP column as the JOIN column to the EVENTDATE column in the PAGEHITSFACTTABLE  table when building out the Analytic View in the next blog post.

I want to enhance the M_TIME_DIMENSION table by adding the English day of the week to correspond to the DAY_OF_WEEK_INT value where 0 = Monday and so on. Also, I want to add the English month based on the MONTH_INT value where 1 = January and so on. Here is how I added the columns.

 

ALTERTABLE"_SYS_BI"."M_TIME_DIMENSION"

ADD (ENGLISH_DAY_OF_WEEK VARCHAR(10) NULL);

UPDATE"_SYS_BI"."M_TIME_DIMENSION"

SET ENGLISH_DAY_OF_WEEK=

      CASE DAY_OF_WEEK_INT

      WHEN 0 THEN'Monday'

      WHEN 1 THEN'Tuesday'

      WHEN 2 THEN'Wednesday'

      WHEN 3 THEN'Thursday'

      WHEN 4 THEN'Friday'

      WHEN 5 THEN'Saturday'

      WHEN 6 THEN'Sunday'

      END

WHERE ENGLISH_DAY_OF_WEEK ISNULL;

Note: I used ENGLISH_ as the prefix so that I could include other languages in the future for the day of the week. For example, for German days, I would create a column called GERMAN_DAY_OF_WEEK and add it to my Analytic View so that German users would see the correct day of the week value. For example:

      WHEN 0 THEN'Montag'

We can so a similar set of statements for the ENGLISH_MONTH column.

ALTERTABLE"_SYS_BI"."M_TIME_DIMENSION"

ADD (ENGLISH_MONTH VARCHAR(10) NULL);

UPDATE"_SYS_BI"."M_TIME_DIMENSION"

SET ENGLISH_MONTH=

      CASE MONTH_INT

            WHEN 1 THEN'January'

            WHEN 2 THEN'February'

            WHEN 3 THEN'March'

            WHEN 4 THEN'April'

            WHEN 5 THEN'May'

            WHEN 6 THEN'June'

            WHEN 7 THEN'July'

            WHEN 8 THEN'August'

            WHEN 9 THEN'September'

            WHEN 10 THEN'October'

            WHEN 11 THEN'November'

            WHEN 12 THEN'December'

      END

WHERE ENGLISH_MONTH ISNULL;

To see that the table looks like, you can issue the following query:

SELECT DATE_SAP, YEAR, QUARTER, MONTH_INT, ENGLISH_MONTH, WEEK, DAY_OF_WEEK_INT, ENGLISH_DAY_OF_WEEK 

FROM"_SYS_BI"."M_TIME_DIMENSION"LIMIT 50;

17 modified date table.png 

Now we have our fact table and dimension tables ready in HANA. In the next blog post in this series, I will create an Analytic View from these tables that SAP Lumira needs to do detailed analysis on the Wikipedia data.

Let R Embrace Data Visualization in HANA Studio

$
0
0

Introduction

As we know that HANA has supported using R as the language for writing a stored procedure. 

R is a very powerful language for doing statistical analysis on data, and one of its major features is plotting fancy charts for data visualization.  There are lots of functions in R doing data visualization.


Well, the question now is: Can we use these fancy data visualization features in the SQL procedures written in R language, and display the charts when we execute the procedure in HANA studio? 

The answer is: Yes!!!

blog_04.png

Be patient, we need do some preparations for our machine that installed the HANA studio. Otherwise we’ll only see errors when executing the procedure. 

 

Here is a typical scenario: The HANA is installed on a powerful machine, which let’s call the ‘HANA box’, and the Rserve is installed on another machine to execute the R scripts, which is the ‘R box’.  Then, the end user will use another machine, such as a laptop, with ‘HANA studio’ installed, to execute the procedure written in R.  We call this machine as the ‘Client machine’. Here is the demonstration:

blog_01.png

The magic here is: we use the X Window system in the client machine, to transfer the graphical data through the X Window protocol:

blog_05.png

I’ll show you how to achieve this goal step by step. 

 

Prerequisites

Undoubtedly, the R should be installed with graphical feature enabled.  This also means that R box should have X Window system installed.

 

Step 1: Install X Window System on the Client Machine

If your client machine is running Linux, you need to do nothing at all, because the GUI of Linux uses the X Window.  If your client machine is running Windows, then you need to install the X Window for Windows.  You can install Cygwin to achieve that.  During the installation process, make sure the X11 is also installed.

You can refer to this guide on how to install X in Cygwin.

 

After X Window system is installed on Cygwin, you can start the X in Cygwin console:

$ startxwin

 

Step 2: Use SSH to Log in to the R Box

Suppose we started the Rserve process with the user ‘ruser’.  The next step is to log in to the Rbox from the client machine, using the ‘ruser’.  Here we need emphasize this because if you log in to the R box using another user account, the channel doesn’t work. 

Then, here comes a key step: When logging on to the R box, you must ensure that the ‘X11 forwarding’ is enabled.  For putty settings, you can configure here:

blog_02.png

If you use the command line to log in using SSH, you can add the ‘-Y’ option:

$ssh -Y <user_name>@<host_name>

 

That’s it.  You can have a test right now.  After you start the X Window in client machine and log in to the R box, you can run the ‘xclock’ in the SSH console.   If everything is OK, you can see the clock on your client machine.  Surprise, right?

blog_03.png

 

Step 3: Ready to run the code!

OK. Then we are ready to run the procedure now.  Here we’ll do a simple demo on using the data visualization features in the R script.  It will simply plot the input data frame. 

Here is the code:

DROP TYPE DUMMY_INPUT_T;
CREATE TYPE DUMMY_INPUT_T AS TABLE (    AAA INTEGER,    BBB DOUBLE
);

DROP TYPE DUMMY_OUTPUT_T;
CREATE TYPE DUMMY_OUTPUT_T AS TABLE(
    AAA INTEGER,    BBB DOUBLE
);

DROP TABLE DUMMY_INPUT;
CREATE TABLE DUMMY_INPUT (
    AAA INTEGER,    BBB DOUBLE
);

DROP TABLE DUMMY_OUTPUT;
CREATE TABLE DUMMY_OUTPUT (
    AAA INTEGER,    BBB DOUBLE
);

DROP PROCEDURE DUMMY_PROC_R;
CREATE PROCEDURE DUMMY_PROC_R (IN input1 DUMMY_INPUT_T, OUT result DUMMY_OUTPUT_T)
LANGUAGE RLANG AS
BEGIN
    result <- input1    plot(input1)    Sys.sleep(1)    while (!is.null(dev.list())){        Sys.sleep(1);    }
END;

TRUNCATE TABLE DUMMY_INPUT;
INSERT INTO DUMMY_INPUT VALUES (1, 1.1);
INSERT INTO DUMMY_INPUT VALUES (2, 2.2);
INSERT INTO DUMMY_INPUT VALUES (3, 3.3);

TRUNCATE TABLE DUMMY_OUTPUT;
CALL DUMMY_PROC_R(DUMMY_INPUT, DUMMY_OUTPUT) WITH OVERVIEW;
SELECT * FROM DUMMY_OUTPUT;

 

When you execute this in the HANA studio now, you can see the chart, right on the HANA studio!

 

Let’s look deep into the R script. 

First, we can see that we use the ‘plot()’ function.  This will draw the input data frame point by point as a print chart. 

Furthermore, you can notice that some code snippet is added at the end of the R script:

Sys.sleep(1)
while (!is.null(dev.list())){    Sys.sleep(1);
}

 

This code snippet is to ensure that the graphical window is not immediately closed after the script is run to the end.

 

Summary

By using the X Window system, we can use the data visualization features in the SQL procedure written in R language.  The chart can be displayed on the client machine running HANA studio. 

This feature is really useful for the data analysts to understand the data more clearly when using HANA and R.  It expands the usage area of R integration in HANA. 

How to setup excel 2007 on HANA SP05 with windows 7 64bit

$
0
0

1. Download HANA client 32 bit from http://scn.sap.com/community/developer-center/hana, client 64 bit is needed for hana studio, it is ok to install both of them on 64 bit windows OS.

SAP HANA Client Developer Edition Win86 32bit (appr. 42,2 MB)

SAP HANA Client Developer Edition Win86 64bit (appr. 83,2 MB)

 

2. Install 32bit hana client on C:\Program Files (x86)\SAP\hdbclient

 

3. Go to C:\Windows\SysWOW64, run odbcad32.exe -> add HDBODBC32 -> server port is: 3 + ins no. + 15.

 

7-19-2013 5-19-41 PM.jpg

 

image003.png

4. Double check with admin tool in control panel

 

image005.png

 

5. Connect to hana and retrieve the tables. Open excel 2007 -> data ->

 

image007.png

image009.png

6. Connect to MDX provider: data -> wizard ->

 

image011.png

 

image013.png

 

image015.png

Viewing all 676 articles
Browse latest View live