I am an intern visiting Palo Alto from SAP’s Shanghai office for a month-long project. It’s my first trip to the bay area so I am soaking up all the sun and all the excitement here. Last weekend, I found myself wanting to watch a movie. I searched the internet and found all the new releases listed on rottentomatoes and imdb but it was hard to pick one. I wanted to get a pulse of the movie before I watch it not from the critics but actual movie goers like me. Also, I wanted one which had high buzz not only in US but also in China. So I decided, why don’t I build one myself, after all I am in the heart of Silicon Valley.
I decided to pick SAP HANA One to power my app not just because I got the db & application server in the cloud but also because the platform would support sentiment analysis for English & Simplified Chinese right out-of-the-box! I used the Rotten Tomatoes API to find newly released movies and twitter & Sina Weibo APIs for sentiment for US & China respectively.
Prerequisites
Before we start to build the application, we need to get SAP HANA One developer edition and install SAP HANA Studio. You can get the info here:
"Get your own SAP HANA, developer edition on Amazon Web Services" http://scn.sap.com/docs/DOC-28294
You can find how to get SAP HANA One developer edition in part 1, 2, 5 and how to install SAP HANA Studio in part 3, 4.
Schema
I did most of my work in the HANA Studio which is based on the eclipse IDE so very familiar for Java and other open-source developers.
![1.jpg]()
First, I created a schema and full text index for all the movie metadata, including title, rating, running time, release data, synopsis, etc. Then I used the JTomato (https://github.com/geeordanoh/JTomato) to populate the table.
MOVIE: Stores movie metadata, including the title, rating, runtime, release date, etc.
![2.jpg]()
Then I used Twitter4J (http://twitter4j.org) to search the movie keywords on Twitter. I found that twitter, given just the keyword, did a good job pulling all combinations of the movie name: fast and furious, fast & furious.
TWEET: Stores crawled tweets from Twitter, including ID, time, location, content, etc.
![3.jpg]()
However, I ran into problems while crawling Sina Weibo because they have a strict process for usage of their API. So I decided to use Tencent Weibo instead.
TWEET_ZH: Stores crawled tweets from Tencent Weibo
![4.jpg]()
Next I created a fulltext index and sentiment tables (called VoiceOfCustomer) using the following SQL. Voila! I now have sentiment analysis for all twitter and tencent weibo data!
CREATE FULLTEXT INDEX TWEET_I ON TWEET (CONTENT) CONFIGURATION 'EXTRACTION_CORE_VOICEOFCUSTOMER' ASYNC FLUSH EVERY 1 MINUTESLANGUAGE DETECTION ('EN') TEXT ANALYSIS ON;
CREATE FULLTEXT INDEX TWEET_ZH_I ON TWEET_ZH (CONTENT) CONFIGURATION 'EXTRACTION_CORE_VOICEOFCUSTOMER' ASYNC FLUSH EVERY 1 MINUTESLANGUAGE DETECTION ('ZH') TEXT ANALYSIS ON;
TWEET_I: Used to perform sentiment analysis for the table TWEET
![5.jpg]()
TWEET_ZH_I: Used to perform sentiment analysis for the table TWEET_ZH
![6.jpg]()
In addition to the tables in SAP HANA and the full text index to perform sentiment analysis, I also wrote stored procedures to wrap complex SQL making it easy for XS (HANA’s application server) to consume.
Architecture
The final architecture looks like this:
![7.jpg]()
Rating
Now, I had to create a formula to quantify rating. I used a very simple formula for this:
Score = (# of strong positive sentiment * 5 + # of weak positive sentiment * 4 + # of neutral sentiment * 3 + # of weak negative sentiment * 2 + # of strong negative sentiment *1) / # of total sentiments
This score would be helpful to rank movies so I could easily pick the top one.
Additionally, I showed a distribution of the sentiments, positive vs. negative vs. neutral, so I could better understand how strong or weak people’s opinion was on the movie both in US & in China.
XS Application
The application should be built on XS Engine to prevent data transfer latency between the database and the web application server so users can access the website directly. The application was built in the following steps:
Step 1: Create stored procedures for rating and sentiment analysis
Currently, there are two stored procedures in the app. One is for rating and the other is for sentiment analysis:
1. Rating
We can use the following SQLs to create the type and the stored procedure:
CREATETYPE MOVIEINFO ASTABLE (
POSTER NVARCHAR(100),
TITLE NVARCHAR(100),
RATING DECIMAL(5, 2),
NUM INTEGER,
TITLE_ZH NVARCHAR(100),
RATING_ZH DECIMAL(5, 2),
NUM_ZH INTEGER,
YEARINTEGER,
MPAA_RATING NVARCHAR(100),
RUNTIME NVARCHAR(100),
CRITICS_CONSENSUS NVARCHAR(2000),
RELEASE_DATE DATE,
SYNOPSIS NVARCHAR(2000),
ID INTEGER
);
CREATEPROCEDURE GETMOVIEINFO(OUT RESULT MOVIEINFO) LANGUAGE SQLSCRIPT READS SQL DATA AS
BEGIN
RESULT =
SELECT A.POSTER, A.TITLE, B.RATING, B.NUM, A.TITLE_ZH, C.RATING_ZH, C.NUM_ZH, A.YEAR, A.MPAA_RATING, A.RUNTIME, A.CRITICS_CONSENSUS, A.RELEASE_DATE, A.SYNOPSIS, A.ID
FROM MOVIE A
INNERJOIN
(SELECT ID, CASESUM(NUM) WHEN 0 THEN 0 ELSETO_DECIMAL(SUM(TOTAL) / SUM(NUM), 5, 2) ENDAS RATING, SUM(NUM) AS NUM FROM
(SELECT
- A.ID,
- C.TA_TYPE,
COUNT(C.TA_TYPE) AS NUM,
CASE C.TA_TYPE
WHEN'StrongPositiveSentiment'THENCOUNT(C.TA_TYPE) * 5
WHEN'WeakPositiveSentiment'THENCOUNT(C.TA_TYPE) * 4
WHEN'NeutralSentiment'THENCOUNT(C.TA_TYPE) * 3
WHEN'WeakNegativeSentiment'THENCOUNT(C.TA_TYPE) * 2
WHEN'StrongNegativeSentiment'THENCOUNT(C.TA_TYPE) * 1
ENDAS TOTAL
FROM MOVIE A
LEFTJOIN TWEET B
ON A.ID = B.MOVIEID
LEFTJOIN"$TA_TWEET_I" C
ON B.ID = C.ID AND C.TA_TYPE IN ('StrongPositiveSentiment', 'WeakPositiveSentiment', 'NeutralSentiment', 'WeakNegativeSentiment', 'StrongNegativeSentiment')
GROUPBY
- A.ID,
- C.TA_TYPE) A
GROUPBY ID) B ON A.ID = B.ID
INNERJOIN
(SELECT ID, CASESUM(NUM) WHEN 0 THEN 0 ELSETO_DECIMAL(SUM(TOTAL) / SUM(NUM), 5, 2) ENDAS RATING_ZH, SUM(NUM) AS NUM_ZH FROM
(SELECT
- A.ID,
- C.TA_TYPE,
COUNT(C.TA_TYPE) AS NUM,
CASE C.TA_TYPE
WHEN'StrongPositiveSentiment'THENCOUNT(C.TA_TYPE) * 5
WHEN'WeakPositiveSentiment'THENCOUNT(C.TA_TYPE) * 4
WHEN'NeutralSentiment'THENCOUNT(C.TA_TYPE) * 3
WHEN'WeakNegativeSentiment'THENCOUNT(C.TA_TYPE) * 2
WHEN'StrongNegativeSentiment'THENCOUNT(C.TA_TYPE) * 1
ENDAS TOTAL
FROM MOVIE A
LEFTJOIN TWEET_ZH B
ON A.ID = B.MOVIEID
LEFTJOIN"$TA_TWEET_ZH_I" C
ON B.ID = C.ID AND C.TA_TYPE IN ('StrongPositiveSentiment', 'WeakPositiveSentiment', 'NeutralSentiment', 'WeakNegativeSentiment', 'StrongNegativeSentiment')
GROUPBY
- A.ID,
- C.TA_TYPE) A
GROUPBY ID) C ON A.ID = C.ID
ORDERBY B.RATING DESC
;
END;
After creating the type and the stored procedure successfully, we can use the following SQL to test:
CALL GETMOVIEINFO(?);
![8.jpg]()
From the column “RATING” and “RATING_ZH”, we can show the score on the main page.
2. Sentiment analysis
We can use the following SQLs to create the type and the stored procedure:
CREATETYPE SENTIMENT ASTABLE (SENTIMENT NVARCHAR(100), NUM INTEGER);
CREATEPROCEDURE GETSENTIMENT(IN ID INTEGER, IN LANG VARCHAR(2), OUT RESULT SENTIMENT) LANGUAGE SQLSCRIPT READS SQL DATA AS
BEGIN
IF LANG = 'EN'THEN
RESULT = SELECT'Strong Positive'AS SENTIMENT, COUNT(*) AS NUM FROM"$TA_TWEET_I" A
INNERJOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = 'StrongPositiveSentiment'
UNIONALL
SELECT'Weak Positive'AS SENTIMENT, COUNT(*) AS NUM FROM"$TA_TWEET_I" A
INNERJOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = 'WeakPositiveSentiment'
UNIONALL
SELECT'Neutral'AS SENTIMENT, COUNT(*) AS NUM FROM"$TA_TWEET_I" A
INNERJOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = 'NeutralSentiment'
UNIONALL
SELECT'Weak Negative'AS SENTIMENT, COUNT(*) AS NUM FROM"$TA_TWEET_I" A
INNERJOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = 'WeakNegativeSentiment'
UNIONALL
SELECT'Strong Negative'AS SENTIMENT, COUNT(*) AS NUM FROM"$TA_TWEET_I" A
INNERJOIN (SELECT ID FROM TWEET WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = 'StrongNegativeSentiment';
ELSEIF LANG = 'ZH'THEN
RESULT = SELECT'很好'AS SENTIMENT, COUNT(*) AS NUM FROM"$TA_TWEET_ZH_I" A
INNERJOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = 'StrongPositiveSentiment'
UNIONALL
SELECT'好'AS SENTIMENT, COUNT(*) AS NUM FROM"$TA_TWEET_ZH_I" A
INNERJOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = 'WeakPositiveSentiment'
UNIONALL
SELECT'一般'AS SENTIMENT, COUNT(*) AS NUM FROM"$TA_TWEET_ZH_I" A
INNERJOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = 'NeutralSentiment'
UNIONALL
SELECT'差'AS SENTIMENT, COUNT(*) AS NUM FROM"$TA_TWEET_ZH_I" A
INNERJOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = 'WeakNegativeSentiment'
UNIONALL
SELECT'很差'AS SENTIMENT, COUNT(*) AS NUM FROM"$TA_TWEET_ZH_I" A
INNERJOIN (SELECT ID FROM TWEET_ZH WHERE MOVIEID = :ID) B
ON A.ID = B.ID
WHERE A.TA_TYPE = 'StrongNegativeSentiment';
ENDIF;
END;
After creating the type and the stored procedure successfully, we can use the following SQLs to test:
CALL GETSENTIMENT(771313125, 'EN', ?);
![9.jpg]()
CALL GETSENTIMENT(771313125, 'ZH', ?);
![10.jpg]()
Step 2: Build the application based on XS Engine
Till now, we can access the tables, indexes, data and stored procedures directly from the XS Engine. To build the application, follow the following steps:
1. Create .xsaccess, .xsapp and .xsprivileges to do the access control.
2. Create getMovies.xsjs to call the stored procedure “GETMOVIEINFO”
function createEntry(rs) {
return {
"poster" : rs.getNString(1),
"title" : rs.getNString(2),
"rating": rs.getDecimal(3),
"num": rs.getInteger(4),
"title_zh" : rs.getNString(5),
"rating_zh": rs.getDecimal(6),
"num_zh": rs.getInteger(7),
"year": rs.getInteger(8),
"mpaa_rating": rs.getNString(9),
"runtime": rs.getNString(10),
"critics_consensus": rs.getNString(11),
"release_date": rs.getDate(12),
"synopsis": rs.getNString(13),
"id": rs.getInteger(14)
};
}
try {
var body = '';
var list = [];
var query = "{CALL SMARTAPP.GETMOVIEINFO(?)}";
$.trace.debug(query);
var conn = $.db.getConnection();
var pcall = conn.prepareCall(query);
pcall.execute();
var rs = pcall.getResultSet();
while (rs.next()) {
list.push(createEntry(rs));
}
rs.close();
pcall.close();
conn.close();
body = JSON.stringify({
"entries" : list
});
$.response.contentType = 'application/json; charset=UTF-8';
$.response.setBody(body);
$.response.status = $.net.http.OK;
} catch (e) {
$.response.status = $.net.http.INTERNAL_SERVER_ERROR;
$.response.setBody(e.message);
}
3. Create getSentiment.xsjs to call the stored procedure “GETSENTIMENT”
function createEntry(rs) {
return {
"sentiment" : rs.getString(1),
"num" : rs.getInteger(2)
};
}
try {
var id = parseInt($.request.parameters.get("id"));
var lang = $.request.parameters.get("lang");
var body = '';
var list = [];
var query = "{CALL SMARTAPP.GETSENTIMENT(?, ?, ?)}";
$.trace.debug(query);
var conn = $.db.getConnection();
var pcall = conn.prepareCall(query);
pcall.setInteger(1, id);
pcall.setString(2, lang);
pcall.execute();
var rs = pcall.getResultSet();
while (rs.next()) {
list.push(createEntry(rs));
}
rs.close();
pcall.close();
conn.close();
body = JSON.stringify({
"entries" : list
});
$.response.contentType = 'application/json; charset=UTF-8';
$.response.setBody(body);
$.response.status = $.net.http.OK;
} catch (e) {
$.response.status = $.net.http.INTERNAL_SERVER_ERROR;
$.response.setBody(e.message);
}
4. Create index.html and code the HTML part.
<!DOCTYPEHTML>
<html>
<head>
<metahttp-equiv="X-UA-Compatible"content="IE=edge">
<title>Real-Time Movie Rating</title>
<scriptsrc="/sap/ui5/1/resources/sap-ui-core.js"
id="sap-ui-bootstrap"
data-sap-ui-libs="sap.ui.commons,sap.ui.ux3,sap.viz"
data-sap-ui-theme="sap_goldreflection">
</script>
<!-- add sap.ui.table,sap.ui.ux3 and/or other libraries to 'data-sap-ui-libs' if required -->
<script>
sap.ui.localResources("movieui");
var view = sap.ui.view({id:"idMovieMatrix1", viewName:"movieui.MovieMatrix", type:sap.ui.core.mvc.ViewType.JS});
view.placeAt("content");
</script>
</head>
<bodyclass="sapUiBody"role="application">
<h1>Real-Time Movie Rating</h1>
<divid="content"></div>
</body>
</html>
5. Create some views and controllers to use native SAP UI 5 to accelerate building the application.
Website
The live webapp is available at (http://107.20.137.184:8000/workshop/sessionx/00/ui/MovieUI/WebContent/index.html) but I bring down the AWS instance to reduce the billing cost. I have captured screenshots and a brief video if you find the server is down.
Real-time movie rating homepage
The following screenshot is the app’s main page. For each movie, there are two scores: the upper score is from Twitter and the lower score from Tencent Weibo.
![11.jpg]()
I heard a lot of buzz about “Man of Steel” but it is currently ranked No. 7 so I was really curious. “Man of Steel” had a 3.72 rating but “20 Feet from Stardom” had a 4.54 rating. Interesting! Looking closer I discovered that this was because “20 Feet” had only 351 mentions but “Man of Steel” had more than 20K, meaning that a popular movie may not necessarily be the one with the highest score but could also be one which has the most buzz.
I then created a page with detailed breakdown of the sentiments of the Movie’s sentiments for both Twitter and Tencent Weibo. Looks like “Man of Steel” has a higher positive sentiment in China compared to the US. Well not surprising, we like superhero movies and Superman is our favorite.
Sentiment/Social Media | Twitter | Tencent Weibo |
| # | %age | # | %age |
Strong Positive | 9,986 | 44% | 528 | 34% |
Weak Positive | 5,903 | 26% | 723 | 47% |
Neutral | 839 | 4% | 12 | 1% |
Weak Negative | 2,067 | 9% | 123 | 8% |
Strong Negative | 3,757 | 17% | 166 | 11% |
![12.jpg]()
Let's see what the score on Rotten Tomatoes looks like. The critics have given it a meager 56% but 82% of the audience liked it. That number is compares well with 70% positive sentiment rating from my real-time rating app.
![13.jpg]()
"20 Feet from stardom" has 97% rating from critics and 100% from the audience on rottentomatoes. So my real-time rating app was able to successfully identify this hit from social sentiment on twitter. Looks like the movie is a sleeper hit!
![15.jpg]()
![14.jpg]()
This application is just a prototype now and I hope to make more enhancements to the drill-down page. For the next version, I want to use the predictive libraries in SAP HANA to create a recommendation engine for movie based on a user’s interests, something like a “Pandora for movies”. Hope you enjoyed reading my blog.