SAP LANDSCAPE TRANSFORMATION REPLICATION:

SAP LT is trigger based ,Real time replication method.
SAP LT replication server for SAP HANA leverage on proven SLO(System Landscape Optiomization) technology.SAP LT replication server is the ideal solution for all HANA customer who need real-time or schedule based replication sourcing from SAP and NON-SAP sources.
SLT replication server can be installed as stand alone server or can run on any SAP system with SAP NetWeaver 7.02 ABAP Stack(Kernel 7.02EXT) .
Add-on DMIS_2010_1_700 with SP5-7, NW 7.02; SAP Kernel 7.20EXT need to be installed for SLT replication to apply SPS04 for SAP HANA 1.0.
SLT replication system is connected with Source SAP ECC system by RFC(Remote Function call) connection and with target HANA system by DB Connection.In case of NON-SAP source system SLT connect with DB connection.
SAP source system contains Application tables,logging tables,triggers and read module.SLT server contains Control Module(Structure mapping & Transformation) and Writer module.In case of Non-SAP source system, Read Module resides in SLT System.
When replication start,logging table and trigger created in source system and data gets replicated to target system via SLT replication server and any changes in source system gets automatically replicated to target system.

SLT CONFIGURATION:

Configuration steps for SAP LT Replication Server

Define a schema for each source system
Define connection to source system
Define DB connection into SAP HANA
Define replication frequency (real-time; frequency for scheduled replication)
Define maximum number of background jobs for data replication

STEP 1: Open Configuration and Monitoring Dashboard by using transaction code “LTR”. Click on New button to start SLT configuration.

STEP 2: Provide name of SLT configuration. Please note that a schema will be created with the same name in HANA after the completion of configuration settings.

STEP 3: Specify source system details. If the source system in SAP (ECC) then select RFC Connection. Select DB Connection for Non-SAP Source system.

STEP 4: Specify target system details. The target HANA system is connected through DB Connection. Provide HANA system connection parameters. Administrator user privilege required.

When the popup to create a new configuration is closed by pressing the OK button, the following actions are performed automatically:

Configuration settings are saved on the LT Replication Server
New user and schema are created on the HANA system with the defined target schema name (not performed if an existing schema is reused).
Replication control tables (RS_* tables) are created in target schema.

RS_LOG_FILES, RS_MESSAGES, RS_ORDER, RS_ORDER_EXT, RS_SCHEMA_MAP, RS_STATUS

User roles for the target schema are created:
- <target_schema>_DATA_PROV -> Role to manage data provisioning
- <target_schema>_POWER_USER -> Contains all SQL privileges of the target schema
- <target_schema>_USER_ADMIN -> Role to execute authority
A procedure to grant (RS_GRANT_ACCESS) or revoke (RS_REVOKE_ACCESS) are created in the target schema
Replication of tables DD02L (stores the table list), DD02T (stores the table short descriptions) and DD08L (R/3 DD: relationship definitions) is started automatically. Once those tables are replicated, the HANA studio knows which tables are available in the source system.
SYS_REPL and table RS_REPLICATION_COMPONENTS are created (If they don’t exist already based on a previous configuration)
Replication is registered in table RS_REPLICATION_COMPONENTS

SAP LT REPLICATION OPTIONS:

Use LTRC transaction code to open LT replication Server Cockpit.We can use SLT options from SLT replication server as well as from HANA data provisioning.Select table and replication option from LT Replication server to start/replicate/stop/suspend/resume replication.

HANA data provisioning option can be seen in Quick launch view.

Select table to start replication.

We have a Chinese version (”SAP HANA Smart Data Access（一）——初识SDA”) of this blog.

Introduction

In the application scenarios of SAP HANA, it is common to analyze and process the data located in other systems. Usually, customers would like to replicate the data from other systems to SAP HANA, and then do analysis and processing in SAP HANA. However, data replication not only costs time and memory, but also, usually, requires another replication system deployed which is always not easy. SDA, abbreviation for Smart Data Access, provides customers a new way to access the date in remote data source.

What is SDA

SDA is a new method of SAP HANA for accessing the data stored in remote data sources. With the help of SDA, SAP HANA can create so-called “virtual table” mapping to tables located in remote data sources, and then SAP HANA can access the data directly by accessing the “virtual table”. “virtual table” can be manipulated by SAP HANA just like an ordinary table, which means the operations, such as select, update, insert, delete, and so on, are all available for “virtual table”. Besides, join operation between local table and “virtual table” is supported. When such join operation taken, optimizer of SAP HANA sends the relevant operations to remote data source for processing, and then the result set would be sent back to SAP HANA for further processing.

SDA was introduced in SAP HANA SPS06. At that time, the data sources supported by SAP HANA SDA includes: SAP HANA、Sybase ASE、SAP Sybase IQ、Teradata database and Apache Hadoop. And only read operation was permitted for virtual table. In SAP HANA SPS07, the data sources and operations supported by SAP HANA SDA are both extended. MSSQL Sever and Oracle are added to the list of supported data source, and write operation is permitted. The comparison of SDA in SPS06 and SPS07 is as below:

SPS06

SPS07

Supported data sources

SAP HANA、SAP Sybase ASE 15.7 ESD#4、SAP Sybase IQ version 15.4 ESD#3 and 16.0、

Teradata database version 13.0、Intel Distribution for Apache Hadoop version 2.3

All data sources supported in SPS06，Oracle Database 12c、Microsoft SQL Server version 11 for SQL Server 2012

Supported operations for virtual table

select

select、insert、update、delete

Note：the data sources officially supported by SAP HANA are limited to specific versions above, other versions are not guaranteed to work well.

Creating Data Source

The first step of accessing remote data source is to create remote data source in SAP HANA. The communication between SAP HANA and remote data source is based on ODBC protocol. The subsequent blogs of this series will talk about how to deploy remote data source in SAP HANA server side. Here, let’s simply talk about how to create remote data source in SAP HANA Studio.

In SAP HANA Studio, there are two ways to create remote data sources, one is by GUI, another is using SQL statement.

(1) Create remote data source with GUI:

Firstly, Open the folder called “Provisioning”. And then right click the “Remote Sources” with mouse, select “New Remote Source…”:

Secondly, choose one adapter from the adapter list in the popup dialog, and fill in corresponding connection and authentication information of the remote data source.

Lastly, press the run button to create data source.

(2) Create remote data source with SQL:

CREATE REMOTE SOURCE <src_name>

ADAPTER <adapter_name> [CONFIGURATION FILE 'filename']

CONFIGURATION <connection_info_string>

[opt_credentials_clause]

Example:

CREATE REMOTE SOURCE ORCL_11g_LNX

ADAPTER "odbc"

CONFIGURATION FILE ‘property_orcl.ini’

CONFIGURATION ‘DSN=oral11g_lnx'

WITH CREDENTIAL TYPE ‘PASSWORD'

USING ‘user=OUTLN;password=Aa111111';

In above SQL statement, <adapter_name> can be one of: ASEODBC, IQODBC,TDODBC, HIVEODBC,ODBC. Obviously, ASEODBC is for Sybase ASE as data source, IQODBC is for Sybase IQ, TDODBC is for Teradata Database, HIVEODBC is for Hadoop. And ODBC adapter is for other common data sources. <connection_info_string> is used to specify the connection information for data source, the name of DSN is usually given here. <opt_credentials_clause> is used to specify the authentication information of data source. Attention please, only adapter ODBC requires the CONFGURATION FILE, the functionality of configuration file will be introduced in next section.

Generic Adapter Framework

With the help of SDA, SAP HANA can communicate with the data sources who supports the ODBC protocol. However, just as discussed above, the supported data source of SAP HANA SDA is still limited now. For the supported specialized data source, SAP HANA has provided native code to support their operations. But SAP HANA SDA can’t guarantee other data sources to work well. What stop SAP HANA SDA from supporting more ODBC data sources is that some operation and configuration of these ODBC data sources can’t be processed by standard ODBC interface. For example, prepare a transaction for Sybase ASE requires some additional code which is not included in standard ODBC protocol. As SAP HANA provides such code, the operation for Sybase ASE is supported.

In order to decrease the influence of this issue, SAP HANA SDA applies Generic Adapter Framework to implement the communication with those unsupported ODBC data sources instead of calling the specialized native code for that data source in SAP HANA. With the help of Generic Adapter Framework, you can customize the feature and action of data source by setting a configuration file. For example, you can specify the supported operations, function mapping, data type mapping of the data source in the configuration file. For convenience of illustration, we call the configuration file “Property Configuration File” in the rest of this blog.

When creating data source, SAP HANA SDA will use the Generic Adapter Framework to communicate with remote data source if the ODBC adapter is chosen. SAP HANA SDA will search the property configuration file in the folder specified by environment variable DIR_EXECUTABLE, and the file name is specified by CONFIGURATION FILE option. By SPS07, SAP HANA SDA has provided the template of property configuration file for MSSQL and Oracle. They are called property_mss.ini and property_orcl.ini, and they are both located in the folder: $DIR_EXECUTABLE/config.

After data source created, SAP HANA SDA will parse the relevant property configuration file, all the features, function mappings, data type mappings and other properties will be linked together with the data source, and influence the communication between the SAP HANA and the data source.

A part of content of property_orcl.ini is below, we can figure out some format and function of property configuration file:

Typical process of creating data source

Creating a remote data source in SAP HANA usually involves steps below：

Check whether SAP HANA provides specialized adapter for the data source, such as "ASEODBC","IQODBC","TDODBC";
If specialized adapter is available, then just use it to create data source;
If specialized adapter is not available, then check whether there is a specialized property configuration template file, such as template for Oracle, MSSQL;
If specialized property configuration template exists, you can change the property configuration file according to your requirement, and then create data source using the modified file. For example, as long as the correctness of the modification ensured, you can disable the unnecessary functions, modify the mapping of data type or function based on you requirement.
If specialized property configuration template exists, you have to create a brand new property configuration file from scratch. To create such a file, you must be familiar with the properties of the data source and the driver it use;
Create the data source in SAP HANA Studio using the specialized adapter or the common adapter (ie. ODBC adapter). When using common adapter, you need to specify the property configuration file for the data source.

Note: when modifying or creating the property configuration file, only the property which is different from default value needs to be set. The mistake in property configuration file may result in incorrect action or result of data source.

Creating virtual table

After data source created in SAP HANA Studio, “virtual table” mapping to data in remote data source can now be created in SAP HANA Studio. Similar to creating data source, there are also two ways to create virtual table:

(1) Create virtual table with GUI:

(2) Create virtual table with SQL Statement below:

CREATE VIRTUAL TABLE <table_name> <remote_location_clause>

Example:

CREATE VIRTUAL TABLE sda.vt AT "ORCL_11G_WIN"."NULL"."OUTLN"."CANDIDATES";

Conclusion

In this blog, we have talked about basic content of SAP HANA SDA. In the subsequent blogs of this series, we will further talk about how to deploy SDA data source in SAP HANA Server, how to access Hadoop with the help of SAP HANA SDA, and so on. Please pay attention.

Reference

1. What’s New SAP HANA Platform Release Notes for SPS07:

http://help.sap.com/hana/Whats_New_SAP_HANA_Platform_Release_Notes_en.pdf

2. Section 6.1.1 of SAP HANA Administrator Guide：

http://help.sap.com/hana/SAP_HANA_Administration_Guide_en.pdf

We have a Chinese version(SAP HANA Smart Data Access（二）——SDA数据源驱动的安装与配置) of this blog.

Introduction

In the blog “SAP HANA Smart Data Access (1):A brief introduction to SDA”, we introduced the architecture of SDA and talked about how to add remote data source for SDA in SAP HANA Studio. Before adding remote data source for SDA, it is necessary to finish the installation and configuration the ODBC manager and ODBC driver for SDA date source in SAP HANA server side. For different SDA data sources, the process of installing and configuring ODBC driver is similar. In this article, we take the Oracle data source as an example to talk about how to install and configuration ODBC driver.

Installation of unixODBC driver manager

Since SAP HANA SDA communicates with remote data source using ODBC protocol, the ODBC driver manager must be installed in SAP HANA server side. Usually, the unixODBC is chosen as the driver manager for SAP HANA SDA. Software package of unixODBC can be downloaded from website: http://www.unixodbc.org/ . Please note that the version of unixODBC needs to be 2.3.0 for SQL Server data source, and 2.3.1 or newer version is required for other data sources. The process of installing unixODBC is below:

Download corresponding version of unixODBC package, the name of package is unixODBC-x.x.x.tar.gz, x here stands for version number.
Login in the SAP HANA server as the root user, decompress the unixODBC package to specified folder.
Enter the folder specified in step 2, then execute the commands below in order:

./configure

make

make install

4. unixODBC should be installed by now, you can execute “isql --version” to check whether the unixODBC is installed successfully.

Installation of ODBC driver for data source

So far, the data sources supported by SAP HANA SDA includes: SAP HANA, SAP Sybase ASE, Teradata database, Oracle, MS SQL Server and Hadoop. The ODBC driver for the database productions, such as Sybase ASE and Oracle, can be downloaded from the official website of the database. For example, you can download ODBC driver for Oracle form website: http://www.oracle.com/technetwork/database/features/instant-client/index-097480.html . As for Hadoop data source, SAP HANA SDA can communicate with it through Hive. More details about connection between SAP HANA SDA and Hadoop in subsequent blogs in this series. And the SAP official recommendation for Hive driver is the HiveODBC driver provided by Simba Technology. Simba HiveODBC can be achieved from the Simba website: http://www.simba.com/connectors/apache-hadoop-hive-odbc.

After downloading the ODBC driver, the driver can be installed according to relevant installation guide. Take Oracle as an example here, two zip packages should be downloaded: instantclient-basic-linux.x64-xx.x.x.x.x.zip和instantclient-odbc-linux.x64-xx.x.x.x.x.zip, x stands for version number. Then decompress the two packages to a same folder with unzip command. The default unzipped directory is instantclient_xx_x. if everything goes well, you can find the ODBC driver file for Oracle in the unzipped directory, the file name is libsqora.so.xx.x . By now, the ODBC driver for Oracle database is installed.

Configuration of ODBC data source

ODBC configuration file needs to be created after installing ODBC data source, and some environment variable should be set accordingly. Let’s also take Oracle as an example:

Login SAP HANA Server as sidadm, sid is the ID of SAP HANA instance.
Enter the home directory of sidadm, create the ODBC configuration file named “.odbc.ini”.
Edit .odbc.ini with vim, the content should be like below:

[ORCL_DSN]

Driver=/path/to/driver/ libsqora.so.xx.x

ServerName=ORCL

ORCL_DSN is the name of ODBC data source, which is used by ODBC Manager to find the information for connection. Driver is a keywords used to specify the path of ODBC Driver of data source. For oracle, ServerName is the name of Oracle database which is defined in the file called “tnsnames.ora” located in home directory of sidadm. For other kinds of database, “ServerName” is replaced by other keywords, such as “ServerNode” for SAP HANA, “Server” and “Database” for MS SQL Server.

4. For Oracle data source, create a file called “tnsnames.ora” in the home directory of sidadm, then edit it with vim, the content should be like below:

ORCL=

(DESCRIPTION =

(ADDRESS = (PROTOCOL = TCP)(HOST = <host_ip>)(PORT = 1521))

(CONNECT_DATA =

(SERVICE = DEDICATED)

(SERVICE_NAME =orcl)

)

5. Set some environment variables in the shell script file “hdbenv.sh” of SAP HANA, this shell script file is located in the folder specified by environment variable “DIR_INSTANCE”. Add the commands below to this shell script file:

export LD_LIBRARY_PATH = $LD_LIBRARY_PATH:your_oracle_driver_dir/instantclient_12_1

export LD_LIBRARY_PATH = $LD_LIBRARY_PATH:/usr/local/lib

export ODBCINI = $HOME/.odbc.ini

export TNS_ADMIN=~/ (For Oracle only)

Here, please pay more attention to “LD_LIBRARY_PATH” variable, all the libraries depended by ODBC driver are searched orderly in the folder list specified in the “LD_LIBRARY_PATH” variable. If any folder which contains the depended library is not included in the folder list, ODBC Manager will say can’t find the library file.

6. Restart SAP HANA, login SAP HANA Server as sidadm, and execute the command “isql –v <DSN> ” to test the connection to data source. If connection succeeded, the configuration is finished. If connection failed, please analyze and process the error according to error message. Some tips for error processing would be introduced in next section.

Trouble shooting

1. Error message:

[08004][unixODBC][Oracle][ODBC][Ora]ORA-12154: TNS:could not resolve the connect identifier specified

Analysis:

This error is raised by ODBC Manager, it is easily misunderstood as an oracle error. However, this reason of this error is that the environment variable TNS_ADMIN is not set or set incorrectly. The TNS_NAME tells where is the file tnsnames.ora. So, if TNS_ADMIN is not set correctly, the ServerName specified in .odbc.ini can’t be resolved. Then the error is raised.

Solution:

Set the environment variable TNS_ADMIN to the home directory of sidadm in the hdbenv.sh.

2. Error message:

[01000][unixODBC][Driver Manager]Can't open lib '/path/to/driver/libsqora.so.12.1' : file not found [ISQL]ERROR: Could not SQLConnect

Analysis:

The error message says the file libsqora.so.12.1 can’t be found, but this file do exist. Let’s use command ldd to check the dependency of this file, we get:

We can see that the file libclntsh.so.12.1 is depended by libsqora.so.12.1, and it can’t be found. Although libclntsh.so.12.1 is in the same folder with libsqora.so.12.1, but the folder is not in the folder list specified by LD_LIBRARY_PATH, so it will not be searched.

Solution:

Add the folder which contains libsqora.so.12.1 to LD_LIBRARY_PATH in the hdbenv.sh.

Create remote data source

After all the configuration finished, create a remote data source following the procedure introduced in “SAP HANA Smart Data Access(1)——A brief introduction to SDA”. The tables in remote data source can be viewed through SAP HANA Studio after remote data source created, as shown below:

Conclusion

In this blog, we take oracle as an example to illustrate how to install and configure the ODBC manager and ODBC driver of remote data source, and simply discuss some error which may happen during the procedure of installation and configuration. The installation and configuration of other data sources supported by SAP HANA SDA is similar, some little difference would be introduced in subsequent blogs of this series.

Reference

“SAP HANA Smart Data Access(1)——A brief introduction to SDA”
Section 6.1.1 of SAP HANA Administrator Guide：

http://help.sap.com/hana/SAP_HANA_Administration_Guide_en.pdf

We have a Chinese version(SAP HANA Smart Data Access（三）——如何利用SDA通过Hive访问Hadoop数据) of this blog.

Introduction

In previous blog of this series, we talked about how to install and configure the data source of SDA in SAP HANA Server side. As most data sources supported by SAP HANA are all database, the procedure of installation and configuration is similar. But for Hadoop data source, something is different. As a distributed data processing platform, Hadoop usually store data in HDFS file system, or in NoSQL database HBase which is also usually based on HDFS. However, both of HDFS and HBase don’t support ODBC protocol. So we need another member of Hadoop family to solve this problem, it is Hive. Hive implements SQL interface for HDFS and HBase, and HiveODBC driver is also provided. In this blog, we’ll talk about how does SDA access the Hadoop data through Hive.

Deploy Hadoop and Hive

The Official version of Hadoop supported by SAP HANA SDA is “Intel Distribution for Apache Hadoop version 2.3” (Including Apache Hadoop version 1.0.3 and Apache Hive 0.9.0). Although there’s only one version in the official supported version list, the experiment of this blog shows that SDA can also access the data stored in ordinary Apache version of Hadoop. The experiment of this blog build up a Hadoop cluster containing 3 nodes, and the version of Hadoop and Hive is : Apache Hadoop 1.1.1 and Apache Hive 0.12.0.

As the guide of deploying Hadoop and Hive can be easily found in internet, we don’t discuss it here. After deploying Hadoop and Hive, some data for experiment needs to be prepared. Here, We use a user information table, the structure of the table is :

Column Name	Data Type
USERID	VARCHAR(20)
GENDER	VARCHAR(6)
AGE	INTEGER
PROFESSION	VARCHAR(20)
SALARY	INTEGER

Data can be imported from csv file into hive table. Firstly, create a table using hive shell.

create table users(USERIDstring, GENDERstring, AGE int, PROFESSION string, SALARY int)

row format delimited

fields terminated by '\t';

Then, import data from csv file to the users table:

load data local inpath '/input/file/path'

overwrite into table users;

Here, the data is imported from local file system, Hive can also import data from HDFS. In this experiment, the number of records in users table is 1,000,000. After importing, count the record number:

As shown in the picture above, Hive call the MapReduce to query data, and it takes 14.6 seconds to count the record number of users table. Afterwards, select out the top 10 records of the table:

As we see, it takes 0.1 second.

Installing and configuring HiveODBC Driver

Same as installing driver for other data sources, installing HiveODBC driver also requires unixODBC installed in the SAP HANA Server side. HiveODBC requires unixODBC-2.3.1 or newer version. For more details about installing unixODBX, please see the reference [2].

unixODBC installed, Begin to install HiveODBC driver. As introduced in reference [2], we use the HiveODBC provided by Simba Technologies. The procedure of installing is like below:

Download Simba HiveODBC driver package, and decompress the package to a certain directory. Then enter the directory: /<DRIVER_INSTALL_DIR>/samba/hiveodbc/lib/64 (use 32 to replace 64 if it’s for 32-bit system) to check the driver file libsimbahiveodbc64.so.
Login SAP HANA Server as sidadm.
Execute “HDB stop” to stop the SAP HANA.
Copy the file “/<DRIVER_INSTALL_DIR>/simba/hiveodbc/Setup/simba.hiveodbc.ini” to the home directory of sidadm.
Edit the ~/.simba.hiveodbc.ini with vim.
If there’s one row “DriverManagerEncoding=UTF-32”, change it to UTF-16.
Check the ErrorMessagePath = /<DRIVER_INSTALL_DIR>/simba/hiveodbc/ErrorMessages, correct it if it doesn’t points to right path.
Comment out the row: ODBCInstLib=libiodbcint.so, and add a new row: ODBCInstLib=libodbcinst.so.
Edit the .odbc.ini file in home directory of sidadm, add a new DSN for hive, the default port for hive is 10000, here’s an example:

[hive1]

Driver=/<DRIVER_INSTALL_DIR>/simba/hiveodbc/lib/64/libsimbahiveodbc64.so

Host=<IP>

Port=10000

10. Edit the file $HOME/.customer.sh to set some environment variable:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:

/< DRIVER_INSTALL_DIR>/simba/hiveodbc/lib/64/

export ODBCINI=$HOME/.odbc.ini

11. Use isql to check whether SAP HANA Server can connect to remote data source successfully:

isql –v hive1

12. If connect successfully, execute “HDB start” to start SAP HANA.

Create hive data source

When installing and configuring HiveODBC finished, create the hive data source in SAP HANA Studio following the steps introduced in Reference [1]. Here, you need to choose the HIVEODBC as the adapter.

After hive data source created, you can view the tables in hive, as shown in picture below:

Query Hive virtual table

Add a new virtual table which maps to table users in Hive following the steps introduced in Reference [1]. Then count the record number of the virtual table:

As shown above, it takes 14.1 seconds to count the virtual table in SAP HANA Studio, which is close to the time it cost in Hive side. The result tells us that SAP HANA SDA doesn’t influence the performance of operation in remote data source when little data transmission involved.

Conclusion

In this blog, we illustrate how SAP HANA SDA access the Hive table stored in Hadoop using a simple example. Hive is a tool which provide SQL interface for Hadoop. From the experiment result, querying the virtual table in SAP HANA Studio and querying the Hive table in Hive side is very close in performance when little data transmission involved.

Reference

SAP HANA Smart Data Access（1）——A brief introduction to SDA
SAP HANA Smart Data Access（2）——How to install and configure the data source driver of SDA
Section 6.1.1 of SAP HANA Administrator Guide: http://help.sap.com/hana/SAP_HANA_Administration_Guide_en.pdf

We have a Chinese version (SAP River (一)：SAP River概述) of this blog.

Introduction

The most common architecture of SAP HANA application is like below:

Figure 1: Traditional SAP HANA application architecture

With the traditional architecture shown in Figure 1, application developer is responsible for both creating data model in database level and implementing control logic in XS level. SQL and SQL Script are required to create data model in SAP HANA database, while xsjs needed to implement control logic in XS level. Therefore, the developer of SAP HANA application at least needs to grasp two technologies to finish a SAP HANA application. Sometimes, developing a SAP HANA application needs the cooperation of two or more developers. SAP River is an option to void this problem.

SAP River is a brand new method to develop SAP HANA application. SAP River consists of a programming language, a programming model and a suit of development tools. With help of SAP River, developer can concentrate on the design of business intent, ignoring how is it implemented and optimized in SAP HANA database. SAP River only exists in design-time. When SAP River objects activated, all SAP River code is compiled into SQL Script or XSJS code and then transferred to XS engine indexserver for execution.

Figure 2: Function model of SAP River

The function model is shown in Figure 2. SAP River integrates all the segments involved in development of SAP HANA application, including data modeling, control logic and access control, which makes developer capable of accomplishing a SAP HANA application grasping only one single technology. Firstly, developer can design the data model for SAP HANA application with SAP River language. During compilation, SAP River will create corresponding database objects for the data model you designed using SAP River language. For example, the entity in SAP River program will be mapped to a table in SAP HANA database. Secondly, using SAP River language, developer can define methods for data objects, which implements business logic. Last but not least, SAP River provides developer a way to design the access control for data and method.

SAP River Language

As a new generation of SAP HANA application development method, SAP River provides a strongly typed and declarative programing language. With this programming language, even a non-computer-major developer can also easily develop a SAP HANA application. In addition, SAP River also supports embedded SQL Script and XSJS code in SAP River program, which is useful for some complicated logic.

SAP River Language mainly includes:

Application: Application is the largest object in SAP River Language, all other objects must be included in an application. Objects in an application can be exposed to external applications via some ways, such as OData.
Entity: Entity is used to store data in database. Usually, one entity is mapped to a table in SAP HANA Database. An entity consists of one or more elements. Usually, one element is mapped to a column of a table. Each element has its own type, the type here can be fundamental type such as integer, string, and can also be a custom type or entity.
Types: Type defined the size and validity of data. SAP River supports different kinds of types, including fundamental type, structured type and stream type. Each entity will automatically define a corresponding implicit type, which has same name with the entity.
Action: Action is similar to the function or method in other programming languages. Usually, SAP River define the business logic of the application in the actions.
View: With the help of view, you can create a data stream using select statement, this data stream is a subset of target data set. The data of this data stream will be dynamically extracted from target data set when the data stream is used.
Role: Role can be created in SAP River code, and assigned some privileges to this role. In this way, access control is implemented.

Figure 3. SAP River example program: Hello World

As a programming language, SAP River also provides some libraries, which contains many common functions. These libraries can be divided to some categories according to their functionality:

Math Functions: mathematic calculation functions, such as absolute, ceil, floor, etc.
Integer Functions: process the data of integer type, such as toDecimalFloat, toString, etc.
String Functions: deal with strings, such as length, substring, etc.
Date Functions: date functions, such as day, month, etc.
Session Functions: session functions, such as getUserName.
Logging Functions: log function, such log.
Utility Functions: some utility functions, such as parseDecimalFloat, parseInt, etc.
HTTP Calls: Http library is used to send Http request in SAP River code, such as OData or REST services.

OData Calls

SAP River can expose the data and business logic to client via OData protocol. SAP River can expose data to client either in application level or in namespace level via OData. If exposed in namespace level, then every entities, views, and actions in this namespace are exposed to client. If exposed in application level, then only the objects that are tagged for exposure are exposed to client. Here, let’s take application level as example to illustrate how to expose data via OData.

Figure 4. Expose SAP River application via OData

① when expose object in application level, you must use the keyword export to specify which objects are to be exposed to client. Such as Employee in TestSyntax here.

② There are more than one ways to expose SAP River object, OData is one of them. So it is necessary to add a notation “@OData” to declare the way of exposure. SAP River will create corresponding OData service for the application or namespace.

③ By Default, SAP River object is private. To let specified users to access certain object, you need to use the keyword “accessible by” to tell which role or privileges are required to access the object. If every users are allowed to access, you can just use “accessible by sap.hana.All”.

How to learn SAP River

SAP HANA begin to support SAP River from SPS07, we have many materials to study SAP River. Here are some materials for studying SAP River:

Conclusion

In this blog, we talked about functionality of SAP River, advantage and structure, including the advantages of SAP River over traditional development framework, main features of SAP River and how to expose data to client via OData.

Reference

If you are modelling HANA views based on a Suite (like ERP, CRM, SRM) on HANA system, you would probably like to have the table and table field descriptions from the ABAP Dictionary available to you in the HANA Studio Modeler/Editor. Once this ABAP report has been ran the table and column description are shown in the HANA Studio editor and automatically copied to the semantic layer:

Run the following ABAP report in the background to update the descriptions in the HANA database based on the ABAP Dictionary. Only works on NW 7.4 (ERP Ehp7 or BW 7.4 or CRM 7 Ehp 3) system:

*&---------------------------------------------------------------------*
*& Report  Z_HANA_SET_TABLE_COMMENTS
*&
*&---------------------------------------------------------------------*
REPORT z_hana_set_table_comments.
DATA: lt_tables       TYPE TABLE OF dd02t,      lv_table        TYPE dd02t,      lt_dfies        TYPE TABLE OF dfies,      lv_dfies        TYPE dfies,      lv_fieldcomment TYPE string.
SELECT t~tabname t~ddtext into corresponding fields of table lt_tables FROM dd02t AS t  INNER JOIN dd02l AS l ON l~tabname = t~tabname WHERE t~as4local = 'A' AND t~ddlanguage = sy-langu AND l~tabclass = 'TRANSP'.
LOOP AT lt_tables INTO lv_table.  TRY.      NEW cl_sql_statement( )->execute_query(           ` COMMENT ON TABLE "` && lv_table-tabname && `" is '` && lv_table-ddtext && `' ` ).      CALL FUNCTION 'DDIF_FIELDINFO_GET'        EXPORTING          tabname        = lv_table-tabname          langu          = sy-langu        TABLES          dfies_tab      = lt_dfies        EXCEPTIONS          not_found      = 1          internal_error = 2          OTHERS         = 3.      IF sy-subrc = 0.        LOOP AT lt_dfies INTO lv_dfies.          TRY.              lv_fieldcomment = cl_abap_dyn_prg=>escape_quotes( CONV string( lv_dfies-fieldtext ) ).              NEW cl_sql_statement( )->execute_query(                         ` COMMENT ON COLUMN "` && lv_table-tabname && `"."` && lv_dfies-fieldname && `" IS '` && lv_fieldcomment && `' ` ).            CATCH cx_sql_exception INTO DATA(oreffield).              WRITE: / 'Error: ', oreffield->get_text( ).               COMMIT WORK.          ENDTRY.        ENDLOOP.      ENDIF.    CATCH cx_sql_exception INTO DATA(oref).  ENDTRY.
COMMIT WORK.
ENDLOOP.
WRITE: / 'Table and field comments updated.'.

This is a companion to my earlier blog, where I demonstrated HADOOP HBASE records being read by HANA and presented using SAPUI5.

Reading and Writing to HADOOP HBASE with HANA XS

In this blog I expand on this topic further by enabling HANA to read and write to HBASE.

I’ve created a simple application where HBASE is used to capture the change log of a HANA table.

But why, you might say?

HADOOP is designed for BIG Data (e.g. at the Petabyte scale), at it’s core it uses disks for storage. It’s still very much a ‘do it yourself’ technology with very little Enterprise ready applications built on top yet.

HANA is arguably designed for LARGE Data (e.g. at the Terabyte scale) and makes use of the latest In-Memory technology. By contrast it has a huge catalog of SAP software now running on it.

The two technologies are complementary in an Enterprise Data Architecture.

Many modern website (e.g. Facebook, Twitter, Linkedin) use a complex combination of HADOOP, traditional RDBMS, custom development and a wide collection of website scripting tools. The aspiration for many of these websites is millions or billions of users.

For Enterprises, who don’t consider IT their core business, then venturing down this path may not be the most straightforward option. HANA is more that just a database, it also provides the platform for building and deploying custom Enterprise Applications for Desktop and Mobile. In this case a far simpler BIG Data Architecture might be to use just HANA & HADOOP. Enterprise applications may be only targeted to 100’s to 10K’s of users.

In the following example I’ve built a very simple application on HANA, which enables a table to be maintained on a HTML/5 webpage. For audit and control purposes I wanted to keep a log of all the changes to Column values by the users. I could have implemented this purely in HANA, but to demonstrate the integration potential of HANA and HADOOP HBASE, I have opted to also write the changes to HBASE. On a small scale the benefit of this is negligible, but on a larger scale there may considerable cost saving for storing low value data on an open source disk based solution such as HADOOP.

The following diagram represents the technical data flow of my example:

Now for an example.

The following shows the DOCUMENTS table in HANA, and HBASE Table Log prior to a change:

In HBASE the change log ( for Doc_Name DOC1 & Row 1) is stored as:

Now I make a change, e.g. Changing the Free Text from ‘APPLE’ to ‘APPLE CIDER’

Update Successful! (Changes written to HANA and HBASE)

From SAPUI5 the HBASE Change Log appears as:

Above you can now see the history of the of the 'Free Text' field

In HBASE the Change log Table appears as:

NOTE: the HADOOP User Interface (HUE) only the latest change is shown, however behind the scenes I’ve defined the HBASE table to store up to 100 changes (versions) of the an individual column.

I can also check these in Hbase Stargate directly, though they are BASE64 encoded:

OR by checking the HbaseTableLog xsjs I created, using the GET Method (which decodes):

NOTE: I used POSTMAN here to help format the returned JSON to make a bit easier to test. Above you’ll see the history for the Free_Text field.

The key feature to get this prototype working is the HANA SPS07 functionaility which enables XS Javacript libraries (xsjslib) to be called on ODATA CRUD events.

E.g. DOCUMENTS.xsodata

service {

"HADOOP"."HanaHbase::DOCUMENTS" as "DOCUMENTS"

update using "HanaHbase:DOCUMENTS_table_exits.xsjslib::update_instead"

;

}

NOTE: For the comparison of Before and After record, to determine the field changed, I’ve made use of the example code provided by Thomas Jung , for convert a SQL record set to a JSON object, see http://scn.sap.com/thread/3447784

During the ODATA PUT (UPDATE) I’ve modified both HANA & HBASE with the most recent change.

The HANA table is only setup to store the current value.

The HBASE equivalent table I’ve defined to keep the most recent 100 changes per COLUMN.

The HBASE Log table was defined as:

create 'HanaTableLog', {NAME => 'table', VERSIONS => 100}

The complete code for this prototype is available on github.

https://github.com/AronMacDonald/HanaHbase

Please refer http://scn.sap.com/docs/DOC-48851 for debugging procedure in HANA SP6.

Before SPS 07, anyone with SYS.DEBUG role was authorized to debug everything, this authorization is no longer required in SPS 07.

Now this restricts the debugging of procedures owned by other users.

With roles:

Grant debug on <proc> to <user>

Grant debug on schema <schema> to <user>

By giving these roles, users can be assigned for debugging a particular procedure or at schema level.

Let's start debugging now...

First create a wrapper procedure and set the break point (open is _SYS_BIC schema):

Open debug prospective and configure a new debugger.

Now select Catalog schema and select the procedure to debug from the schema, click 'Browse' button on the right.

OUT_MARA has one row selected.

In this third part of the BlogProject, we will see how to create the data model so that we can manage and analyze our data in a more robust manner and add our application logic on top of the data.

If you missed the previous posts about the creation of an XS project and the persistence model (especially the data schema in the 2^nd part) make sure to have a look:

http://scn.sap.com/community/developer-center/hana/blog/2014/05/27/introduction-to-hana-xs-application-development-part-1-blogproject-creation

http://scn.sap.com/community/developer-center/hana/blog/2014/05/27/introduction-to-hana-xs-application-development-part-2-blogproject-persistence-model

The modeling components that HANA provides are SQL views and information views. Of course HANA provides the capabilities of the standard SQL views. But what seems to be the most interesting part of HANA modeling are information views. When we talk about information views, we refer to the three types of views: attribute views, analytic views and calculation views. These views are non-materialized, leading to increased agility and less storage requirements.

Image 1: Information views architecture

Information Views

Attribute views

Attribute views are dimensions, BW characteristics or master data (mostly) and are used to join dimensions or other attribute views, creating logical entities (like Product, Employee, Business Partner) based on multiple objects. However, if the needs of the use case require it, attribute views can be created on transaction data too. Attribute views are re-usable and often shared in Analytic and Calculation Views. Lastly, attribute views include only attribute, and not measure, columns, and user defined calculated columns that use certain functions on attribute columns.

Analytic views

Analytic views are thought of as OLAP views, representing star schemas, where one or more fact tables are surrounded by dimensions. Analytic views are used for multidimensional data analysis and cube queries. They are typically defined on at least one fact table, that contains transactional data, along with a number of tables or attribute views, that play the role of dimensions to the fact table(s). In addition, in this type of views, you can define calculated columns based on other columns, or restricted columns, which enable you to filter the data on certain columns. Except from that, there is the capability of adding input parameters and variables, to affect the result of the view. Furthermore, except from attribute columns, an analytic view can contained measures, which can be aggregated.

Calculation views

Calculation views are composite views used on top of analytical and attribute views, as indicated by Image 1, used to perform more complex calculations not possible with other views. They are usually used when working with multiple fact tables, or when Joins are not sufficient. They can use any combination of any number of tables, attribute and analytic views, using union, join, aggregation and projection operators. A calculation view can be defined in a graphical manner, which is usually the case, producing graphical calculation views, or in a scripted manner with the use of SQL Script, producing scripted calculation views. Lastly, a calculation view allows the definition of attributes and measures, as well as the creation of calculated and restricted columns and input parameters and variables.

When to use each view

Searching the web and communities, I found the below picture that contains the whole concept of the data analysis in HANA and what component to use, depending on the requirements of your application. In this part we will deal only with the information views, leaving the scripting methods for later analysis.

Image 2: Information views usage

Information views and BlogProject

In our use case we will only use attribute views for joining our master data tables and calculation views to provide multiple joins and aggregations. We will not use analytic views because in this particular application the data do not require a star schema representation. In fact what we want to do here is create collections of attributes, measures, calculations and aggregations that will be exposed to the user interface and provide the data that a user will need. So, if we think about it, what a user wants (in fact what we want our user to want) is to read and write posts, read and write comments, see his/her profile and see country statistics.

Create attribute views

AT_COMMENT

First we will create the comment entity that we want to expose to a user. First of all the user must see the text of the comment, the date and the user that wrote it. If we consider creating a profile screen for the user, then his/her comments will also be shown there, but it would be great for the user to be able to see the corresponding post of a certain comment, too.

So, in the below attribute view we join the tables POST and USER with the COMMENT in the columns that play the roles of foreign keys. We keep both the ID and name(Username, PostTitle) columns because it is easier for as to manage ids, but easier for the user to manage names.

To create this view you have to drag your mouse from the foreign key to the primary key, for example from COMMENT.UserID to USER.ID, and then choose the type of join and cardinality from the properties to the right down side. In this case we have chosen “referential” join with cardinality 1(USER, POST) to many (COMMENT).

AT_POST

For the central entity, the POST, and following the previous logic, we want to show to the user not the UserID, but the Username. So, same as before we drag from the foreign to the primary key, choose from the properties “referential” join and 1(USER) to many(POST) cardinality. Again we keep the UserID for our own purpose. Then, we have to add to the output the columns we want by clicking on the sphere next to each column.

In this example we extend our attribute view logic by adding a Calculated Column. Let’s say we want a column to give as a clue about a post’s popularity, except from Views. To do that we will create a column called LikesDislikesRatio, by pressing right click on the Calculated Columns folder, choose “New” and follow the wizard. The formula is simple, it’s the fraction Likes/Dislikes. Let’s make the convention that when Dislikes are zero(0) then the result will be the number of Likes, and when the Likes are zero(0) the result will be zero(0), given that we only care for the most popular.

Create calculation views

We could keep on creating attribute views, but at some points we will need some aggregations. In our case we want to know how many comments are written for a single post, how many incoming links there are for this post, the number of posts and comments written by a certain user and how many posts are made in the country level.

CALC_POST

Let’s start with the POST. We would like to count the number of comments and inbound links for each post. To do so we follow the procedure shown below.

First, we have to join each COMMENT with the corresponding POST. For the post entity we use the attribute view AT_POST we created previously, that holds the additional information we wanted. To join, after we have created a join (JOIN_1) and added the 2 objects, we drag the mouse from COMMENT.PostID to AT_POST.ID and we choose cardinality 1(AT_POST) to many(COMMENT). In contrast to the joins we made in the attribute views, now we will choose Left Outer join, because we want to keep all the rows of the AT_POST, even those that don’t have comments or links.

As a result we have for each post, the ids of all its comments.

Then, we will create another join (JOIN_2) to join the previous join (JOIN_1) with the POST2POST table. Again, we drag the mouse, choose cardinality 1 to 1 and change the join type to Right Outer, because we want in the result all the columns and not only the ones that exist in both tables.

Doing this, we added to each post the ids of the posts that have outbound links.

The next step is to actually create the aggregations. To do this we have to connect our last join (JOIN_2) to the Aggregation operator. In the Aggregation, we will create our two aggregated columns that will hold the number of comments and incoming links for each post. We simply right-click on the Calculated Columns folder, but this time we choose “New counter” and follow the wizard choosing which column we want to count. We do this twice, creating the two columns “CommentsNo” and “InLinks”.

CALC_COUNTRY

For the Country entity we will follow the same procedure, first joining the country with the user, then the result of the join with the post. In the properties we can see the details of each join.

The result is for each country all the user ids and for each user id the post ids that the user has written.

Then, in the Aggregation operator, we create again two new counter, counting the number of posts and the number of users for each country.

CALC_USER

Lastly, we want to create a view that holds all the important information about a user. In addition, we want to know for each user how many posts and comments he/she has written.

Again following a similar procedure, at the beginning we join the USER with the COUNTRY, to replace the countryID with actual name of the country.

Then, we join the result with the POST table to get the postIDs of each user’s post.

After we have joined the JOIN_1 with the POST, we join the resulted JOIN_2 with the COMMENT table to get the commentIDs of each user’s comments.

Lastly, we create the two columns (Counters) that will hold the number of posts(PostsNo) and comments(CommentsNo) of each user. The properties tab shows the details.

Having done all the previous procedures, we have completed our data model and we are ready to expose it to our ui application. The data will be exposed with OData protocol and more specifically via a number of OData services in the next post.

Hope you enjoyed!!

Introduction

Modeler, including attribute view, Analytic view and Calculation view, plays a very important part in SAP HANA. Develop with modelers can not only delivers business knowledge to the developers better but also accelerates application’s performance.

Many readers should have understanding in modeler’s concept and also know how to create modelers with Administrator user to preview and analysis data. While in real develop environment, this kind of user has too much privilege and is strictly limited to use. Now the question is which privileges should be granted to the normal user for modeler development.

In the following part, I’ll show how to realize it step by step.

Create User

First, we create a new user called “REPOUSER”. SAP HANA will grant PUBLIC privilege to every new creating user. Connecting the user via SAP HANA Studio, it shows the initial content. SAP HANA assigns a homonymic schema to this user and the user has all the privilege to access and modify the schema.

Grant privilege to package

Create a new package “repo” with “SYSTEM” user. When you open content folder first, it warns that “execute on repository_rest” privilege is missing.

Execute following sql command with “SYSTEM” user:

GRANTEXECUTEON REPOSITORY_REST to REPOUSER;

Now the folder is opened but the content is still invisible, execute flowing commands with “SYSTEM” user then:

GRANT REPO.READ ON _SYS_REPO."repo"TO REPOUSER;

GRANT REPO.EDIT_NATIVE_OBJECTS ON _SYS_REPO."repo"TO REPOUSER;

GRANT REPO.ACTIVATE_NATIVE_OBJECTS ON _SYS_REPO."repo"TO REPOUSER;

GRANT REPO.MAINTAIN_NATIVE_PACKAGES ON _SYS_REPO."repo"TO REPOUSER;

Now, “REPOUSER” user could access “repo” package and creating/editing modelers under it.

Creating modelers

Here I take an analytic view for example. First, we create an analytic named “SALES_BY_REGION”, since the used tables consist in schema “FURNITURE”, user “REPOUSER” should have the select privilege on this schema. Execute the sql command with SYSTEM user.

GRANTSELECTONSCHEMA"FURNITURE"TO REPOUSER;

After the modeler is created, it needs validation and activation. In SAP HANA, all the existing modelers are managed by user “_SYS_REPO”, so the user needs select privilege on all used tables in modelers. If error occurs during validation and activation, we should grant select privilege on related schema to user “_SYS_REPO” with this command:

GRANTSELECTONSCHEMA FURNITURE TO _SYS_REPO WITHGRANTOPTION;

Now, the modeler has been successfully validated and activated, data could be previewed. SAP HANA provides two ways in data preview: executing sql and “Data Preview” in SAP HANA Studio.

For the first one, all existing modelers leave in schema “_SYS_BIC” by “package_name/view_name” name format. Give select privilege on this schema to “_SYS_REPO”, we could query data of “SALES_BY_REGION” directly.

GRANTSELECTONSCHEMA _SYS_BIC TO REPOUSER;

If you would preview data in SAP HANA Studio, some additional privilege should be granted. Just execute these two commands with “SYSTEM” user and then we could preview data in Studio.

GRANTSELECTONSCHEMA _SYS_BI TO REPOUSER;

CALL GRANT_ACTIVATED_ANALYTICAL_PRIVILEGE('_SYS_BI_CP_ALL','REPOUSER')

Introduction

SAP HANA Lifecycle Management, HLM for short, is a part of SAP HANA Platform. It provides flexible customization for SAP HANA, including administrative, upgrade and many other features for an easier and rapid maintenance.

Since HLM is not a default option for SAP HANA installation, we need some additional work to install and configure it. This document will show examples to introduce how to install, configure and use HLM.

Prerequisite

SAPHANALM07_0-10012745.SAR
SAPHOSTAGENT155_155-20005731.SAR

These two installations could be downloaded from SMP (Service Market Place).

Installation

1. Unzip SAR files on the server

sapcar –xvf SAPHANALM07_0-10012745.SAR

sapcar –xvf SAPHOSTAGENT155_155-20005731.SAR

After the extraction, the directory shows like this.

2. Install hostagent

Enter the SAPHOSTAGENT directory to start installation.

After the installation, a hostctrl directory will be generated in /usr/sap .

Check whether the hostagent service has been started.

Note: If the version of SAP HANA is SP07 or above, the hostagent is a default option when SAP HANA is installed. We list the installation steps of hostagent for the reference if the SAP HANA is of old version.

3. Install HLM

Enter the SAPHANALM directory to start installation, here we choose instance H70.

After that, HLM directory is created in home directory of the instance: /hana/shared/H70.

Use HLM

After the above steps, HLM is accessible now. Three ways is provided to access HLM:

Use SAP HANA Studio
Use web browser
Use command line

Here we’ll introduce using SAP HANA Studio. Please make sure your Internet Explorer version is 9.0 or higher.

First, choose the instance, and log on with “Lifecycle Management” option in right-click menu.

Below picture shows the main interface of HLM, which includes many features. We have “Configuration of SLT and SMD” and “Rename of SAP HANA system for SAP HANA System” in “Integrate SAP HANA System in Landscape” part; “Update of HLM, SAP HANA server, client and studio” in “Update SAP HANA System” part; “Add/Delete Additional Host” in “Administrate SAP HANA System”, “Mange Additional SAP HANA Components” such as AFL, SDA and so on.

There’re two ways of updating SAP HANA system: Apply Support Package Stack and Apply Single Support Package. The difference between them lays on that the former one is an overall update for every available components and the later one can update one component alone. Besides, the update could be manual or automatic. Automatic update takes action by downloading and installing online while for the manual one, installation packages are firstly downloaded and installed later.

A SMP account is needed since installations must be downloaded from SMP in automatic update.

Note: Entry the Preference option in SAP HANA Studio and search with “marketplace”. Fill in your SMP account and proxy information and then click “save”. Then you needn’t fill in your SMP once updating automatically.

After an automatic detection, all the available components will be listed with its version No, so you could update to any point. Now we take SAP HANA Client for example.

For the manually upgrade, you need download SAR packages from SMP first and put them to the HANA server. The HLM will detect the available updates and then you can start the update according to the automatic one.

This blog mainly introduce the installation and use of HLM, which makes SAP HANA much convenient in administration and use.

Introduction

Modeler is often used in SAP HANA development, usually dozens even hundreds of modeler views are created in one project.

Just imagine in a very large project, developers take their own programming works in different environment and finally in the integration phrase, all these developed content should be deployed to the real production system. We need a way in which these modeler views could be easily exported and imported through instances. SAP HANA does provide this feature that realizes rapid migration of modeler views.

This blog describes how to import and export modeler views in SAP HANA.

Creating view

This picture shows some modeler views we created, all these views are based on the tables in schema “FURNITURE”.

Now, all the views have created completely and could be accessed directly.

Export views

As the left picture shows, choose File-> Export option, and then choose the Developer Mode of SAP HANA Content in the dialog box.

Select the instance and modeler views to be exported, you could choose all the modeler views or certain view of this package by given the location.

After that, it generates a tree structured directory in the target folder. The format is “instance SID-> package name-> analyticviews/attributeviews/calculationviews/package_name.properties”. So far, we’ve successfully finish exporting.

Import views

Before exporting the views, we must ensure all the related tables have been migrated to the target instance and then we could grant related privilege to the user.

grantselectonschema furniture to _sys_repo withgrantoption;

As the following pictures show, the procedure of importing is very similar with exporting.

After we choose the folder location, a files tree will be displayed and we could choose the views to be imported.

The imported view will display as a grey icon, which means it is not active. Right click the view and then choose validate and active, then it will be available.

Introduction

Packages are often used in SAP HANA application development for modeler views and XS projects. Many packages are used and different objects are created in each package in development. These packages should be migrated to production server for deployment finally so the migration of packages remains a problem. In the previous blog, we have provide a solution for migration on modeler views, while it is not suitable for the SAP HANA native application

SAP HANA uses “Delivery Unit” to realize migration on different packages from different instance. A delivery unit could be regarded as a collection of several packages for centralization of management. This blog will mainly discuss how to create and use Delivery Unit.

Access Methods

SAP HANA provides a centralized management platform called HANA Application Lifecycle Management for Delivery Unit. Before visiting it, the user should have the “sap.hana.xs.lm.roles: :Administrator” privilege. You could visit it in these two ways.

SAP HANA Studio

That is very similar with visiting LifeCycle Management: right click on Transport Management.

2. Access it with URL

http://<host_address>:80<instance_number>/sap/hana/xs/lm/index.html

Platform Features

The following picture shows main interface of HANA Application Lifecycle Management. The platform lists many features for Delivery Unit, such as export/import/create delivery unit and search package.

A Vendor ID should be created for the initially used of HANA Application Lifecycle Management. Choose change vendor under Administration if you want to change to a new vendorID.

Create a DU

For SAP HANA SP07, system creates several delivery units after installation of SAP HANA, which display under PRODUCTS->Delivery Units. You can add, delete or modify delivery units from it. It is very simple to create a delivery unit.

Fill in the name, version and simple description of DU.

2. Assign packages to this DU.

Export a DU

The created delivery units and owned packages are listed and you could export the DU into a .tgz file through Export Delivery Unit File.

Import a DU

Import the delivery unit and all the related packages are created in SAP HANA.

Manage Studio

SAP HANA Studio also provides administration for DU. Switch to modeler perspective and chooses Delivery Unit under Setup.

Create DU and assign packages to it.

2. Export and import DU with selecting Delivery Unit in SAP HANA Content.

Introduction

Currently, most enterprise database productions are designed leveraging OLTP or OLAP because of limitation of its architecture. Since they could not fulfill the both performance of OLTP and OLAP, enterprises would like to use systems of different database to satisfy their business needs. Data exists in different database so that the transaction system and analysis system are separated, which makes data management rather difficult.

SAP HANA, as next generation In-memory database, was initially designed for a perfect combination of OLTP and OLAP by SAP.

SAP HANA introduces the concept of “Column” table. We know records are sequential stored in data block of traditional row table; while in column table, each column is stored sequentially in data block. This architecture makes aggregation operation of columns such as sum, count very fast in column table. In OLTP part, column table will be affected due to adjustment data structure in data block when the records are changed.

To solve the OLTP performance problems of “Column” table, a special data area called Delta area is used for every column table in SAP HANA. Every column table has two areas for data storing: Main area and Delta area. Data in Delta area is stored by row and the updating data of OLTP transactions will be put into this area first. When the data meets a certain condition, a merge operation will take place to move data in delta are to main area together and the row data will convert to column format. That is what SAP HANA designs for both OLTP and OLAP in one database.

In the perspective of application, we can have many other ways to improve the performance of OLTP on column table.

Method

Disable Auto Delta Merge

“Column” table makes a “Merge” operation with Main area when data existing in Delta area grows enough large. When performing it, Delta area is locked, all OLTP transactions will be blocked and data will not be update during that period. When “Column” table is created, Automatic Delta Merge is enabled, which makes “Merge” unpredictable. If frequent transactions would occurs in that period, the performance shrinks significantly.

The recommendation is disabling auto Delta Merge on the related tables via following SQL command.

altertable [table_name] disable automerge;

Also, this feature could be reset to enable, after that database will monitor the Delta area to check if the condition satisfies merge operation.

altertable [table_name] enable automerge;

Since data in Delta area is stored in row-format while it is column-format in Main area, the performance is not good when executing operations such as table-join. Two segments of data in different store format will be processed and the calculator always switches between column engine and row engine, which results in bad performance. We recommend doing a manual merge when data in Delta area is large enough or after executing sequences of transactions.

merge delta of [table_name];

Multithreads

In most time, an OLTP transaction is a light operation in database, which spends only milliseconds, so the performance increases little in single thread. In database, a record locks when inserting, updating or deleting the record of one table. These records do not affect each other so the transaction won’t be blocked in multithread. Besides, the waiting time of CPU and response time of database shrinks and the performance becomes better.

Generally speaking, the performance improves when threads increases, but due to the limitation of server and databases self, the performance has only slightly changes after a certain status and remains unchanged finally.

Creating connections through JDBC or ODBC driver, or opening multiple consoles with hdbsql of SAP HANA could be used to make multithreads.

For a single-row insertion, we could directly set up multiple database connections for paralleling. Another case is that we need a bulk insertion like “insert into [table1] select * from [table2]” with large data in table2. Since executing this SQL will use only one thread, we should divide them into several parts by columns. For example, a table contains an ID column and could be split according to column value distributions, the origin statement could be replaced with” insert into [table1] select * from [table2] where ID like ‘%[0-9]’”. The series of statements could run parallel using multithreads and the performance will be improved.

Table Partition

Partitions only apply to Column table in SAP HANA. It has several advantages:

1. Data distributes in different partitions and multi-threads make effect, each thread processes data for paralleling.

2. Table could be split based on business scenario. For example, partitions on years of history sales data could be based on months, so that database will only search one partition when analyzing sales data on certain month to reduce the data size.

3. The data for OLTP distributes in different partition to prevent frequently writing on one data area.

SAP HANA provides three partition types: Hash partition, Range partition and RoundRobin partition. Hash and Range are designed for one column or union columns and the difference between them is Hash partition is based on the hash value of columns and Range partition defines value range of columns. Generally speaking, Range partition is always created for data typed column and Hash partition is for ID column. RoundRobin partition divides the records into partitions randomly so this is most uniform partition type.

One best practice is using partitions and multithreads in combination. Not only OLTP transactions apply for this method, but operations like delta merge could also benefit from it. We can use “merge delta of [table_name] part [partid]” to merge data for each partition.

Meanwhile, partitions take cost for database. The data volume and log volume increase size for storing additional information of partitions.

JDBC Tuning

SAP HANA provides many data access drivers, such as ODBC, JDBC and MDX for extent application. MDX is mainly for molder views in SAPHANA and JDBC, ODBC is more often used in OLTP. For JDBC driver, some tips are useful in improving OLTP performance.

Closing Auto Commit

Compose your database operations in one transaction and commit manually.

Commit More Records Once

Every commit has its cost. Try to reduce the commit times if condition permits.

Using Batch Commit

In JDBC, using batch commit could reduce the interaction times when transferring data between database and programs.

conn = DriverManager.getConnection (connection, username, password);

stmt = conn.createStatement();

stmt.addBatch(sql);
stmt.executeBatch();

I've just finished the final touches for the latest update, SAP HANA SPS8 revision 80 which is now available! We've kept most every change we made in the last edition and we added a handful of more changes.

From a user perspective you have the "SYSTEM" and "CODEJAMMER" user like before but since we've already pre-loaded SAP River into the server you also have a new user "RDLCODER" for trying out the SAP River Developer Language.

You'll be able to find this new version in the SAP Cloud Appliance Library for both the Aamazon EC2 landscape as well as the Microsoft Azure platform.

We've completely revamped the "landing page" of the server and remove the required login to see that page.

Now once you activate and then create your instance in the CAL service you'll be able to choose the "connect" link and load this page automatically which will then provide you with the user(s) information, server hostname and instance number of your new developer edition as well as lots of preloaded content, sample applications, links to the WebIDE, Admin center, desktop tools, developer license information OpenSAP and Native Development Workshops (and solutions) and much much more!

You can even create your own SAP River development user with the SQL scripts listed there!

We are striving to bring you more of what you want/need to these development editions so please keep the feedback coming!

To get your own system please follow this link.

Databases protection

As we know, SAP HANA is a kind of in-memory database. How SAP HANA ensure that data consistency and correctness when system crash?

To answer this question, we should know that SAP HANA store data not only in memory but also in disk. And it is refer to a concept named database protection. It is means to prevent database from all kinds of interference and destruction, ensure the data save and reliable and recover rapidly from crash. So recovery technologies are important measures of databases protection. Transaction is a sequence of operation that can’t be split. For an example, bank transfer: account A transfer 100 dollar to account B. It is include two update operations:

A=A-100
B=B+100

These 2 operations cannot be split, they should either do both or never do at all. There are three kinds of state of transaction in log:

<Start T> means transaction T has been started.
<Commit T> means transaction T has been finished and all modifications have been written to database.
<Abort T>means transaction T has been stop and all modifications have been undone.

Databases failure includes three types:

Transaction failure is an internal failure of a single transaction and it will not affect other transaction.
Media failure is hardware failure such as damage of disk, no space in disk, etc.
System failure is soft failure such as outage, machine crash, etc. This kind of failure may result in memory data loss and affect all running transactions.

The goal of recovery of system failure is to recover system to state before failure happens.

Validation of recovery of SAP HANA system failure

The concepts mentioned above are applicable to SAP HANA database. So we can test it to validate recovery of SAP HANA system failure.

At first, modify the interval of savepoint. In period of savepoint, SAP HANA system will persistent memory page to disk. It is 300s by default. We change it to 3000s.

Open two SQL consoles and change the “auto commit” property to off.

Run console 1 sql command:

insert into "LOGTEST"."TEST" values(1,'谢谢大家关注HANAGeek，欢迎大家一起来学习SAP HANA知识，分享SAP HANA知识。');

Run console 2 sql command:

insert into "LOGTEST"."TEST" values(2,'谢谢大家关注HANAGeek，欢迎大家一起来学习SAP HANA知识，分享SAP HANA知识。');

commit;

Power off the machine of SAP HANA system. Then restart SAP HANA system and check the content of this table.

We can regard console 1 and console 2 as transaction 1 and transaction 2. Because T1 executed one modification but committed it, SAP HANA rolled back to situation when T1 did not begin. Because T2 has committed before outage, SAP HANA recovered the system to the situation before outage even if system did not do savepoint operation.

Strategies of system failure

If the system failure is media failure, we need recover from copies of data at first. Then system will recover system using logs.

Transaction log

Transaction log is used to manage modifications in database system. It records all modification’s details. We do not need to persist all data when transaction is committed. Transaction log persistence is enough. When system crash, system’s last consistent state can be restore by replaying transaction logs. Hence logs must be recorded as chronological order.

There are three types transaction log: undo log, redo log, undo/redo log. There are only two kinds of transaction logs in SAP HANA: undo log, redo log.

There are three kinds of records in log files:

<Start T> means transaction begin.
<Commit, T>/ <Abort T> means transaction end.
Update detail：

Identification of transaction.
Operation object.
Value before update(undo log)/Value after update(redo log)/Value before update and value after update(undo/redo log).

Redo log

An important feature of redo log is that the log records must be written to disk before update data in to database system. The format of redo log record is <T,x,v> which T for identification of transaction, x for identification of update object and v for value after update.

As shown below, operations of transaction T1: A=A-100，B=B+100. Left part of the picture is the steps of T1. Middle part of the picture is the content of redo log. Right part of the picture is initial value of A and B.

The steps of the recovery of redo log:

Start to scan redo log from head and find all truncations which have the identifier <Commit, T>. Put them in a truncation list L.
Scan records <T, x, v>. If T belong to L, then

Write(X ,v) (assign new value v to X)
Output(X) (write X to database system)

For each T not belong to L, do write <Abort, T> to log file.

We do not need to concern about transactions without <Commit, T> because they definitely did not write data to database system. We need to redo transactions which have <Commit, T> because they may have not written to database system.

The writing of redo log is synchronous with the transaction process. When SAP HANA system restart after crash, it will process redo log to recover system. To improve the efficiency of log processing, SAP HANA system will do save-point (check point). In the period of save point, system persist data which did not persist since last save-point. Hence, only the redo log since last save-point needs to be processed. The redo log before last save-point can be removed.

Undo log

SAP HANA not only persist the update data of transaction which has committed, but also may persist data which has not committed. So we need undo log which has been persisted in disk. The format of undo log record is <T, x, v> whose v represents the value before update.

As shown below, the operations of transaction T1: A=A-100, B=B+100. Left part of the picture is the steps of T1. Middle part of the picture is the content of undo log.

The process of recovery:

Start to scan redo log from head and find all truncations which don’t have the identifier <Commit, T> or <Abort, T>. Put them in a truncation list L.
Scan records <T, x, v>. If T belong to L, then

• Write(X ,v) (assign new value v to X)

• Output(X) (write X to database system)

For each T not belong to L, do write <Abort, T> to log file.

In SAP HANA system, undo log do persistence when save-point which is different with redo log. Besides, undo log is written to the data area but not to the log area. The reason is that the system can be restore to the state of last save-point since restart from crash. If transactions after last save-point have committed, system can restore it using by redo log. If they have not committed, we do not need undo log after last save point to restore. So undo log after last save-point is useless. The advantages of this mechanism are:

Fewer log records need to be persisted when transaction processing.
It will slow the increase of disk.
Database can be restored to the state of consistency from data area.

Save-point

When data base crashed, we need to scan all undo list and redo list to restore it. There are problems of this method:

It will take a long time to scan the log.
It will make the redo list too long, so take a long time to restore.

So SAP HANA chooses do save-point regularly:

Do not accept new transactions.
Write undo records to data area.
Write modified memory pages into disk.
Write identifier of save-point into redo log.

The process of save point is shown as below.

SAP HANA makes use of persistence device to recover from system crash. Last article mentioned that undo log, redo log and save-point can be used to prevent from crash such as power-off automatically. But if the persistence devices (such as disk) break down, that mechanism cannot be reliable. To prevent hardware from damage, database administrator needs to back up the database.

The difference between recovery from restart and recovery from back up is that system recovery from back-up by using external devices. There are 2 parts of back up database: log back-up and data back-up. They are independent with each other. The process of backup has negligible influence to performance of SAP HANA database. There are 5 points you need to remember.

Back up of SAP HANA need authorizations as below.

Authorization name	Comment
BACKUP ADMIN	Authorization of back up execution
CATALOG READ	Authorization of collecting information

SAP HANA won’t process log backup until first data backup finished.
Backup and recovery always apply to the whole database. It is not possible to back up and recover individual.
SAP HANA database can be backed up to the file system or using third-part backup tools.
Shared storage is strongly recommended, as it makes the data area available to all the nodes in a database.

Log backup

The system can perform regular automatic backups of the redo logs. During a log backup, the payload of the log segments is copied from the log area to service-specific log backups or to a third-party backup server.

A log segment is backed up in the following situations:

The log segment is full.
The log segment is closed after exceeding the configured time threshold.
The database is started.

System administrator can set the mode of backup as below.

There are two kinds of log mode:

Normal(default)

Log segments are automatically backed up if parameter enable_auto_log_backup is enabled. Log mode normal is recommended to provide support for point-in-time recovery. Automatic log backups can prevent log-full situations from arising.

Overwrite

Log segments are freed by savepoints and no log backups are performed. This can be useful, for example, for test installations that do not need to be backed up or recovered.

System user also can set the interval of log backup. The default interval is 900s.If the log area damaged, the data updated in this interval will not recovered. If it is set to 0, system backup the log only when log segment is full or system restart.

Data backup

A data area backup includes all the database content: transaction data and administrative data (for example, users, roles, models, and views). Only the actual data is backed up; unused space in the database is not backed up. When a data area backup is performed, the data area is backed up for each of the SAP HANA services running. If SAP HANA is running on multiple hosts, the data backup includes all the service-specific backup parts for all the hosts.

By default, data backups are written to the following destination: $DIR_INSTANCE/backup/data. You can specify a different destination when you perform the backup. Alternatively, you can change the default backup destination using the Backup editor in SAP HANA studio.

There are 3 ways to perform a data backup:

SAP HANA studio
SQL commands
Batch mode

Performing a Data Backup Using SAP HANA Studio

Right click “System”, choose “Back Up…”, choose the type of backup. You can choose other type if you installed other third-party tools.
Specify the backup destination and the backup prefix. The default destination is the path specified on the Configuration tab of the Backup editor.
Choose Next. A summary of the backup settings is displayed.
If all the settings are correct, choose Finish.
The backup starts.

Performing a Data Backup Using SQL Commands

You can enter SQL commands either by using the SQL console in SAP HANA studio, or by using the command line program hdbsql.

SQL command: BACKUP DATA USING FILE ('<path><prefix>')

Example:

BACKUP DATA USING FILE ('/backup/data/MONDAY/COMPLETE_DATA_BACKUP')

This would create the following files in the directory /backup/data/MONDAY:COMPLETE_DATA_BACKUP_databackup_0_1 (name server topology)COMPLETE_DATA_BACKUP_databackup_1_1 (name server)COMPLETE_DATA_BACKUP_databackup_2_1 (for example, index server)...

Performing a Data Backup in Batch Mode

You can perform data backups in batch mode at operating system level using the command line tool SAP HANA HDBSQL. HDBSQL enables you to trigger backups through crontab. It is recommended that you set up a batch user for this purpose and that you authorize this user to perform backups with the system privilege BACKUP OPERATOR.

Procedure:

Install the client software by executing the following command:

hdbinst –a client (default location: /usr/sap/hdbclient).

The client software enables access to hdbuserstore.

Create a user key by executing the following command:

/usr/sap/hdbclient/hdbuserstore set <KEY> <host>:3<instance id>15 <user> <password>

In crontab, execute the following command at the desired time:/usr/sap/hdbclient/hdbsql –U<KEY> "BACKUP DATA USING FILE ('<path><prefix>')"

It may be necessary to recover the SAP HANA database in the following situations:

A disk in the data area is unusable.
A disk in the log area is unusable.
As a consequence of a logical error, the database needs to be reset to its state at a particular point in time.
You want to create a copy of the database.

Data Area is Unusable

If the data area is unusable, and all the data changes after the last complete data backup are still available in the log backups and log area, the data from committed transactions that was in memory at the time of failure can be recovered.

No committed data is lost. For recovery, the data backups, the log backups, and the log area are used. When the data backup has been successfully recovered, the log entries from the log backups and the log area are automatically replayed.

It is also possible to recover the database using an older data backup and log backups. All relevant log backups made after the data backup are needed for the recovery.

Log Area is Unusable

If the log area is unusable, it is only possible to replay the log backups. As a consequence, any changes that were made after the most recent log backup will be lost. In addition, all the transactions that were open during the log backup will be rolled back.

For recovery, the data backups and the log backups are used. When the data backup has been successfully recovered, the log entries from the log backups are automatically replayed. In the Recovery Wizard, you must specify the option Initialize log area to prevent the recovery of entries from the unusable log area. This option initializes the log area, and the old content of log area is lost.

Logical Error – Point in Time Recovery

To reset the database to a particular point in time, you need a data backup from before the point in time to recover to, the subsequent log backups, and the log area.

All changes made after the recovery time will be lost. If you need to perform this recovery, consider recovering the database to a different system.

Recovering the SAP HANA database

To recover a SAP HANA database:

From SAP HANA studio, open the context menu for the database to be recovered. Choose Recovery...

You are requested to confirm that the system can be shut down for the recovery.

2. Confirm and enter the <SID>adm user and password.

Choose OK. The database is shut down.

3. Specify the recovery type. The following recovery types are available:

Option	Description
Recover the database to its most recent state	This option recovers the database to as close as possible to the current time. This recovery option uses the following data: The most recent data backup Log backups made since the most recent data backup Log area
Recover the database to the following point in time	This recovery option uses the following data: The last data backup available before the specified point in time Log backups made since the backup to be used Log area
Recover the database to a specific data backup	This recovery option uses the following data: The specified data backup
Recover the database to the following log position	This recovery type is an advanced option that can be used in exceptional cases if a previous recovery failed. This recovery option uses the following data: The most recent data backup available before the specified log position Log backups made since the data backup to be used Log area

4. Choose Next.

The SAP HANA database uses the backup catalog to identify the location of the backups. You do not need to specify whether File or Backint was used for the backup.

5. If the log backup is needed, specify the location of log backups.

6. Choose Next.

7. A summary of the selected recovery options is displayed. To make changes to the recovery configuration, choose Back.

8. If the settings are correct, choose Finish. The recovery is then started.

Copying a Database Using Backup and Recovery

You can create a homogenous copy of a database by recovering an existing source database backup to a different, but compatible target database. The source database backup consists of data backup files and the log backup files.

A homogenous database copy is a quick way to set up cloned systems for training, testing, and development. For this reason, copying a database can significantly reduce total cost of delivery (TCD).

The following prerequisites must be met:

A database backup of the source database must be available.
The version of the target database system is the same or higher than the source database.
The target system must be configured with sufficient disk space and memory.
Ensure that the configuration of the target system is identical to the configuration of the source system.
The number of hosts and the number and types of services (for example, index server) on each host must be identical for both system landscapes.
Customer-specific configuration changes can be manually applied to the target system.

To create a homogenous copy of a database:

Shut down the target database.
Move or delete available data and log backup files from the target database.
Copy the data backup files and, optionally, the log backup files from the source database to the corresponding directories in the target database.
Start the recovery of the target database system.

In SAP HANA studio, select the target database system and choose Recover… from the context menu.

Select the recovery type Recover the database to its most recent state.
Choose Next.
Choose Add to specify the location of log backup files for the target database.
Choose Next.
Choose location of data backup files for the target database.
Select the option Initialize log area.
Choose Next.
Confirm the warning.
Select the option Install new license key and specify the license key file.
Choose Next.
If the recovery options are correct, choose Finish. The recovery is then started.

When the recovery is successfully completed, the database is started.

Copying a Database Using Snapshot

Create a snapshot on source SAP HANA database.

BACKUP DATA CREATE SNAPSHOT (Or using HANA Studio)

Shut down the target SAP HANA database. Copy the data area from the source database to the target database.
Confirm or abandon the snapshot:

SNAPSHOT BACKUP_ID <backup_id> SUCCESSFUL <external_id> | UNSUCCESSFUL [<string>]SUCCESSFUL <external_id>

You can use following statement to query the snapshots.

SELECT * FROM "SYS"."M_BACKUP_CATALOG" WHERE ENTRY_TYPE_NAME = 'data snapshot'

Delete the backup files in target database.
Execute the following command in target databse:

hdbnsutil -useSnapshot
hdbnsutil –convertTopology

Start the target database.

This article is according to CSV files importing to SAP HANA acceleration. And find out which factors and methods can affect the speed of importing.

Hardware factors

The speed limit of SAP HANA import is limited by hardware configuration. No matter what we have done in software level, the hardware is the most important factor. There are 3 kind of hardware factor which affect the speed limit of importing.

Disk type

Since SAP HANA importing is always be with transaction log writing and delta log writing, the disk write speed is significant. SSD as log are and data area of SAP HANA is recommended.

Number of CPU cores

SAP HANA is able to make full use of multiple cores to import data. So number of CPU cores decides the speed of importing.

Size of memory

SAP HANA is an in-memory database which’s data is stay in memory. If the size of memory is not big enough, data importing will lead to lack of memory. Then HANA will unload other data which is not used recently. It will reduce the speed of importing data. Besides, in the period of reading csv files, the size of cache will increase rapidly. When size of cache is too large, operation system will release some space of cache. This process will affect importing data. Through some experiments, I recommend that the size of free memory is nearly double the size of csv files.

Importing files factors

According to CSV files, there are 3 factors which can affect speed of importing data:

The correct format of importing files

If csv files contain data which not follow the format of table, all batch contains this data will not be imported into database. This will reduce the speed of importing data.

Size of csv file

The size of csv file needs to be big enough so that SAP HANA can use multiple threads technology to import data.

SAP HANA factors

In SAP HANA, data are stored not only in memory but also in disk and log files. To get the speed limit of importing data, we need abandon some configurations that is for security reasons.

Partition

The partition of table can contribute to improve the parallel degree. Through my experiments, hash partition is the best method of partition. And the numeric field is the best type of partition value.

Auto merge

By default, the data imported into table are stored in delta are. Then the delta area is merged into main area automatically. And this process will not import data into database. So we can disable auto merge to ensure the process of importing will not do merge operation.

Delta log

To column store, SAP HANA will write delta log into disk when importing data. This process will reduce the speed of importing data. Our aim of importing is to put the data in to memory, and that process is to make sure the imported data will lose. So we can disable delta log to improve the speed of importing.

Number of threads

To make full use of multiple cores, we can set the number of threads when importing data. Through experiment, number of threads= number of CPU cores is the best setting.

Number of tuples in a batch

SAP HANA imports data in batches. We can set the number of tuples in a batch.

Summary

According to two different hardware configurations, we get different results of importing speed.

Hardware configuration

Importing speed

CPU: 16 cores

Memory: 256GB

Disk type: SSD

100M/s

CPU: 80 cores

Memory: 1TB

Disk type: SSD

308.8M/s

The flow chart of accelerate the speed of importing.

Introduction

Smoothing algorithms are basically used in time series data either to produce smoothed data for presenting the trend of the data or to forecast it for what if analysis. Time series data are sequential observations from the history of data with respect to a series of date, time or time stamps. Moving average analysis is also the same kind of time series analysis where the past observations are equally weighted. But, for certain analysis (price movement or share market stock movement), the recent past data has the most weight. Exponential smoothing assigns exponentially decreasing weights over time.

Pre requisites:

No missing/null data
Only numeric data can be smoothed

Algorithm

Let S_tbe the smoothed value for the t’th time period and X_n(1, 2, 3...n) be the time series; mathematically,

S1 = x0 (you cannot get the smoothed value for the first entry in the time series).

St = αx_t−1 + (1−a) S_t−1 where α is the smoothing factor (in %- Mathematically 0 < α < 1). When Alpha tends to 0, the weight given to the history value reduced and tends to 0.

Let us consider an example of price trend of a particular material over a series of date range.

DAY	PRICE
2-JUNE-2014	100
3-JUNE-2014	95
4-JUNE-2014	110
5-JUNE-2014	110
6-JUNE-2014	98
7-JUNE-2014	Holiday
8-JUNE-2014	Holiday
9-JUNE-2014	105
10-JUNE-2014	118

There is no data available for 7^th June and 8^th June. But, as per the algorithm null or missing series are not allowed. In this case, if the previous data is not available corresponding smoothed value will be taken as X_tfor those entries.

Now, let us convert the input table into time series and manually apply the smoothing algorithm, with smoothing factor as 50 %( 0.5). Consider the first DAY as the base date.

DAY	Time	PRICE	Smoothed Value ( S_t)
2-JUNE-2014	0	100
3-JUNE-2014	1	95	100
4-JUNE-2014	2	110	97.5
5-JUNE-2014	3	110	103.75
6-JUNE-2014	4	98	106.875
7-JUNE-2014	5	Holiday	102.4375
8-JUNE-2014	6	Holiday	102.4375
9-JUNE-2014	7	105	102.4375
10-JUNE-2014	8	118	103.71875

Calculation

S(0) will be null
S(1) will be , 0.5 * 100(which is X_t-1) + 0.5 * 100 = 100
S(2) will be , 0.5 * 95 + 0.5 * 100 = 97.5
S(3) will be , 0.5 * 110 + 0.5 * 97.5 = 103.75
S(4) will be ,05.*110 + 0.5 * 103.75 = 106.875
S(5) will be , 0.5 * 98 + 0.5 * 106.875 = 102.4375
S(6) will be , 0.5 * 102.4375(previous time series values in not available, hence the smoothed value is considered ) + 0.5 * 102.4375 = 102.4375
The same process continues for the upcoming entries and can be forecast up to n number of time series entries.

Graph simulated with smoothed data in Microsoft Excel

PAL Implementation

(Source code from SAP HANA PAL Document is re used to generate the below code snippet - Page 193 ).

CREATE SCHEMA PAL_TRY;
SET SCHEMA PAL_TRY;
--dropping existing procedures if any
CALL SYSTEM.AFL_WRAPPER_ERASER('SINGLESMOOTH_TEST_PROC');
--Creating procedures
CALL SYSTEM.AFL_WRAPPER_GENERATOR('SINGLESMOOTH_TEST_PROC','AFLPAL','SINGLESMOOTH',PAL_SINGLESMOOTH_PDATA_TBL);
CREATE LOCAL TEMPORARY COLUMN TABLE #PAL_CONTROL_TBL ("NAME" VARCHAR(100),
"INTARGS" INT, "DOUBLEARGS" DOUBLE, "STRINGARGS" VARCHAR(100));
--RAW_DATA_COL : column where the data is available
INSERT INTO #PAL_CONTROL_TBL VALUES ('RAW_DATA_COL',1,NULL,NULL);
--Alpha value 
INSERT INTO #PAL_CONTROL_TBL VALUES ('ALPHA', NULL,0.5,NULL);
--Forecast_num : Forecast next 100 values(Includes future values )
INSERT INTO #PAL_CONTROL_TBL VALUES ('FORECAST_NUM',100, NULL,NULL);
--STARTTIME : ID starts from 0
INSERT INTO #PAL_CONTROL_TBL VALUES ('STARTTIME',0, NULL,NULL);
CREATE COLUMN TABLE PAL_SINGLESMOOTH_DATA_TBL LIKE PAL_SINGLESMOOTH_DATA_T ;
--Loading the test Data
INSERT INTO PAL_SINGLESMOOTH_DATA_TBL VALUES (0,100.0);
INSERT INTO PAL_SINGLESMOOTH_DATA_TBL VALUES (1,95.0);
INSERT INTO PAL_SINGLESMOOTH_DATA_TBL VALUES (2,110.0);
INSERT INTO PAL_SINGLESMOOTH_DATA_TBL VALUES (3,110.5);
INSERT INTO PAL_SINGLESMOOTH_DATA_TBL VALUES (4,98.0);
INSERT INTO PAL_SINGLESMOOTH_DATA_TBL VALUES (7,105.0);
INSERT INTO PAL_SINGLESMOOTH_DATA_TBL VALUES (8,118.0);
CREATE COLUMN TABLE PAL_SINGLESMOOTH_RESULT_TBL LIKE PAL_SINGLESMOOTH_RESULT_T;
--Executing the procedure
CALL _SYS_AFL.SINGLESMOOTH_TEST_PROC(PAL_SINGLESMOOTH_DATA_TBL,"#PAL_CONTROL_TBL", PAL_SINGLESMOOTH_RESULT_TBL) WITH OVERVIEW;  SELECT * FROM PAL_SINGLESMOOTH_RESULT_TBL;

The output table PAL_SINGLESMOOTH_RESULT_TBL will contain the smoothed data which can be used for the analysis or presentation purposes.

Regards

Sreehari V Pillai