Quantcast
Channel: SCN : Blog List - SAP HANA Developer Center
Viewing all articles
Browse latest Browse all 676

Bread & Butter: 8 Tables with ‘All you can eat’ Data

$
0
0

Hi All,

 

I’m sorry but this isn’t going to be an exciting blog about some of HANA’s cool features.

 

It's a bit boring but the Bread & Butter of all databases is having tables with data, that can be queried and joined.

 

For those of you just starting or those that are more adventurous and want a large volume of data to test out performance tuning options,  then  TPC Benchmark™H (TPC-Hmight be for you.

http://www.tpc.org/tpch/

 

TPC-H provides 8 tables and a data generation tool that enables you to create as much data as you want.

 

Generate the test data, create the tables, import and you’re away.

 

The 8 tables are all designed to be joined. They simulate Sales Order transactions:

The tables include (using SAP style terminology):

  1.        Customers (Customer Master)
  2.        Supplier (Vendor Master)
  3.        Part (Material Master records)
  4.        Partsupp (Inventory Balances)
  5.        Nation (Country)
  6.        Region
  7.        Orders (Sales Order Header)
  8.        Lineitem (Sales Order line items)

 

NOTE: The tables are NOT in the same format as the similar SAP ECC tables but offer similar information as what you might find in ECC.

 

The Table Schema is as follows:

1.png

NOTE: SF is the Scaling factor used to generate the data. 

            For example a scaling factor of 1 will generate 6 Million records for the lineitem table, SF 2 will generate 12 Million, 10 will generate 60M,

           100 will generate 600M, etc.

 

As a quick summary you will need to :

  •      Log into Linux with the appropriate authorization
  •      Install a C/C++ compiler if you’ve not already
  •      Download/Extract the TPC-H files
  •      Modify the ‘makefile’  to be relevant for Linux
  •      Compile the data generation tool (DBGEN)
  •      Run DBGEN to create data files based on the scaling factor
  •      Create the tables and import them in HANA studio

 

If you’re still reading then I’ve not scared you off, so now for the technical bit.

 

Log into Linux with the appropriate authorization

Use Putty or equivalent tool to log into Linux box running HANA , at the OS level.

For those using a AWS HANA developer box refer to http://scn.sap.com/docs/DOC-28294

 

Install a C/C++ compiler if you’ve not already

sudo zypper install gcc gcc-c++ gcc-fortran

cd /usr/bin/

sudo ln -s gcc g

 

Download/Extract the TPC-H files

Create a new directory to store TPC-H files, ensure there is enough space on the mounted folder then download and extract the files.

On AWS I have choosen the following:

cd /sap

mkdir tpc-h

cd tpc-h

wget http://www.tpc.org/tpch/spec/tpch_2_15.0.zip

unzip tpch_2_15.0.zip

 

Modify the ‘makefile’  to be relevant for Linux

Use 'vi' or equivalent to modify the supplied 'makefile'  which is used to compile the data generator tool.

In the 'makefile' ensure the following parameters are set

DATABASE=SQLSERVER

MACHINE = LINUX

WORKLOAD = TPCH

NOTE: We use SQLSERVER above as it doesn't recognise HANA.

 

For an intro to  'vi' you could try  http://heather.cs.ucdavis.edu/~matloff/UnixAndC/Editors/ViIntro.html

 

At the command prompt:

cd tpch_2_15.0/dbgen

vi makefile.suite

 

Edit the file and save.

 

Compile the data generation tool (DBGEN)

Now we need to compile the generation tool, to generate the executable file 'dbgen'

make -f makefile.suite

 

The dbgen executable should now appear in the folder.

Check with.

ls -l

 

Run DBGEN to create data files based on the scaling factor

Now you are ready to create 'tbl'  files for each of the 8 tables.

The important parameter of the 'dbgen'  program is the scaling factor.

A scaling factor of 1, will generate 1Gb of uncompressed test data(e.g. 6 million rows in the 'lineitem table')

A scaling factor of 2, will generate 2Gb of uncompressed test data(e.g. 12 million rows in the 'lineitem table')

A scaling factor of 10, will generate 60Gb of uncompressed test data(e.g. 60 million rows in the 'lineitem table')

A scaling factor of 100, will generate 600Gb of uncompressed test data(e.g. 600 million rows in the 'lineitem table')

etc.

 

Excute the tool with one of following commands withe different scaling factors (or your own):

#Scaling factor 1

./dbgen -vf -s 1

 

#Scaling factor 2

./dbgen -vf -s 2

 

#Scaling factor 10

./dbgen -vf -s 10

 

 

[Note: I suggest your try scaling factor 1 first, till you are comfortable with the load process]

 

Create the tables and import them in HANA studio

createschema"TPCH";

 

setschema"TPCH";

 

droptable nation;

droptable region;

droptable supplier;

droptable part;

droptable partsupp;

droptable customer;

droptable orders;

droptable lineitem;

 

create  columntable nation (

n_nationkey  decimal(3,0) notnull,

n_name       char(25) notnull,

n_regionkey  decimal(2,0) notnull,

n_comment    varchar(152)

);

 

createcolumntable region (

r_regionkey  decimal(2,0) notnull,

r_name       char(25) notnull,

r_comment    varchar(152)

);

 

createcolumntable part (

p_partkey     decimal(10,0) notnull,

p_name        varchar(55) notnull,

p_mfgr        char(25) notnull,

p_brand       char(10) notnull,

p_type        varchar(25) notnull,

p_size        decimal(2,0) notnull,

p_container   char(10) notnull,

p_retailprice decimal(6,2) notnull,

p_comment     varchar(23) notnull

);

 

createcolumntable supplier (

s_suppkey     decimal(8,0) notnull,

s_name        char(25) notnull,

s_address     varchar(40) notnull,

s_nationkey   decimal(3,0) notnull,

s_phone       char(15) notnull,

s_acctbal     decimal(7,2) notnull,

s_comment     varchar(101) notnull

);

 

createcolumn  table partsupp (

ps_partkey     decimal(10,0) notnull,

ps_suppkey     decimal(8,0) notnull,

ps_availqty    decimal(5,0) notnull,

ps_supplycost  decimal(6,2) notnull,

ps_comment     varchar(199) notnull

);

 

createcolumntable customer (

c_custkey     decimal(9,0) notnull,

c_name        varchar(25) notnull,

c_address     varchar(40) notnull,

c_nationkey   decimal(3,0) notnull,

c_phone       char(15) notnull,

c_acctbal     decimal(7,2) notnull,

c_mktsegment  char(10) notnull,

c_comment     varchar(117) notnull

);

 

createcolumntable orders  (

o_orderkey       decimal(12,0) notnull,

o_custkey        decimal(9,0) notnull,

o_orderstatus    char(1) notnull,

o_totalprice     decimal(8,2) notnull,

o_orderdate      datenotnull,

o_orderpriority  char(15) notnull,

o_clerk          char(15) notnull,

o_shippriority   decimal(1,0) notnull,

o_comment        varchar(79) notnull

);

 

createcolumntable lineitem (

l_orderkey    decimal(12,0) notnull,

l_partkey     decimal(10,0) notnull,

l_suppkey     decimal(8,0) notnull,

l_linenumber  decimal(1,0) notnull,

l_quantity    decimal(2,0) notnull,

l_extendedprice  decimal(8,2) notnull,

l_discount    decimal(3,2) notnull,

l_tax         decimal(3,2) notnull,

l_returnflag  char(1) notnull,

l_linestatus  char(1) notnull,

l_shipdate    datenotnull,

l_commitdate  datenotnull,

l_receiptdate datenotnull,

l_shipinstruct char(25) notnull,

l_shipmode     char(10) notnull,

l_comment      varchar(44) notnull

);

 

IMPORTFROM CSV FILE '/sap/tpc-h/tpch_2_15.0/dbgen/lineitem.tbl'INTO"TPCH"."LINEITEM"WITH RECORD DELIMITED BY'\n' FIELD DELIMITED BY'|';

IMPORTFROM CSV FILE '/sap/tpc-h/tpch_2_15.0/dbgen/customer.tbl'INTO"TPCH"."CUSTOMER"WITH RECORD DELIMITED BY'\n' FIELD DELIMITED BY'|';

IMPORTFROM CSV FILE '/sap/tpc-h/tpch_2_15.0/dbgen/nation.tbl'INTO"TPCH"."NATION"WITH RECORD DELIMITED BY'\n' FIELD DELIMITED BY'|';

IMPORTFROM CSV FILE '/sap/tpc-h/tpch_2_15.0/dbgen/orders.tbl'INTO"TPCH"."ORDERS"WITH RECORD DELIMITED BY'\n' FIELD DELIMITED BY'|';

IMPORTFROM CSV FILE '/sap/tpc-h/tpch_2_15.0/dbgen/part.tbl'INTO"TPCH"."PART"WITH RECORD DELIMITED BY'\n' FIELD DELIMITED BY'|';

IMPORTFROM CSV FILE '/sap/tpc-h/tpch_2_15.0/dbgen/partsupp.tbl'INTO"TPCH"."PARTSUPP"WITH RECORD DELIMITED BY'\n' FIELD DELIMITED BY'|';

IMPORTFROM CSV FILE '/sap/tpc-h/tpch_2_15.0/dbgen/region.tbl'INTO"TPCH"."REGION"WITH RECORD DELIMITED BY'\n' FIELD DELIMITED BY'|';

IMPORTFROM CSV FILE '/sap/tpc-h/tpch_2_15.0/dbgen/supplier.tbl'INTO"TPCH"."SUPPLIER"WITH RECORD DELIMITED BY'\n' FIELD DELIMITED BY'|';

 

 

 

Congratulations!

You are now ready to start running queries. 

 

Section 2.4  of the following document  http://www.tpc.org/tpch/spec/tpch2.15.0.pdf has plenty of query examples for you to try.

You may need to translate these to HANA SQL or be more adventurous and create them with Attribute, Analytic & Calculation views to optimise performance.

NOTE: For those that successfully translate the queries to HANA SQL then please consider adding your translated SQL as a comment to this blog, to share with others.

 

Have fun bench-marking your results with others.

 

 

------------------------------------------------------------------------------------------

BTW:  I used TPC-H data in my first blog comparing HANA to HADOOP IMPALA.

http://scn.sap.com/community/developer-center/hana/blog/2013/05/30/big-data-analytics-hana-vs-hadoop-impala-on-aws


Viewing all articles
Browse latest Browse all 676

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>