Bread & Butter: 8 Tables with ‘All you can eat’ Data

Hi All,

I’m sorry but this isn’t going to be an exciting blog about some of HANA’s cool features.

It's a bit boring but the Bread & Butter of all databases is having tables with data, that can be queried and joined.

For those of you just starting or those that are more adventurous and want a large volume of data to test out performance tuning options, then TPC Benchmark™H (TPC-H) might be for you.

http://www.tpc.org/tpch/

TPC-H provides 8 tables and a data generation tool that enables you to create as much data as you want.

Generate the test data, create the tables, import and you’re away.

The 8 tables are all designed to be joined. They simulate Sales Order transactions:

The tables include (using SAP style terminology):

Customers (Customer Master)
Supplier (Vendor Master)
Part (Material Master records)
Partsupp (Inventory Balances)
Nation (Country)
Region
Orders (Sales Order Header)
Lineitem (Sales Order line items)

NOTE: The tables are NOT in the same format as the similar SAP ECC tables but offer similar information as what you might find in ECC.

The Table Schema is as follows:

NOTE: SF is the Scaling factor used to generate the data.

For example a scaling factor of 1 will generate 6 Million records for the lineitem table, SF 2 will generate 12 Million, 10 will generate 60M,

100 will generate 600M, etc.

As a quick summary you will need to :

Log into Linux with the appropriate authorization
Install a C/C++ compiler if you’ve not already
Download/Extract the TPC-H files
Modify the ‘makefile’ to be relevant for Linux
Compile the data generation tool (DBGEN)
Run DBGEN to create data files based on the scaling factor
Create the tables and import them in HANA studio

If you’re still reading then I’ve not scared you off, so now for the technical bit.

Log into Linux with the appropriate authorization

Use Putty or equivalent tool to log into Linux box running HANA , at the OS level.

For those using a AWS HANA developer box refer to http://scn.sap.com/docs/DOC-28294

Install a C/C++ compiler if you’ve not already

sudo zypper install gcc gcc-c++ gcc-fortran

cd /usr/bin/

sudo ln -s gcc g

Download/Extract the TPC-H files

Create a new directory to store TPC-H files, ensure there is enough space on the mounted folder then download and extract the files.

On AWS I have choosen the following:

cd /sap

mkdir tpc-h

cd tpc-h

wget http://www.tpc.org/tpch/spec/tpch_2_15.0.zip

unzip tpch_2_15.0.zip

Modify the ‘makefile’ to be relevant for Linux

Use 'vi' or equivalent to modify the supplied 'makefile' which is used to compile the data generator tool.

In the 'makefile' ensure the following parameters are set

DATABASE=SQLSERVER

MACHINE = LINUX

WORKLOAD = TPCH

NOTE: We use SQLSERVER above as it doesn't recognise HANA.

For an intro to 'vi' you could try http://heather.cs.ucdavis.edu/~matloff/UnixAndC/Editors/ViIntro.html

At the command prompt:

cd tpch_2_15.0/dbgen

vi makefile.suite

Edit the file and save.

Compile the data generation tool (DBGEN)

Now we need to compile the generation tool, to generate the executable file 'dbgen'

make -f makefile.suite

The dbgen executable should now appear in the folder.

Check with.

ls -l

Run DBGEN to create data files based on the scaling factor

Now you are ready to create 'tbl' files for each of the 8 tables.

The important parameter of the 'dbgen' program is the scaling factor.

A scaling factor of 1, will generate 1Gb of uncompressed test data(e.g. 6 million rows in the 'lineitem table')

A scaling factor of 2, will generate 2Gb of uncompressed test data(e.g. 12 million rows in the 'lineitem table')

A scaling factor of 10, will generate 60Gb of uncompressed test data(e.g. 60 million rows in the 'lineitem table')

A scaling factor of 100, will generate 600Gb of uncompressed test data(e.g. 600 million rows in the 'lineitem table')

etc.

Excute the tool with one of following commands withe different scaling factors (or your own):

#Scaling factor 1

./dbgen -vf -s 1

#Scaling factor 2

./dbgen -vf -s 2

#Scaling factor 10

./dbgen -vf -s 10

[Note: I suggest your try scaling factor 1 first, till you are comfortable with the load process]

Create the tables and import them in HANA studio

createschema"TPCH";

setschema"TPCH";

droptable nation;

droptable region;

droptable supplier;

droptable part;

droptable partsupp;

droptable customer;

droptable orders;

droptable lineitem;

create columntable nation (

n_nationkey decimal(3,0) notnull,

n_name char(25) notnull,

n_regionkey decimal(2,0) notnull,

n_comment varchar(152)

);

createcolumntable region (

r_regionkey decimal(2,0) notnull,

r_name char(25) notnull,

r_comment varchar(152)

);

createcolumntable part (

p_partkey decimal(10,0) notnull,

p_name varchar(55) notnull,

p_mfgr char(25) notnull,

p_brand char(10) notnull,

p_type varchar(25) notnull,

p_size decimal(2,0) notnull,

p_container char(10) notnull,

p_retailprice decimal(6,2) notnull,

p_comment varchar(23) notnull

);

createcolumntable supplier (

s_suppkey decimal(8,0) notnull,

s_name char(25) notnull,

s_address varchar(40) notnull,

s_nationkey decimal(3,0) notnull,

s_phone char(15) notnull,

s_acctbal decimal(7,2) notnull,

s_comment varchar(101) notnull

);

createcolumn table partsupp (

ps_partkey decimal(10,0) notnull,

ps_suppkey decimal(8,0) notnull,

ps_availqty decimal(5,0) notnull,

ps_supplycost decimal(6,2) notnull,

ps_comment varchar(199) notnull

);

createcolumntable customer (

c_custkey decimal(9,0) notnull,

c_name varchar(25) notnull,

c_address varchar(40) notnull,

c_nationkey decimal(3,0) notnull,

c_phone char(15) notnull,

c_acctbal decimal(7,2) notnull,

c_mktsegment char(10) notnull,

c_comment varchar(117) notnull

);

createcolumntable orders (

o_orderkey decimal(12,0) notnull,

o_custkey decimal(9,0) notnull,

o_orderstatus char(1) notnull,

o_totalprice decimal(8,2) notnull,

o_orderdate datenotnull,

o_orderpriority char(15) notnull,

o_clerk char(15) notnull,

o_shippriority decimal(1,0) notnull,

o_comment varchar(79) notnull

);

createcolumntable lineitem (

l_orderkey decimal(12,0) notnull,

l_partkey decimal(10,0) notnull,

l_suppkey decimal(8,0) notnull,

l_linenumber decimal(1,0) notnull,

l_quantity decimal(2,0) notnull,

l_extendedprice decimal(8,2) notnull,

l_discount decimal(3,2) notnull,

l_tax decimal(3,2) notnull,

l_returnflag char(1) notnull,

l_linestatus char(1) notnull,

l_shipdate datenotnull,

l_commitdate datenotnull,

l_receiptdate datenotnull,

l_shipinstruct char(25) notnull,

l_shipmode char(10) notnull,

l_comment varchar(44) notnull

);

IMPORTFROM CSV FILE '/sap/tpc-h/tpch_2_15.0/dbgen/lineitem.tbl'INTO"TPCH"."LINEITEM"WITH RECORD DELIMITED BY'\n' FIELD DELIMITED BY'|';

IMPORTFROM CSV FILE '/sap/tpc-h/tpch_2_15.0/dbgen/customer.tbl'INTO"TPCH"."CUSTOMER"WITH RECORD DELIMITED BY'\n' FIELD DELIMITED BY'|';

IMPORTFROM CSV FILE '/sap/tpc-h/tpch_2_15.0/dbgen/nation.tbl'INTO"TPCH"."NATION"WITH RECORD DELIMITED BY'\n' FIELD DELIMITED BY'|';

IMPORTFROM CSV FILE '/sap/tpc-h/tpch_2_15.0/dbgen/orders.tbl'INTO"TPCH"."ORDERS"WITH RECORD DELIMITED BY'\n' FIELD DELIMITED BY'|';

IMPORTFROM CSV FILE '/sap/tpc-h/tpch_2_15.0/dbgen/part.tbl'INTO"TPCH"."PART"WITH RECORD DELIMITED BY'\n' FIELD DELIMITED BY'|';

IMPORTFROM CSV FILE '/sap/tpc-h/tpch_2_15.0/dbgen/partsupp.tbl'INTO"TPCH"."PARTSUPP"WITH RECORD DELIMITED BY'\n' FIELD DELIMITED BY'|';

IMPORTFROM CSV FILE '/sap/tpc-h/tpch_2_15.0/dbgen/region.tbl'INTO"TPCH"."REGION"WITH RECORD DELIMITED BY'\n' FIELD DELIMITED BY'|';

IMPORTFROM CSV FILE '/sap/tpc-h/tpch_2_15.0/dbgen/supplier.tbl'INTO"TPCH"."SUPPLIER"WITH RECORD DELIMITED BY'\n' FIELD DELIMITED BY'|';

Congratulations!

You are now ready to start running queries.

Section 2.4 of the following document http://www.tpc.org/tpch/spec/tpch2.15.0.pdf has plenty of query examples for you to try.

You may need to translate these to HANA SQL or be more adventurous and create them with Attribute, Analytic & Calculation views to optimise performance.

NOTE: For those that successfully translate the queries to HANA SQL then please consider adding your translated SQL as a comment to this blog, to share with others.

Have fun bench-marking your results with others.

------------------------------------------------------------------------------------------

BTW: I used TPC-H data in my first blog comparing HANA to HADOOP IMPALA.

http://scn.sap.com/community/developer-center/hana/blog/2013/05/30/big-data-analytics-hana-vs-hadoop-impala-on-aws

Bread & Butter: 8 Tables with ‘All you can eat’ Data

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...