Quantcast
Channel: SCN : Blog List - SAP HANA Developer Center
Viewing all articles
Browse latest Browse all 676

Using HADOOP PIG to feed HANA Deltas

$
0
0

I think I've read somewhere recently that HADOOP is considered by some as a Swiss Army Knife for solving Big Data problems.

 

It certainly has a large plethora of tools, at various levels of maturity.

It's amazing the speed at which these opensource tools are developing and evolving.

 

If I needed to prepare external data files for HANA my first thought would be Excel.

As the size of the data and frequency of loading increased I might start thinking SAP DataServices (BODS).

 

There's usually more than one way to crack an egg though so my next thought is to consider using HADOOP.

 

The following diagram illustrates just a few of HADOOPs tools:

 


In this blog I will primarily explore the use of  PIG, SQOOP and OOZIE to insert delta records into HANA. [ b)  & c) ]

 

For more details on using SQOOP & OOZIE with HANA see:

Exporting and Importing DATA to HANA with HADOOP SQOOP

Creating a HANA Workflow using HADOOP Oozie



For a great intro to Hadoop (including PIG) then try out the Hortonworks Sandbox and follow some of their useful tutorials (Hadoop Tutorial: How to Process Data with Pig)


I don't want to reinvent the wheel completely so please do check out the Hortonworks tutorials.  They also have videos  if you don't want to get your hands dirty.


Below I will briefly cover 3 scenarios:

A) Manually using PIG to reformat a file

B) Using PIG to compare files and generate a DELTA file

C) Use Ooozie, Pig & Sqoop to transfer Delta to HANA



Manually using PIG to reformat a file

1) Load your raw file using the  HADOOP User interface (HUE)

NOTE: PIG can also be used with some compressed file formats as well.


2) Run a Pig Script to FILTER and Remove some columns



End result


Using PIG to compare 2 files and generate a basic DELTA file


In this example I will load a new file and compare with the above file.  Where I have a new key (ID)  I want to generate a new DELTA file with only the new key records.

The new file is:

Note from above we have previous received record with ID 3, so the new delta record should only be (4,dddd)


So lets use a PIG script to determine the simple DELTA

If you look closely at the logic it  resemble a Right Outer Join where the key of Left table is NULL.


The end result is:



Finally lets combine this PIG Script with HADOOP OOZIE & SQOOP to schedule and load the DELTA to HANA.


Use Ooozie, Pig & Sqoop to transfer Delta to HANA


Prior to running a new OOZIE workflow, lets check the target table which I manually loaded with results of the first simple PIG script.


Now lets create & run an Oozie workflow as follows:


Step1 - Use a Pig Script to create Delta File

NOTE: This will execute the same script used earlier.



Step 2 - Use Sqoop to export Delta File to HANA


Step3 - Move the New Delta and Overwrite the previous Delta



Now lets execute the workflow and see the results



Now finally lets check if it made it too HANA.


SUCCESS 


 

If you give it a try then please do let me know how you get on.



Viewing all articles
Browse latest Browse all 676

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>