I think I've read somewhere recently that HADOOP is considered by some as a Swiss Army Knife for solving Big Data problems.
It certainly has a large plethora of tools, at various levels of maturity.
It's amazing the speed at which these opensource tools are developing and evolving.
If I needed to prepare external data files for HANA my first thought would be Excel.
As the size of the data and frequency of loading increased I might start thinking SAP DataServices (BODS).
There's usually more than one way to crack an egg though so my next thought is to consider using HADOOP.
The following diagram illustrates just a few of HADOOPs tools:
In this blog I will primarily explore the use of PIG, SQOOP and OOZIE to insert delta records into HANA. [ b) & c) ]
For more details on using SQOOP & OOZIE with HANA see:
Exporting and Importing DATA to HANA with HADOOP SQOOP
Creating a HANA Workflow using HADOOP Oozie
For a great intro to Hadoop (including PIG) then try out the Hortonworks Sandbox and follow some of their useful tutorials (Hadoop Tutorial: How to Process Data with Pig)
I don't want to reinvent the wheel completely so please do check out the Hortonworks tutorials. They also have videos if you don't want to get your hands dirty.
Below I will briefly cover 3 scenarios:
A) Manually using PIG to reformat a file
B) Using PIG to compare files and generate a DELTA file
C) Use Ooozie, Pig & Sqoop to transfer Delta to HANA
Manually using PIG to reformat a file
1) Load your raw file using the HADOOP User interface (HUE)
NOTE: PIG can also be used with some compressed file formats as well.
2) Run a Pig Script to FILTER and Remove some columns
End result
Using PIG to compare 2 files and generate a basic DELTA file
In this example I will load a new file and compare with the above file. Where I have a new key (ID) I want to generate a new DELTA file with only the new key records.
The new file is:
Note from above we have previous received record with ID 3, so the new delta record should only be (4,dddd)
So lets use a PIG script to determine the simple DELTA
If you look closely at the logic it resemble a Right Outer Join where the key of Left table is NULL.
The end result is:
Finally lets combine this PIG Script with HADOOP OOZIE & SQOOP to schedule and load the DELTA to HANA.
Use Ooozie, Pig & Sqoop to transfer Delta to HANA
Prior to running a new OOZIE workflow, lets check the target table which I manually loaded with results of the first simple PIG script.
Now lets create & run an Oozie workflow as follows:
Step1 - Use a Pig Script to create Delta File
NOTE: This will execute the same script used earlier.
Step 2 - Use Sqoop to export Delta File to HANA
Step3 - Move the New Delta and Overwrite the previous Delta
Now lets execute the workflow and see the results
Now finally lets check if it made it too HANA.
SUCCESS
If you give it a try then please do let me know how you get on.