This article is according to CSV files importing to SAP HANA acceleration. And find out which factors and methods can affect the speed of importing.

Hardware factors

The speed limit of SAP HANA import is limited by hardware configuration. No matter what we have done in software level, the hardware is the most important factor. There are 3 kind of hardware factor which affect the speed limit of importing.

Disk type

Since SAP HANA importing is always be with transaction log writing and delta log writing, the disk write speed is significant. SSD as log are and data area of SAP HANA is recommended.

Number of CPU cores

SAP HANA is able to make full use of multiple cores to import data. So number of CPU cores decides the speed of importing.

Size of memory

SAP HANA is an in-memory database which’s data is stay in memory. If the size of memory is not big enough, data importing will lead to lack of memory. Then HANA will unload other data which is not used recently. It will reduce the speed of importing data. Besides, in the period of reading csv files, the size of cache will increase rapidly. When size of cache is too large, operation system will release some space of cache. This process will affect importing data. Through some experiments, I recommend that the size of free memory is nearly double the size of csv files.

Importing files factors

According to CSV files, there are 3 factors which can affect speed of importing data:

The correct format of importing files

If csv files contain data which not follow the format of table, all batch contains this data will not be imported into database. This will reduce the speed of importing data.

Size of csv file

The size of csv file needs to be big enough so that SAP HANA can use multiple threads technology to import data.

SAP HANA factors

In SAP HANA, data are stored not only in memory but also in disk and log files. To get the speed limit of importing data, we need abandon some configurations that is for security reasons.

Partition

The partition of table can contribute to improve the parallel degree. Through my experiments, hash partition is the best method of partition. And the numeric field is the best type of partition value.

Auto merge

By default, the data imported into table are stored in delta are. Then the delta area is merged into main area automatically. And this process will not import data into database. So we can disable auto merge to ensure the process of importing will not do merge operation.

Delta log

To column store, SAP HANA will write delta log into disk when importing data. This process will reduce the speed of importing data. Our aim of importing is to put the data in to memory, and that process is to make sure the imported data will lose. So we can disable delta log to improve the speed of importing.

Number of threads

To make full use of multiple cores, we can set the number of threads when importing data. Through experiment, number of threads= number of CPU cores is the best setting.