This article is according to CSV files importing to SAP HANA acceleration. And find out which factors and methods can affect the speed of importing.
Hardware factors
The speed limit of SAP HANA import is limited by hardware configuration. No matter what we have done in software level, the hardware is the most important factor. There are 3 kind of hardware factor which affect the speed limit of importing.
- Disk type
Since SAP HANA importing is always be with transaction log writing and delta log writing, the disk write speed is significant. SSD as log are and data area of SAP HANA is recommended.
- Number of CPU cores
SAP HANA is able to make full use of multiple cores to import data. So number of CPU cores decides the speed of importing.
- Size of memory
SAP HANA is an in-memory database which’s data is stay in memory. If the size of memory is not big enough, data importing will lead to lack of memory. Then HANA will unload other data which is not used recently. It will reduce the speed of importing data. Besides, in the period of reading csv files, the size of cache will increase rapidly. When size of cache is too large, operation system will release some space of cache. This process will affect importing data. Through some experiments, I recommend that the size of free memory is nearly double the size of csv files.
Importing files factors
According to CSV files, there are 3 factors which can affect speed of importing data:
- The correct format of importing files
If csv files contain data which not follow the format of table, all batch contains this data will not be imported into database. This will reduce the speed of importing data.
- Size of csv file
The size of csv file needs to be big enough so that SAP HANA can use multiple threads technology to import data.
SAP HANA factors
In SAP HANA, data are stored not only in memory but also in disk and log files. To get the speed limit of importing data, we need abandon some configurations that is for security reasons.
Partition
The partition of table can contribute to improve the parallel degree. Through my experiments, hash partition is the best method of partition. And the numeric field is the best type of partition value.
Auto merge
By default, the data imported into table are stored in delta are. Then the delta area is merged into main area automatically. And this process will not import data into database. So we can disable auto merge to ensure the process of importing will not do merge operation.
Delta log
To column store, SAP HANA will write delta log into disk when importing data. This process will reduce the speed of importing data. Our aim of importing is to put the data in to memory, and that process is to make sure the imported data will lose. So we can disable delta log to improve the speed of importing.
Number of threads
To make full use of multiple cores, we can set the number of threads when importing data. Through experiment, number of threads= number of CPU cores is the best setting.
Number of tuples in a batch
SAP HANA imports data in batches. We can set the number of tuples in a batch.
Summary
According to two different hardware configurations, we get different results of importing speed.
Hardware configuration | Importing speed |
CPU: 16 cores Memory: 256GB Disk type: SSD | 100M/s |
CPU: 80 cores Memory: 1TB Disk type: SSD |
|
The flow chart of accelerate the speed of importing.