如何使用 JpaRepository 进行批量(多行)插入?

当从服务层使用长 List<Entity>调用我的 JpaRepositorysaveAll方法时,Hibernate 的跟踪日志显示每个实体发出的单个 SQL 语句。

我是否可以强制它执行批量插入(即多行) ,而不需要手动处理 EntityManger、事务等,甚至是原始 SQL 语句字符串?

使用多行插入,我的意思不仅仅是从:

start transaction
INSERT INTO table VALUES (1, 2)
end transaction
start transaction
INSERT INTO table VALUES (3, 4)
end transaction
start transaction
INSERT INTO table VALUES (5, 6)
end transaction

致:

start transaction
INSERT INTO table VALUES (1, 2)
INSERT INTO table VALUES (3, 4)
INSERT INTO table VALUES (5, 6)
end transaction

而是:

start transaction
INSERT INTO table VALUES (1, 2), (3, 4), (5, 6)
end transaction

在 PROD 中,我使用的是 CockroachDB,性能上的差异是显著的。

下面是一个重现该问题的最小示例(为简单起见,为 H2)。


返回文章页面

package things


import org.springframework.boot.autoconfigure.SpringBootApplication
import org.springframework.boot.runApplication
import org.springframework.web.bind.annotation.RestController
import org.springframework.web.bind.annotation.GetMapping
import org.springframework.data.jpa.repository.JpaRepository
import javax.persistence.Entity
import javax.persistence.Id
import javax.persistence.GeneratedValue


interface ThingRepository : JpaRepository<Thing, Long> {
}


@RestController
class ThingController(private val repository: ThingRepository) {
@GetMapping("/test_trigger")
fun trigger() {
val things: MutableList<Thing> = mutableListOf()
for (i in 3000..3013) {
things.add(Thing(i))
}
repository.saveAll(things)
}
}


@Entity
data class Thing (
var value: Int,
@Id
@GeneratedValue
var id: Long = -1
)


@SpringBootApplication
class Application {
}


fun main(args: Array<String>) {
runApplication<Application>(*args)
}

返回文章页面

jdbc.driverClassName = org.h2.Driver
jdbc.url = jdbc:h2:mem:db
jdbc.username = sa
jdbc.password = sa


hibernate.dialect=org.hibernate.dialect.H2Dialect
hibernate.hbm2ddl.auto=create


spring.jpa.generate-ddl = true
spring.jpa.show-sql = true


spring.jpa.properties.hibernate.jdbc.batch_size = 10
spring.jpa.properties.hibernate.order_inserts = true
spring.jpa.properties.hibernate.order_updates = true
spring.jpa.properties.hibernate.jdbc.batch_versioned_data = true

返回文章页面

import org.jetbrains.kotlin.gradle.tasks.KotlinCompile


plugins {
val kotlinVersion = "1.2.30"
id("org.springframework.boot") version "2.0.2.RELEASE"
id("org.jetbrains.kotlin.jvm") version kotlinVersion
id("org.jetbrains.kotlin.plugin.spring") version kotlinVersion
id("org.jetbrains.kotlin.plugin.jpa") version kotlinVersion
id("io.spring.dependency-management") version "1.0.5.RELEASE"
}


version = "1.0.0-SNAPSHOT"


tasks.withType<KotlinCompile> {
kotlinOptions {
jvmTarget = "1.8"
freeCompilerArgs = listOf("-Xjsr305=strict")
}
}


repositories {
mavenCentral()
}


dependencies {
compile("org.springframework.boot:spring-boot-starter-web")
compile("org.springframework.boot:spring-boot-starter-data-jpa")
compile("org.jetbrains.kotlin:kotlin-stdlib-jdk8")
compile("org.jetbrains.kotlin:kotlin-reflect")
compile("org.hibernate:hibernate-core")
compile("com.h2database:h2")
}

跑步:

./gradlew bootRun

触发数据库插入:

curl http://localhost:8080/test_trigger

日志输出:

Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
173247 次浏览

You can configure Hibernate to do bulk DML. Have a look at Spring Data JPA - concurrent Bulk inserts/updates. I think section 2 of the answer could solve your problem:

Enable the batching of DML statements Enabling the batching support would result in less number of round trips to the database to insert/update the same number of records.

Quoting from batch INSERT and UPDATE statements:

hibernate.jdbc.batch_size = 50

hibernate.order_inserts = true

hibernate.order_updates = true

hibernate.jdbc.batch_versioned_data = true

UPDATE: You have to set the hibernate properties differently in your application.properties file. They are under the namespace: spring.jpa.properties.*. An example could look like the following:

spring.jpa.properties.hibernate.jdbc.batch_size = 50
spring.jpa.properties.hibernate.order_inserts = true
....

To get a bulk insert with Spring Boot and Spring Data JPA you need only two things:

  1. set the option spring.jpa.properties.hibernate.jdbc.batch_size to appropriate value you need (for example: 20).

  2. use saveAll() method of your repo with the list of entities prepared for inserting.

Working example is here.

Regarding the transformation of the insert statement into something like this:

INSERT INTO table VALUES (1, 2), (3, 4), (5, 6)

the such is available in PostgreSQL: you can set the option reWriteBatchedInserts to true in jdbc connection string:

jdbc:postgresql://localhost:5432/db?reWriteBatchedInserts=true

then jdbc driver will do this transformation.

Additional info about batching you can find here.

UPDATED

Demo project in Kotlin: sb-kotlin-batch-insert-demo

UPDATED

Hibernate disables insert batching at the JDBC level transparently if you use an IDENTITY identifier generator.

The underlying issues is the following code in SimpleJpaRepository:

@Transactional
public <S extends T> S save(S entity) {
if (entityInformation.isNew(entity)) {
em.persist(entity);
return entity;
} else {
return em.merge(entity);
}
}

In addition to the batch size property settings, you have to make sure that the class SimpleJpaRepository calls persist and not merge. There are a few approaches to resolve this: use an @Id generator that does not query sequence, like

@Id
@GeneratedValue(generator = "uuid2")
@GenericGenerator(name = "uuid2", strategy = "uuid2")
var id: Long

Or forcing the persistence to treat the records as new by having your entity implement Persistable and overriding the isNew() call

@Entity
class Thing implements Pesistable<Long> {
var value: Int,
@Id
@GeneratedValue
var id: Long = -1
@Transient
private boolean isNew = true;
@PostPersist
@PostLoad
void markNotNew() {
this.isNew = false;
}
@Override
boolean isNew() {
return isNew;
}
}

Or override the save(List) and use the entity manager to call persist()

@Repository
public class ThingRepository extends SimpleJpaRepository<Thing, Long> {
private EntityManager entityManager;
public ThingRepository(EntityManager entityManager) {
super(Thing.class, entityManager);
this.entityManager=entityManager;
}


@Transactional
public List<Thing> save(List<Thing> things) {
things.forEach(thing -> entityManager.persist(thing));
return things;
}
}

The above code is based on the following links:

All mentioned methods work but will be slow especially if the source for inserted data lies in some other table. Firstly, even with batch_size>1 the insert operation will be executed in multiple SQL queries. Secondly, if the source data lies in the other table you need to fetch the data with other queries (and in the worst case scenario load all data into memory), and convert it to static bulk inserts. Thirdly, with separate persist() call for each entity (even if batch is enabled) you will bloat entity manager first level cache with all these entity instances.

But there's another option for Hibernate. If you use Hibernate as a JPA provider you can fallback to HQL which supports bulk inserts natively with subselect from another table. The example:

Session session = entityManager.unwrap(Session::class.java)
session.createQuery("insert into Entity (field1, field2) select [...] from [...]")
.executeUpdate();

Whether this will work depends on your ID generation strategy. If the Entity.id is generated by the database (for example MySQL auto increment), it will be executed successfully. If the Entity.id is generated by your code (especially true for UUID generators), it will fail with "unsupported id generation method" exception.

However, in the latter scenario this problem can be overcome by custom SQL function. For example in PostgreSQL I use uuid-ossp extension which provides uuid_generate_v4() function, which I finally register in my custom dialog:

import org.hibernate.dialect.PostgreSQL10Dialect;
import org.hibernate.dialect.function.StandardSQLFunction;
import org.hibernate.type.PostgresUUIDType;


public class MyPostgresDialect extends PostgreSQL10Dialect {


public MyPostgresDialect() {
registerFunction( "uuid_generate_v4",
new StandardSQLFunction("uuid_generate_v4", PostgresUUIDType.INSTANCE));
}
}

And then I register this class as a hibernate dialog:

hibernate.dialect=MyPostgresDialect

Finally I can use this function in bulk insert query:

SessionImpl session = entityManager.unwrap(Session::class.java);
session.createQuery("insert into Entity (id, field1, field2) "+
"select uuid_generate_v4(), [...] from [...]")
.executeUpdate();

The most important is the underlying SQL generated by Hibernate to accomplish this operation and it's just a single query:

insert into entity ( id, [...] ) select uuid_generate_v4(), [...] from [...]

I faced the same issue but I couldn't see my hibernate queries in batch, I realized that query doesn't translate to what was really querying. But to be sure that is bulking you can enable to generate statistics spring.jpa.properties.hibernate.generate_statistics=true then you will see:

enter image description here

when you add the spring.jpa.properties.hibernate.jdbc.batch_size=100 you will start to see some differences, like less jdbc statements and more jdbc batches:

enter image description here