Embulk

_images/embulk-logo.png

What’s Embulk?

Embulk is a open-source bulk data loader that helps data transfer between various databases, storages, file formats, and cloud services.

Embulk supports:

  • Automatic guessing of input file formats
  • Parallel & distributed execution to deal with big data sets
  • Transaction control to guarantee All-or-Nothing
  • Resuming
  • Plugins released on RubyGems.org

You can define a bulk data loading using combination of input and output plugins:

_images/embulk-architecture.png

For example, this tutorial describes how to use file input plugin with csv parser plugin and gzip decoder plugin to read CSV files, and elasticsearch output plugin to load the records to Elasticsearch.

Documents