embulk

Embulk v0.11 is coming soon

(A Japanese version of this article is at: https://zenn.dev/dmikurube/articles/embulk-v0-11-is-coming-soon-ja)

It has been a long time since we started Embulk v0.10 as a “development series”. 1

We will soon release Embulk v0.11, the next “stable” versions. This article introduces some notes on migration to the v0.11 series. 2

Embulk v0.10.50

First, as of today, Embulk v0.10.50 has been released.

This v0.10.50 is intended to be the final version of the v0.10 series, and is effectively the Release Candidate for v0.11.0. If no problems are found with it, v0.11.0 will be released the same as v0.10.50.

As of publishing this article, we intended to have v0.10.48 as the final version. But, we had a couple of updates after publishing.

  • v0.10.49: Fix for newer Java
  • v0.10.50: Fix for explicit null in CSV quote and escape

So, please try v0.10.50 before v0.11.0 is released. And, if you find any problem with v0.10.50, please report it to our User forum. We plan to release v0.11.0 in June 2023 if no significant problems are found.

The Embulk v0.11 series contains a number of changes that are incompatible with the previous v0.9 series. Many older plugins are no longer considered to work. The major plugins maintained by the Embulk project (https://github.com/embulk) are ready for the v0.11 series, but there would be many plugins that are not yet ready.

Almost all the existing plugins can be ready for v0.11 if the developers of them follow the instructions in this article.

There may be cases where the Embulk core needs to be changed for some plugins to support v0.11. If it turns out before the stable version v0.11.0 is released, we would likely be able to help such cases by changing the Embulk core.

After v0.11.0 is officially released, it could happen that “the part of the Embulk core can no longer be changed”, and there would be no way. Please try v0.10.50 before it happens.

Plugin’s v0.11 support

As mentioned above, many plugins are expected not to work with v0.10.50 (v0.11.0). Most of them should work again if the plugins are updated to support v0.11. Please contact the developer of each plugin.

You can report it in our User forum, but it is up to the developer of each plugin whether to release its v0.11-compatible version or not.

Download

The selfupdate command, used to update Embulk, is now deprecated.

  • The selfupdate command alone to update to the latest version is already disabled.
  • The selfupdate X.Y.Z command to update to a specified version is still effective, but will eventually become obsolete.

Choose an appropriate site for you to download from the following options.

  • Download from GitHub Releases with confirming the version.
    • You can confirm changes in the version.
    • This will be appropriate if you download it manually.
  • Download from a version-specific URL like: https://dl.embulk.org/embulk-X.Y.Z.jar
  • Download the latest version from https://dl.embulk.org/embulk-latest.jar.
    • This is a redirect to https://dl.embulk.org/embulk-X.Y.Z.jar above.
    • This URL still points v0.9.24 as v0.10 is a development series.
    • We will update this URL in a while after v0.11.0 is released.

Jfyi, the file of Embulk v0.10.50 (embulk-0.10.50.jar) is 6.39 MB, while Embulk v0.9.24 (embulk-0.9.24.jar) was 42.4 MB. It is much smaller.

Changes in the command line

You will see the following message once you rename the downloaded file embulk-X.Y.Z.jar to embulk, and run it, in the same way as before.

This is a warning message that you will not be able to run Embulk with a command like: $ embulk run ...

================================== [ NOTICE ] ==================================
 Embulk will not be executable as a single command, such as 'embulk run'.
 It will happen at some point in v0.11.*.

 Get ready for the removal by running Embulk with your own 'java' command line.
 Running Embulk with your own 'java' command line has already been available.

 For instance in Java 1.8 :
  java -XX:+AggressiveOpts -XX:+UseConcMarkSweepGC -jar embulk-X.Y.Z.jar run ...
  java -XX:+AggressiveOpts -XX:+TieredCompilation -XX:TieredStopAtLevel=1 -Xverify:none -jar embulk-X.Y.Z.jar guess ...

 See https://github.com/embulk/embulk/issues/1496 for the details.
================================================================================

You can still run Embulk like $ embulk run ... as of v0.10.50 and v0.11.0. You don’t need to change the command line immediately at the same time as trying v0.10.50 or migrating to v0.11. However, a command line like $ embulk run ... will become unavailable during the v0.11 series shortly. Please now start preparing your startup script, or command line starting with java.

Commands like $ embulk run ... will be unsupported so that Embulk will support newer Java, as explained in the next section. The embulk command has been implemented by an embedded shell script. However, it turned out to be hard to support various command lines (options) of varieties of Java versions by varieties of distributors after Java 9 in the shell script. We concluded that it would be impossible to keep the embedded shell script.

Please start preparing your own startup script, or command line, appropriate to your Java version in use.

Support for newer Java

Unfortunately, Embulk supports only Java 8 officially, as of v0.10.50 and v0.11.0. However, we believe that Embulk should work basically with Java 11 and Java 17 at this point. If you are interested, please give it a try.

We plan to expand testing with newer Java versions, along with moving forward with Embulk v0.11, so that we can officially announce that Embulk supports Java 11 and Java 17 at some point in the future.

There are some cases in that plugins are not compatible with newer Java. 3 Such a plugin would need some work.

Embulk System Properties

Embulk System Properties has been added, a mechanism to configure global Embulk settings. They can be configured by a Java properties file embulk.properties, and overridden with a command line option -Xkey=value.

There are several possible locations for embulk.properties (see below). But as a starting point when migrating to v0.10.50 or v0.11, an easy option is to start from ~/.embulk/embulk.properties under ~/.embulk/. ~/.embulk/ is the directory where plugins have been installed even until v0.9.

JRuby

Embulk v0.10.50 and v0.11.0 do not include JRuby. If you wanted to use an Embulk plugin in the Ruby gem format (in almost every case at this time), you need to download JRuby separately, and configure your Embulk System Properties to use it. (See below in particular.)

It may seem annoying to you at a glance. But, please consider it a chance for you to start using an arbitrary JRuby version (at your own risk) as you need, whereas you had not been able to upgrade JRuby embedded in Embulk so far. Some newer JRuby versions may not work well as we do not test with many JRuby versions. But, now, you should have freedom much more improved.

In the long term, however, we plan to gradually shrink our support on (J)Ruby. 4 We will make the Maven-format plugins, which do not require JRuby, easier to use so that they will be the main plugin format in v0.11 and later.

Example: set up with JRuby

Let’s set up with JRuby 9.1.15.0, which is quite old, but the same with Embulk v0.9.

First, download jruby-complete-9.1.15.0.jar from JRuby Files/downloads/9.1.15.0, and place it in an appropriate directory.

Then, specify the location of the jruby-complete-9.1.15.0.jar as a file: URL in ~/.embulk/embulk.properties as follows. (Replace /home/user/ appropriately.)

jruby=file:///home/user/jruby-complete-9.1.15.0.jar

This is the first setup to use Ruby gem-format plugins in Embulk v0.10.50 and v0.11.

embulk.gem

To use Ruby gem-format plugins with JRuby, you also need to install embulk.gem. It must be the same version as the Embulk core.

# If Embulk is v0.10.50, embulk.gem must be 0.10.50, too.
$ java -jar embulk-0.10.50.jar gem install embulk -v 0.10.50

This embulk.gem is completely different from embulk.gem until v0.8. We are considering an idea to yank the old embulk.gem until v0.8 from rubygems.org, as we see some confusion due to this. If you have any thoughts, please leave a comment in GitHub Discussions at:

https://github.com/orgs/embulk/discussions/3

The embulk.gem depends on msgpack.gem. You may need to install it explicitly in some cases.

Bundler and Liquid

You’ll need to install Bundler or Liquid explicitly if you are going to use them.

$ java -jar embulk-0.10.50.jar gem install bundler -v 1.16.0
$ java -jar embulk-0.10.50.jar gem install liquid -v 4.0.0

However, we have not done much testing with them either. The same versions of Bundler 1.16.0 and Liquid 4.0.0 are working now, and new versions would probably work as well. But please try their new versions at your own risk. (Feedback is welcome.)

Embulk home

Embulk v0.10.50 and v0.11.0 have a directory setting called “Embulk home”. This is mostly equivalent to ~/.embulk/ up to v0.9, and still defaults to ~/.embulk/. In the example of Embulk System Properties above, we started from putting embulk.properties in ~/.embulk/.

You can change this “Embulk home” to another directory from ~/.embulk/, and load embulk.properties from there. Embulk home is chosen in the following order of precedence.

  1. Embulk home can be set from an Embulk System Property embulk_home in a command line option -X.
    • It would be an absolute path, or a relative path from the current working directory.
    • java -jar embulk-0.10.50.jar -Xembulk_home=/var/tmp/foo run ... (absolute path)
    • java -jar embulk-0.10.50.jar -Xembulk_home=.embulk2/bar run ... (repative path)
  2. Embulk home can be set from an environment variable EMBULK_HOME.
    • It must be an absolute path.
    • env EMBULK_HOME=/var/tmp/baz java -jar embulk-0.10.50.jar run ... (absolute path)
  3. If neither 1 nor 2, Embulk home is set by searching according to the following conditions.
    • Search from the current directory, moving one by one toward the parent.
    • In each directory, look for a directory “whose name is .embulk/”, and “which contains a regular file named embulk.properties”.
    • If such a directory is found, it is chosen as Embulk home.
    • If the current working directory is under the user’s home directory, the search stops at the home directory.
    • Otherwise, the search continues to the root directory.
  4. If none of 1 to 3 matches, ~/.embulk/ is chosen unconditionally as Embulk home.

embulk.properties directly under Embulk home is loaded as Embulk System Properties, then.

Finally, an Embulk System Property embulk_home is overridden with the absolute path of Embulk home.

Directories to install plugins

Along with setting Embulk home, the directories to install Embulk plugins are determined more explicitly. They are configured in the following order of precedence.

  1. The directories to install Ruby gem-format plugins, and the directory to install Maven-format plugins can be set from Embulk System Properties gem_home, gem_path, and m2_repo in command line options -X.
    • They would be absolute paths, or relative paths from the current working directory.
    • java -jar embulk-0.10.50.jar -Xgem_home=/var/tmp/bar gem install ... (absolute path)
    • java -jar embulk-0.10.50.jar -Xm2_repo=.m2/repository run ... (relative path)
  2. The directories to install Ruby gem-format plugins, and the directory to install Maven-format plugins can be set from Embulk System Properties gem_home, gem_path, and m2_repo in the embulk.properties at Embulk home.
    • They would be absolute paths, or relative paths from Embulk home.
  3. The directories to install Ruby gem-format plugins, and the directory to install Maven-format plugins can be set from environment variables GEM_HOME, GEM_PATH, and M2_REPO.
    • They must be absolute paths.
    • env GEM_HOME=/var/tmp/baz java -jar embulk-0.10.50.jar gem ... (absolute path)
  4. If none of 1 to 3 matches, Ruby gem-format plugins are installed in lib/gems under Embulk home, and Maven-format plugins are installed in lib/m2/repository under Embulk home.

Finally, Embulk System Properties gem_home, gem_path, and m2_repo are overridden with the absolute paths of the chosen directories. gem_path would be empty if not explicitly set.

On the other hand, environment variables GEM_HOME, GEM_PATH, and M2_REPO are not overridden. They are simply ignored if 1 or 2 above matches.

Summary

You should be able to run Embulk v0.10.50 with the configurations above. Please try it before v0.11.0 is released in June 2023.

If you see any problem, please report it in our User forum.

Plans for v0.11.0 and later

We initially thought to move forward to v1.0 without adding many new features sooner after v0.11.0 is released. However, we now discuss implementing some requests before v1.0.

We have not decided anything, but we may go into v0.12 or v0.13 before v1.0 for such new features. But anyway, we do not intend to make significant incompatible changes like this time, such as v0.9 → v0.10 → v0.11.

Thank you for your continued support of Embulk.

Minimum things we wanted to do before v1.0

  • Bug fixes based on feedback
  • Support for plugin developers for v0.11
  • Improving support for Maven-format plugins (especially installation)
  • New testing framework for plugins
  • Official support for newer Java
  • Refactoring and reorganization of the code

Maybe some more to do before v1.0

(Note: no any decisions made)

  • Enhancement in reporting errors and else from plugins
  • Support for some Data Lineage-like functionality
  1. For the background reason why it took so long, see “Embulk in TD, and in the future” in Treasure Data Developer Blog, if you are interested. 

  2. Some of the information is duplicated with “For Embulk users: What will change in v0.11 and v1.0?” that was published about two years ago. But, this article in 2023 contains some further changes and notes. 

  3. A typical case is using classes in javax.xml.*, which are removed from the standard API. It is not always easy to determine at a glance since some transitive dependencies may call javax.xml.*

  4. Join us in maintaining Embulk if you think, “We don’t want that! Continue to support JRuby!” :)