Skip to content

Java Client for TD-API

Using the Java client for Treasure Data API, you can:

  • Submit Hive/Trino(Presto) queries to Treasure Data.
  • Check the status of jobs (queries).
  • Retrieve query results.
  • Check the information of databases and tables.

Note that td-client-java 0.8.0 requires Java 1.8 or higher. And td-client-java-0.7.x requires Java7.

Install

You can download a Jar file (td-client-java-(version)-shade.jar) from here.

For the information about the older versions.

Use the following dependency settings for either Maven or the Standalone Jar file.

Maven
<dependency>
  <groupId>com.treasuredata.client</groupId>
  <artifactId>td-client</artifactId>
  <version>(version)</version>
</dependency>

<!-- If you are not using any slf4 logger binder, add the following dependency, too. -->
<dependency>
  <groupId>ch.qos.logback</groupId>
  <artifactId>logback-classic</artifactId>
  <version>1.2.3</version>
</dependency>
Standalone Jar
<dependency>
  <groupId>com.treasuredata.client</groupId>
  <artifactId>td-client</artifactId>
  <version>(version)</version>
  <classifier>shade</classifier>
</dependency>

Basic Use

Set API Key

Option 1 : Config file

To use td-client-java, you need to set your API key in the $HOME/.td/td.conf file.

[account]
  user = (your TD account e-mail address)
  apikey = <YOUR_API_KEY>

Option 2: Environment variable

It is also possible to use the TD_API_KEY environment variable. Add the following configuration to your shell configuration .bash_profile, .zprofile, etc.

export TD_API_KEY = YOUR_API_KEY

For Windows, add the TD_API_KEY environment variable in the user preference panel.

Example Code

import com.treasuredata.client.*;
import com.google.common.base.Function;
import org.msgpack.core.MessagePack;
import org.msgpack.core.MessageUnpacker;
import org.msgpack.value.ArrayValue;
...

// Create a new TD client by using configurations in $HOME/.td/td.conf
TDClient client = TDClient.newClient();
try {

// Retrieve database and table names
List<TDDatabase> databaseNames = client.listDatabases();
for(TDDatabase db : databaseNames) {
   System.out.println("database: " + db.getName());
   for(TDTable table : client.listTables(db.getName())) {
      System.out.println(" table: " + table);
   }
}

// Submit a new Trino(Presto) query (for Hive, use TDJobReqult.newHiveQuery)
String jobId = client.submit(TDJobRequest.newTrinoQuery("sample_datasets", "select count(1) from www_access"));

// Wait until the query finishes
ExponentialBackOff backoff = new ExponentialBackOff();
TDJobSummary job = client.jobStatus(jobId);
while(!job.getStatus().isFinished()) {
  Thread.sleep(backoff.nextWaitTimeMillis());
  job = client.jobStatus(jobId);
}

// Read the detailed job information
TDJob jobInfo = client.jobInfo(jobId);
System.out.println("log:\n" + jobInfo.getCmdOut());
System.out.println("error log:\n" + jobInfo.getStdErr());

// Read the job results in msgpack.gz format
client.jobResult(jobId, TDResultFormat.MESSAGE_PACK_GZ, new Function<InputStream, Object>() {
  @Override
  public Object apply(InputStream input) {
  try {
    MessageUnpacker unpacker = MessagePack.newDefaultUnpacker(new GZIPInputStream(input));
    while(unpacker.hasNext()) {
       // Each row of the query result is array type value (e.g., [1, "name", ...])
       ArrayValue array = unpacker.unpackValue().asArrayValue();
       int id = array.get(0).asIntegerValue().toInt();
    }
  }
});

...

}
finally {
  // Never forget to close the TDClient.
  client.close();
}

Bulk upload

// Create a new TD client by using configurations in $HOME/.td/td.conf
TDClient client = TDClient.newClient();

File f = new File("./sess/part01.msgpack.gz");

TDBulkImportSession session = client.createBulkImportSession("session_name", "database_name", "table_name");
client.uploadBulkImportPart(session.getName(), "session_part01", f);

Data Connector Bulk Loading

// Create a new TD client by using configurations in $HOME/.td/td.conf
TDClient client = TDClient.newClient();

client.startBulkLoadSession("session_name");

Advanced Use

Proxy Server

If you need to access Web through proxy, add the following configuration to $HOME/.td/td.conf file:

[account]
  user = (your TD account e-mail address)
  apikey = (your API key)
  td.client.proxy.host = (optional: proxy host name)
  td.client.proxy.port = (optional: proxy port number)
  td.client.proxy.user = (optional: proxy user name)
  td.client.proxy.password = (optional: proxy password)

Configuring TDClient

To configure TDClient, use TDClient.newBuilder():

TDClient client = TDClient
    .newBuilder()
    .setApiKey("(your api key)")
    .setEndpoint("api.ybi.idcfcloud.net")   // For using a non-default endpoint
    .build()

It is also possible to set the configuration with a Properties object:

Properties prop = new Properties();
// Set your own properties
prop.setProperty("td.client.retry.limit", "10");
...

// This overrides the default configuration parameters with the given Properties
TDClient client = TDClient.newBuilder().setProperties(prop).build();

Configuration Parameters

The precedence of the configuration parameters are as follows:

  1. Properties object passed to TDClient.newBuilder().setProperties(Properties p)
  2. Parameters written in $HOME/.td/td.conf
  3. System properties (passed with -D option when launching JVM)
  4. Environment variable (only for TD_API_KEY parameter)
KeyDefault ValueDescription
apikeyAPI key to access Treasure Data. You can also set this via TD_API_KEY environment variable.
userAccount e-mail address (unnecessary if apikey is set)
passwordAccount password (unnecessary if apikey is set)
td.client.proxy.host(optional) Proxy host e.g., "myproxy.com"
td.client.proxy.port(optional) Proxy port e.g., "80"
td.client.proxy.user(optional) Proxy user
td.client.proxy.password(optional) Proxy password
td.client.usessltrue(optional) Use SSL encryption
td.client.retry.limit7(optional) The maximum number of API request retry
td.client.retry.initial-interval500(optional) backoff retry interval = (interval) * (multiplier) ^ (retry count)
td.client.retry.max-interval60000(optional) max retry interval
td.client.retry.multiplier2.0(optional) retry interval multiplier
td.client.connect-timeout15000(optional) connection timeout before reaching the API
td.client.read-timeout60000(optional) timeout when no data is coming from API
td.client.connection-pool-size64(optional) Connection pool size
td.client.endpointapi.treasuredata.com(optional) TD REST API endpoint name
td.client.port80 for non-SSL, 443 for SSL connection(optional) TD API port number

Further Reading