Tableau Server on Linux – Part 2: Zookeeper and File Store/TDFS

 In FOR BI PROS, Tableau

File Store / TDFS on LinuxThe journey continues: in this session we are going to move the famous TDFS (Tableau Distributed File System) with Zookeeper service to one of our favorite operating system: Linux. The goal is the same: have each and every Tableau Server processes on Linux without the need of the Windows OS. And a short disclaimer: this is 100% unsupported by Tableau and you need valid licenses for your Linux box otherwise you are going to violate their EULA.

Previously on “Tableau Server on Linux – Part 1 – Data Engine”

And today we are not just going to install these two services on Linux. No, we’ll do a lot more! We start to transform our Single Node Tableau Server to a Cluster without even touching the GUI.

The Basics

TDFS – or as Tableau calls File Store service – is installed along with the Data Engine and controls the storage of extracts. In highly available environments, the File Store ensures that extracts are synchronized to other file store nodes so they are available if one file store node stops running. How does it work in practice? If you refresh a data source then

  • Backgrounder receives an extract refresh task
  • Gets the Data and pass it to the tdeserver process with its new unique name
  • tdeserver writes the new local tde file
  • Backgrounder connects to File Store service and report the new file
  • File Store puts the file to TDFS
  • TDFS implementation ensures that file is replicated to all nodes. Node configurations are stored in zookeeper under /tdfs zookeeper directory.

In order to use our tdeserver without the need to copy files between Tableau Server and our Linux hosts we need zookeeper and tdfs, that’s it. So, let’s configure them.

Zookeeper

First of all, what is Zookeeper? According to Zookeeper’s website Zookeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. Tableau uses Zookeeper to store cluster information, check who is doing what, who is available. Most of these functionalities are implemented in the Coordination Service. But enough from theory, let’s jump into the practice!

Installing Zookeeper on Linux

Zookeeper in Tableau Server 9.0 uses Zookeeper 3.4.6 which is the latest stable release. Zookeeper is written purely in Java, thus, binaries should work on all platforms where java is supported. You can download this version from any apache mirrors.

$ wget http://www.us.apache.org/dist/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz
[..]
$ tar xvzf zookeeper-3.4.6.tar.gz

Installation is done, the zookeeper distribution is ready to server in your zookeeper-3.4.6 folder. To have everything up and running we need a tableau-compatible configuration. The configuration should look like:

$ cat zookeeper-3.4.6/conf/zoo.cfg
tickTime=2000
initLimit=30
syncLimit=2
snapCount=100000
dataDir=/home/ec2-user/zookeeper-data
clientPort=12000
maxClientCnxns=0
quorumListenOnAllIPs=true

server.1=54.203.245.18:13000:14000
server.2=54.212.254.40:13000:14000

Couple of things: server.1 should be our original Windows Tableau server while server.2 is the Linux one. The dataDir should point to zookeepers local data directory. This needs to be created with mkdir ~/zookeper-data command. Also, you should create a file called myid inside dataDir to tell zookeeper the local node’s id:

echo 2 > ~/zookeper-data/myid

Good. Now switch to the windows box and add the server.2 line to Tableau Server’s zoo.conf located in %PROGRAMDATA%\Tableau\Tableau Server\data\tabsvc\config\zookeeper\zoo.cfg . That’s it. Restart tableau server, then start our own Linux Zookeper instance with:

$ ./bin/zkServer.sh start
JMX enabled by default
Using config: /home/ec2-user/zookeeper-3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

You can quickly check  zookeeper.out  to see everything is okay.

Validating Zookeper

We already built a Zookeper cluster and joined to our Tableau Server, isn’t it fantastic? Well, it is. But what’s inside? Well, let’s have a look:

$ bin/zkCli.sh -server 127.0.0.1:12000
[zk: 127.0.0.1:12000(CONNECTED) 0] ls /
[configs, tdfs, zookeeper, clusterstate.json, aliases.json, clustercontroller, live_nodes, postgres, overseer, collections, overseer_elect]

Nice, it seems we can access everything locally from Linux. Or maybe not:

[zk: 127.0.0.1:12000(CONNECTED) 1] ls /tdfs
Authentication is not valid : /tdfs

Tdfs folder is password protected. Time to authenticate ourselves. You can get the password from %PROGRAMDATA%\Tableau\Tableau Server\data\tabsvc\config\filestores.properties as usual:

filestore.zookeeper.username=fszkuser
filestore.zookeeper.password=95d2cb4f8464d1560db0f8276b59e4bfe2e6ad5d

Now, let’s authenticate and retry read from /tdfs  directory:

[zk: 127.0.0.1:12000(CONNECTED) 2] addauth digest fszkuser:95d2cb4f8464d1560db0f8276b59e4bfe2e6ad5d
[zk: 127.0.0.1:12000(CONNECTED) 4] ls /tdfs
[hostslock, totransferperhost, status, clock, totransferperfolder, hosts, transferring]

Everything as expected. Zookeeper: job done.

TDFS / File Store

Getting the binaries

And now, something different. Until now we deal only with ready to use services. Now, let’s move something really tableau specific. We should start moving Tableau java packages (jars) to our Linux box. Here is what and how:

  • create a new folder called tableau-apps. This is where the code will go
  • create a folder as tableau-apps/bin. Copy all jar files from Tableau Server’s bin/ folder recursively. If you are doin’ it right you should have repo-jars and repo-migrate-jars subfolders with jar files in it as well. You do not need everything right now, but this is only part two – and we will move all services in the next few weeks, not just TDFS!
  • create new folder as tableau-apps/lib. Just like in case of bins, copy all jar files from Tableau Server’s lib/ folder. Here you don’t need recursion, first level is enough.

That’s it, binaries are done. How about configuration?

TDFS Configuration – On Linux

Create a new folder filestore and create the following three files:

log4j.xml – to see what is going on:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE log4j:configuration PUBLIC "-//LOGGER" "log4j.dtd">

<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/">

    <!-- Appenders -->
    <appender name="file" class="org.apache.log4j.DailyRollingFileAppender">
        <param name="File" value="/home/ec2-user/filestore/filestore.log" />
        <param name="DatePattern" value="'.'yyyy-MM-dd" />
        <param name="encoding" value="UTF-8" />
        <layout class="org.apache.log4j.EnhancedPatternLayout">
            <param name="ConversionPattern" value="%d{yyyy-MM-dd HH:mm:ss.SSS Z}{UTC} %t %X{siteName} %X{userName} %-5p %X{requestId}: %c - %m%n" />
        </layout>
    </appender>


    <!-- 3rdparty Loggers -->
    <logger name="org.apache">
        <level value="warn" />
    </logger>

    <!-- Root Logger -->
    <root>
        <priority value="info" />
        <appender-ref ref="file" />
    </root>

</log4j:configuration>

connections.properties – this is required to know where to connect

# 54.203.245.18 - this is our windows box
#Thu May 28 07:12:36 UTC 2015
pgsql.host=54.203.245.18
jdbc.url=jdbc\:postgresql\://54.203.245.18\:8060/workgroup
primary.host=54.203.245.18
pgsql.port=8060
primary.port=8060

And finally the filestore.properties:

coordinationservice.hosts=localhost:12000
coordinationservice.operationretrylimit=5
coordinationservice.operationretrydelay=5000
coordinationservice.operationtimeout=30000
coordinationservice.sessiontimeout=60000
filestore.zookeeper.username=fszkuser
filestore.zookeeper.password=95d2cb4f8464d1560db0f8276b59e4bfe2e6ad5d
filestore.maxmutexretries=5
filestore.hostname=54.212.254.40
filestore.maxentriesinfilestofetch=4
filestore.root=/home/ec2-user/dataengine
filestore.port=9345
filestore.status.port=9346
filestore.transferreportintervalms=30000
filestore.reapholdoffms=7500000
filestore.inusereapholdoffms=86400000
filestore.filetypes=extract
filestore.allfileprocessingholdoffms=300000
filestore.somefileprocessingholdoffms=300000
filestore.reapfailedtransfersholdoffms=3600000
filestore_stale_folder_reap.delay_s=3600
filestore_zookeeper_cleaner.delay_s=60
filestore_missing_folder_fetch.delay_s=60
filestore_scheduled_folder_fetch.delay_s=60
filestore_scheduled_internal_folder_fetch.delay_s=60
filestore_failed_transfers_reap.frequency_s=86400
filestore.maxservertimeoffsetms=900000
worker.hosts=54.203.245.18,54.212.254.40

The windows server is still the 54.203.245.18 while 54.212.254.40 is the linux node. The filestore.root directory should point to our data engine directory (which was created in our part 1). And don’t forget to change the fszkuser user’s password.

Linux part is done, switch to windows.

TDFS Configuration – On Windows

In addition to zookeeper authentication TDFS blocks all connections which aren’t coming from worker nodes. Thus, we should add this node as working in the following files:

  • filestore.properties
  • connections.properties
  • connections.yaml
  • backgrounder.properties
  • clustercontroller.properties
  • dataengine/tdeserver_standalone0.yml

Practically you must:

  1. search and replace localhost string with the external IP of the server in all above listed files
  2. change worker.hosts to worker.hosts=windows_ip,linux_ip in filestore.properties andtdeserver_standalone0.yml due to whitelisting

You can find these files in %PROGRAMDATA%\Tableau\Tableau Server\data\tabsvc\config.

Start TDFS

Config done, let’s start TDFS:

$ java -Dconnections.properties=file:///connections.properties -Dconfig.properties=file:///$PWD/filestore.properties -cp ".:../tableau-apps/bin/app-tdfs-filestore-latest-jar.jar:../tableau-apps/bin/repo-jars/*:../tableau-apps/lib/*"  com.tableausoftware.tdfs.filestore.app.Main
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/ec2-user/tableau-apps/bin/repo-jars/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ec2-user/tableau-apps/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/ec2-user/tableau-apps/lib/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Now, you should see some nice log messages in filestore/filestore.log:

2015-08-04 12:49:24.709 +0000 Thread-2   INFO  : com.tableausoftware.tdfs.filestore.status.StatusService - Starting Status Service on port 9346
2015-08-04 12:49:24.729 +0000 main   INFO  : com.tableausoftware.tdfs.filestore.app.Main - FileStore Server started
2015-08-04 12:49:24.731 +0000 main   INFO  : com.tableausoftware.tdfs.filestore.controller.ControllerService - Registering filestore node with zookeeper...
2015-08-04 12:49:24.841 +0000 main   INFO  : com.tableausoftware.tdfs.filestore.controller.ControllerService - Registered filestore node with zookeeper.

If you are still with me then you just accomplished part 2: you have your TDFS and Zookeper on your Linux node in cluster mode.

Try things out

A typical test case would be an extract refesh. After refresh completion we should see the generated TDE file both on Windows and Linux.

Refresh Extract on Server

Now in the backgrounder.log we can see that it was able to communicate with TDFS:

2015-08-04 13:00:38.547 +0000 (Default,,,) pool-2-thread-1 : INFO  com.tableausoftware.tdfs.common.ExtractsListHelper - Wrote extracts to file C:\ProgramData\Tableau\Tableau Server\data\tabsvc\temp\allValidFolderIds651283455269617809\allValidFolderIds1576681360460061073.tmp
2015-08-04 13:00:38.562 +0000 (Default,,,) pool-2-thread-1 : INFO  com.tableausoftware.model.workgroup.service.FileStoreService - Uploaded allValidFolderIds file to File Store on host 54.203.245.18
2015-08-04 13:00:38.578 +0000 (,,,) backgroundJobRunnerScheduler-1 : INFO  com.tableausoftware.backgrounder.runner.BackgroundJobRunner - Job finished: SUCCESS; name: List Extracts for TDFS Reaping; type :list_extracts_for_tdfs_reaping; notes: null; total time: 1 sec; run time: 0 sec
2015-08-04 13:00:38.578 +0000 (,,,) backgroundJobRunnerScheduler-1 : INFO  com.tableausoftware.backgrounder.runner.BackgroundJobRunner - Running job of type :list_extracts_for_tdfs_propagation; no timeout; priority: 10; id: 19339; args: []
2015-08-04 13:00:38.594 +0000 (Default,,,) pool-2-thread-1 : INFO  com.tableausoftware.model.workgroup.workers.ListExtractsForTDFSPropagationWorker - Deleted 0 extract_sessions created prior to last DB start time
2015-08-04 13:00:38.609 +0000 (Default,,,) pool-2-thread-1 : INFO  com.tableausoftware.model.workgroup.workers.ListExtractsForTDFSPropagationWorker - done fetching orphans
2015-08-04 13:00:38.609 +0000 (Default,,,) pool-2-thread-1 : INFO  com.tableausoftware.model.workgroup.workers.ListExtractsForTDFSPropagationWorker - Found 4 recent valid extract records

On Windows:

c:\ProgramData\Tableau\Tableau Server\data\tabsvc\dataengine\extract>dir "a7\c0\{BE3565D0-4390-48E8-89D8-5A254A8FC675}\comments.tde"

08/04/2015  01:00 PM            49,034 comments.tde

On Linux:

$ find dataengine/ -exec ls -l {} \; | grep -i aug | grep com
-rw-rw-r-- 1 ec2-user ec2-user 49034 Aug  4 13:01 dataengine/extract/a7/c0/{BE3565D0-4390-48E8-89D8-5A254A8FC675}/\comments.tde

Hurray, our file was replicated successfully in our newly built cluster. This is the end, the happy end.

File Store / TDFS on Linux

If you have questions or comments just let me know and stay tuned for learn about more services – running on Linux.

Contact Us

We're not around right now. But you can send us an email and we'll get back to you, asap.

Not readable? Change text. captcha txt