Introducing TableauFS: File System on Tableau Server Repository
We have so many APIs for Tableau Server but to be honest even the simplest things cannot be achieved without extensive amount of work. Lets see what my colleagues / clients demanded from me:
- Easy way to move workbooks and data sources across servers (large enterprise environment, between multi-tenant systems)
- Version control workbooks
- Search by workbook/data source contents, like where a particular connection or table is used in published workbooks
- Point in time recovery of workbooks, backup individual projects or workbooks (similar to version control)
- Mass-change workbooks and data sources (we have tableau servers with 5.000-10.000 workbooks/data sources)
Lets see how would a file system solve these issues?
- Move workbooks? Just copy file between servers like “scp dir1/file server2:dir1/”
- Version control? Use git with annex or bop
- Search contents? grep or zipgrep on files
- Run tools like TWB Auditor to understand your workbooks’ contents
- Point in time recovery? git or rsync contents to a snapshot aware file system
- Mass change? sed, ruby, python, etc or some twbx editing tool like powertools directly on the server files
You see, this is why you desperately need a file system for your published data. Good news everyone, last week I wrote one…
…and I called it as TableauFS (pragmatically over creativity). It’s a FUSE based userspace file system driver built in pure ANSI C (for performance and fun) on top of Tableau’s repository server. It allows to mount tableau servers with all data sources and workbooks directly to the file system. File information and contents are retrieved on-access without any local persistence or caching so when you cat a file it will go to tableau and retrieve the chunks one by one.
The file system connects directly to the postgresql repository database using readonly credentials for read only mode or tblwgadmin or postgres for read write access.
Do you want it? Sure you do.
You need five packages in advance to compile it: fuse-devel, postgresql-devel, cmake, makefile and gcc. To work with workbooks/datasources larger than 2GB you need postgresql version 9.3+ otherwise the file limit is 2GB.
You can clone the source from https://github.com/tfoldi/fuse-tableaufs.
For getting the binaries just type cmake . && make && make install and you will have everything installed. The executable will be installed as /usr/local/bin/tableaufs but you can use mount directly.
[[email protected]]/usr/src/fuse-tableaufs# cmake . && make && make install -- Build type: -- C FLAGS: -g3 -Wall -Wwrite-strings -Wcast-qual -Wpointer-arith -Wconversion -Wcomment -Wcast-align -Wshadow -Wredundant-decls -- Found PostgreSQL: /usr/lib64/libpq.so -- Configuring done -- Generating done -- Build files have been written to: /usr/src/fuse-tableaufs Scanning dependencies of target tableaufs [ 50%] Building C object src/CMakeFiles/tableaufs.dir/tableaufs.c.o [100%] Building C object src/CMakeFiles/tableaufs.dir/workgroup.c.o Linking C executable tableaufs [100%] Built target tableaufs [100%] Built target tableaufs Linking C executable CMakeFiles/CMakeRelink.dir/tableaufs Install the project... -- Install configuration: "" -- Installing: /usr/local/bin/tableaufs
Configuring tableau server database
To exploit all features (include read-write mode) you need tblwgadmin or similar user with superuser privilege while for read only access read only user is almost enough. Unfortunately, Tableau’s readonly user does not have select access on pg_largeobject (as Jonathan Macdonald discovered in this post), so you have to logon as tblwgadmin and issue:
GRANT SELECT ON pg_largeobject TO readonly;
to leverage the full read only experience. It will not harm your system (this is still read only) but unsupported.
The steps here:
- Enable readonly user with tabadmin dbpass –username readonly <password> command as documented here
- Check your pgsql admin password in tabsvc.yml file. The default location is C:\ProgramData\Tableau\Tableau Server\config but depending on your ProgramData folder, this can be different. lease note that ProgramData folder can be hidden.
- Go to Tableau Server\9.0\pgsql\bin folder and issue psql -h localhost -p 8060 -U tblwgadmin workgroup command and paste the password from tabsvc.yml
- Execute the grant select statement
You can mount it with default mount unix command as:
$ sudo mount -t fuse -o "ro,pghost=tableau-srvr,pgport=8060,pguser=readonly,pgpass=fuubar" tableaufs /mnt/tableau-dev $ mount | grep fuse tableaufs on /mnt/tableau-dev type fuse.tableaufs (ro,relatime,user_id=0,group_id=0)
“ro” stands for read only mode as the default mode is rw. You can mount your server directly with tableaufs command as well:
$ tableaufs -o "pghost=tableau-srvr,pguser=postgres,pgpass=lolobar" /mnt/tableau-dev
To unmount, simply:
$ umount /mnt/tableau-dev
TableauFS maps Tableau repository to the following directory structure:
/Sitename/Projectname/Workbook 1.tbw[x] /Sitename/Projectname/Workbook 2.tbw[x] /Sitename/Projectname/Datasource 1.tds[x]
You can go to each directory, list and stat files, find without any limitation. Packaged and non-packaged objects have different file names, tbwx and tdsx are packaged while twb and tdx are plain XML files. You can read, grep, find, search and edit them just like regular files. Whatever you do will be executed on tableau server, the FS does not cache or store blocks locally.
Search in workbooks
We will explore this topic in details in some of my forthcoming posts, but let just note that you can search easily inside XML and zipped XML objects. In the below example I used zipgrep to list all data connection from a packaged workbook. No tabcmd get, no logon to the tableau web portal, no rest api. Just plain unix commands:
Editing existing workbooks are also possible, just check out this thread: http://community.tableau.com/message/369406#369406
Version control & object based point in time recovery
One of the best things in a file system is that you can snapshot or version control its contents. You can expect an extensive post on how to version control and backup automatically all (or selected) tableau objects, how to view differences between changes in a human readable way using only open source tools. In advance, just to keep you entertained here is an example how to create a new git repository and add all of your tableau workbooks and data sources in it:
Looks nice? Wait until I just show my set of git extensions to manage zip packaged objects in git repo.
I love speed and performance, especially when it matters and in a file system it definitely does. Everything is written in pure ANSI C, using only fuse and postgres client libraries.
[[email protected] Tableau Samples]# cp Finance.twbx /tmp/ [[email protected] Tableau Samples]# dd if=/dev/zero of=Finance.twbx bs=64K count=10k 10240+0 records in 10240+0 records out 671088640 bytes (671 MB) copied, 35.7648 s, 18.8 MB/s [[email protected] Tableau Samples]# dd if=Finance.twbx of=/dev/null bs=64K count=10k 10240+0 records in 10240+0 records out 671088640 bytes (671 MB) copied, 20.2003 s, 33.2 MB/s [[email protected] Tableau Samples]# cp -f /tmp/Finance.twbx .
On my laptop with virtual tableau server it the IO throughput is between 15-35 MB/sec, which is definitely not bad for a network file system.
Do you have question or a good idea how to make it better? Drop a line or ping me at twitter (@tfoldi).