Reading from Tableau Data Extracts – The complete story
Testing is everything. Without proper unit tests we cannot build anything, this is one of the most important aspect of any software development methodology. When I wrote a clojure wrapper around tableau data extract API I had difficulties to write good unit tests simply because there is no automated way to validate contents of TDE files easily (without Server). How should I check if all data what I wrote to an extract file actually match with the expected values? Well, using only Tableau tools I cannot.
Luckily I have solution for this like for everything.
Back in 2012 when tableau released the TDE API it was write only and Windows only. Both limitations were cumbersome. If I want to write robust ETL process I need to know the last/max values of different columns. Also, most of my customers run their ETL applications on Unix systems. They demanded some solution from my team. We are consultants, we love money and for demand we always have supply.
The first “native” version
I grew up in Eastern Europe in the era of socialism. We did not have fancy computers at home and to be honest we had no money for the latest cutting edge applications. When we wanted to play with some game or application the first reflex was to start SoftICE or other disassembler and start looking into the registers to catch something. This was decades ago and I am not a rioting teenager anymore but I still remember the flow, how things needs to be disassembled to write things on top of it.
Back to the original topic. If you had look into DataExtract.log file you saw what is going on. When you work with tdeserver64 (that is the database server handles extract related operations) it will listen on named pipe or network sockets. Named pipe is the default from Desktop or from extract API while TCP sockets are used in server mode.
My first thought was to solve all problems in one round. Reverse engineer the network protocol and build a library on Linux which will connect to the windows machine’s tdeserver process. It was such a great idea that I even started to work on the first version after my new year’s eve hangover on January first:
First of all it is not so easy to sniff data on localhost. Unlike on *nix boxes the loopback device cannot be moved to promiscuous mode. Same applies to named pipes, dumping IO calls in other processes is pretty inconvenient. The solution for that was using Detours. Detours is an API re-route library which helps hook into applications and windows system calls. It allows you to inject your DLLs into any application so you we can safely say that Detours is your best friend right after dogs.
After I had everything together I wrote the first version where the library was platform independent using plain tcp socket calls. You still required to run the server on windows but the client was platform independent.
It worked but I felt that it wasn’t the best solution. I was able to read and write most of the data types but handling value arrays were relatively painful. So I moved forward.
Disassemble & Decompile
When you open IDA Pro you start feeling the power of the dark side. Very addictive. It turned out that I was wrong in the whole time, it is much easier to link directly against tableau DLLs than to deconstruct the network communication. I made a nice command line tool to get contents of the TDE file.
I was proud so I dropped a mail to tableau and asked if I can publish this or not.
End of story
Few days later I got a friendly reminder citing the license :
You shall not (and shall not allow any third party to): (a) decompile, disassemble, or otherwise reverse engineer the Software or Media Elements or attempt to reconstruct or discover any source code, underlying ideas, algorithms, file formats or programming interfaces of the Software or Media Elements by any means whatsoever
The whole project was moved to the shelf until last week. Then I used these tools and libraries for automated testing my TDE library. The tests passed so it will go back to that shelf and probably remains there forever.
Do you have some crazy idea and looking for a solution? Share with us!