Python/Tableau Integration | Part 1 - Installing TabPy

Like most people, I used the holiday season to brush up on my Python skills while checking out the latest updates on products in the industry. I was really impressed to hear about Tableau announcing the beta release of TabPy – a new API that allows Tableau to evaluate Python code in calculated fields. I’ve always been a huge fan of Tableau since I first invested my career into data visualization because Tableau is fairly easy to grasp as a beginner and Tableau constantly extends capabilities with APIs for techies (such as myself) to push the envelope on what’s possible.

PythonTableau.png

In late 2015, I presented on R-integration within Tableau for the local Twin Cities Tableau User Group and considered it a major step towards advertising powerful (open-source) analytical libraries for broader audiences. It made R a lot more accessible and approachable for teams kicking off data science initiatives. Trained as a Java developer, I’m personally a Python fan because I find it easier to read/write Python vs. R, but both are extremely powerful in their own way. I was naturally excited to hear about Tableau’s Python integration and discovered it has a similar set-up and usage as R-integration. Let’s check out how easy it is to get started!

TabPy is extremely easy to get set up. You will need:

1) PC/Mac running Tableau Desktop 10.1 (admin-level access advised)
2) Python 2.7 (Python.org)
3) TabPy (Github.com)
4) 30 minutes – start to finish (seriously)

TabPy has fantastic installation instructions for all operating systems on the GitHub page. The easiest path is to select the bright neon green button “Clone or download” and select to download the zipped folder directly to your machine.

Github-Download.png

Once downloaded, use the linked installation instructions for your machine to setup TabPy. Each operating system is slightly different, but the primary theme is to run the setup script within the TabPy folder structure via a terminal window. The installation is obnoxiously easy – likely 1-2 command line and installation complete. TabPy was built in a Python 2.7 environment (vs. Python 3.x) so it’s important to confirm your version (type “python –version” in command line to check). Python version issues addressed here.

Python-Version.png

Starting Up TabPy Server Locally

Once you kick off the “setup” script via command line within the downloaded TabPy folder, there will be a few minutes of waiting (and hopefully no admin access issues). What’s happening during that time?

1) TabPy checks to see if Anaconda is already installed on your machine (and installs if not found). Anaconda is a great platform for managing Python assets like environments, notebooks, and libraries. Package management can sometimes be a pain while working with Python so Anaconda is a welcome add-on for TabPy.

2) Within Anaconda, a new environment will be created called ‘Tableau-Python-Server.’ Anaconda supports multiple environments and most end users don’t need to worry about this step – just know that it was created for you.

3) Install necessary Python libraries required for TabPy to work (in addition to what default Anaconda installation provides). It’s a great suite of Python libraries to get started with!

4) Most importantly, starts up the TabPy server and gives you instructions on how to start up in the future. Take note!

TabPy-Running.png

Congratulations! If your terminal/command prompt displays similar results, the TabPy installation was successful and the server is running locally on your machine. It’s important to remember the directory where Anaconda’s ‘Tableau-Python-Server’ environment is located as you will need that command provided to start the server in the future.

The server is running a background process using a Python web framework called Tornado. Running TabPy initiates Tornado and exposes a few REST APIs built by Tableau to execute Python code at its requested by Tableau Desktop. When you close your terminal window or restart your machine, the Tornado process stops. It needs to be started each time you want to run Python directly in Tableau Desktop. Most people probably don’t care about what’s happening behind the scenes, but I’m a fan of knowing a little about how it’s all pieced together. Cool stuff!

The final step is getting your Tableau Desktop to communicate with the local process running TabPy. Open up your Tableau Desktop v10.1+ and navigate to Help > Settings & Performance > Manage External Service Connection… The TabPy confirmation message in the terminal window mentioned that “web service listening on port 9004” so provide the server of “localhost” and port “9004”. Press “Test Connection” and voila!

Success.png

What’s Next?

Tableau Desktop is ready to evaluate Python code! Check out Part II of the series (coming soon!) on Python/Tableau integration where I dive into creating calculated fields with Python code and visualizing outputs.

Technical Considerations

This walk-through provided detailed instructions for setting up TabPy locally on your computer. The steps for running a remote server that multiple users could access are virtually the same, but the final step would require the IP location of the server (i.e. not ‘localhost’) and appropriate port number. Keep in mind that the communication between Tableau and TabPy is not encrypted. Some organizations may be comfortable as servers often are located within their organization’s firewall, but be mindful of the potential security risk during data analysis. Additionally, Python (like many programming languages) can negatively impact server security via ‘write’ access to server file systems, OS command execution, and the ability to download new libraries potentially containing malicious code.

I also need to give a shout-out to Bora Beran for his TabPy contributions and troubleshooting advice for the community!