How to pass environment variables in Jupyter Notebook

(Sharing some personal suffering)

One thing that gets me mad it’s to make several .csv/.txt files in my computer to perform some analysis. I personally prefer to connect directly in some RDBMS (Redshift) and get the data in some straightforward way and store the query inside the Jupiter Notebook.

The main problem with this approach is: a high number of people put their passwords inside the notebooks/scripts and this is very unsafe. (You don’t need to believe me, check it by yourself)

I was trying to pass the environment variables in a traditional way using export VARIABLE_NAME=xptoSomeValue  but after starting the Jupyter Notebook I get the following error:



KeyError                                  Traceback (most recent call last)
<ipython-input-13-2288aa3f6b7a> in <module>()
      2 import os
----> 4 HOST = os.environ['REDSHIFT_HOST']
      5 PORT = os.environ['REDSHIFT_PORT']
      6 USER = os.environ['REDSHIFT_USER']

/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/UserDict.pyc in __getitem__(self, key)
     38         if hasattr(self.__class__, "__missing__"):
     39             return self.__class__.__missing__(self, key)
---> 40         raise KeyError(key)
     41     def __setitem__(self, key, item):[key] = item
     42     def __delitem__(self, key): del[key]


For some reason, this approach didn’t work. I make a small workaround to start using some environmental variables when I call of jupyter notebook command in that way:

env REDSHIFT_HOST='myRedshiftHost' REDSHIFT_USER='flavio.clesio' REDSHIFT_PORT='5439' REDSHIFT_DATA='myDatabase' REDSHIFT_PASS='myVeryHardPass' jupyter notebook

I hope it helps!

How to pass environment variables in Jupyter Notebook

Six reasons your boss must send you to Spark Summit Europe 2017

It’s redundant to say that Apache Spark is becoming the most prominent open-source big data cluster-computing framework in the last 2 years, where this technology not only shattered old paradigms of general purpose distributed data processing, but also built a very vibrant, innovation-driven, and receptive community.

This is my first time at Spark Summit, and for me personally, it’s a great time as Machine Learning professional to be part of such event that has grown dramatically in the last 2 years only.

Here in Brazil we do not have such tradition to invest in conferences (that are some cultural reasons involved that needed to break down in another blog post), but this is the six reasons that your boss must send you to Spark Summit Europe 2017:

  1. Accomplish more than the rest: While some your company competitors are heavily busy making re-work in old frameworks, your company can stay focused to solve real problems that permit scalability for your business using bleeding edge technologies.
  2. Stay ahead of the game: You can choose one of these two sentences to put in your resumé: 1) “Worked with Apache Spark, the most prominent open-source cluster-computing framework for Big Data Projects“; or 2) “Worked with <<Put some obsolete framework the needs a couple USD millions to be deployed and have 70% fewer features than Apache Spark and the most stable version was written 9 years ago and the whole marketing are migrating>>”. It’s up to you.
  3. Connect with Apache Spark experts: In Spark Summit you’ll meet some real dealers of Apache Spark, not someone with marketing pitch (no offense) offering difficulties (e.g. closed-source, buggy platform) to sell facilities (e.g. never-ending-consulting-until-drain-your-entire-budget style, sell (buggy) plugins, add-ons, etc… ). Some of Spark experts are Tim Hunter, Tathagata Das,  Sue Ann Hong, Holden Karau, to name a few.
  4. Network that matters: I mean people with shared interest in enthusiasm over an open-source framework Apache Spark and technology, headhunters of good companies that understand that data plays a strong role at business; not some B.S. artist or pseudo-tech-cloaked-sellers someone else.
  5. Applied knowledge produce innovation, and innovation produce results: Some cases using Apache Spark to innovate and help business – Saving more than US$ 3 million using Apache Spark and Machine Learning, managing 300TB data workload using Apache Spark, real-time anomaly detection in some systems, changing the game of digital marketing using Apache Spark,  and predicting traffic using weather data.
  6. Opting out will destroy your business and your career: Refuse to get knowledge and apply that it’s the fast way to destroy your career with stagnation in old methods/process/platforms and become obsolete in a few months. For your company, opting out of innovation or learning new methods and technologies that can help to scale the business or enhance productivity, it’s a good way to get out of business in a few years.

To register and learn more about the event, please visit Spark Summit 2017 and follow spark_summit on Twitter.

Six reasons your boss must send you to Spark Summit Europe 2017

See you at Spark Summit Europe 2017

In October 26, my friend Eiti Kimura and I will provide a talk called Preventing leakage and monitoring distributed systems with Machine Learning at Spark Summit Europe 2017 where we’ll show our solution to monitoring a highly complex distributed system using Apache Spark as a tool for Machine Learning.

We’re very excited to share our experience in this journey, and how we solved a complex problem using a simple solution that saved more than US$ 3 million in the last 19 months.

See you at Spark Summit at Dublin.

See you at Spark Summit Europe 2017