Emacs IPython Notebook and “ESS in the Cloud”

Back in 2009, one of the first advantages that made me play around Emacs again when I went back to graduate school was Emacs Speaks Statistics. It allowed me to avoid the pain of using the R-console which was frankly a miserable experience.  The deficiencies of that interface stood in stark contrast to the enormous benefits to be had by using a FREE and open source statistical computing environment where each function could be examined and verified.  Data analysis was no longer tied to a machine with a working license server.  I could now work from home or even on my lengthy commute from Coney Island to school in Manhattan.  Three dead hours of my day now became my most productive time. Although I do not have the resources to donate to the project I have cited it in my scientific work and encourage others who have used it to do the same.

Although R has continued to improve since I started using it.  The Python Data community has truly blossomed, albeit from a much lower base. Tools like  numpy, scipy networkx, NLTK, sympy, pandas, rpy2 and particularly IPython have made Python a formidable competitor in the scientific computing space.  In fact I am not going to give links to the source code because best practice is to install them in a virtual environment using pip.  The best instructions I have found for Linux are here.  I will write a tutorial for Mac and Windows next week.  The reason for IPython has had such a profound impact are human rather than technical.  IPython prints detailed error messages where as R prints error messages that are cryptic at best.  Python has a vibrant community with numerous initiatives to reach under-served, under-computed and under-represented groups including PyLadies and other groups.  R help is notoriously caustic.

How caustic?

Funny you should ask.  Trey Causey a PhD student at University of Washington (where R. Doug Martin used to teach statistics and owned the R predecessor language S+) wrote a blog post asking whether R-help had gotten meaner.   There are 20 comments on his post and it generated a response article by Columbia’s Andrew Gelman who has an 35,174 citations and h-index of 63. (That is 63 papers cited at least 63 times.)  I took a less sophisticated approach.  I read their posting guide and answers and I vowed to never ask a question.  If I want to be abused like that I will go find a job on a trading desk.  But searching the archives and other sources, I got by.

IPython Notebook

I discovered the notebook from this post on R-bloggers back in
November 2012.  The browser was a great way to show work across
various operating systems.  But whoa, did this mean I actually had to
edit in the browser.  Christ! It was like using Word, or Notepad++.
Surely we can do better. Well I couldn’t, but Takafumi Arakaki could.
He made the IPython Notebook a mode in Emacs.  A powerful editor, the
ability to work interactively and display the results in the browser
where bosses, students and PI’s feel at home.  If you have not set it
up, please read my tutorial.  But at the Data Science for Social Good
Fellowship. I am looking at data that is simply to big for my laptop. I
needed to run IPython remotely on an amazon ec2 instance but edit the
interactive session locally.  These servers have no windowing software
(x11) and it violates the terms of service to install it.  There were
a few choices.

1.  Run IPython remotely (on the instance) and edit it a local non
window version of Emacs.

2. Run IPython remotely on a public ip over http. (This is a really
bad idea for reasons I will explain.)

3. Run IPython remotely on a public ip over SSL/TLS with a password.
(A somewhat less bad idea bad idea.)

4. Run IPython remotely on a port on the remote localhost, 127.0.0.1
and forward that port to our local localhost (no typo there) via
ssh.  Then we can pick it when we open the notebook list in Emacs.
The command is M-x ein:notebooklist-open. ‘M’ here, is ‘Meta’ which
on Linux or Windows is mapped to the Alt key and on Mac is mapped
to the command key.

Why everything sucks but the last option.

1. If you edit on the remote machine you are using Emacs inside the
bash shell.  Any extended key-bindings don’t work including the
bindings for the emacs-ipython notebook (ein).  Everytime I wanted
to execute a cell I had to type M-x
ein:notebook-execute-and-goto-next insteand of M-RET.  That sucks!

2. Run IPython remotely on a public ip over http. Whoa, now we have a
process listening that can execute linux commands on a shared remote
computer that is completely unsecured. That sucks!

3. Run IPython remotely on a public ip over SSL/TLS with a
password. Ok, so this is what the IPython documentation suggests.
Here is the link and you should also check out this github repo.
The difference is what they name the key and certificate but the
first set of instructions did not work for me while the second did.
Choose a good password it probably is no worse than buying a book
at Amazon.  But it still leaves you editing in the browser. That
sucks!

4. Finally the fourth option is confusing but it gets done what we want.
The important thing to understand is that we need to forward the remote
machine’s local port to our localhost using -L option in ssh.  The best
explanation I could find is here  in the section on Forwarding Local Ports to Remote.

This is hard, but it does not suck!

Instructions

1. Start on your machine.  Set up a .ssh/config file where you define the
host and identity.  Good directions are here. It is worth stating, the local machine is your laptop and the remote is the server you are using.  The result is that establishing an ssh  connection should be as easy as:

$ ssh myServer

Here is a sample config file based on my own.  This will not work on your machine.

Screenshot from 2013-07-12 09:41:18

2. Configure ein. See my issue on the ein repo to set your ein:console args
I set up a profile locally rather than used sshfs as Takafumi suggested.
The directions are in this repo If you are using a virtual environment, the configuration will be in:
~/.config/ipython

3. Make a directory on the remote machine to put your notebook files.  Start
the server normally.

$ ipython notebook –pylab=inline –no-browser –port=6000

server

4. From another terminal on your local machine

$ ssh -N -f -L 7000:127.0.0.1:6000  myServer

bothTerminals

5. Open emacs.  Type M-x ein:notebooklist-open.  When Emacs asks which port say
7000.

Congratulations, you now have an ssh connection to your notebook on a
remote server in local emacs.  And

you know what, that my friend, does
not suck at all.

emacsEditingRemoteNotebook

 

[[You can the language by following this R-link and Vincent Goulet generously makes Emacs with ESS available through his website for both Mac OS X and Windows.]]

Setting up a virtual environment with Ipython, numpy and pandas

Most of the time you read about setting up virtual environments, it is for web development.  But the same benefits hold for analysis and research software.  You want to be able to reproduce results.  It also increases security not to be adding all the unverified libraries with machine level privileges. This post is a minor modification of the outstanding tutorial I have been using for the last few months.  Since it is two years old, there is another version of python and it does not cover IPython, I will repeat the steps here.

First install Pythonbrew and another version of python

I use apt-get in ubuntu so type

$ cd ~

$ sudo apt-get install libsqlite3-dev libbz2-dev libxml2-dev libxslt-dev curl

then get pythonbrew

$ curl -kL http://github.com/utahta/pythonbrew/raw/master/pythonbrew-install | bash

This line gets the repository and executes through bash.  We will need to modify the configuration file for bash.

$ echo "source $HOME/.pythonbrew/etc/bashrc" >> ~/.bashrc

Don’t forget the dot in .bashrc.  Now nothing changes until this file is executed by the operating system:

$ source .bashrc

This should complete with no errors.  The next step is to install python 2.7.3.  It is going to take a few minutes to complete.

$ pythonbrew install --verbose 2.7.3

And now we have to tell the system to use this new version of python

$ pythonbrew use 2.7.3

Install virtualenv and virtualenvwrapper

We have to install virtualenv in the system’s python and virtualenvwrapper in the new python.

$ sudo apt-get install python-virtualenv

$ pip install virtualenvwrapper

The first line only needs to be executed once.  It works for the whole system.  The second one needs to be done for each new python environment you create. Make a hidden directory to hold the virtual environments.

$ mkdir ~/.virtualenvs

Add the following three lines at the end of your .bashrc.

$ export WORKON_HOME=$HOME/.virtualenvs
$ export VIRTUALENVWRAPPER_PYTHON=$HOME/.pythonbrew/pythons/Python-2.7.3/bin/python
$ source $HOME/.pythonbrew/pythons/Python-2.7.3/bin/virtualenvwrapper.sh
You will need to use an editor.  Then you have to reload them:
$ source .bashrc

Create the virtual environment

 

To create a virtual environment called ‘no-more-drug-war’, type:

$ mkvirtualenv --no-site-packages no-more-drug-war

Important libraries

So, in order to know what packages we have installed at any time, we install yolk.

$ pip install yolk

Do not type sudo!  To see what it installed at any time:

$ yolk -l

A list of further packages for IPython are available here.  Type these individually and they each may take a few minutes to install.

$ pip install pyzmq

$ pip install pygments

$ pip install tornado

$ pip install nose

$ pip install numpy

$ pip install scipy

$ pip install matplotlib

$ pip install pandas

Turning it on and off

Now to get out of your virtual environment, just type

$ exit

To get back in, type:

$ workon no-more-drug-war

Good luck!

Intro and iPython

So I was able to get this to post to my Measure of Justice. However I was not able to get it to work here. Since then, to my surprise I have found myself working less with the visually amazing, but temperamental iPython and more with Emacs org-mode.

The ability to toggle between thirty different languages and output to html or LaTeX is pretty overwhelming. This is not to say that I have had no trouble at all. Python sessions were broken for a while. Overall it has been a pleasant experience. If you are interested start with the article in the Journal of Statistical Software. But that is just the advertisement for what it can do. To master the usage you should go to the supplementary materials. You can download both the source code for the paper and the babel library. None of this is behind a pay-wall.

Here are the tricks:

1. The paper uses an initialization file, but you don’t need to do that. I generally just put an elisp block in the paper and execute that.

2. They defined a Journal of statistical software class to comply with formating requirement. You will generally just output to LaTeX

3. Any questions, just reach out to me on Twitter @emisshula

Two important news pieces on cybercrime

1.  The FBI is executing warrants against the Wiki-leaks supporters.  I had long suspected that most of these guys were not as tricky (sophisticated) as they thought they were.http://www.mcclatchydc.com/2011/01/27/107589/fbi-serves-40-warrants-in-search.html
2.  Another article on government picking at your electronic data.  Who gets picked for this is a crapshoot.  The problem is not that they are friend-ing people on facebook to snoop but that there are too many illegal acts they can investigate.  It is time to go after rogue prosecutors, people always go after the cop.  We need to change direction at the top.http://www.huffingtonpost.com/gw-schulz/when-can-cops-gain-access_b_815211.html

Introduction

This is the first issue so it is worth saying what I would like to cover.  I am interested in increasing labor market participation and public safety.  I believe that is done through life long learning and remediation.  I am particularly interested in using mathematics to decide what is a threat and what is not.