Installing kerasR

This is a quick reference to installing kerasR, a slim wrapper around Keras starting with the required Python packages.

Python packages

Create a virtualenv:

$ virtualenv pydata --python=/usr/bin/python3
Running virtualenv with interpreter /usr/bin/python3
Using base prefix '/usr'
New python executable in /home/brian/pydata/bin/python3
Also creating executable in /home/brian/pydata/bin/python
Installing setuptools, pip, wheel...done.
$ source pydata/bin/activate
(pydata) $

Install keras. This will also install the other prerequisites for doing any sort of datasciency stuff in Python (numpy, pandas) as well as Theano. Tensorflow will be installed in the next step.

(pydata) $ pip install keras
Collecting keras
Collecting six (from keras)
Using cached six-1.10.0-py2.py3-none-any.whl
Collecting theano (from keras)
Collecting pyyaml (from keras)
Collecting scipy>=0.14 (from theano->keras)
  Downloading scipy-0.19.0-cp35-cp35m-manylinux1_x86_64.whl (47.9MB)
    100% |████████████████████████████████| 47.9MB 27kB/s
Collecting numpy>=1.9.1 (from theano->keras)
  Downloading numpy-1.13.0-cp35-cp35m-manylinux1_x86_64.whl (16.9MB)
    100% |████████████████████████████████| 16.9MB 66kB/s
Installing collected packages: six, numpy, scipy, theano, pyyaml, keras
Successfully installed keras-2.0.4 numpy-1.13.0 pyyaml-3.12 scipy-0.19.0 six-1.10.0 theano-0.9.0

Install Tensorflow:

(pydata) $ pip install tensorflow

kerasR

In R, install the kerasR package:

> install.packages("kerasR")
Installing package into ‘/home/brian/R/x86_64-pc-linux-gnu-library/3.4’
...
** testing if installed package can be loaded
successfully loaded keras
* DONE (kerasR)

This may also install the reticulate package, which is an interface to Python objects and methods.

A guide to using kerasR is provided as a vignette.

Troubleshooting

If you get an error message when executing library(kerasR) saying:

> library(kerasR)

keras not available
See reticulate::use_python() to set python path,
then use kerasR::keras_init() to retry

this means kerasR (or more specifically, reticulate) can't find the keras python package, you need to start R after loading your virtualenv:

$ source pydata/bin/activate
(pydata) $ R
> library(kerasR)
Using TensorFlow backend.
successfully loaded keras
>

Philippine Startups Wordcloud

In [2]:
import pandas as pd
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from string import punctuation

I went to the Kickstart and Ideaspace websites and scraped the descriptions of the startups they funded.

And by scraped, I mean I cut-and-paste stuff into a Google Sheets document.

In [3]:
raw = pd.read_csv("../files/Philippine Startups - Sheet1.csv")
In [4]:
descriptions = raw['Long Description']
descriptions.head()
Out[4]:
0    Arthrologic designs and develops a TKA (Total ...
1    ​BluLemons Gaming Studio is an all-Filipino th...
2    Croo enables people to swiftly send informatio...
3    The Company has the opportunity to create the ...
4    Despite current transponder technologies avail...
Name: Long Description, dtype: object
In [5]:
raw_words = word_tokenize(" ".join(descriptions))
In [6]:
stop_words = set(stopwords.words('english') + list(punctuation))

words = [w.lower() for w in raw_words if w.lower() not in stop_words and not w.isdigit() and len(w) > 3]
In [7]:
words[:20]
Out[7]:
['arthrologic',
 'designs',
 'develops',
 'total',
 'knee',
 'arthroplasty',
 'system',
 'simple',
 'evidence-based',
 'utilizing',
 'successful',
 'clinical',
 'data',
 'improve',
 'surgical',
 'skills',
 'easy-to-use',
 'surgeon-friendly',
 'instrumentation',
 'assure']
In [8]:
word_str = " ".join(words)
word_str[:1000]
Out[8]:
'arthrologic designs develops total knee arthroplasty system simple evidence-based utilizing successful clinical data improve surgical skills easy-to-use surgeon-friendly instrumentation assure successful predictable results offer competitive cost provide greater majority patients access technology improve living \u200bthe product asian-fit 2-component total knee arthroplasty system definitive surgical treatment severe end-stage osteoarthritic knees \u200bblulemons gaming studio all-filipino theme mobile gaming studio develop games based filipino culture \u200bthey believe creating mobile games great avenue showcase philippines offer globally vision create games impact filipino youth across cultures croo enables people swiftly send information loved ones need arises without typing anything calling anyone button accessory clicked sent predetermined emergency contacts smartphone application text message contains important information person’s current location nearby landmarks person’s contacts equipped '
In [9]:
with open("../files/phstartupwords.txt","w") as f:
    f.write(word_str)

Lazy Wordcloud Visualization

Enter the contents of the file generated into http://www.wordclouds.com/, and manually remove the words that occur less than 3 times: