Source: San Francisco startup descriptions vs. Silicon Valley startup descriptions using Crunchbase data

import pandas as pd
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from string import punctuation

I went to the Kickstart and Ideaspace websites and scraped the descriptions of the startups they funded.

And by scraped, I mean I cut-and-paste stuff into a Google Sheets document.

raw = pd.read_csv("../files/Philippine Startups - Sheet1.csv")

descriptions = raw['Long Description']
descriptions.head()

0    Arthrologic designs and develops a TKA (Total ...
1    ​BluLemons Gaming Studio is an all-Filipino th...
2    Croo enables people to swiftly send informatio...
3    The Company has the opportunity to create the ...
4    Despite current transponder technologies avail...
Name: Long Description, dtype: object

raw_words = word_tokenize(" ".join(descriptions))

stop_words = set(stopwords.words('english') + list(punctuation))

words = [w.lower() for w in raw_words if w.lower() not in stop_words and not w.isdigit() and len(w) > 3]

words[:20]

['arthrologic',
 'designs',
 'develops',
 'total',
 'knee',
 'arthroplasty',
 'system',
 'simple',
 'evidence-based',
 'utilizing',
 'successful',
 'clinical',
 'data',
 'improve',
 'surgical',
 'skills',
 'easy-to-use',
 'surgeon-friendly',
 'instrumentation',
 'assure']

word_str = " ".join(words)
word_str[:1000]

'arthrologic designs develops total knee arthroplasty system simple evidence-based utilizing successful clinical data improve surgical skills easy-to-use surgeon-friendly instrumentation assure successful predictable results offer competitive cost provide greater majority patients access technology improve living \u200bthe product asian-fit 2-component total knee arthroplasty system definitive surgical treatment severe end-stage osteoarthritic knees \u200bblulemons gaming studio all-filipino theme mobile gaming studio develop games based filipino culture \u200bthey believe creating mobile games great avenue showcase philippines offer globally vision create games impact filipino youth across cultures croo enables people swiftly send information loved ones need arises without typing anything calling anyone button accessory clicked sent predetermined emergency contacts smartphone application text message contains important information person’s current location nearby landmarks person’s contacts equipped '

with open("../files/phstartupwords.txt","w") as f:
    f.write(word_str)

Lazy Wordcloud Visualization¶

Enter the contents of the file generated into http://www.wordclouds.com/, and manually remove the words that occur less than 3 times:

Lazy Wordcloud Visualization¶

Comments