Data Community DC

Last modified


Data Science DC: Tools and Public Data Sets

I think I should mine the Data Science DC Web Site and Presentations for Tools and Data Sets to review and reuse.

I did GDELT and the Global Terrorism Database that way and heard about Julia and a Graph Data Set recently.

Graphics Guru, Ed Tufte said recently (also Financial Times) that  Data-Driven Documents "is a person or organization in his field whose work will be most important in the future."

So I want to audit some examples of Michael Bostock's visualization work to see what data sets he used and if I can re-create his visualizations in Spotifre.



Spotfire Dashboard

Research Notes


Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library. The library, largely written in Julia itself, also integrates mature, best-of-breed C and Fortran libraries for linear algebra, random number generation, signal processing, and string processing. In addition, the Julia developer community is contributing a number of external packages through Julia’s built-in package manager at a rapid pace. Julia programs are organized around multiple dispatch; by defining functions and overloading them for different combinations of argument types, which can also be user-defined. For a more in-depth discussion of the rationale and advantages of Julia over other systems, see the following highlights or read the introduction in the online manual.

My Note: Is this like Mathematica? So one needs a graphics package to use with it.

My Note: Did he present on Julia on July 30th and used a graph database? Yes


My Note: THis looks like what I do in MindTouch!

Getting the Julia Project Data
-The data is contained in a 500MiB tarball. Unfortuantely, GitHub no longer allows me to
-store these files there, so it's up on S3. Running the following commands should download
+The data is stored as a Neo4j database that is compressed into a 500MiB
+tarball. Unfortuantely, GitHub no longer allows me to store these files there,
+so it's up on S3. Running the following commands should download
and unpack the data:

     curl -o julia.graph.db.tar.gz


A Julia Meta Tutorial



If you are thinking about taking Julia, the hot new mathematical, statistical, and data-oriented programming language, for a test drive, you might need a little bit of help. In this blog we round up some great posts discussing various aspects of Julia to get you up and running faster.

Why We Created Julia

If only you could always read through the intentions and thoughts of the creators of a language! With Julia you can. Jump over to here to get the perspectives of four of the original developers, Jeff BezansonStefan KarpinskiViral Shah, and Alan Edelman.

We are power Matlab users. Some of us are Lisp hackers. Some are Pythonistas, others Rubyists, still others Perl hackers. There are those of us who used Mathematica before we could grow facial hair. There are those who still can’t grow facial hair. We’ve generated more R plots than any sane person should. C is our desert island programming language.

We love all of these languages; they are wonderful and powerful. For the work we do — scientific computing, machine learning, data mining, large-scale linear algebra, distributed and parallel computing — each one is perfect for some aspects of the work and terrible for others. Each one is a trade-off.

We are greedy: we want more.

An IDE for Julia

If you are looking for an IDE for Julia, check out the Julia Studio. Even better, Forio, the makers of this IDE, offer a nice series of beginner, intermediate, and advanced tutorials to help you get up and running.

Julia Documentation

By far the most comprehensive and best source of help and information on Julia are the ever growing Julia Docs which includes a Manual for the language (with a useful getting started guide), details of the Standard Library, and an overview of available packages.  Not to be missed are the two sections detailing noteworthy differences between Matlab and R.

MATLAB, R, and Julia: Languages for Data Analysis

Avi Bryant provides a very nice overview and comparison of Matlab, R, Julia, and Python. Definitely recommended reading if you are considering a new data analysis language.

An R Programmer Looks at Julia

This post is from mid-2012 so a lot has changed with Julia. However, it is an extensive look at the language from an experienced R developer.

There are many aspects of Julia that are quite intriguing to an R programmer. I am interested in programming languages for “Computing with Data”, in John Chambers’ term, or “Technical Computing”, as the authors of Julia classify it. I believe that learning a programming language is somewhat like learning a natural language in that you need to live with it and use it for a while before you feel comfortable with it and with the culture surrounding it. Read more …

The State of Statistics in Julia – Late 2012 

Continuing on this theme of statistics and Julia, John Myles White provides a great view of using Julia for statistics which he updated in December of last year.

A Matlab Programmer’s Take on Julia – Mid 2012

A quick look at Julia from the perspective of a Matlab programmer and pretty insightful as well.

Julia is a new language for numerical computing. It is fast (comparable to C), its syntax is easy to pick up if you already know Matlab, supports parallelism and distributed computing, has a neat and powerful typing system, can call C and Fortran code, and includes a pretty web interface. It also has excellent online documentation. Crucially, and contrary to SciPy, it indexes from 1 instead of 0. Read more …

Why I am Not on the Julia Bandwagon Yet

Finally, we leave you, good reader, with a contrarian view point.

Sean Murphy

Senior Scientist and Data Science Consultant at JHU

Sean Patrick Murphy, with degrees in math, electrical engineering, and biomedical engineering and an MBA from Oxford, has served as a senior scientist at Johns Hopkins University for over a decade, advises several startups, and provides learning analytics consulting for EverFi. Previously, he served as the Chief Data Scientist at a series A funded health care analytics firm, and the Director of Research at a boutique graduate educational company. He has also cofounded a big data startup and Data Community DC, a 2,000 member organization of data professionals. Find him on LinkedInTwitter, and 

Why We Created Julia


In short, because we are greedy.

We are power Matlab users. Some of us are Lisp hackers. Some are Pythonistas, others Rubyists, still others Perl hackers. There are those of us who used Mathematica before we could grow facial hair. There are those who still can’t grow facial hair. We’ve generated more R plots than any sane person should. C is our desert island programming language.

We love all of these languages; they are wonderful and powerful. For the work we do — scientific computing, machine learning, data mining, large-scale linear algebra, distributed and parallel computing — each one is perfect for some aspects of the work and terrible for others. Each one is a trade-off.

We are greedy: we want more.

We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.

(Did we mention it should be as fast as C?)

While we’re being demanding, we want something that provides the distributed power of Hadoop — without the kilobytes of boilerplate Java and XML; without being forced to sift through gigabytes of log files on hundreds of machines to find our bugs. We want the power without the layers of impenetrable complexity. We want to write simple scalar loops that compile down to tight machine code using just the registers on a single CPU. We want to write A*B and launch a thousand computations on a thousand machines, calculating a vast matrix product together.

We never want to mention types when we don’t feel like it. But when we need polymorphic functions, we want to use generic programming to write an algorithm just once and apply it to an infinite lattice of types; we want to use multiple dispatch to efficiently pick the best method for all of a function’s arguments, from dozens of method definitions, providing common functionality across drastically different types. Despite all this power, we want the language to be simple and clean.

All this doesn’t seem like too much to ask for, does it?

Even though we recognize that we are inexcusably greedy, we still want to have it all. About two and a half years ago, we set out to create the language of our greed. It’s not complete, but it’s time for a 1.0 release — the language we’ve created is calledJulia. It already delivers on 90% of our ungracious demands, and now it needs the ungracious demands of others to shape it further. So, if you are also a greedy, unreasonable, demanding programmer, we want you to give it a try.

Download and Install Julia on Various Operating Systems


Julia Logo

Current version: release v0.1.2 | beta v0.2-pre

My Note: I downloaded Release v0.1.2 (147 MB)

Release v0.1.2

Windows Archive (.zip) 32-bit 64-bit
Mac OS X Package (.dmg) 64-bit
Source (Git) -b release0.1


Beta v0.2-prerelease

Windows Self-Extracting Archive (.exe) 32-bit 64-bit
Mac OS X Package (.dmg) 64-bit
Linux packages Ubuntu
Source (Git)

If the provided download files do not work for you, please file an issue.

Platform Specific Instructions


Julia is available for both 32-bit and 64-bit Windows since XP SP3.

  1. Download the Windows julia.exe installer for your platform. 32-bit julia works on both x86 and x86_64. 64-bit julia will only run on 64-bit Windows (x86_64).
  2. Run the downloaded program to extract julia
  3. Double-click julia.bat in the unpacked folder to start julia

The Windows README contains information on dependencies.

Uninstallation is performed by deleting the extracted directory. If you would also like to remove your preferences files, they can be found in your user directory at %APPDATA%/julia; this is typically expanded to %USERHOME%/Application Data/Roaming/julia.


On Mac, a Julia-version.dmg file is provided, which contains Installation is the same as any other Mac software — copy the to your hard-drive (anywhere) or run from the disk image. OS X Lion (10.7) or later are required to be able to use the precompiled binaries. OS X Snow Leopard (10.6) has also been reported to work, but it may not work in all cases. In such cases, you may need to build from source.

Uninstall Julia by deleting and the packages directory in ~/.julia. Multiple binaries can co-exist without interfering with each other. If you would also like to remove your preferences files, remove ~/.juliarc.jl.


Instructions will be added here as more linux distributions start including julia. If your Linux distribution is not listed here, you should still be able to run julia by building from source. See the Julia README for more detailed information.

  1. Ubuntu 13.04: apt-get install julia

Uninstallation is platform dependent. If you did a source build, it can be performed by deleting your julia source folder. If you would also like to remove your preferences files, they are ~/.julia and ~/.juliarc.jl.

Beta installation instructions

PPA (Personal Package Archive) is provided for Ubuntu systems to allow for automatic updating to the latest beta version of Julia. To use this PPA and install julia on Ubuntu 12.04 or later, simply type:

$ sudo add-apt-repository ppa:staticfloat/julianightlies
$ sudo add-apt-repository ppa:staticfloat/julia-deps
$ sudo apt-get update
$ sudo apt-get install julia

Add graphics capabilities to Julia

Graphics in Julia are available through external packages. These packages are under heavy development and take different approaches towards graphics and plotting.


Gaston provides an interface to gnuplot. Gaston also includes detailed documentation and examples in its manual. Add the Gaston package to your Julia installation with the following commond on the Julia prompt:

  1. Pkg.add("Gaston")
  2. using Gaston
  3. Gaston.set_terminal("aqua") #(this may be necessary, if the following reports that your terminal type is unknown)
  4. x=-pi:.001:pi; y=x.*sin(10./x); plot(x,y) #(plot x*sin(10/x))

In order to use Gaston, you will need to install gnuplot and ensure it is accessible from ENV[“PATH”] within Julia. Gnuplot is widely used, and binaries are available for all platforms.


Winston provides 2D plotting capabilities for Julia. Add the Winston package to your Julia installation with the following command on the Julia prompt:

  1. Pkg.add("Winston")
  2. using Winston
  3. plot( cumsum(randn(1000)) ) #(plot a random walk)

Winston’s interface will be familiar to MATLAB users. See examples and documentation on the Winston homepage.


Gadfly is an implementation of a Wickham-Wilkinson style grammar of graphics in Julia. Add the Gadfly package to your Julia installation with the following command on the Julia prompt:

  1. Pkg.add("Gadfly")
  2. using Gadfly
  3. draw(SVG("output.svg", 6inch, 3inch), plot([sin, cos], 0, 25)) #(plot a pair of simple functions over a range)

Gadfly’s interface will be familiar to users of R’s ggplot2 package. See examples and documentation on the Gadfly homepage.

Julia Install Files Unzipped


Neo4j: The World's Leading Graph Database


Neo4j is an open-source, high-performance, enterprise-grade NOSQL graph database.

Neo4j is a robust (fully ACID) transactional property graph database. Due to its graph data model, Neo4j is highly agile and blazing fast. For connected data operations, Neo4j runs a thousand times faster than relational databases.

More than 20 of the Global 2000, hundreds of startups and thousands of community members use Neo4j in a wide variety of use cases such as social applications, recommendation engines, fraud detection, resource authorization, network & data center management and much more.

The Definitive Book on Graph Databases:



Data Sets:

Graph Databases


The Definitive Book on Graph Databases.

First Edition Now Available!
Free download of O'Reilly's Graph Databases (PDF)

My Note: The current version of Neo4j has two different types of index: named indexes and automatic indexes. The Cypher query examples throughout the Graph Databases book use named indexes.

There’s a problem, however: data created using a Cypher CREATE statement won’t be indexed in a named index. This has led to some confusion for anyone wanting to code along with the examples: if you use the CREATE statements as published, the query examples won’t work.

Read the full blog here:

Graph Databases introduces graphs and graph databases to technology enthusiasts, developers, and database architects.

Graph Databases, published by O’Reilly Media, discusses the problems that are well aligned with graph databases, with examples drawn from practical, real-world use cases. This book also looks at the ecosystem of complementary technologies, highlighting what differentiates graph databases from other database technologies, both relational and NOSQL.

Graph Databases is written by Ian Robinson, Jim Webber, and Emil Eifrém, graph experts and enthusiasts at Neo Technology, creators of Neo4j, the world’s leading graph database.

Table of Contents

1. Introduction
What is a Graph?
A High-Level View of the Graph Space
The Power of Graph Databases
2. Options for Storing Connected Data
Relational Databases Lack Relationships
NOSQL Databases Also Lack Relationships
Graph Databases Embrace Relationships
3. Data Modeling with Graphs
Models and Graphs
The Property Graph Model
Querying Graph: Introduction to Cypher
Comparison of Relational and Graph Modeling
Cross-Domain Models
Common Modeling Pitfalls
Avoiding Anti-Patterns
4. Building a Graph Database Application
Data Modeling
Application Architecture
Capacity Planning
5. Graphs in the Real World
Why Organizations Choose Graph Databases
Common Use Cases
Real-World Examples
6. Graph Database Internals
Native Graph Processing
Native Graph Storage
Programmatic APIs
Nonfunctional Characteristics
7. Predictive Analysis with Graph Theory
Depth- and Breadth- First Search
Path-Finding with Dijkstra’s Algorithm
The A* Algorithm
Graph Theory and Predictive Modeling
Local Bridges

This exclusive early release of Graph Databases, published by O’Reilly Media, is compliments of Neo Technology, creators of Neo4j. Taking advantage of this special offer will get you started with graph databases now — long before the official book’s release.

Graph Database Meetups


October 22, 2013


Date Saved. Speaker To Be Announced.

August 27, 2013


August Meetup - Freebase in Neo4j

Geoff Moes (of fame) and I (Wes) are jointly working on some experimentation involving freebase's RDF-based data in Neo4j. We've worked on a simple program (in Scala) to take freebase's ~19GB gzipped turtle RDF file and use the BatchInserter API to create a neo4j store. We'll present ideas and issues faced and the path we took to get here (including some rough benchmarks along our path to optimization). I also have a few of the O'Reilly Graph Databases books as prizes!

Geoff Moes

Thanks everyone for coming out.

I just found this presentation, that might be of interest:

Wes Freeman

Here are the slides:

May 14, 2013


What's new in Neo4j 2.0, and a quick Neo4j Internals presentation

I (Wes) will give a talk about what's new and exciting in Neo4j 2.0; mainly Cypher's new stuff, which is where my focus has been. Labels, new index style, new syntax. And a little bit of what might be to come. For those of you who don't know Cypher, I'll do a quick introduction on that as well.

There will also be a quick talk about Neo4j internals. How the file storage works, and how caching works. I've found this knowledge enlightening in understanding and fixing performance issues.

If you want to check out Cypher before the meetup, I recommend: 
and the links at the top take you to different pages:

Wes Freeman

Here are the slides in case anyone wanted them:
Feel free to shoot me any questions.

Max De Marzi

Here is a preview =>

Max De Marzi

http://neographsearch.herokuapp... <= Sneak Peak

Max De Marzi

Hey guys, I'll be in town February 26th and 27th. If you want to spend a whole day on Neo4j, I'm teaching a class on the 26th => Also, If any of you are trying out Neo4j at your company and want a free consultation on the 27th please let me know and I can try to stop by. Also check out my blog => for more Neo4j goodness.

February 26, 2013


Create your own graph search like Facebook with Neo4j and Cypher w/ Max De Marzi

It doesn't matter if you are working on Neo4j in Ruby, Java, Python, Clojure or Javascript. Chances are you'll want to make use of the Neo4j Query language "Cypher".

We'll go through the language, from simple queries to complex, learn a few tricks, see what's new in version 1.9 and see what kind of crazy queries we can make.

We will also hear from some of the Baltimore/Washington based Neo4j projects. Learn what they are up to, and where you can get your feet wet. Have a project in mind and need help turning it into reality, don't be afraid to ask for help here.

February 26, 2013


Neo4j Tutorial - Washington, D.C.

*** PLEASE NOTE THAT WE DO HAVE SEATS LEFT FOR THIS TRAINING - registration is through Eventbrite, see below ***

This 1-day training course will stretch your imagination with an understanding of how Graphs and Graph Databases work within a real world datamodel.

Delivered by a senior and experienced Neo4j Consultant, there will be plenty of opportunity to get guided hands-on experience using Neo4j.

Bring your laptop with an installed Neo4j server to get the most out of the day. At the end of the day you will have a much better, real-world understanding of the technology and what it can deliver.

Please register for this meetup through Eventbrite, where we can also more easily process the online payment:

January 14, 2013


NeoTechnology Tour w/ Peter Neubauer!

This will be the second stop on the Neo4j 2013 World Wide Tour with Pernilla Lindh, our World Wide Community Manager, being on site! She loves spreading graph love, knowledge of NoSQL, Graphs and Neo4j. Swag love is guaranteed. :)

Peter Neubauer, VP of Community [and hardcore Neo4j coder with over 400k lines committed], will give an introductory talk about Neo4j: What is it? How can I use it? What are others doing with it?

For advanced users, if there is sufficient interest, we'll hold an ad-hoc discussion in a smaller meeting room about whatever people want to talk about, hosted by Wes Freeman, Cypher contributor and primary author of the Scala library AnormCypher.

Come and have a beer -  listen about the coolest new tech on the block right now! 

Thanks to Gray and Near Infinity for helping us out with the space.

This is also posted at the JUG meetup:

Peter Neubauer

For supersmall datasets, a good start to tinker. For bigger things, try to use the batchimporter, with 2 CSV files, see would that work?

December 11, 2012


Using Ikanow's Infinit.e with Neo4j

Infinit.e is an unstructured data analysis and decision support tool that uses Hadoop, ElasticSearch, MongoDB, and a range of NLP tools and services to ingest data. It can pull out entities, relationships and sentiments from unstructured data.

Putting that data into Neo4j allows for some interesting graph analysis, as well as the ability to do ad-hoc graph queries with Cypher.


6:30 - Pizza + Networking, etc.

7:00 - Announcements

7:05 - Craig Vitter: Ikanow Infinit.e Intro + Intro to Developer API 

7:30 - Short Break  

7:40 - Wes Freeman: Quick Cypher Review, Demo: Scala app to import Ikanow data to Neo4j, Example Cypher queries against Infinit.e data

Wes Freeman

Hey Max, (and other Pythonistas), I knew I'd seen something about using Python with embedded Neo4j--check this out, using JPype:

Wes Freeman

Here are my slides. I'll leave my neo4j instances up for a while in case anyone wants to play with them. (links toward the end of the presentation)...

Wes Freeman

Sorry we made this the same day as the BigData DC meetup--probably a bad choice, considering the audience overlap! I'll try to avoid that for next time, and promise that we'll have better beer than they will. Also, we have some heavy hitters signed up--Geoff wrote a blog post about Max's recent talk at Data Science DC: Looking forward to meet you guys!

October 16, 2012


Spring Data Neo4j - graph power with spring ease of use

Spring Data Neo4j provides straightforward object persistence into the Neo4j graph database. Conceived by Rod Johnson and Neo Technology CEO Emil Eifrem, it is the founding project of the Spring Data effort. The library leverages a tight integration with the Spring Framework and the Spring Data infrastructure. Besides the easy to use object graph mapping it offers the powerful graph manipulation and query capabilities of Neo4j with a convenient API.

This talk introduces the different aspects of Spring Data Neo4j and shows applications in several example domains. 
During the session we walk through the creation of an engaging sample application that starts with the setup and annotating the domain objects. We see the usage of Neo4jTemplate and the powerful repository abstraction. After deploying the application to a cloud PaaS we execute some interesting query use-cases on the collected data.

Michael Hunger is the project lead of the Spring Data Neo4j project and a senior developer with Neo Technology. He is (co-)author of two books about Spring Data (Neo4j) and others. Michael is interested in many areas of software development, from enjoying programming languages, working actively with open source projects and their community, to good development practices. He is a board member of the JetBrains Academy, a frequent speaker at conferences and an editor for InfoQ.

Wes Freeman

Here are the slides from Michael--thanks for the great presentation!

July 26, 2012


Deep Data - graph queries with Cypher

With a graph database, Big Data gets an extra dimension, expanding into Deep Data. Neo4j's Cypher query language lets you extend your SQL knowledge into graph queries that can plumb those depths.

Let's get together for a guided introduction to graph queries using Cypher. Assuming no prior experience, we'll start with an introduction to graph database concepts, then switch into workshop mode, progressively exploring a graph and elaborating from basic to clever queries.

Location is TBD. We'll select a location together, so please vote DC, MD, or VA when you sign up.

Andreas Kollegger

If you're interested, the slides I was going to present are available here:

They use a drastically reduced set of the full week workshop that Neo runs, the source for which is available here:

April 19, 2012


Welcome to the Graph

The social graph, a program's object graph, related products in a catalog, discussions crossing multiple email threads and twitter. Graphs are everywhere. A graph database let's you master every angle in your data.

We'll introduce Neo4j, an open source graph database that you can embed, deploy as a server, or use in the cloud. Starting with graph database 101, you'll leave with an understanding of graph-like thinking, when to use a graph, and what great new magic you'll be able to add to your applications.

How do you get from here to there? With a graph. Join us to learn how.


Page statistics
2837 view(s) and 39 edit(s)
Social share
Share this page?


This page has no custom tags.
This page has no classifications.