DRAFT: This module has unpublished changes.
User-uploaded Content
DRAFT: This module has unpublished changes.

Name:
Chanda Phelan
Internship Organization:
The National Evolutionary Synthesis Center
Department:
Phylet: tree of life visualization
Internship Position Title:
Summer of Code intern
Company URL:
http://www.nescent.org/
Internship Organization Description:

My internship could be considered to be under the umbrellas of several organizations: I worked on the Phylet project, an open-source project associated with the National Evolutionary Synthesis Center (NESCent). The internship was organized and funded through the GNOME Foundation's Outreach Program for Women (OPW), which is itself an offshoot of the Google Summer of Code (GSoC) program.

 

NESCent:

"The National Evolutionary Synthesis Center (NESCent) is a nonprofit science center dedicated to cross-disciplinary research in evolution. NESCent is jointly operated by Duke University, The University of North Carolina at Chapel Hill, and North Carolina State University, and is sponsored by the National Science Foundation. NESCent promotes the synthesis of information, concepts and knowledge to address significant, emerging, or novel questions in evolutionary science and its applications. NESCent achieves this by supporting research and education across disciplinary, institutional, geographic, and demographic boundaries."

 

OPW:

"Outreach Program for Women (OPW) internships were inspired in many ways by Google Summer of Code and by how few women applied for it in the past. This was reflective of a generally low number of women participating in the FOSS development. The GNOME Foundation first started the internships program with one round in 2006, and then resumed the effort in 2010 with rounds organized every half a year. By having a program targeted specifically towards women, we found that we reached talented and passionate participants, who were uncertain about how to start otherwise. We hope this effort will help many women learn how exciting, varied and valuable work on FOSS projects can be and how inclusive the community really is. This program is a welcoming link that will connect you with people working on individual projects in various FOSS organizations and guide you through your first contribution."

 

GSoC:

"Google Summer of Code is a global program that offers post-secondary student developers ages 18 and older stipends to write code for various open source software projects. We have worked with open source, free software, and technology-related groups to identify and fund projects over a three month period. Since its inception in 2005, the program has brought together over 6,000 successful student participants and over 3,000 mentors from over 100 countries worldwide, all for the love of code. Through Google Summer of Code, accepted student applicants are paired with a mentor or mentors from the participating projects, thus gaining exposure to real-world software development scenarios and the opportunity for employment in areas related to their academic pursuits. In turn, the participating projects are able to more easily identify and bring in new developers. Best of all, more source code is created and released for the use and benefit of all."

 

Location (City, State, Country):
[virtual]
Term:
Summer 2013
Specialization(s):
Information Economics for Management
Description of work:

Concept:

I worked on the Phylet project developing a visualization of the complete tree of life, which itself is still under construction. We use data from the Open Tree of Life project, which is also open source and aims to "produce the first online, comprehensive first-draft tree of all 1.8 million named species, accessible to both the public and scientific communities," according to their website.

 

Phylet is meant to be the visual gateway to the tree, and so is informed by the Open Tree of Life's mission. Phylet is certainly not the first tree of life visualization to go online, but there are few other attempts and most of them are geared for a layman audience, which means that the things that are important to researchers - like conflits about relationships - are usually not included. One of the primary goals of Phylet is to create a tool that can also be used by scientists studying phylogenetics in their research, because - surprisingly, at least to a person who lives in big data - there is actually no good visualization tool for researchers to draw phylogenetic trees. Current visualization tools were created with networks, not trees, in mind and so the visualizations they produce lack the ordered direction of a tree.

 

Details:

Despite the number of organizations involved, my project was largely independent. There were three other people involved in the project that I spoke to on a regular basis, but since it is small and open-source, I was the only one working on it full time.

 

The data is powered by Neo4j, a graph database platform, and the visuals by the gorgeous and powerful Javascript library d3. When I started, the visualization was in the network form described above, and I spent the summer building a local copy of the visualization and transforming it. This has actually been much more complicated than I first assumed: a phylogenetic tree is actually a graph and not a tree, meaning a node may have multiple parents. Because it is most intuitively presented in a directed tree shape, though, it puts it in this unhappy space between being a tree and a graph, which d3 cannot easily handle.

Learning Objectives/Goals:

   1)  I want to learn the Phylet infrastructure (Neo4j and particularly the JavaScript library d3) well enough that I can write tutorials in simple, straightforward English that non-programmers can follow; such a thorough understanding will allow me to construct more complex and interactive data visualizations in the future, as it will provide me with a solid understanding of both the front end and back end of such visualizations. A completed, polished tutorial for all the Phylet components will be the benchmark.

    2)  I want to learn about how an open-source program is developed, from alpha stage to release,  particularly how collaboration works virtually when team members are spread all over the world; this will allow me to contribute to open-source programs in the future, which is important since many of the best data visualization tools are open-source.  The benchmark for this will be a releaseable version of the Phylet visualization, including a fully fleshed-out website.

    3)  I want to learn how the coding and visualization skills I have learned and will learn at SI can be utilized in the realm of academic research, which I will accomplish by spending time with the members of Stephen Smith’s lab, who are post-docs working on the Open Tree of Life, the data that powers Phylet; they also use visualizations, though their purpose is more to answer research questions and to present findings to an academic community, so the challenges of visualization are different. The benchmark for this will be my mentor’s assessment of how well I incorporated these concepts into the Phylet visualization.

Impact/Outcomes of Internship:

Deliverables:

The outcome was a mixed bag. I made some real advances in the visualization, and I learned a lot, but my progress did not even kind of live up to what I had planned in the schedule I originally proposed in my internship application - I had no idea what I was getting into, and I knew it, but it was still a surprise at how much longer every single step took.

 

The outcome of the visualization itself is a version that is not entirely ready for a wider audience but is fairly useable as a tool for researchers to create static images, which is exciting. I intend to continue working on the visualization in the future. (It is an open-source project, after all! I wonder how I will react when others start making changes to something that felt like it was just mine the whole summer.) I wish that I could have had a finished product, but the consensus from the others on the Phylet team is that a Javascript adept could fix the remaining problems fairly easily. I'm like a D3 Ikea - I constructed all the pieces, but someone else has to put in the screws.

 

Learning:

This was my first real experience trying to code a big project, and it was a valuable one. It helped me understand that getting stuck and frustrated is not necessarily a sign of failure, or that I'm not good enough - it's just how coding new things works. Of course, I am still a novice programmer, so perhaps the roadblocks were more frequent than they would have been for someone with more experience. In exchange for the frustration, however, I get to be in a space where anything is theoretically possible and no one else really knows the right answer, either - I think it is a fair trade.

 

Implications:The Big Data paradigm is just beginning to leak into biology research, and being on the leading edge of that, however briefly, really felt like I was on a frontier. Even my half-working Phylet visualization is expanding the toolset of researchers - how awesome is that? The vastness of the space that evolutionary biology works in - both in time and in the size and complexity of the tree of life - feels tailormade for the tools of big data, but those tools are still underutilized. This is at least partly because it's not common to find someone with both an advanced degree in evolutionary biology and expertise in computer science. I don't have the biology PhD, but wrestling with data? That I can do, and it is exciting that my data-crunching skillset opens doors into scientific research that would otherwise be closed to me. I never thought I would like coding as much as I do, but it is just incredible to be able to make real contributions of consequence. It is addictive enough to keep me going through weeks of pounding my head against the coding wall.

Artifact and Description:

There isn't a working version on Phylet online at the time of this writing, I included a screencast of me demonstrating the visualization below.

 

In the screencast, I quickly show the pieces that have to be running to power the visualization and then demonstrate the dynamics of the viz. I give a more detailed description of what's going on in the audio.

 

 

DRAFT: This module has unpublished changes.
DRAFT: This module has unpublished changes.