The National Evolutionary Synthesis Center
Phylet: tree of life visualization
Internship Position Title:|
Summer of Code intern
Internship Organization Description:|
My internship could be considered to be under the umbrellas of several organizations: I worked on the Phylet project, an open-source project associated with the National Evolutionary Synthesis Center (NESCent). The internship was organized and funded through the GNOME Foundation's Outreach Program for Women (OPW), which is itself an offshoot of the Google Summer of Code (GSoC) program.
"The National Evolutionary Synthesis Center (NESCent) is a nonprofit science center dedicated to cross-disciplinary research in evolution. NESCent is jointly operated by Duke University, The University of North Carolina at Chapel Hill, and North Carolina State University, and is sponsored by the National Science Foundation. NESCent promotes the synthesis of information, concepts and knowledge to address significant, emerging, or novel questions in evolutionary science and its applications. NESCent achieves this by supporting research and education across disciplinary, institutional, geographic, and demographic boundaries."
"Outreach Program for Women (OPW) internships were inspired in many ways by Google Summer of Code and by how few women applied for it in the past. This was reflective of a generally low number of women participating in the FOSS development. The GNOME Foundation first started the internships program with one round in 2006, and then resumed the effort in 2010 with rounds organized every half a year. By having a program targeted specifically towards women, we found that we reached talented and passionate participants, who were uncertain about how to start otherwise. We hope this effort will help many women learn how exciting, varied and valuable work on FOSS projects can be and how inclusive the community really is. This program is a welcoming link that will connect you with people working on individual projects in various FOSS organizations and guide you through your first contribution."
"Google Summer of Code is a global program that offers post-secondary student developers ages 18 and older stipends to write code for various open source software projects. We have worked with open source, free software, and technology-related groups to identify and fund projects over a three month period. Since its inception in 2005, the program has brought together over 6,000 successful student participants and over 3,000 mentors from over 100 countries worldwide, all for the love of code. Through Google Summer of Code, accepted student applicants are paired with a mentor or mentors from the participating projects, thus gaining exposure to real-world software development scenarios and the opportunity for employment in areas related to their academic pursuits. In turn, the participating projects are able to more easily identify and bring in new developers. Best of all, more source code is created and released for the use and benefit of all."
Location (City, State, Country):|
Information Economics for Management
Description of work:|
I worked on the Phylet project developing a visualization of the complete tree of life, which itself is still under construction. We use data from the Open Tree of Life project, which is also open source and aims to "produce the first online, comprehensive first-draft tree of all 1.8 million named species, accessible to both the public and scientific communities," according to their website.
Phylet is meant to be the visual gateway to the tree, and so is informed by the Open Tree of Life's mission. Phylet is certainly not the first tree of life visualization to go online, but there are few other attempts and most of them are geared for a layman audience, which means that the things that are important to researchers - like conflits about relationships - are usually not included. One of the primary goals of Phylet is to create a tool that can also be used by scientists studying phylogenetics in their research, because - surprisingly, at least to a person who lives in big data - there is actually no good visualization tool for researchers to draw phylogenetic trees. Current visualization tools were created with networks, not trees, in mind and so the visualizations they produce lack the ordered direction of a tree.
Despite the number of organizations involved, my project was largely independent. There were three other people involved in the project that I spoke to on a regular basis, but since it is small and open-source, I was the only one working on it full time.
Impact/Outcomes of Internship:|
The outcome was a mixed bag. I made some real advances in the visualization, and I learned a lot, but my progress did not even kind of live up to what I had planned in the schedule I originally proposed in my internship application - I had no idea what I was getting into, and I knew it, but it was still a surprise at how much longer every single step took.
This was my first real experience trying to code a big project, and it was a valuable one. It helped me understand that getting stuck and frustrated is not necessarily a sign of failure, or that I'm not good enough - it's just how coding new things works. Of course, I am still a novice programmer, so perhaps the roadblocks were more frequent than they would have been for someone with more experience. In exchange for the frustration, however, I get to be in a space where anything is theoretically possible and no one else really knows the right answer, either - I think it is a fair trade.
Implications:The Big Data paradigm is just beginning to leak into biology research, and being on the leading edge of that, however briefly, really felt like I was on a frontier. Even my half-working Phylet visualization is expanding the toolset of researchers - how awesome is that? The vastness of the space that evolutionary biology works in - both in time and in the size and complexity of the tree of life - feels tailormade for the tools of big data, but those tools are still underutilized. This is at least partly because it's not common to find someone with both an advanced degree in evolutionary biology and expertise in computer science. I don't have the biology PhD, but wrestling with data? That I can do, and it is exciting that my data-crunching skillset opens doors into scientific research that would otherwise be closed to me. I never thought I would like coding as much as I do, but it is just incredible to be able to make real contributions of consequence. It is addictive enough to keep me going through weeks of pounding my head against the coding wall.
Artifact and Description:|
There isn't a working version on Phylet online at the time of this writing, I included a screencast of me demonstrating the visualization below.
In the screencast, I quickly show the pieces that have to be running to power the visualization and then demonstrate the dynamics of the viz. I give a more detailed description of what's going on in the audio.