So what is this "Biozenformatics," anyway? Well, it's like this: I've used the word/name "Zen" in email addresses, gaming character names, nicknames and online forum names for almost two decades now, so it seemed fairly natural that when I went back to school for my doctorate and decided to make my "computer science" dissertation project focus on genome sequencing that the related blog should have the word "zen" in it somewhere.
Wait, what?
Now that I'm almost three-quarters done with my first year of doctoral studies I thought it might be a good idea to have a place to discuss my research interests and other topics that have relevance to my studies. And the occasional book/TV/movie quote.
Wow, that sounds really pretentious. OK, the truth is I'm really just starting out in the field and I'm having to teach myself a lot of biology, genetics and some advanced topics in computer science because unlike M.I.T., or Berkeley, etc., Colorado Technical University doesn't have any advanced biological studies. In those other schools there are whole research teams with their own labs to work on these topics together. I'm doing this all on my own.
The Story So Far
My master's project was about grid computing. You remember grid computing, right? That was "The Next Big Thing" before Cloud Computing. I worked with grid computing because before that I was intrigued by cluster computing. You see, I grew up on a farm, and on the farm you frequently have to make do with what is at hand. Sometimes that means using a shovel when a hoe would work better. Very often it meant "repurposing" a piece of equipment - usually with a cutting torch, grinder and arc welder.
How does cluster computing and growing up on a farm relate? Ever since those days I've been a bit obsessed with doing more with less. More computing power with less money. When I first read about Pixar's "render farm" - a cluster of 2000 Sun workstations used to render the graphics of their early movies I was hooked. I love the idea of taking old computers and getting more "horsepower" out of them.
Of course the other big buzzword these days is "Big Data." Well, for all you non-computer geeks out there, let me let you in on a secret: computer scientists have been doing "big data" for at least a decade under the name "High Performance Computing." As soon as I figured this part out, I knew the general direction of my dissertation.
How does cluster computing and growing up on a farm relate? Ever since those days I've been a bit obsessed with doing more with less. More computing power with less money. When I first read about Pixar's "render farm" - a cluster of 2000 Sun workstations used to render the graphics of their early movies I was hooked. I love the idea of taking old computers and getting more "horsepower" out of them.
Of course the other big buzzword these days is "Big Data." Well, for all you non-computer geeks out there, let me let you in on a secret: computer scientists have been doing "big data" for at least a decade under the name "High Performance Computing." As soon as I figured this part out, I knew the general direction of my dissertation.
But why genome sequencing, especially when my school doesn't even have any advanced biological studies? Well, something I know about myself: if a project is going to go to completion, it has to be interesting to me and I spent most of the first 18 years of my life wanting to be a veterinarian. It seemed like a very natural step that when I started looking for Big Data topics to study, genome sequencing would be at the top of the list.
Where Do We Go From Here?
In a few days I'll post a summary of why genome sequencing is a "big data" topic. For now, suffice to say that the age of tabletop gene sequencers is beginning, and any one sequencer is capable of putting out terabytes of data a week on dozens of samples, be it animal, vegetable, fungus, or human. Even with all of our amazing "i7 quad-core 64 bit" desktop computers, our ability to generate the data is growing faster than our ability to process it.
Case in point, I was reading a scholarly paper from late 2012 by a research group out of M.I.T. that very proudly proclaimed their software could sequence a human genome from scratch in "only" three and a half weeks. One sample. Three and a half WEEKS!
I think I've found a research avenue that could make a difference. Even 10% would save days per sample. I want to do a little more "proof of concept" work before I publicly reveal that avenue. Hopefully soon I'll have some hard data to post which may prove surprising to a few people.
No comments:
Post a Comment