What does it take to be an imaging scientist? What is an imaging scientist? This is a question that crosses my mind every so often when someone asks me about my “goals.” Do people think about goals regularly? I don’t. But I’m an intense person, and I pursue things aggressively and (in graduate school) without much fear of this thing called “failure.” (This can be the topic of a separate post, I realized along the way that as long as I work hard at things, even if my path is a little bumpy, as long as I don’t give up I can never really “fail” in the catastrophic sense that would inspire amygdala-driven fear). I don’t really do things for some ulterior motive – I do things that are fun, and things that make me happy, and if the things I do on a regular basis (that some might label as “work”) didn’t make me feel this way, I wouldn’t do them. I just like making things, and I wouldn’t be writing this now if there wasn’t some inherent joy in it. I would suggest from this observation that not consciously thinking about goals does not imply that they do not exist, because any person on some kind of track (like graduate school) is arguably there to pursue a continuation of doing things he or she likes to do. So for this kind of personality-type, the goals are hard-wired, implied, and sometimes unconscious until they are pulled from the depths of the brain.

Back to this question. When I am asked about goals, since I don’t consciously set any, I frame my answer around continuing to do the things that I like to do. In that scope, I look at my environment and decide that this “thing” I’m doing is probably being an imaging scientist, which is probably a subtype of data analysis. The question is, then, what does it mean to be an imaging scientist, and since in the United States we are obsessed with this concept of “success,” what does it mean to be a GREAT imaging scientist? I have some thoughts.

We can actually frame this in the context of computer games. There are different levels of imagers, from noobs that are just getting started and haven’t even made the connection between pixels and numbers (me circa 2009), and super-experts that can build and do anything that they think of. This latter group, then, is the highest level of the hierarchy and what we can define as “expert” or “successful.” Here comes the fun part! I want to answer this question by defining my own level system, as if we had a computer game with an “imaging scientist” as a class. Actually, we don’t even need to restrict it to “imaging scientist,” we can talk generally about people in “data science.”  First, let’s define the buckets of skills that we can evaluate. Each of these buckets has its own dimension, so you could define yourself along each one to come up with your final avatar. Note to self: add this “make your own academic avatar” to a list of fun web-interface projects to make at some point in life!

  • domain expertise
  • programming
  • methods
  • communication

Now I want to talk about the details of these domains. Here I am going to communicate my bias. It might make some people upset, so I’ll preface these thoughts with the understanding that this is my opinion, and you are welcome to disagree. I will put all of these statements in the context of myself, because I hold myself to them to some degree.

 

Domain Expertise

This is the hardest for me. Regardless of the field that I’m in, I will have a hard time being successful if I can’t ask interesting questions. I am terrible at this, because my natural way of thinking is to get excited about methods and tools, and then find an application to shove in. When I was interviewing for graduate school, actually, I had a pretty famous person tell me that “People like me that don’t have burning questions do not belong in PhD programs.” I actually excused myself from the group, found a side hallway, and just cried. In retrospect I realized that the world needs both kind of people, those who use tools with interesting questions, and those that develop the tools themselves, and more important than bluntly stating that one is “more important” than the other is recognizing the need for those groups to work together. So while I’ve always struggled with the “biological domain” part of “biomedical informatics,” I’m pushing myself to work harder on the question part, and surrounding myself with people that I can learn from. For this domain expertise, we can break it down further into three sub categories: the source of the question, the value of the question, and the impact of answering it:

Source:

  • Level 1: I am incapable of asking questions, perhaps I’m a lemming with no brain?
  • Level 2: I can ask basic questions with much guidance from the literature, peers, or advisors.
  • Level 3: I can ask intermediate questions if I pursue guidance from peers, and push myself to
  • think about “the big picture.”
  • Level 4: I can ask expert, interesting questions based on an intelligent synthesis of the people and sources of knowledge around me.

Value:

  • ** Level 1:** My question has been answered before, but I haven’t done my homework to know that.
  • Level 2: My question has not appeared to have been answered before, and this might be the case because of positive publication bias.
  • Level 3: I have a basic skill-set and incentive to search the literature and determine if my question has been answered before.
  • Level 4: I have enough background and domain knowledge to evaluate the likelihood of if my question has been answered, and this understanding comes from having identified a gap in a method or process.

Impact:

  • Level 1: My Mom might be proud of me if I wrote a paper about it.
  • Level 2: Answering the question would have some value to a small, select group of people, but the rest of the world would ask “Why would anyone care about that?”
  • Level 3: Answering the question would have a significant impact for the scientific community.
  • Level 4: Answering the question would have a significant impact for not only the scientific community, but also society at large. It’s probably a Nature paper.

 

For the above, I’d say I’m probably at:
Source: 2
Value: 2
Impact: 1

Haha. Like I said, I’m terrible. Let’s continue! This is fun :)

 

Programming

In this day and age, if I am not in a “soft” science, if I finish graduate school without programming experience I did not try hard enough. It could be the case, however, that I finished without programming but was able to do my analysis with graphical software. That seems plausible, but what that really means is that I would then look for a job that has me doing that same limited set of clicking and pointing. If I aspire to do anything else other than this limited functionality, I’d have to outsource it, and so why would anyone want to hire me when there are many people that can use the software and understand the underlying black box? Basically, for anyone that works with data, especially highly complex imaging data, I can’t imagine being successful without being able to do things with it. The default understanding of a “scientist” is that you must make discoveries with some level of novelty, and most of the time the novelty means doing much more than clicking and pointing. I will say that there is room in the world for those that are more managerial scientists (e.g., having great ideas and maybe running a lab), and I think those individuals would be even better with some coding ability! For this bucket, I’d say there are three levels: language depth, language breadth, and software development.

Language Breadth:

  • ** Level 1:** I know no languages, and I don’t even work with data. I might write a Word document sometimes, but that’s it. This is where, for example, my parents are. Sorry Mom and Dad!
  • Level 2: I have no coding skills, but I do work with data. I rely on Microsoft Excel and tried to open a data file in Microsoft Word, but it crashed my computer. It takes me inordinately long to do simple things that could be done with one line of sed or awk. What are those things?
  • Level 3: I have used 1-5 languages.
  • Level 4: I have used 6-10 languages.
  • Level 5: I have used more than 10 languages. I may not be a coding master, but I have impressive breadth!

Language Depth:

  • ** Level 1:** If languages are like black matter, then I’m on the earth, because we just don’t see one another.
  • Level 2: I can write a few basic lines, in say, Matlab, but only when I am absolutely forced to, and I find it painful and hard. Command lines aren’t so bad, but I prefer graphical interfaces.
  • Level 3: I can write basic scripts. The idea of writing a function or anything beyond a few lines is kind of scary.
  • Level 4: I program with some frequency, and can figure out most of what I need. I am very particular about my choice of editor, and the coloring of my syntax.
  • Level 5: I don’t even need to do Google searches. It flows out of my fingers like salami on a slope! I will argue with you for hours on the pros and cons of vim vs emacs, or whether your IDE of choice can trump my love for eclipse, sublime text, or atom.

Software Development:

  • Level 1: Still don’t know any languages, captain.
  • Level 2: I wrote a paper that had some programming, but the little bits of pieces of my work are scattered everywhere. If you ask me to reproduce anything I will tar and feather you.
  • Level 3: I wrote a paper with some programming, and I have a nice script or two to go with it that can be shared if anyone asks! I just need to find it in my Dropbox…
  • Level 4: I use version control, but it’s mostly just me. I’m not really sure how to collaborate in this github place.
  • Level 5: I work on projects with other people in a version-controlled environment, but mess up from time to time.
  • Level 6: I AM GITHUB MASTA! If you want to know me better as a person, you should really look at my repositories.
  • Level 7: I have actually worked as a developer in a large company.

 

And my critique of myself:
Language Breadth: 4
Language Depth: 4
Software Development: 5

I’m being kind to myself here, because if I had more awareness of how awesome the CS students are here, I’d probably give myself 2’s across the board, just one level up from Mom and Dad!

Methods

A skill-set in methods is highly dependent on the graduate program. A chemist should probably be in the 99th percentile of people that can put on white coats and mix dangerous things, and a data scientist should minimally know the high level methods, and when and how to apply them. In my field, statistics are also very important, along with machine learning and domain-specific processing protocols. My first foray into methods was related to preprocessing (loved it!) and then machine learning (also loved it!). The statistics part, which really comes down to mentally translating a bunch of symbols into mathematical steps, and then understanding the steps to take to convince the peanut gallery that your result isn’t just random chance, is a lot harder. My strategy for understanding methods that are typically communicated with symbols is to put them into a language I do understand – some kind of code. One line of goppelty gook translated into for loops (or matrix multiplication), and being able to tangibly see the matrices I’m working with, makes sense to me. This in effect may just be another data visualization strategy – perhaps I’m visually inclined, and don’t truly understand things until I see them. But statistics, overall, in that it is the language of methods, and that those awful symbols are used to communicate them, is something that I have to work really hard at. I’m convinced that like learning a programming language, reading these symbols comfortably will just “click” one day. For example, I remember years ago when I first saw a line of code in R, it looked like goppeltee gook. It just clicked one day, and now I practically dream in R. I don’t see why statistics will be any different, but I do need to keep pushing myself to be exposed to the symbols. For methods, there are two categories: Implementation and Utilization, and Understanding

Implementation and Utilization

  • Level 1: A method… is that like a recipe in a book?
  • Level 2: I can follow a web tutorial and then try to reproduce it for my data.
  • Level 3: I can bring together different software, packages, and tools to do basic preprocessing, postprocessing, analysis, and statistical inference.
  • Level 4: I don’t start with pipelines: I think of my data, and my goals, and develop and use a series of steps that are best fit to that.
  • Level 5: I not only use current methods intelligently and correctly, I am a bleeding edge developer that is defining the field.

Understanding

  • Level 1: I’m completely new, and I’ve only been exposed to “methods” through reading the New York Times, or listening to NPR.
  • Level 2: I have a high level understanding of methods. This “clustering” bit means we group things that are similar. A regression involves lines?
  • Level 3: I know the methods that are important in my field, but I need lots of Wikipedia before being able to explain anything to anyone.
  • Level 4: I know the big picture of most of the big methods in my field, and I can explain most of them. For the details, I still might need some wikipedia. I might have even implemented some basic methods on my own.
  • Level 5: I’m a well-established researcher, and I know this stuff so well I confidently use the BEST packages with the RIGHT parameter settings, and teach methods to others.
  • Level 6: I am a methods, and statistical master.
  • Level 7: I am one of the methodological powerhouses at Stanford, such as Daphe Koller, Andrew Ng, or Rob Tibshirani. If you aren’t in my lab, I know you want to be.

 

And now for where I fall in the methods buckets:
Utilization: 3
Big Picture Understanding: 4

 

Communication

This is broken into three categories: verbal, written, and visual. I think that many graduate school programs should expect more of us in terms of communication. Let’s start with “verbal.” Presenting at a lab meeting, or to a small group a few times a year with scattered “milestone” talks I don’t think is representative of what we need to do in the “real world.” Regardless of whether an imaging scientist goes into academia or industry, if he/she cannot stand in front of a large audience with a white board marker and properly communicate a complicated method, it’s just not good enough. We should have regular experience with developing and presenting content. Serving as teaching assistants is a good idea, but this should be enhanced with actual course development, and minimally teaching a handful of full-length lectures.

On the written side of things, they do make us write quite a bit, so that is properly addressed. Keep in mind that academic writing is very different than personal writing or blogging (ahem, this post), and I’m not sure how the two are related. I am totally incapable of being brief about anything, so I lose a lot of points in that department, but on the other hand, I can come up with some pretty fun metaphors, most of which would be totally inappropriate for an academic paper. A well-trained academic should be able to produce many different kinds of writing pretty painlessly and quickly.

Finally, “visual” communication. It was less than a year ago when I realized that I had a visualization of some data result in my head that I wasn’t skilled enough to show in a meaningful way. This is actually the hardest of all of the skills, because it’s most definitely feasible to survive as an imaging scientist using traditional software, and convincing others with confusion matrices and p-values. I found myself, however, not only wanting a visualization to supplement some result, but badly needing it to convince myself that it was meaningful, period.

Verbal Communication:

  • Level 1: I get anxious and mute if you stand me in front of a room of people. I might even fall over.
  • Level 2: I can give a presentation after many hours of practice, and mostly just reading the cards or slides in front of me.
  • Level 3: I can give a pretty good presentation with visual or written aids, but I still need to work on things like volume, eye contact, etc.
  • Level 4: I’m not perfect, but I feel confident in my ability to engage a room of people.
  • Level 5: I am the verbal communication master. I am interesting, engaging, and funny.
  • Level 6: I am Russ Altman.

Written Communication:

  • Level 1: Like, omgwtfbbq! Lolz!
  • Level 2: I am a twitter masta, 140 characters or less is the name of my game.
  • Level 3: I had to make a resume and bio once, and it was pretty ok.
  • Level 4: I regularly journal, and I’ve written an essay or two for an application.
  • Level 5: I’ve written papers or abstracts, although I’ve never published. I may or may not keep a journal.
  • Level 6: I’ve published, and it sounds pretty ok! I regularly write in either a research or personal journal, and at least a handful of people have written some of my brain dumps.
  • Level 7: I have written grants, many publications, and regularly keep a research or academic journal with a substantial number of readers.
  • Level 8: I am a master of language, with several publications in well-respected journals. I have a way with words, and they come easily and regularly.
  • Level 9: Grant writing and publication defined the early part of my career, and now I’m writing books. I may have a world-class blog, or I’m regularly asked to write chapters or articles for well-known institutions.

Visual Communication

  • Level 1: I’m pretty limited to drawing circles and lines on paper when I need to show something.
  • Level 2: I can make basic plots for papers and such, usually with the function called “plot.”  I am also a Power Point PRO.
  • Level 3: I can make standard and acceptable plots, but also add some swag by importing into a pixel or vector graphics program!
  • Level 4: I think hard about my data visualizations, and go to network or scientific visualization software outside of the standard to represent my results.
  • Level 5: In addition to the above, I can program my own visualizations using something like shiny in R, d3, or advanced functions in graphical software.
  • Level 6: Anything that I can dream of, I can show, simply and beautifully.
  • Level 7: I am an expert in data visualization, it’s what I do for my job, and I’m better than the 95th percentile.
  • Level 8: I AM Mike Bostock.

 

And to define myself on these three levels:
Verbal Communication: 4
Written Communication: 6
Visual Communication: 5

 

In summary:

What the above comes down to, is that to be a good data scientist you need domain expertise to identify a meaningful problem, ask a well-scoped, specific question to address the problem, and understand what data are needed to answer it. You then need the methodological understanding to find the right steps in your toolbox to answer it, and prove to others that your answer is a good one, and the tangible software or programming skills to be able to implement these steps. Finally, you need the verbal, written, and visual communication skills to be able to convince others of the value of your work.

Now for some fun – for each of the domains above, I can create my “data scientist avatar” progress bars that I might see in a computer game to see how well I’m doing at this “goal achieving” thing.

graduate_student_progress

While I didn’t make a web application, you are welcome to play with my code!

Graduate Student Avatar Gist

 

Some flaws with the visualization above:

  • It does not take the number of levels or difficulty into account
  • It doesn’t reflect how I feel (or the importance) about the different domains, just how I evaluate myself
  • It does not properly represent progress for each of the higher domains, other than to show that (generally) I think I’m best at programming, worst at domain knowledge.
  • It doesn’t properly summarize “the whole” picture, across domains

Haha, I should probably not show those piddley green bars to any kind of panel that is evaluating me :)

And I hope no one ever asks me about goals again! If they do, I’m going to direct them here and say “to get more points!”

I also want to point out that these are not necessarily goals, but artificial constructs that can be used to evaluate progress by one person’s (my) standard.  If I thought about skills in such a stark format on a regular basis, that would be kind of scary.  If you want to go as far as to call these goals, I’d say that thankfully they are convoluted into a framework centered around  playing with data and having fun!

What we all come to realize, however, is that it’s impossible to make it to the highest levels of any of these domains. Maybe one person in the world, for a split second, could get a completely green bar, and then lose in in the next blink.  There are also bars that are completely missing from this plot because I don’t have the insight that they exist, period.  Back to big picture thinking.  There are always going to be people out there smarter and better than me at things, and it’s much better to realize that, while there is substantial challenge and novelty by default, as long as we work hard and take on the mindset of learning for the rest of our lives, we will really enjoy what we do, feel challenged, and grow. So perhaps that is what it takes to be a great imaging scientist? I do hope that my stumbling around eventually leads to asymptoting around that: maintaining a level of being utterly, completely happy :O)