“…a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data.” – The Economist, 27 February 2010 [emphasis mine]
The above quotation neatly summarizes the role and core competencies of a bioinformatics professional, under the more general title of “data scientist”. Data science usually refers to statistical and database-driven business intelligence (e.g., correlating Pop-Tarts sales to hurricane approaches), but bioinformatics is clearly data science applied to the molecular biology domain. The skills required for participation in business intelligence and bioinformatics significantly overlap.
Before realizing that bioinformaticians are specialized data scientists, I was puzzled by the interdependent combination of mathematical thinking (e.g. statistical modeling, exploratory data analysis) and clerical activities (e.g., database maintenance) that are required of many bioinformatics professionals. The fluctuating mix of scientific, engineering, and technician activities that makes up my work days proved hard to explain in a cohesive fashion to my colleagues in the biology lab.
Business and government institutions in the past separated their statisticians (analysts) from their software/database gurus. The analysts would submit a request for information from the databases, and the software folks would reply with a report or data file that they believe fulfilled the request. The analysts would then proceed to evaluate the data with no further interaction with the database gurus.
This of course severely hampered analyst work in two major ways: First, analysts could never be certain that the request was understood in the manner that they intended. By having inter-department flows of such requests, a “telephone game” of sorts emerged to confound the information flow.
Second, limited communication between the analysts and the database gurus leads to differing mental models of how the data is sampled, stored, and used. This in turn leads to situations where one group deletes a detail important to another group without realizing there is a problem.
The emerging profession of data scientist attempts to combine analyst, software engineer, and database guru into one person. Doing so substantially reduces the risks described above. To be fair, it creates new challenges (data scientists become “Jills and Jacks of all trades” who occasionally stumble over the subtle features of the formerly separated knowledge domains) – but this is manageable.
However, I do not believe that the data scientist emerged to directly solve these problems. Instead I think they emerged purely because maintaining the department-based functional separation described above reduces an organization’s agility. A data scientist is more likely to spot unique opportunities and warnings than a cluster of business units communicating amongst themselves client/server style.
In this light, it is far easier to explain the complex mix of science, engineering, and technician skills that fills the bioinformatics repertoire; organizational agility mandates the combination.