Data scientist makes peace with web programming

True to my hacker roots, I prefer command line interfaces (CLIs) to graphical user interfaces (GUIs). That sentiment compounds when the GUI is delivered through a web browser. However, I recently—finally—accepted the fact that the web browser is the most important user interface out there, and the only user interface that most scientists will bother with. Moreover, I now believe that the ability to write web applications firmly belongs in any data scientist’s skill set.

Below I outline how I got to this conclusion.

Web applications’ utility to cost ratio

From my point of view, browser-based GUIs have a very low utility to cost ratio:

Suppose you are writing an application to perform a scientific calculation where a user sets input parameters, runs the calculation, and retrieves numeric results. (These operations describe 99% of all scientific software). Whether creating a CLI or a GUI, you incur a baseline creation cost associated with writing the mathematical portion of the program (developing the model or simulation, validating input, etc.), plus a baseline maintenance cost of keeping the mathematical portion running smoothly (updating source data, dealing with weird outliers, etc.).

The creation cost then diverges at the construction of the user interface: CLIs incur minor additional creation cost—you write a few “print” statements and parse the command line. For web-based GUIs, at minimum you have to muck with HTML form processing and configure an HTTP server.

Maintenance cost also diverges according to user interface type: CLIs tend to run consistently forever once installed on a Linux box. Web-based GUIs by contrast, even when installed on the same Linux box, operate at the mercy of IT infrastructure. Any change in firewall settings, or a crash in one of many interdependent servers, can render the application useless. Therefore keeping a program running that provides a web-based GUI requires more system administration resources than its CLI analogue would.

Modern web development technologies change the utility to cost ratio

Lower creation cost

Even up to the end of the last decade, my web applications employed late 1990’s techniques (e.g. Perl CGI and JSP) to get their business done. This worked fine for all my scientific computing tasks, but distracted me from the significant changes in web development that emerged in the mid 2000’s.

Simply put, web development is easier now. Using modern frameworks such as Django, one can create database-driven, well-organized, aesthetically pleasing, and robust web applications very quickly. These frameworks encapsulate the most common web programming tasks, leaving the developer free to attend to the business logic of the application. While this work still does not proceed as fast as simply adding “print” statements to a CLI, by reducing the programming tedium, the frameworks have made web development more palatable to data scientists who just want to get on with their craft.

Increased utility of web applications

Advancement in client-side web browser scripting, and the whole Web 2.0 ethos in general, has exponentially increased the utility of modern web applications over their 1990’s forbearers. Improvements in the CLI world over the same period (the thoroughly awesome IPython for example) are small by comparison. Modern client-side scripting improved the interactivity of web applications, almost to the point of rivaling the interactive ability of CLI applications. While I still think CLI provides greater interactive flexibility than a web app, the gap between them has closed enough to make web apps compelling alternatives for many tasks.

The trend

The recent lowering of web app creation cost and the corresponding increase in their utility suggests an exponential increase in web apps’ utility to cost ratio, as shown in the following (conjectural) graph:

Beyond application utility: making our work visible

The above calculation assumes the same number of users would exist no matter what the user interface looks like. But seriously, UI utility for existing users is a nice consideration, but the ability to attract users is more important.

Here we have a common mix of idealistic and practical motives: encouraging use of our software by a large community, and keeping our projects funded:

Most scientists do not want to be UNIX hackers

The fact remains that most scientists do not want to mess with the UNIX command line, so the best CLI program in the world is going to get limited attention in that community. (I honestly thought this would change during the last decade, but the condition persists). Making scientific computations available through a web interface ensures wider use.

Scientific computing folks need all the “marketing” they can get

While the emergence of Big Data has given our field a major boost, we still need to make our work visible to business leaders and researchers who are uninterested in the UNIX shell. Web applications accomplish this task fabulously; they help us promote our work. This helps keep the lights on.

Web programming skills’ firm place in data scientists’ repertoire

Based on the arguments above, I now believe that web programming skills belong in the repertoire of any scientist who writes a significant amount of code.

Post Author: badassdatascience

3 thoughts on “Data scientist makes peace with web programming

Leave a Reply

Your email address will not be published.