Along with the explosive popularity of data science in multiple industries, a new movement has emerged: democratizing data science. This is the idea of bringing the seemingly ethereal practice of data science and machine learning to people in multiple types of roles through the use of out-of-the-box data science development tools. Although this is an exciting movement, there are some concerns- users of these tools may not be trained in data science methodologies or the process behind designing and extracting an experimental cohort from the data and developing an algorithm. This brings up an important question: who exactly is trained to fulfill data science roles?
I’m often asked if you need a Doctor of Philosophy (Ph.D.) degree to be a data scientist, and the ambiguous and uncomfortable answer is that it depends. The title ‘data scientist’ is comprised of 2 words, ‘data’ and ‘scientist.’ Merriam-Webster defines scientist as “a person learned in science and especially natural science; a scientific investigator.” Arguably, the training and curriculum for terminal degrees are designed to train individuals to be scientists. Therefore, do you need a terminal degree in a STEM or other field to be called a data scientist, or any type of scientist? And provocatively, does allowing individuals without such training to carry the title of ‘scientist’ lessen the value of advanced training? And where does a master’s degree (M.S.) fit it?
The answers to these questions are controversial, evidenced by largely anecdotal and conflicting opinions. I have heard, and perhaps even said, that a Ph.D. degree teaches you a way of thinking. You learn to apply and execute the scientific method to a myriad of problems; whatever the business may throw at you. But Ph.D. training inherently teaches a much broader skill set.
First, let’s articulate the skills data scientists need well beyond the often quoted trifecta of mathematics/statistics, computer science, and domain knowledge, and examine whether or not Ph.D. degrees are necessary training and education for success in data science roles.
Why a Ph.D. helps you
Individuals who seek and earn terminal degrees in their field of study most likely have common and inherent strengths that may uniquely position them for excelling in a data science role. Individuals with M.S. degrees often receive practice in these skills, but perhaps to a much lesser degree. Consider the following, keeping in mind that demonstration of these skills and attributes without being prompted by a specific expectation is exceptionally valuable:
Ability to be self-accountable. Doctoral research is often somewhat reclusive because trainees are being taught how to make individual contributions to their scientific discipline. Meaning, the individual nature of this work invokes self-accountability or increases the likelihood that the individual will take ownership and responsibility for driving projects forward. This is a particularly valued skill because it demonstrates leadership and augments project momentum.
Ability to teach yourself. Taking the initiative to teach yourself something you don’t know in order to perform the task at hand can prove valuable to both the employee and the company. Knowing how and when to leverage resources (i.e., experts in the field, online training, and others) to teach yourself a new skill and apply it to a current or upcoming project demonstrates initiative, willingness to stretch yourself, and dedication to the company. Clear demonstration of this signals preparation for stretch goals. This skill is perhaps not entirely unique to Ph.D. training, but practice of this skill is certainly a focus of Ph.D. programs.
Ability to intuit. I believe this is a skill often overlooked, primarily because it’s difficult to quantify. The ability to successfully develop and apply intuition in problem solving, analytical interpretation, programming, and other tasks can increase your efficiency and decrease development time. This skill is exceptionally important when handling vague and ambiguous requirements and projects, which is often the case. Admittedly, the ability to intuit also increases with experience.
Ability to manage and execute long term, complex projects. Doctoral projects in STEM fields are often several years in duration and experience scope creep, although academicians don’t typically use the term ‘scope creep.’ Although this ability can work against Ph.D. data scientists (see below), the ability to plan and execute a complex project is particularly useful in a data science team. Further, breaking work down into smaller pieces and knowing when to pivot (or fail) helps you succeed in a business environment.
Ability to apply relevant peer-reviewed literature and contribute to peer-reviewed literature. This goes well beyond maintaining a library of papers. What are you learning from already published work? How are you applying that to your current work? Data scientists with Ph.D. degrees receive the appropriate training to correctly and thoroughly apply peer-reviewed findings, but also scrutinize and evaluate the rigor of such publications to know when to dismiss published findings.
Ability to dissect and solve a problem. The cornerstone of advanced scientific training is hypothesis testing, and this skill goes beyond just knowing the scientific method. It includes understanding how to design robust experiments with data and being able to succinctly articulate the limitations, ability to know when complex solutions are needed and not needed, ability to correctly apply and interpret statistical analysis (i.e., don’t overvalue the p-value!), among others. Ph.D. training is exceptional in honing this skill.
Ability to perform scientific writing. If you work at a company that doesn’t have a research focus, this may not have as much value, but experience in scientific writing helps not only with research papers, but also in crafting documentation for the business, clients, and/or users of your models. Being able to concisely describe complex work in lay language, and especially being able to articulate the limitations and biases, is essential, for both written and oral delivery.
Why a Ph.D. hurts you
Let’s not sugar coat it–earning a Ph.D. degree can lift the veil on, and possibly even promote, various attributes that may be harmful to success in working as a data scientist in the private sector. Consider some of the attributes that may be hindrances:
Lack of ability to work in a team. Ph.D. work is very often solitary, and you are trained to think and function independently. This is not often true in private sector data science- you not only need to coordinate with other data scientists, but you also need to work in cross-functional teams with project managers, software developers, account managers, and others. Ph.D. training doesn’t provide preparation for this workflow.
Lack of practice in managing and executing short term, complex projects. As referenced above, experience in managing long term projects was a reason for why a Ph.D. may help you, but it’s a double-edged sword because it may also hurt you, especially at a small start-up company. Companies routinely invest six-figure and above salaries for data scientists and don’t have much tolerance for working heads-down on a project for months. Unlike your dissertation, companies expect projects to complete in weeks or months and value to be delivered along the way. This can be a challenge for people with a Ph.D.
Lack of broad communication skills. In an academic setting, graduate students and postdocs are often surrounded by and work with individuals with similar background and training as themselves. Outside of academia, however, you must work with teams with different kinds of training. The consumers of your work might have a hard time understanding box plots or pie charts. The mean values and 95% confidence intervals might be meaningless. The consumers of data science models may care much more about interpretability and business context than decreasing false positives. Knowing how to create the right visualizations for varying audiences, such as lay or business individuals, and understanding the consumers of your work is often not inherent to individuals with Ph.D. degrees.
Lack of experience creating science for consumers. In an academic setting, success is measured by the number of peer-reviewed papers you publish and grants you get funded, but this is not the case for data science groups at companies. Success and value of your work is determined by your customer, most likely the product team, and their requirements can deviate from what would be publishable. Individuals with Ph.D. training have to learn what resonates with the business, clients, and consumers, and be able to translate their knowledge and expertise into useable science products. You are not guaranteed that experience in a Ph.D. program.
Lack of business understanding and acumen. Executives speak a very different language from academicians, although some of it really means the same thing. However, a lack of understanding what makes a business successful and what drives value can be deleterious to Ph.D.-trained scientists working in a business setting. Speaking different languages can result in misconceptions and missed expectations for what data science can offer the business. Ph.D. trained scientists have a learning curve to overcome here since their training environment is very different compared to a business setting. Trainees in Ph.D. programs should consider internships or other types of extracurricular activities to gain business skills.
Now, let’s explore the variability of the data science vertical across multiple companies and industries and how that may influence a possible Ph.D. requirement for data scientists. How might these disparate data science functions be enhanced or harmed by staffing your data science team with Ph.D.’s?
Variability in how companies use data science is a factor
When evaluating whether or not a Ph.D. degree is necessary to be a data scientist, we must consider the needs of the company and how data science fits into the company’s overall strategy. Like axialHealthcare, some data science teams are structured to build predictive models and classifiers for external products. Other data science teams are tasked with using artificial intelligence and machine learning methods to enhance operational efficiency internal to the business. Some data science teams are centralized, others are dispersed or integrated within several functional areas of the business. Some data science teams have a research component with an expectation of peer-reviewed publications as a success measure, especially in the healthcare space where there is a strong expectation of evidence-based products. Other teams don’t have this expectation from the business. I would argue that if your company has a strong research component, then it would be critical to staff a significant portion of your data science team with employees with Ph.D. degrees.
Ultimately, when businesses are thinking about adding data science to their strategy, executives must spend time thinking and concisely define why the business needs data science, what expectations are realistic for a data science function, and what skills are necessary to ensure success of data science. The goals of a knowledge-generating data science team are usually to formulate processes for extracting insights and prediction models from data and to create ideas and concepts. These knowledge assets are indeed abstract, so enabling them in the business is critical to getting value out of the data science function. Defining who is responsible and accountable for enabling this knowledge is critical, because as Peter Drucker indicated in his book The Effective Executive, “Knowledge is useless to executives until it has been translated into deeds.” Formulating a data science strategy early will inform if you need employees with advanced degrees on your team. And in my opinion, you likely need a healthy mix of skills and experience. And that means data scientists with Ph.D. degrees and those without them and ensuring democratized, black box data science tools are leveraged smartly, valuably, and rigorously.
Author: Lindsey Morris
Lindsey Morris is the Director of Data Science & Analytics at axialHealthcare in Nashville, TN. In her current position, she leads a team of five data scientists who perform machine learning, statistics, and descriptive analytics on medical and pharmacy claims data. Lindsey and the data science team at axialHealthcare provide data-driven support tools to physicians and other healthcare providers to better manage their patients with acute and chronic pain conditions and opioid-consuming patients.
She obtained her B.S. from the University of Tennessee, Knoxville and a Ph.D. from Vanderbilt University, both in Chemical Engineering. After completing a postdoctoral appointment in the Department of Medicine at Vanderbilt in 2015, she entered the field of data science at axialHealthcare. During her time at Vanderbilt, she worked on a drug discovery project to characterize the efficacy of novel type 2 diabetes therapeutics, served as the President of the Postdoctoral Association, was awarded an F32 individual postdoctoral fellowship from the NIH, and competed in the TechVenture Challenge hosted by Life Science Tennessee. Lindsey consistently engages with the local data science community by organizing and presenting at meetups, writing blogs and opinion pieces, reviewing abstracts for local conferences, and designing projects and assisting in career development for the Nashville Software School. Lindsey is a graduate of the Greater Nashville Technology Council’s (NTC) Emerging Leaders in IT (ELITE) program and is the recipient of NTC’s 2019 Data Scientist of the Year award.
Drucker, Peter F. (1967). The Effective Executive. New York, NY. HarperCollins.