Domain Knowledge: A Field Guide For New Data Scientists

How you can develop domain knowledge without on-the-job exposure.

Megan Rashid
UX Collective

--

When you just have string, a stick and a stone, you have information. If you put those pieces together that information forms a bow and arrow and represents meaning.
credit: @ash.lmb

The case for domain knowledge:

The secret sauce of data science is a pairing of domain knowledge with technical expertise. Data scientists often lean on subject matter experts (SME) who have a deep understanding of a particular job, process, department, function, technology, machine, material or type of equipment around which they’re trying to implement a solution. Integrating SME’s domain knowledge into the data science process ensures that the designed solution addresses the underlying problem and is fit-for-purpose for the end user.

In the words of the great Yoda, “many of the truths that we cling to depend on our point of view”. The people closest to the problem know the issues and pain points better than anyone else. The data scientist tries to capture this perspective and reframe it through the lens of data science to identify the best approach. In isolation, the two perspectives offer partial solutions. It’s when we combine user experience with data science that we get innovative.

Can new data scientists develop domain knowledge without on-the-job exposure?

In practice, applying data science is domain-specific. The processes, data, and methods used vary between industries. While SME’s should always remain involved in the process, data scientists with domain knowledge are more efficient at driving innovation and change because they can see things through the point of view of their stakeholders. With more demand for data scientists, companies often look for candidates with relevant background experience that can hit the ground running.

Unfortunately, many new data scientists struggle to develop this skillset through online learnings or formal programs. When they enter the job market, they’re often faced with job postings that require domain experience. So begins the endless loop of relevant experience without the opportunity to earn it. Data science without context is like learning to ride a bike without ever having pedaled. You understand the need to balance and how to pedal, but taking one out is a much different experience. While most domain knowledge is developed from hands-on exposure, new data scientists can still demonstrate specialization and remain competitive in the job market by following these guidelines.

How to Develop Domain Knowledge Without Experience

Dart hitting the bullseye of a dartboard.
Photo by Anastase Maragos on Unsplash

Identify what type of data science you want to do.

New data scientists can gain a competitive advantage by identifying a specialty where they want to focus the job search. Are you an environmental activist? Are you more of a creative who loves following social media trends? Perhaps, you’re interested in social justice and community outreach? Taking stock of your interests is a great way of narrowing down the field. Whether it be geospatial, digital marketing, healthcare, or something else, the demand for data science is so robust that new data scientists can be specific in the industry where they want to sharpen their skillsets.

Specificity is important because you want to demonstrate why you want this data science job and not just any position. As with any industry, the way people talk and think within that niche will vary. When in Rome, do as the Romans do. If you can identify and explain some data science opportunities within that field, you’re demonstrating that you’ll be a data scientist who can translate their business problems into actionable solutions. This is also why people who have domain knowledge in their prior roles and then upskill into data science are so valuable — speed to insight.

Do your homework.

There are three main areas of domain knowledge that new data scientists can research — processes, data, methods.

1. Processes: operational settings, functions and technologies specific to this industry; understanding of customer journey within industry context

2. Data: internal and external datasets typically utilized

3. Methods: typical data science methods implemented and future areas of innovation within industry

In my career thus far, I’ve had to develop domain knowledge in three industries — commercial insurance, healthcare, and marketing. I’ll use the example of commercial insurance to illustrate what domain knowledge I needed in my role because insurance is an industry where not having prior experience is the norm for most roles.

Processes

To understand the processes that you’ll be exposed to, it’s best to start with the customer journey. Google “customer journey in x” and you’ll find countless examples and diagrams of what that might look like. What goods or services are being sold or provided? Who is the customer or client? How do they interact with this business?

I once worked for a small business specialist in the commercial insurance industry who sold coverage to small business owners through an online portal. Before my interview, I actually went to their portal and began the process as if I was a customer. I researched “customer journeys in insurance” to understand the typical quote, coverage and claims experience. Visiting their website gave me a better understanding of the specific nuances to their customer experience. I took note of the fields they collected in the quote process as well as the options within dropdowns or click lists. These all correlate to data collected behind the scene. I then repeated this exercise with some of their competitors. This simple exercise helped me understand their appetite, products offered and what made them standout amongst competitors. In my interview, I was able to reference these observations and demonstrate how I could look at their problems from different angles.

The operational processes are where you’ll see a lot of jargon and nuance specific to that industry. While you won’t be able to understand everything until you’re in your role, you can get a sense of the typical technologies and operational structures you’d find within that organization. For example, all insurance companies will have a CMS or claims management system. I didn’t have experience with their claims management system because that is proprietary software specific to their company. However, I could demonstrate that I knew what a claims management system was and my enthusiasm for learning the intricacies of that dataset. You might get lucky and see these systems or structures listed specifically within the job posting. If you don’t have hands-on experience, which you probably won’t if you’re a new data scientist, then strive for a high-level understanding so you can ask more specific questions in your interview.

Data

It is highly unlikely you’ll have access to data from the company you’re applying to before you get the job. Organizations like Spotify or The New York Times have open-source APIs you can access without being an employee. If you can access a dataset like this then include a project in your portfolio specific to that company. However, for most roles, this simply will not be an option.

For most industries, you will find organizations rely on similar internal and external datasets for their key operations. You can generalize the customer journey to understand data availability. For example, my research allowed me to understand the basic insurance customer journey consisted of three parts: quote, bind and claims. I learned a lot about the types of data collected during the quote phase by interacting with the company’s and competitors’ portals. I also learned that bind data would include features around the customer and coverage through industry publications. My research also highlighted the unique challenges present in claims data because commercial claims are infrequent events and not a lot of data exists.

External datasets could include industry standardization codes, market trends, brand tracking, etc. Standardization codes allow organizations to organize and link internal data to external systems, which may be a part of regulatory reporting and management. In commercial insurance, you’ll see data like SBA or NAICS and SIC. In healthcare, you will come across CMS ICD10 and CPT/HCPCS. Industry standardized data comes with vast documentation that you can become familiar with before interviewing. Taking this initiative also shows that you’re a self-starter who can hit the ground running once hired.

Methods

The ability to identify and apply data solutions in a specific context is the pièce de résistance for any data scientist. You’re being hired for the lens through which you see problems just as much as you are for your technical aptitude. Become familiar with the key areas, analytical methods and opensource technologies utilized in that industry.

Stick with use cases. Industry publications and AI product rollouts are great places to start. What types of analytics align to parts of the customer journey? For insurance, you can divide analytics into several areas and determine what analytical methods can be implemented around those problems.

Examples of insurance analytics areas. 1. Product management — market analysis, product profitability simulation, customer segmentation; 2. Underwriting — business process improvement, risk evaluation, business case development; 3. Claims — fraud detection, non-payment of claims, claims process improvement; 4. Customer — call center improvement, data-driven marketing, customer churn.

For fraud detection in claims, you can learn anomaly detection and related Python libraries like PyOD (Python Outlier Detection). For customer churn, you could look into forecasting with Facebook’s Prophet library. Perhaps, you could learn PDF scraping to improve operational processes. Be intentional about the types of projects you select to build your portfolio. If building a portfolio isn’t where you’re at right now, then become conversational in these topics to prepare for your interview.

Be specific. Be intentional. Be creative.

Good luck out there!

--

--

Data Scientist | Using data to help you build a winning sales machine