The Importance of Foundational Data Integrity to Healthcare Analytics

Posted by Tim Huke on April 10, 2018

bigstock-Hand-Writing-Data-Integrity-Wi-160017311My last post addressed a general overview of the entire healthcare analytics continuum - from foundational data integrity to true prescriptive analytics. At the end of that post I promised to dive into each layer with a bit more detail. So today's post will deal with the most important layer in the continuum - foundational data integrity. After all, without high quality, fully integrated and enriched healthcare data, any analytics exercise will be flawed at best.

The rapid move to digital records in health care over the past few years, prompted by regulatory requirements, insurance company billing procedures and patient customer support expectations, has led to an onslaught of data. It is estimated that health care records account for 50 petabytes of data. That is an astronomical amount of data that is difficult to conceptualize, but it is overwhelmingly clear that data will drive the future of health care.

Data alone, however, is insufficient to propel the highest-quality care coordination or predict future cost. The quality of data is paramount, as it provides a foundation for a 360-degree view of a patient and decisions on patient care, drives engagement, care management and wellness initiatives, provides insight into effectiveness and ROI of those programs, as well as projects future plan cost, direction on plan design, and everything else in between.

On the one hand, this amount of data presents countless opportunities for leveraging data analytics. Care coordination can become more fluid, based on measuring and exchanging this information. Diagnoses might be more intelligent because of a complete patient picture. But the immensity of these data sets can be unwieldy, and the lack of common data standards and disparate records don't always ideally mesh. So there are definitely roadblocks and challenges to an ideal analytics solution.

Conversely, however, today's NoSQL, cloud-based analytics platforms, Deerwalk's included, allow for the collection and integration of more types of data than ever before, including eligibility feeds, medical and PBM data, EMR data from onsite clinics, lab, biometric, HRA, workers comp, disability, dental, vision, wellness and care management participation data, and wearable device data (IoT) to name a few. So there are more opportunities than ever before to connect more data types to further coordinate care and see a holistic picture of a member and group's total health profile. This provides an opportunity, as well as a big challenge. With so much data available, how does one collect it, store it, seamlessly stitch it all together, validate it, and start to make sense of it in meaningful ways?

Part of the reason for the underutilization of health care data is the sheer difficulty of collecting the data from all the various vendors in the first place. Incomplete or inaccurate data impedes the optimal use of big data analytics, as does the widespread use of disparate systems.  It's one thing to just collect raw types of data. It's certainly another to scrub, cleanse, integrate and enrich that data through a tightly controlled QA process, turning the individual component parts (i.e. integrating data at member record level across various data types) into an seamless data set that is the accurate, reliable and usable.

Deerwalk prefers to work directly with vendors and assumes the responsibility for data collection pending client approval. All currently operationalized data collection methods are SFTP-based which allows for better security, monitoring, and automation, and Deerwalk incorporates our clients into these existing SFTP data acquisition processes. Deerwalk requests permission from our clients to work with each partner organization to establish a direct SFTP connection to re-route ongoing monthly, weekly or even daily data feeds from the existing vendor to Deerwalk.

Since data quality is key to this optimal flow of information, SaaS tools are available to reduce the cost of ensuring data quality. Deerwalk's Data Factory, which currently collects over 100,000 files a month from over 800 unique data sources on behalf of our clients, is a solution to ensure that your health care organization has a foundation of accurate and reliable data, which can then be stored in a secure HIPAA-compliant environment, tied to a robust analytics engine that will drive a competitive edge.

Deerwalk Data Factory Slide v2

Deerwalk's data management and enhancement solution consists of a secure, flexible and scalable data management platform that can receive frequent data submissions from many different data submitters in a recurring submission/quality control/warehousing process. Data quality is ensured through a multilevel scrubbing process that is built into its ETL. 

Examples of Deerwalk's data enhancements:

  • Deerwalk's procedure and diagnosis groupers.
  • Pharmacy information from First DataBank.
  • Medical Episode Groupers.
  • MARA Concurrent and Prospective Risk Scores.
  • 60+ distinct service categories.
  • Provider standardization and specialty flagging.
  • Member identity and demographic standardization.
  • Member records and profiles.
  • Chronic condition identification.
  • Proprietary utilization event grouping algorithms covering office, urgent care and emergency room visits, inpatient admissions, high-cost imaging, outpatient surgeries and more.
  • Quality Metrics - member-level historical compliance, care gap and risk index data
  • Utilization Metrics - claim groupings for days, visits, admissions, MRIs, surgeries
  • Member Documents - function in aggregate as a master person index
  • Member Months

Deerwalk's solutions ensure foundational data integrity at scale so that organizations can derive actionable insights from reliable and accurate real-world analytics. 

What do you think? Subscribe to our blog, facebook, twitter or LinkedIn page to join the conversation and tell us what you think.


To learn more about Deerwalk's Population Health Analytics solutions, please visit us at, or click below to schedule a demo.