In our initial blog on big data, we posed the question: “Is big data a big boondoggle or a big solution?” Based upon our experience of helping literally thousands of healthcare providers use vast amounts of financial and clinical data to improve their revenue cycles and care delivery – we believe that big data can provide big answers. What exactly is big data? A report delivered to the U.S. Congress in August 2012 by the TechAmerica Foundation defines big data as: large volumes of high velocity, complex, and variable data that require advanced techniques and technologies to enable the capture, storage, distribution, management, and analysis of the information.
Who is using big data today? Industry, healthcare and retailers — Wal-Mart’s data warehouses now include some 2.5 petabytes (a petabyte of storage is about one million gigabytes) of information, the equivalent of roughly half of all the letters delivered by the U.S. Postal Service in 2010. That volume of data is 160 times larger than all the holdings of the U.S. Library of Congress.
Healthcare is producing huge amounts of big data? U.S. healthcare big data alone is truly mind-boggling; it reached 150 exabytes in 2011 (1 exabyte of data = 1,000 petabytes or 1 billion gigabytes. Five exabytes of data would contain all the words ever spoken by human beings on earth. At this rate, big data for U.S. healthcare will soon reach zettabyte status (1 zettabyte = 1,000 exabytes, and not long after that, yottabyte status will be achieved (1 yottabyte = 1,000 zettabytes).
I said mind-boggling, didn’t I?
Here’s a specific example: Kaiser Permanente, the California-based health network, which has more than 9 million members, is estimated to have between 26.5 petabytes and 44 petabytes of patient data under management just from electronic health record (EHR) data, including images and annotations. This is the same amount of information that would be contained in 4,400 Libraries of Congress.
What does healthcare big data entail?? Healthcare big data can be comprised of the following categories or streams of information:
1. Web and social media data; it can also include health plan websites, smartphone apps, etc.
2. Machine-to-machine data, collected from sensors, meters, and other devices.
3. Transaction data from healthcare claims and other billing records, available in semi-structured and unstructured formats.
4. Biometric data, including x-rays and other medical images, blood pressures, pulse readings, fingerprints, genetic information, handwriting, retinal scans, etc.
5. Human-generated data — unstructured and semi-structured data such as EHRs, physicians’ notes, email, and paper documents.
In recent years, it has become increasingly apparent that multiple streams of data like these can be leveraged with powerful new collection, aggregation, and analytics technologies and techniques to improve the delivery of healthcare at the individual patient level, as well as at the levels of disease and condition-specific populations. Big data promises to ease the transition to real data-driven healthcare, allowing healthcare professionals to improve the standard of care based on millions of cases, to define needs for subpopulations, to make more personalized decisions for individual patients, and to identify and intervene for population groups at risk for poor outcomes. Stay tuned for more, and …