

Few sectors carry as much data complexity as healthcare, and few have been as resistant to the kind of structural reform that has transformed finance, retail, and logistics. The volume of patient data generated every year is staggering, yet much of it sits inaccessible, fragmented across provider networks, and shielded behind regulations that were written to protect privacy rather than to unlock utility. That tension, between protecting health data and making it useful, is the central problem that Olivia Chen, a data scientist and early-stage B2B technology investor, takes up in her essay "The Future of Personal Health Records: Unlocking the Potential of Healthcare Data."
Published in May 2023 on her Substack newsletter Behind The Things, the piece maps three distinct technical and business strategies for addressing what Chen identifies as a shared failure across clinical research sponsors, healthcare providers, and insurance payers: none of them can get reliable, continuous access to the patient-level data they need to improve outcomes, lower costs, or run efficient studies. Chen's analysis is worth revisiting in 2025, as the regulatory conditions she described have only grown more favorable, and the companies she pointed to as early movers have matured considerably.
Before laying out her three solutions, Chen frames the healthcare data access problem as structural rather than technical. Clinical research sponsors need behavioral and health data from study participants beyond the limited touchpoints of a formal study. Providers need longitudinal records to improve operational efficiency. Payers need data to model and manage healthcare economics. All three groups face the same obstacle: HIPAA's strict requirements around personally identifiable information and protected health information make obtaining patient data without explicit consent difficult under most conventional approaches.
The significant detail in Chen's framing is that she identifies the regulatory environment as both the constraint and the opening. The passage of the 21st Century Cures Act, signed into law in 2016 and phased into enforcement starting in 2020, fundamentally reoriented the legal relationship between patients and their own records. The ONC's Cures Act Final Rule supports seamless and secure access to electronic health information, giving patients access to their health data, spurring innovation, and addressing industry-wide information blocking practices. Critically, it calls on the healthcare industry to adopt standardized application programming interfaces, which will help allow individuals to securely and easily access structured electronic health information using smartphone applications.
This is the regulatory shift that made Chen's first solution viable.
The most direct path Chen describes is collecting individual health records through APIs, with patients actively consenting to share their data. The logic is clean: if patients hold the legal right to their records under the Cures Act, and EHR systems are now required to expose that data through standardized FHIR-based APIs, then building platforms that help patients consolidate and monetize or share their own records becomes both legally defensible and commercially interesting.
Chen notes this approach sidesteps the de-identification challenge entirely. When a patient affirmatively consents to share their full record, the HIPAA restrictions that would otherwise apply shift substantially. The commercial angle is compelling: life science companies and payers are willing to pay for access to real-world, longitudinal health data that clinical trial structures rarely capture. The challenge she flags is more practical than legal: finding patients who are both motivated to participate and who have accumulated sufficiently rich records to be useful.
The personal health records software market has grown substantially since Chen wrote this piece. According to GMI Research, the PHR software market size was valued at USD 9.1 billion in 2023 and is expected to exhibit growth at a CAGR of 9.8% from 2024 to 2032, driven by technological advancements, increasing patient awareness, and growing demand for personalized healthcare solutions. Companies including PicnicHealth, Crescendo Health, and Ciitizen, which Chen cites as early operators in this space, have all expanded their platforms since the essay was published, reflecting continued investor and enterprise interest in the consent-driven model.
The second strategy Chen outlines involves a different approach to the consent question. Rather than requiring patients to opt in and share their complete records, de-identified data solutions strip personally identifying information before the data is sold or shared, removing the HIPAA barrier by removing the patient's identity from the dataset entirely.
Chen describes the commercial logic here as targeting a smaller but still substantial market. Payers and providers are natural buyers of de-identified aggregate data for population health modeling and operational planning. Life science companies can use it for epidemiological research, market sizing, and drug development, even if it cannot substitute for the richer longitudinal records that consent-based collection produces.
The technical enablers she points to are worth noting. Blockchain-based approaches and tokenized health data had been gaining traction as mechanisms for maintaining verifiable data provenance without preserving identity, allowing a researcher to know that a record is genuine and traceable without knowing who generated it. The tradeoff Chen identifies is that this approach demands more technical sophistication from startups and may face less commercial demand from life science partners, who tend to prefer identified or re-identifiable data for research purposes.
Companies she named, including Datavent and Briya, sit within a broader ecosystem of health data intermediaries that has continued to expand. The de-identified market has also attracted attention from larger players: in April 2023, Microsoft and Epic announced their strategic collaboration to develop and integrate generative AI into healthcare by combining Azure OpenAI Service with Epic's industry-leading electronic health record software, with a focus on delivering AI-powered solutions integrated with Epic's EHR to increase productivity and enhance patient care. The involvement of major enterprise software companies signals that the infrastructure for managing and monetizing health data at scale is no longer a niche concern.
The most speculative of Chen's three strategies, and arguably the most forward-looking, involves generating synthetic health data through AI. The premise is that if machine learning models can study enough real patient records, they can produce statistically realistic synthetic datasets that replicate the patterns of actual populations without containing information about any real individual.
The appeal for healthcare IT is clear: synthetic data eliminates privacy concerns almost entirely, can be generated on demand at scale, and can be tuned to reflect specific demographic profiles or disease populations that real-world data might underrepresent. Chen observes that this makes synthetic data particularly attractive for improving hospital efficiency models and health economics analysis, even if it is less suited to clinical research, which requires verifiable real-world events.
Chen's caution is appropriately measured. Synthetic data generated from an algorithm carries inherent quality questions; the data is only as good as the model that produced it, and the model is only as good as the real-world training data it consumed. If the underlying records carry biases, underrepresentation, or documentation errors, the synthetic outputs amplify those problems without making them visible.
That caution has proved well-founded. Healthcare regulators have moved carefully on synthetic data validation standards, and the FDA has only recently begun issuing guidance on its use in drug development submissions. The companies pushing hardest in this space, including MDClone, which Chen references, have tended to position their tools as supplements to real-world data rather than substitutes for it.
What ties Chen's three-part framework together is an argument that the technical barriers to healthcare data access have been falling faster than the organizational and cultural barriers. The Cures Act created a legal framework. FHIR standardization created a technical framework. The commercial incentives from life science companies and payers have always been present. What has lagged is patient awareness, startup execution, and the trust infrastructure needed to make consent-based models work at scale.
The 21st Century Cures Act, passed in 2016 and enforced starting in 2020, established a simple but revolutionary principle: patients have the right to access their electronic health information without obstruction. For the strategies Olivia Chen describes, that principle is the load-bearing structure. Without it, Solution A collapses back into a HIPAA negotiation, and Solutions B and C lose the upstream data supply they depend on.
The broader market trajectory supports her underlying thesis. The PHR software market is forecast to reach approximately USD 25.92 billion by 2035, with key factors driving growth including government initiatives promoting PHR adoption, the proliferation of mobile health devices, and the integration of artificial intelligence and machine learning into PHR software for advanced analytics and personalized health insights.
Data scientist Olivia Chen published this analysis when many of these dynamics were early-stage. The companies she identified as interesting were largely pre-scale. The regulatory shifts she described were newly in force. What makes the essay worth engaging with now is that her framework was structural rather than trend-dependent, focused on the underlying incentives and constraints rather than specific product features. Those constraints have loosened on schedule. The question her analysis raises: Which of the three approaches ultimately captures the most value? remains open, and the competitive dynamics between consent-based platforms, de-identified data intermediaries, and synthetic data providers will continue to play out.