On July 10th, 2019, the White House Office of Management and Budget (OMB) issued a Request for Information (RFI) to identify “needs for additional access to (or improvements in the quality of) Federal data and models that would improve U.S. artificial intelligence (AI) research and development (R&D) and testing efforts.” Under the February 11, 2019 White House Executive Order on Maintaining American Leadership in Artificial Intelligence, the overarching goal of the OMB’s RFI was to “confront fundamental challenges, novel ideas for human and AI collaboration, and the creation of a more trustworthy AI.”
From transportation and agriculture, to manufacturing and healthcare, over 200,000 data sets have already been made available on www.data.gov. And depending on the various types of R&D applications, different datasets and models may be required to (i) accelerate advances in AI, (ii) enhance AI application explainability, and (iii) ensure more innovative, trustworthy, and inclusive AI applications. Michael Kratsios, Deputy Assistant to the President for Technology Policy, agreed, in summary, stating that the “RFI represents yet another step forward in the American AI Initiative to accelerate our leadership and empower our innovators and the American people.”
Intel has long realized that the U.S. has a great opportunity with AI to increase its industrial competitiveness, improve the population’s quality of life, and maintain its leadership stance on the world stage. Through its National AI Research and Development Strategic Plan, the OMB’s National Science and Technology Council has signaled its willingness to act to maintain U.S. leadership on AI. Intel stands ready to work closely with all interested stakeholders to develop viable pathways to pursue the development and adoption of responsible data stewardship to unleash the power of AI. Thus, along with responding to the OMB’s RFI with key recommendations regarding key gaps in data and model availability that are slowing progress in AI R&D and testing, we identified those Federal datasets that agencies should earmark as “most important,” outlined herein.
Quality Improvements to Accessible Data and Models to Improve AI R&D and Testing
In response to the OMB‘s query regarding identification of Federal datasets that agencies should earmark as “most important,” agencies maintaining health, healthcare, and life sciences datasets should be triaged as priority datasets. Specifically, this data will foster the use of complex algorithms and models to estimate human cognition in the analysis of complicated medical data; it will cultivate the ability for algorithms to approximate conclusions without direct human input. As the availability of data increases, so does the potential to provide better services and more effective therapies and treatments. Agencies’ focus on the responsible liberation of health, healthcare, and life sciences data “can help physicians and researchers prevent disease, speed recovery and save lives, by unlocking complex and varied datasets to develop new insights.” 
From medical imaging, clinical systems, & lab/life science, to radiology, telehealth, & electronic health records, Intel is investing in health, healthcare, and life science-related businesses with a goal to use AI to accelerate research and development in the health, healthcare, and life sciences market sector. Intel encourages the U.S. government to earmark the 3477 datasets as “most important,” ensuring that both access to (and quality improvements of) these healthcare-related datasets are set as priorities for all agencies. For example, healthcare-related data can further be used for “scientific research, statistics, development and innovation activities, steering and supervision of authorities, planning and reporting duties by authorities, teaching and knowledge management.”  Nonetheless, in accordance with the globally recognized principle of data minimization, government agency focus on health care & life sciences accessibility should consider the following, in order of priority :
- Aggregate data
- Anonymized personal data
- Personal data that does not contain identifiers that would directly identify a person
- Pseudonymized personal data
- Exceptionally, data containing personal identifiers in well-justified situations
Increased access to data that can predict health outcomes through the use of analytical capabilities such as AI create remarkable potential to solve some of society’s most vexing health problems. However, this same access to data and increased analytical ability demonstrate the need for robust privacy and security controls. “Gaining the value of these technology advances, while still protecting privacy/security is a challenge that needs increased attention.” 
As federal agencies prioritize healthcare data in their efforts to improve AI related research & development and testing, Intel additionally encourages further study and action of the following policy recommendations :
Encourage Better Interoperability between Patient Electronic Health Records (EHRs)
Transitioning between the care of different health care providers and searching for clinical trials in which to participate should be more effective. The current system makes the transfer or sharing of EHRs and other pertinent health and patient data difficult. Each health system and provider maintains its own system of logging and maintaining EHRs, which creates complications when patients want their records used for research purposes or need to transfer their records, especially for an urgent health matter. The current model is not conducive to obtaining important health data quickly: it does not properly encourage the aggregation of data for more effective creation of AI tools to aid the efficiency of care and promote more effective health research. Hence:
- The government should pursue incentives for EHR systems, and the implementations by different providers and payers, to allow for access to a more diverse and greater volume of data.
- Increasingly, data does not need to move to allow for this interoperability. The use of encryption can allow AI tools to analyze data in a federated network of encrypted databases, as mentioned above in the section referencing SMFL.
- Alice Borrelli, former Global Director of Health Policy at Intel, recommended that the government should use its convening power to bring together the different EHR vendors to encourage the creation of a centralized patient access portal.
Provide Better Guidance for Institutional Review Boards (IRBs)
The value of traditional health data decreases as the quality of non-traditional health data increases. In our increasingly digital world, apps and devices like Fitbits are constantly collecting valuable logs of health data from millions of people. Rather than going to the doctor to determine information about your heart rate and blood pressure, an app can measure this information on an ongoing basis, often providing increasingly accurate and granular datasets. Consumer-facing companies are producing healthcare data equivalent to that of medical practitioners but there are historical and practical barriers for researchers to maximize the impact from this data.
Kathryn Marchesini, Chief Privacy Officer at the Office of the National Coordinator for Health Information Technology, described how HIPAA does allow for the aggregation of traditional and non-traditional data for research purposes. However, many IRBs default to requiring patient consent, due to a lack of comfort with risk-based reviews to determine whether the clinical care data can be combined with this non-traditional data and then used for healthcare research. The federal government could provide more guidance and tools to IRBs to simplify the process for gaining access, so they do not default to requiring patient consent.
Encourage Implementation of NISTs Cyber Security Framework by Health Care Providers
Information security is often cited as a reason to not allow access to health data. Entities that store health care data need more detailed guidance on how they can allow for use of the data by AI tools, while still providing robust security. The NIST Framework provides a method to better encourage the right analysis. Many sectors of the economy have implemented NIST’s voluntary risk management structure which creates a structure to assess risk to better understand how to Identify, Protect, Detect, Respond and Recover. The Framework provides the ability for individual sectors to create their own profiles on how to best manage cybersecurity risk in that particular industry.
The health sector was one of the last targeted industries to create a profile, and Health and Human Services has now collaborated with the Department of Homeland Security to create an implementation guide. The guide will help health care organizations prioritize what focus areas will best protect data, while still allowing for the innovative use of data. Government should now look to create incentives to encourage individual healthcare companies to use the guidance, profile, and underlying Framework to understand how best to improve data security while also promoting effective use of that data. Ari Schwartz, the Executive Director of the Center for Cybersecurity Policy and Law, recommended that the government focus on encouraging health care data management companies to use their vendor contracts as a mechanism to promote use of the Framework.
Modify Payment Structures to Optimize for Investment in Health Data Availability
Historically, economic incentives encouraged health care providers to minimize access to the data from the clinical care of their patients. This focus on minimizing access also reduces the extent to which the data can be effectively used by AI tools. More analysis is necessary on how individual doctors can be paid to architect their systems to allow for the federated access to data for better patient care and research. Duke University School of Law Professor Barak Richman recommended that government should take an active role in redesigning how government payments are restructured to modify the current payment model beyond just payment for individualized patient care, and instead should encourage greater investment in the innovative use of data. As the health IT space continues to evolve, these types of policy considerations are needed to enable the benefits using AI tools in the healthcare sector, while protecting consumers and their data from misuse.
Summarizing the Above
Boiling this down to the basic recommendations, here is a summary of Intel’s recommendations regarding the identification of Federal datasets agencies should earmark as most important:
- Health, Healthcare, & Life Sciences Focus – Intel recommends that those agencies maintaining health, healthcare, and life sciences datasets should be triaged as priority datasets
- Interoperability – Intel recommends that government pursue incentives for EHR systems, and the implementations by different providers and payers, to allow for access to a more diverse and greater volume of data.
- Guidance for Institutional Review Boards (IRBs) – Intel recommends that the federal government provide more guidance and tools to IRBs to simplify the process for gaining access, so they do not default to requiring patient consent
- NIST’s Cybersecurity Framework – Intel recommends that the federal government encourage entities that store health care data look to this framework for detailed guidance on how they can allow for use of responsibly liberated data by AI tools, while still providing robust security
- Payment Structures to Optimize for Investment in Health Data Availability – Intel recommends that the federal government and its agencies invest in the additional analysis necessary as to how individual doctors can be paid to architect their systems to allow for the federated access to data for better patient care and research.
 Romao, Mario, “How Finland is Pioneering the Use of Health Data for Secondary Purposes”  IBID  IBID  Hoffman, David, “How to Use AI in Healthcare while Protecting Privacy”  IBID