Recent Blog Posts

Capitalizing on Non-Volatile Memory Technologies

I recently gave a talk at SNIA’s annual Storage Developer Conference (SDC), filling in for a colleague who was unable to travel to the event.  The talk, “Big Data Analytics on Object Storage—Hadoop Over Ceph Object Storage with SSD Cache,” highlighted work Intel is doing in the open source community to enable end users to fully realize the benefits of non-volatile memory technologies employed in object storage platforms[1].

 

In this post, I will walk through some of the key themes from the SNIA talk, with one caveat: This discussion is inherently technical. It is meant for people who enjoy diving down into the details of storage architectures.

 

Let’s begin with the basics. Object storage systems, such as Amazon Web Service’s Simple Storage Service (S3) or the Hadoop Distributed File System (HDFS), consist of two components: an access model that adheres to a well-specified interface and a non-POSIX-compliant file system.

 

In the context of this latter component, storage tiering has emerged as a required capability. For the purpose of this discussion, I will refer to two tiers: hot and warm. The hot tier is used to service all end-user-initiated I/O requests while the warm tier is responsible for maximizing storage efficiency. This data center storage design is becoming increasingly popular.

 

Storage services in a data center are concerned with the placement of data on the increasingly diverse storage media options available to them. Such data placement is an economic decision based on the performance and capacity requirements of an organization’s workloads. On the $-per-unit-of-performance vector there is often significant advantage to placing data in DRAM or on a storage media such as 3D XPoint that has near-DRAM speed. On the dollar-per-unit-capacity vector there is great motivation to place infrequently accessed data on the least expensive media, typically rotational disk drives but increasingly 3D NAND is an option.

 

With the diversity of media: DRAM, 3D XPoint, NAND, and Rotational, the dynamic of placing frequently accessed, so-called “hot” data on higher performing media while moving less frequently accessed, “warm” data to less expensive media is increasingly important. How is data classified, in terms of its “access frequency?” How are data placement actions carried out based on this information? We’ll look more closely on the “how” in a subsequent post. The focus of this discussion is on data classification. Specifically, we look at data classification within the context of distributed block and file storage systems deployed in a data center.

 

Google applications such as F1, a distributed relational database system built to support their AdWords business and Megastore, a storage system developed to support online services such as Google Application and Compute Engine [2,3]. These applications are built on top of Spanner and BigTable respectively. In turn, Spanner and BigTable store their data, b-tree-like files and write-ahead log to Colossus and Google File Systems respectively [4]. In the case of the Colossus File System (CFS), Google has written about using “Janus” to partition CFS’s flash storage tier for workloads that benefit from the use of the higher performing media [5]. The focus of this work is on characterizing workloads to differentiate these based on a “cacheability” metric that measures cache hit rates. More recently, companies such as Cloud Physics and Coho Data have published papers along similar lines. The focus of this work is on characterizations that efficiently produce a [cache] “miss ratio curve (MRC)” [6-8] Like Google, the goal is to keep “hot” data in higher-performing media while moving less frequently accessed data to a lower cost media.

 

What feature of the data-center-wide storage architecture enables such characterization? In both Google’s and Coho’s approaches, the distinction between servicing incoming end-user I/O requests for storage services and accessing backend storage is fundamental. In Google’s case, applications such as F1 and Megastore indirectly layer on top of the distributed storage platform. However, the Janus abstraction is transparently interposed with such applications and the file system. Similarly, Coho presents a storage service, such as NFS mount points or HDFS mount points, to end-user applications via a network virtualization layer. [9,10] This approach allows for the processing pipeline to be inserted between the incoming end-user application I/O requests and Coho’s distributed storage backend.

 

One can imagine incoming I/O operation requests—such as create, delete, open, close, read, write, snapshot, record, and append—encoded in well-specified form. Distinguishing between, or classifying, incoming operations with regard to workload, operation type, etc. becomes analogous to logging in to an HTTP/web farm [11]. And like such web farms, read/access requests are readily directed toward caching facilities while write/mutation operations can be appended to a log.

 

In other words, from the end user’s perspective storage is just another data-center-resident distributed service, one of many running over shared infrastructure.

 

And what about the requisite data management features of the storage platform? While the end user interacts with a storage service, the backend storage platform is no longer burdened with this type processing. It is now free to focus on maximizing storage efficiency and providing stewardship over the life of an organization’s ever-growing stream of data.

 

 

References

 

  1. Zhou et al, “Big Data Analytics on Object Storage – Hadoop Over Ceph Object Storage with SSD Cache,” 2015.
  2. Shute et al, “F1: A Distributed SQL Database That Scales,” 2013 http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41344.pdf
  3. Baker et al, “Megastore: Providing Scalable, Highly Available Storage for Interactive Services,” 2011 http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper32.pdf
  4. Corbett et al, “Spanner: Google’s Globally-Distributed Database,” 2012 (see section 2.1 Spanserver Software Stack for a discussion on how Spanner uses Colossus) http://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf
  5. Albrecht et al, “Janus: Optimal Flash Provisioning for Cloud Storage Workloads,” 2013.
  6. Waldspurger et al, “Efficient MRC Construction with SHARDS,” 2015 http://cdn1.virtualirfan.com/wp-content/uploads/2013/12/shards-cloudphysics-fast15.pdf
  7. Warfield, “Some experiences building a medium sized storage system,” 2015 (see specifically slide 21 “Efficient workload characterization”).
  8. Wires et al, “Characterizing Storage Workloads with Counter Stacks,” 2014.
  9. Cully et al, “Strata: High-Performance Scalable Storage on Virtualized Non-volatile Memory,” 2014.
  10. Warfield et al, “Converging Enterprise Storage and Business Intelligence: A Reference Architecture for Data-Centric Infrastructure,” 2015.
  11. Chen et al, “Client-aware Cloud Storage,” 2014 http://www.csc.lsu.edu/~fchen/publications/papers/msst14-cacs.pdf

Read more >

Wearables, Cultural Changes, and What’s Next

Read Part I: Transforming Healthcare with Patient-Generated Data

Read Part II: How Wearables are Impacting Healthcare

Read Part III: Challenges of User-Generated Data

Read Part IV: Wearables for Tracking Employee Movement

 

This blog series has been about how wearables have become more than a passing trend and are truly changing the way people and organizations think about managing health. I hear from many companies and customers who want to understand how the wearables market is impacting patient care as well as some of the changes taking place with providers, insurers, and employers. So far, I’ve shared some of their questions and my responses. The final question in this series is:

 

What kinds of organizational and cultural changes are driven by patient-generated data?

 

There is definitely a cultural shift, and you get different adoption and excitement on a clinician-by-clinician basis. It is still early days.Some clinicians are championing patient-generated data while others aren’t buying into its significance.

Where I hope Intel can play a role both near term and going forward is with predictive analytics and using streaming wearable data to help inform predictive models and make them more accurate. As I mentioned in earlier posts, we want to make it easier for health systems and clinicians to adopt a data-driven approach, enabling better allocation of limited resources and, ultimately, improving patient outcomes.

 

I am most excited about the ability to monitor patients and members continuously rather than periodically, moving from episodic to real time. That’s the game changer. And it’s enabled by technologies with the combination of very low power consumption, very small form factor or package, and the ability to send sensor data (either directly or via the ubiquitous smartphone) to the cloud or to backend information systems.

 

As wearables become more pervasive it will be exciting to see the industry move beyond consumer-based wearable devices that were developed for fitness purposes to devices with more sophisticated sensing capabilities targeted for healthcare use cases. I feel these devices will have a significant impact on reducing costs and improving outcomes by monitoring conditions and patients 24×7.

 

What questions about wearables do you have?

Read more >

My Life at Intel: China Software Engineering Director Xiaomei Zhou Shares Her Story

My Intel Journey I joined Intel in 1995 as a Senior Software Engineer at the company’s Arizona site. I then moved on to Oregon, and then California—all with Intel. For the past four years, I’ve been working for Intel from Shanghai. In my … Read more >

The post My Life at Intel: China Software Engineering Director Xiaomei Zhou Shares Her Story appeared first on Jobs@Intel Blog.

Read more >

My Life at Intel: Quality Assurance Sr. Manager Arati Sankhe Shares Her Story

My Intel Journey I joined Intel Security Group, formerly McAfee, in late 2004 as Quality Assurance (QA) Lead. I was given opportunities to aid in development efforts, lead QA teams for multiple products, work with product managers, and manage deployment. … Read more >

The post My Life at Intel: Quality Assurance Sr. Manager Arati Sankhe Shares Her Story appeared first on Jobs@Intel Blog.

Read more >

Intel IoT Ignition Labs Spark Global Internet of Things Collaborations

Intel’s IoT Ignition Labs sparked innovation in rapid succession across the globe this year, enabling the fast creation of collaborative, scalable, standardized solutions. From Dublin to Dubai, the labs provide hands-on opportunities to experience the latest IoT technologies, access to … Read more >

The post Intel IoT Ignition Labs Spark Global Internet of Things Collaborations appeared first on IoT@Intel.

Read more >

The Greater Manchester Health Devolution Plan: Using Technology to Drive Partnerships between Health and Social Care to Improve Health Outcomes

As much of the developed world faces the burden of escalating healthcare costs, ageing populations, and increased incidence of chronic diseases, countries and localities are experimenting with innovative approaches to address these challenges.

 

In a recent visit to Greater Manchester, England, I learned of an innovative initiative recently launched that aims to use technology to create a civic partnership between NHS and social care providers. The principal goals of the initiative are to 1) improve health outcomes, 2) reduce healthcare disparities, and 3) reduce income inequality. While many nations and municipalities are working to achieve these goals, Greater Manchester’s approach to achieving these goals is multi-pronged and systematic, and will use technology in a way that will integrate social services into healthcare delivery.

 

Using Technology for Data Storage, Analysis, and Interoperability

Greater Manchester’s plan calls for the creation of “Connected Health Cities” (CHCs), which will be powered by health innovation centers. These centers will assemble the data from multiple sources. What is especially novel about these CHCs is that the collection, management, and analysis of both health and social care data will happen at a scale that until now has been impossible. Shared protocols for data analysis across the CHCs will allow for timelier and more powerful research studies, and ultimately better informed decision-making.

 

Interoperability and integration of myriad data sources will be a vital component for realizing these goals. Clinical, community, and patient-generated data will be used to inform decision-making not just among clinical providers, but also among public health policy makers, planners, social care providers, and researchers. Finally, technology will also support continuous evaluation of the program, and use of actionable measures to drive decision-making.

 

Decision Making at the Local Level

Another unique feature of this plan is that the UK will provide the locality with 6 billion pounds for Greater Manchester to use at its own discretion. Granting control of the budget to local municipalities will allow for decision making to happen at the lowest possible level.

 

The Greater Manchester Board will set strategies and priorities, but local boards will devise plans that will be tailored depending on the needs of the local environment. The budget will be used not just for healthcare, but also for social programs and public health activities. The technology and data will help to bridge health and social programs at this local level.

 

Tackling Priority Care Pathways

Greater Manchester will focus initially on optimizing four “care pathways.” One pathway will use support tools for self-care to reduce hospital admissions for patients with chronic conditions. Another will support schizophrenia patients by linking self-reported symptoms to responses from community psychiatric nurse visits. Studying the effects of the program on specific pathways will allow for evaluation and iteration to help ensure the program’s success.

 

Public and Patient Engagement

Involvement from the public and from patients is a central pillar of this devolution plan. A panel of 10 patient and public representatives will have a say in how the data is used. The program leaders believe such involvement will ensure sustainability and transparency of these CHCs, and ensure that citizens’ needs are met in this civic partnership.

 

Insights for other Countries and Cities

I am eager to observe the implementation and early results that come from this Greater Manchester initiative, as it is a test bed for the devolution of health and social care. The potential for this initiative to accelerate and scale innovation to reduce disparities and improve population health is exciting. Data will be used from multiple streams in ways that haven’t before been possible, and will be used in a way that is tailored to the local environment. With a growing body of research that highlights the impact that social determinants (e.g., housing, social services and support, access to care) have on healthcare, using technology and data to tackle these issues could help other nations and cities in their efforts to improve population health.

 

Contact Jennifer Esposito on LinkedIn

 

Read more >

5 Questions for Dr. Robert C. Green, Brigham and Women’s Hospital and Harvard Medical School

Dr. Robert Green is director of the Genomes2People Research Program, associate director for Research for Partners Personalized Medicine and associate professor of Medicine at Brigham and Women’s Hospital, a Harvard Medical School teaching hospital in Boston, Massachusetts. We recently sat down with him to discuss his research on personal genomics and where that research is headed in the future. DrGreen.jpg

 

Intel: How has gene sequencing evolved over the last decade?


Green: Genetic sequencing is tremendously exciting and important for the future of medicine. We have all this technology that allows us to sequence, align, and call these variants so much better than we could before. The real question is, how do you integrate that into the practice of medicine? Our research at Brigham and Women’s Hospital is zeroed in on finding out how we can implement genomic information, particularly sequencing, into the practice of medicine for adults or even for newborns.

 

Intel: Are patients interested in consumer genetic testing? If so, why?


Green: Patients want to know if they are at increased risk for particular diseases. They want to know if there are drugs they should or shouldn’t take. Some of them want to know if they are carrying recessive carrier traits that would put them at risk for having a child with a recessive disease if their partner is also carrying a copy of a mutation. A lot of people are really curious about the diseases they already have.

 

Our research helps us understand why people want services like consumer genetic testing. In part, it’s perhaps to predict future illness, but it’s also to explain what they already have or what’s running in their family.

 

Intel: What kind of impact can genomic sequencing have on patients?


Green: Genome sequencing isn’t going to be relevant to everyone’s disease. It isn’t going to be relevant to everyone’s medication. But it is going to be relevant to people with rare diseases and those on some medications. It’s going to be relevant to a lot of people who have cancer. And it’s highly relevant to people who are planning their family and want to do preconception testing to avoid recessive conditions.

 

When it comes to individuals who are dealing with sepsis or bad infection, we’re going to not only sequence those patients, but also sequence the microbiomes of the bacteria that are infecting them.

 

One of the directions for sequencing is to have a file or a database of genomic information a patient can call upon to use when they face a situation where they need a new medication, are going to have a family, or have a mysterious illness. That information will be there. It can be brought up in an electronic health record at the point of care in a decision support manner.

 

Intel: What’s the biggest hurdle right now?


Green: One of the great challenges for the practice of medicine is to keep up with, educate ourselves about, and use research effectively in our practices.

With initiatives to accelerate the implementation of genomics, like the Intel All in One Day goal, it’s going to be important that clinicians and those who are in training—fellows, residents, medical students—become much more familiar with genomics than they are today.

 

Intel: What’s your vision for the future of genomic research application?


Green: I think five years from now, about 20 percent of the population is going to walk into their doctor’s office with their own genome, and doctors are going to have to decide how they want to use this information. I can envision that about 10 years from now everyone will use their genome as a daily part of their medical care.

Read more >

Intel IoT Ignition Lab Spotlights Turkish Internet of Things Innovations

Sitting at the crossroads of historic innovations, Istanbul welcomed another step forward in recent years with the opening of its first Intel IoT Ignition Lab. The Internet of Things (IoT) lab came about thanks to a fruitful collaboration among Intel … Read more >

The post Intel IoT Ignition Lab Spotlights Turkish Internet of Things Innovations appeared first on IoT@Intel.

Read more >

My Life at Intel: Malaysia and Singapore MNC Lead Karen Chow Shares Her Story

My Journey I was born and bred in Kuala Lumpur. I started my first job in Malaysia’s IT industry with a local IT solution distribution house as a product manager in 1996. During my nine years there, I worked with two U.S.-based … Read more >

The post My Life at Intel: Malaysia and Singapore MNC Lead Karen Chow Shares Her Story appeared first on Jobs@Intel Blog.

Read more >

Big Data Webinar Series: 3. Big Data in Clinical Medicine: Bringing the Benefits of Genome-Aware Medicine to Cancer Patients – Available On-Demand

The 3rd and final webinar in Frost & Sullivan’s series on Big Data in Healthcare took place recently and is now available to view on-demand. The webinar is a must-watch for healthcare leaders and features some very practical advice on how to get the most value from the data your organizations holds.

 

Greg Caressi, Sr. VP of Transformational Health at Frost & Sullivan, opened with an overview of how providers are leveraging big data platforms to drive health, concluding with ‘the real bottleneck is in turning data into knowledge’.

 

David Delaney, MD, Chief Medical Officer at SAP Healthcare, took the conversation forward by sharing his thoughts on how HANA® Healthcare In-Memory simplifies and accelerates data analysis for healthcare providers. Intel and SAP have a long-standing relationship and have worked together on SAP HANA® since 2005.

 

Finally, Kevin Fitzpatrick, CEO of ASCO CancerLinQ, gave a fantastic presentation on shaping the future of cancer through rapid learning. A brilliant insight into really important work for the cancer community.

 

Listen to this webinar now on-demand and don’t forget that if you missed the first two webinars you can also view them today using the links below:

 

  • Watch Webinar 3: Big Data in Clinical Medicine: Bringing the Benefits of Genome-Aware Medicine to Cancer Patients
  • Watch Webinar 2: Predictive Healthcare Analytics in a Big Data World: Use Cases from the Field
  • Watch Webinar 1: Future of Healthcare is Today: Leveraging Big Data

Read more >

Driving America’s Intelligent Transportation Future

Intel is highly encouraged by the enactment today of the most innovative and forward-thinking highway bill in history, Fixing America’s Surface Transportation Act (FAST Act).  With this milestone legislation, Congress explicitly recognizes the increasing importance of new transportation technologies to the U.S. economy and global competitiveness … Read more >

The post Driving America’s Intelligent Transportation Future appeared first on Policy@Intel.

Read more >