Recent Blog Posts

How to printf inside (aborted) Intel® Transactional Synchronization Extensions (Intel TSX) transactions

One of the most popular ad-hoc functional debugging techniques is to use the printf or fprintf functions to display the state of variables. However, if these functions are used inside an Intel® TSX transaction they can cause transaction aborts. The reason is that flushing the print output buffer involves an operating system call and an I/O operation: operations that cannot be roll backed by Intel® TSX. That means that the (f)printf output from transaction may be lost due to the machine state roll-back as a result of the transaction abort caused by the attempt to flush the I/O buffer inside the transaction. If the flush happens after a committed transaction then the printf output won’t be lost.1

In general, any transaction abort handler needs to use a fall-back synchronization mechanism that does not involve Intel TSX.  It should, therefore, be possible to see the problem that is being debugged there where printf works as expected. However, what can you do if, for some reason, the problem is not reproducible in the fall-back execution? So far I haven’t had this problem, but if you do please consider the trick shown below.

The idea is to use Intel® Processor Trace (Intel® PT) technology. This records an instruction trace that includes execution inside transactions, but it doesn’t record any data values. To overcome this limitation I map a data value into a code execution path that I can then see in the captured trace.

The basic building block of the method is function that “prints” a char (1 byte) into the processor instruction trace. It takes the character (in the %rdi register) and jumps to that offset in a special code block defined by character’s value. The code block contains 256 1-byte “ret” instructions (return from the function).

tsx_print_char:
    movzbq %dil,%rdi # zero-extend the lowest 8-bits of rdi (input character)
                     # to 64-bits in rdi
    leaq tsx_print_char_0(%rip), %r8 # get the address of ret table below into r8
    addq %rdi, %r8  # add rdi to r8
    jmp *%r8        # jump to r8
    tsx_print_char_0:
    ret # 256x ret
    ret
    ret
    ret
    ret
    ret
    ret
    ...
    ret
    ret
    ret
    ret

The decoder of the processor trace dump can then read the records indicating control transfer from the jmp *%r8 instruction offset to a position in the ‘ret’ table (tsx_print_char_0 label) and interpret them as characters. Using Linux perf this can be done as follows ($7 and $11 are fields in the perf Intel PT dump that contain source and destination of an execution control transfer, 3 is the length of the “jmp *%r8” instruction encoding):

perf script | egrep 'tsx_print_char.*tsx_print_char_0' | awk '{ printf("%c", (strtonum("0x"$11)-strtonum("0x"$7)-3))} '

Using this primitive one can built a function that is compatible with the normal printf:

#include <stdio.h>
#include <stdarg.h>
#include <stdlib.h>

extern "C" void tsx_print_char(char c);

#define TSX_PRINTF_BUF_LEN (1024)

int tsx_printf(const char* format, ...)
{
    va_list list;
    char str[TSX_PRINTF_BUF_LEN];
    va_start(list, format);
    int ret = vsnprintf(str, TSX_PRINTF_BUF_LEN, format, list);
    va_end(list);
    ret = (ret < TSX_PRINTF_BUF_LEN)? ret : TSX_PRINTF_BUF_LEN;
    for(int i = 0; i < ret; ++i)
    {
        tsx_print_char(str[i]);
    }
    return ret;
}

Here is a performance micro-test for the tsx_printf function:

int main(int argn, char * arg[])
{
   cpu_set_t cpuset;
   CPU_ZERO(&cpuset);
   CPU_SET(2, &cpuset); // thread 2
   pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpuset);
   tsx_printf("Some warm up to resolve symbols %llun", RDTSCP());
   printf("Some warm up to resolve symbols %llun", RDTSCP());
   unsigned long long int totaltx = (argn >=2)?atoi(arg[1]):100;
   int  print_type = (argn >=3)?atoi(arg[2]):0;
   unsigned long long int begin, end, i = 0;
   int tmp;
   tbb::speculative_spin_rw_mutex mutex; // TBB mutex using TSX
   begin = RDTSCP();

   for(; i < totaltx; ++i)
   {
       int status = 0;
       tbb::speculative_spin_rw_mutex::scoped_lock _(mutex);
       if(_xtest()) // print inside transaction
       {
           switch(print_type)
           {
               case 1:
                   printf("Hello from an aborted transaction %d!n", i);
                   break;
               case 2:
                   tsx_printf("Hello from an aborted transaction %d!n", i);
                   break;
           }
       }
       _mm_stream_si32(&tmp, 0); // causes abort due to incompatible
                                 // memory type access (non-temporal store)
   }
   fflush(stdout);
   end = RDTSCP();

   printf("total     iterations: %dn", totaltx);
   printf("cycles per iteration: %llun", (end-begin)/totaltx);

   return 0;
}

Compiling and running the test:

g++ -c tsx_print_char.s -o tsx_print_char.o
g++ -Itbb/build/linux_intel64_gcc_cc4.3_libc2.11.3_kernel3.13.5_release -Ltbb/linux_intel64_gcc_cc4.3_libc2.11.3_kernel3.13.5_release -ltbb –O3 test.cpp tsx_print_char.o -o test.x -pthread
perf record -e intel_pt//u ./test.x 10 2
perf script | egrep 'tsx_print_char.*tsx_print_char_0' | awk '{ printf("%c", (strtonum("0x"$11)-strtonum("0x"$7)-3))} }'

The output:

Some warm up to resolve symbols 14970613345637616
Hello from an aborted transaction 0!
Hello from an aborted transaction 1!
Hello from an aborted transaction 2!
Hello from an aborted transaction 3!
Hello from an aborted transaction 4!
Hello from an aborted transaction 5!
Hello from an aborted transaction 6!
Hello from an aborted transaction 7!
Hello from an aborted transaction 8!
Hello from an aborted transaction 9!

The table below shows cycles per iteration when the test runs 10000 iterations on  an Intel® Core™ i7-6700 CPU @ 3.40GHz (“Skylake”) with Linux kernel 4.4 and “performance” power governor (“cpupower frequency-set -g performance”). The output of the normal printf (stdout) is redirected to a file to use buffering and improve speed (flushed just once before the end of the test).

printf variant cycles per iteration

cycles per iteration

with Intel PT collection running

no printf 495 583
buffered printf 697 848
tsx_printf 1015 1356

As one can see the overhead of tsx_printf is similar to that of the normal buffered printf.

The source code is available under conditions of the Intel Sample Source Code license.


1 If the functional issue one wants to debug were transaction aborts in the first place then all printf output before the debugged abort point will be lost. However, the performance monitoring unit is able to provide a good guidance on the type of instruction abort (TSX-unfriendly instruction, access to TSX-incompatible memory type) and its location in the code.

Read more >

Webinar: Introduction to the Intel® IoT Gateway

Join us for the first webinar in the Intel® IoT Commercial Workshop Series. In the opening session, Intel Evangelist Daniel Holmlund will present an Introduction to the Intel® IoT Gateway.

Topics in this session include:

•  How to connect to your Intel® IoT Gateway from your development computer.
•  How to connect to your Intel® Edison board from your development computer using a serial connection.
•  How to connect your Intel® Edison board to your Intel® IoT Gateway’s hotspot.
•  Deploy an application to your Intel® Edison board using the Intel® XDK. 

This webinar will conclude with a Q&A session.      

Date: Thursday, August 11, 2016
Time: 9:00 –10:00 a.m. PST
Presenter: Daniel Holmlund, Intel Corporation
Location: This is an online webinar. To join us, please register today.

Read more >

Advanced Computer Concepts for the (Not So) Common Chef: The Multi-core Industrial Kitchen

This is going to be a short blog as there is no new material to discuss, just a simple extension of the earlier blogs (see Advanced Computer Concepts for the (Not So) Common Chef: The Home Kitchen  and Advanced Computer Concepts For The (Not So) Common Chef: Terminology Pt 1) on how the CPU can be viewed as a chef along with her kitchen and recipe book.

A multicore processor, and most current processors such as the Intel® Core 2 generations, are multicore. They can have anywhere from two CPUs/cores to upwards of 60 cores in the Intel® Xeon Phi processor family.

Being a bright and perceptive audience, I’m sure you’ve figured out from this article’s title what kitchen analogy applies to a multi-core processor configuration.

A multi-core processor is equivalent to having multiple kitchens in the same building. Now, these kitchens don’t need to be in separate rooms. They only need to have duplicate equipment, chefs, etc. See Figure INDUSTRIAL_KITCHEN. Note that these kitchens can share pantries, meaning memory, but the distance to different panties may differ depending upon what kitchen is accessing it. (Hint: Can you guess what topic a future blog might have?)

In such an industrial kitchen, several recipes can be prepared and cooked simultaneously by, say, n different chefs. Similarly, in an n-core multi-processor, n programs can execute simultaneously+

Figure INDUSTRIAL_KITCHEN. A large restaurants with two kitchens and two chefs.

Multi-core Kitchen

 

That’s it. Simple enough, right?

NEXT UP: The pipeline! What is the pipeline? It’s the internal workings of a core.

+ This is different from the multi-tasking that modern operating systems, such as Windows*, OS X*, and Linux* do. Multi-tasking emulates multi-processing by switching quickly between multiple tasks/programs giving the impression that multiple programs are running simultaneously even though only one is really running at a time.

 

Read more >

Intel® Ultimate Coder Challenge For IoT – Team Update: Proximarket

Team Proximarket Goes Live With IoT Retail Tracking

Editor note: This is the first in a series of interviews with leaders of the five international teams competing in the Intel® Ultimate Coder Challenge 2016.

Selected from among 180 worldwide entries, five global teams are competing in the Intel® Ultimate Coder Challenge for IoT, with a common goal – to use IoT to help solve real problems. Tackling challenges from transportation to healthcare, coders are using unique approaches with Intel® edge and gateway devices and common sensors, motors and switches that you would find within our standard Grove* Sensor Kit. Whichever team’s solution is judged to best address the problem in their vertical will be named winner of the challenge.

Team Proximarket – IoT Retail Shopping Carts

Proximarket is designing highly usable industrial network solutions using proximity technology into new configurations. By leveraging existing concepts like Google’s* Physical Web while complementing exciting technology like the Intel® Curie™ module, Intel® Edison module, and the Dell* IoT Gateway, Proximarket is turning existing proximity concepts on their heads by deploying beacon transmitters on consumer carried shopping carts while mounting receiver circuits on store shelves.

Team Proximarket is a two person partnership between leader Michael Schloh von Bennewitz and István Szmozsánszky. Based in Germany, this team is looking to change how we shop. Not only can a retail store understand shopping habits, but consumers can benefit from instant feedback of customer loyalty rewards as they pass products on the shelves.

Michael Schloh von Bennewitz is a computer scientist specializing in network engineering, embedded design, and mobile platform development. Responsible for research, development, and maintenance of packages in several community software repositories, he actively contributes to the Opensource development community. Michael speaks four languages fluently and presents or teaches at technical events every year, and has worked with groups including Cable & Wireless*, Nokia*, Mozilla*, Ubuntu*, ARIN*, Droidcon*, AstriCon*, Minix*, the Mobile World Congress*, and Dockercon*. Michael’s IoT knowlege profits from years of work at telecoms and relationships with industry leaders. He is a developer on the Smartcart project and Intel innovator with the mandate to promote IoT technology.

Q: What does this project mean to you and your team?

A: This project is the pilot service offering of a startup seeking investment and as such carries business interest and personal interest alike. We’re trying to serve store vendors trying to assist family run shops as well as large brick and mortar stores to compete with Internet warehouse operations. We’re also trying to give consumers tools in shopping wisely and avoiding invasive tracking while maintaining the benefits of customer loyalty programs.

Q: Describe the sensors you’re using and any challenges with them?

A: Our service is proximity based, so we’re programming the nRF51822 Bluetooth* Smart circuit in Intel® Curie™ modules to act as a proximity sensor. We’re also using identity tagging of products on shelves and reading them as they’re placed in shopping carts, so for that we use RFID or more likely NFC RF sensors. Lastly, we sense battery discharge and generator voltage to calculate device life of a Smartcart and alert management when individual carts can’t be used and must be recharged.

Q: What software, programming languages, and cloud services are you using and why?

A: We’re using NodeJS* as much as possible due to the stable nature and selection of NPM libraries, particularly good IoT relevant technology support, and rapid prototyping workflow. We’re also using C11 in order to take advantage of the Intel® Curie™ module libraries developed for Zephyr*, which will be our platform of choice when programming Intel® Curie™ module based devices like the Smartcart. We’ll avoid Cloud connectivity, but are definitely testing to see how much utility we’ll lose by this. Our cloud platform of choice is Microsoft Azure*.

Q: Share any challenges, funny stories and solutions that you’ve encountered so far.

A: We actually have not received any of the hardware parts needed to implement our design, so this has forced us to reallocate most our research and development time to searching for sponsors of parts. We have almost half of what is needed to create a Smartcart prototype now, as sponsors come forth and deliver their pledged hardware.

Teamwork while Traveling

We also overestimated the timeliness of the challenge. While it was delayed a few months, our obligations remained constant. This is rather funny, because our summertime obligations consisted of almost perpetual travel. We’ve been trying to work on the design at the same time and at the same place by meeting in random airports or nearby cities where we plan to land.

Regarding solutions, we’re sad to see the Intel® Curie™ modules packaged in Arduino* boards, but happy that they are finally on the market. A solution including the Intel-hosted Zephyr* RTOS is not included in what we received, but we were able to secure a pledge for a partially sponsored JTAG programmer. This device is due to arrive at the end of July and will allow us to finally implement our Smartcart prototype in C11 on Zephyr*. That will be an ultimate solution indeed.

Q: What data are you feeding to the cloud, and how are you using it in your solution?

A: We hope to integrate Smartcart in some niche situations as well as third world countries where little expertise exists relating to cloud logic. We also find problems in explaining data jurisdiction and keeping with data control laws. Lastly, we like the idea of allowing large shops like Costco* to maintain their own internal infrastructure with Redhat* Origin or Openstack, rather than dictate which cloud platform they use.

That being said, we are exploring both MQTT and AMQP interfaces to Microsoft Azure*. Microsoft* has acted in good faith lately, reversing their long term abuse of consumer liberty. On top of that, we’re fans of their cloud technology for technical reasons, and they offer a reasonably priced IoT Hub framework which supports the two protocols we like to use, namely MQTT and AMQP.

Challenge Winners Announced in August

The teams have been steadily coding, wiring and assembling their projects. In the final phase, teams will make final presentations, demonstration videos and display the prototypes that were produced. Follow the Ultimate Coder site showing current progress, pictures of their products, and comments from fans and judges. Judging begins in August with the winners announced soon after.

Read more >

Trending on IoT: Our Most Popular Intel® IoT Developer Stories for July

IoT Kit

1.  Intel® XDK User Guide

Create applications for your Internet of Things (IoT) board using the Intel® XDK User Guide and templates.


Intel® Curie™ Compute Module

2.  What the Intel® Curie™ Compute Module Means to Makers

Learn about the Intel® Curie™ module, which is an Intel® Quark™ system on a chip that provides a complete low-power solution for wearable devices and consumer or industrial edge products.


 Arduino 101*

3.  Getting Started With Node-RED* and Arduino 101* with the Grove* Shield

Find out how to get started with Node-RED* and Arduino 101* (branded Genuino 101* outside the U.S.) using the Grove* Shield.


User Guide for C++

4.  Intel® System Studio IoT Edition User Guide for C/C++

Learn to work with Internet of Things (IoT) projects in C/C++ using the Intel® System Studio IoT Edition.


User Guide for Java

5.  Intel® System Studio IoT Edition User Guide for Java*

Use this step-by-step guide to work with IoT projects in Java* using the Intel® System Studio IoT Edition.


Node Red*

6.  Connecting an Intel® IoT Gateway to IBM Watson*

Follow this quick guide to add the IoT cloud repository to the Intel® IoT Gateway and add support for IBM Watson*. You can start developing applications for this platform in your preferred programming language.


 Intel® NUC

7.  Sensor to Cloud: Connecting the Intel® NUC and Arduino 101* board to the IBM Watson* IoT Platform

Learn how to use an Intel® NUC device to connect sensors on an Arduino 101* board to the IBM Watson* IoT Platform.


Smart Home

8.  IoT Path-to-Product: Smart Home  

There is value in building smart home solutions using standards-based, open platforms. These guidelines can help you create your own smart home project.


IoT Development Made Easy

9.  IoT Development Made Easy

Find out about the collaboration between Intel and IBM to provide professional developers with a complete set of development tools for IoT from edge to cloud.


Smart Home Prototype

10.  IoT Path-to-Product: How to Build the Smart Home Prototype

Follow the steps in the exercise described in this article to create your own IoT solution prototype.


Our team of software developers, experts, partners and enthusiasts have continuously shared innovative and exciting ideas in the Intel® IoT Developer Zone. This month we’ve compiled a list of our most popular IoT stories to guide you through the latest projects in the IoT space. Miss last month? Go to June IoT to catch up.

Top Ten IoT

 

Read more >

Three reasons why developers should care about OpenStack

Why should developers care about OpenStack? Does it really affect their work adding features, debugging code, improving performance, generally pleasing their customers? (Or at least making them less pissed off). Sure, cloud computing is a Big Deal. But isn’t the … Read more >

The post Three reasons why developers should care about OpenStack appeared first on Intel Software and Services.

Read more >