Authenticating Rembrandts: CNNs plus Image Entropy identify the real paintings among the copies


According to a Wall Street Journal article published in 2010, new art scholarship and cataloguing techniques developed over the past 30 years have helped art experts conclude that at least half of the more than 1,000 drawing and paintings attributed to Rembrandt were created by other artists. Some of these works may well be forgeries, created to deceive, but Rembrandt taught at least fifty students to draw and paint and these students began their careers by intentionally copying the all-time master of light and shadow. Their works are not considered forgeries.

A new AI technique developed by husband-and-wife team Steven and Andrea Frank is helping to winnow the real Rembrandts from the stylistic copies and fakes. This new technique replaces human authentication with image tiling, image-entropy analysis, and CNN-based recognition of essential stylistic details encoded into genuine Rembrandt images.

According to a paper titled “Salient Slices: Improved Neural Network Training and Performance with Image Entropy” that the Franks have submitted to the IEEE Transactions on Neural Networks and Learning Systems for review and publication, the total number of works attributed to Rembrandt shrank from an early 20th century estimate of 711 to only 250 by 1989. Much of this winnowing occurred as the result of the Rembrandt Research Project, which was started by the Netherlands Organization for Scientific Research in 1989. Another 90 works have since been restored to the list of authenticated images.

The CNN Opportunity

This low number of authenticated Rembrandt works creates a problem for automated authentication based on convolutional neural networks (CNNs). According to the Franks’ IEEE paper, “One widely cited source estimates that achieving human-level classification performance requires training a neural network with about 5000 labels per class.” Consequently, 350 or so authenticated Rembrandts is much too small a number to meet the required data-set size threshold for effective CNN training.

Effective CNN training depends heavily on the availability of high-quality data sets. Sometimes, however, the data simply isn’t there. In fact, many other applications – such as the identification of unusual and rare medical conditions using histology images – have similar limitations with respect to the size of the available training data sets, so this is not a problem simply for art authentication.

A second CNN-training challenge is image size. The “C” in CNN stands for “convolution,” which is a computationally expensive operation. The required number of convolutions performed (particularly by the initial CNN layers) scales linearly with pixel count of the images being analyzed. Consequently, these computations become problematic for large images. Training with large images will be slow. Large images can exceed a CNN’s memory limitations.

Simplification strategies such as image downsampling or simplifying the CNN architecture sacrifice the CNN’s ability to classify images based on small, subtle features. When it comes to authenticating Rembrandts or positively finding a tumor in a large radiographic image, any resulting errors either way – false positives or false negatives – can be costly.

 Tiling simplifies the problem

For classification purposes, most studies of artwork classification have used a reduced-resolution version of the entire image or they have experimented with hand-engineered feature extraction to identify key characteristics typical of an artist’s work. Art connoisseurs and historians often disagree over the stroke level and other stylistic features that truly characterize an artist’s work. No accepted methodology exists for identifying such features, which can change and evolve over an artist’s career. The Franks wanted to develop an automated method for generating and selecting image tiles from a high-resolution image to avoid the need for artist-specific feature engineering and the inevitable disagreements over subjective stylistic criteria.

The task of developing objective criteria seemed an ideal task for CNNs, which are adaptive and therefore learn their own visual criteria. Image fragments, rather than whole images, have already been successfully employed for CNN-based object-detection and -classification tasks, but these techniques had not yet been used for artwork authentication.

 Using image entropy to select tiles

Tiling a large image into smaller subimages is not difficult. However, selecting the right subimages to use for training is a more complex problem. The Franks needed a mathematically objective way to identify candidate tiles for CNN training. They decided to try image entropy.

The entropy of an image represents the degree of randomness (and therefore the amount of information content) in an image’s pixel values, in the same way that the entropy of a message represents the amount of useful, nonredundant information encoded in the message. Image entropy correlates with the increasingly rich feature content captured in the convolutional layers of a CNN. As it happens, the Franks discovered that image entropy does provide a useful criterion for selecting image tiles to use for artwork authentication.

The Franks’ approach divides an artwork image into discrete but partially overlapping tiles and then selects and retains only those tiles with entropies equal to or exceeding the calculated entropy of the entire artwork. No subimage can contain as much information content as the original artwork, but subimages with comparable information diversity calculated over the entire artwork “pack a similar convolutional punch when processed by a CNN.” The retained subimage tiles capture the most visually complex elements of a portrait including the hands, facial details, and elaborate clothing features. These components parts attract a viewer’s information hungry eyes and, apparently, drive a CNN’s results.

However, image entropy alone does not provide a useful basis for defining minimum tile size. Using larger tiles erodes the benefits of tiling while small tiles miss big features that could be important for classification and excessively small tiles may fail to capture important stylistic details and even stroke-level elements while potentially overemphasizing spurious features such as wear marks and paint “craquelure.”  These exceedingly small tiles may have large image entropies but they’re not helpful when used for authentication.

 Test Results

The Franks tested their algorithm using four CNN architectures and four different tile sizes: 100×100, 200×200, 400×400, and 800×800 pixels. The 800×800-pixel tiles were too large to produce a meaningful analysis, suggesting an upper limit on feature size for artwork classification. Tile-based training sets that satisfied the image-entropy criterion developed with 200×200-pixel tiles drawn from 20 Rembrandt and 15 non-Rembrandt images yielded 4000 non-Rembrandt tiles and nearly 13,000 Rembrandt tiles.

The three CNN architectures tested were:

  • A simple network with three convolutional layers
  • A single dropout regularization layer (probability = 0.5)
  • A model with five convolutional layers and two dropout layers (probabilities 0.2, 0.3)
  • The VGG16 architecture pre-trained on the ImageNet dataset

The five-layer model achieved the best overall performance.

Based on early successes with CNNs trained with these selected tiles, the Franks put their approach to the test by evaluating three works that have been subject to significant attribution controversy:

  • The Polish Rider, once de-attributed but now accepted to be a genuine Rembrandt work.




Averaged probabilities from the models trained by 200×200-pixel and 400×400-pixel tiles classify The Polish Rider as a Rembrandt painting and judge the Portrait of Elisabeth Bas as not by Rembrandt. Both of these CNN classifications agree with the consensus of human experts.

Both models classify The Man with the Golden Helmet as a Rembrandt with overwhelming (99%) probability, which does not agree with human expert consensus. In their paper, the Franks write:

“Probing more deeply into the scholarship surrounding the Rembrandt Research Project’s deattribution of this painting, we find this justification: ‘In particular the thick application of paint to the helmet in contrast to the conspicuously flat rendering of the face, robe and background, which are placed adjacent to each other without a transition, does not correspond to Rembrandt’s way of working.’

“Given the attention paid by experts to such surface detail, we ran the image through our 100×100 model which, intriguingly, classified the painting as not by Rembrandt – but just barely, with a classification probability of 0.49. Considering the probabilities of each model separately, one almost hears echoes of the experts’ colloquy over the last century, and with the same wistful result. Still, we wonder about the reliability of attributions based on features analyzed at small scales, given the performance differences we observed.”


Hardware Acceleration

The Franks’ authentication methodology for training CNNs to recognize features using tiled elements from a limited data set is ripe for acceleration by Intel® FPGAs. Many CNNs are available for instantiation in FPGAs. The Intel® FPGA Deep Learning Acceleration (DLA) Suite provides users with the tools and optimized architectures to accelerate inference using a variety of today’s common Convolutional Neural Network (CNN) topologies with Intel® FPGAs and Intel® Programmable Accelerator Cards (PACs). The Intel® FPGA Deep Learning Acceleration (DLA) Suite, included as part of the Intel® OpenVINO™ Toolkit, also makes it easy to write software that targets FPGA for machine learning inference.

Determining image entropy for large numbers of smaller image tiles can consume a significant amount of time. The equation for calculating image entropy is:

Steven Frank eventually wrote his own image-entropy program in Python to accelerate that part of his technique. He found the available library function too slow. This calculation involves a base-2 log, easily handled with a log lookup table, and a bunch of multiply/accumulate operations. Consequently, this equation can easily be parallelized and accelerated using the thousands of multiplier/accumulators (MACs) and the other programmable hardware resources available in an FPGA.

Acceleration of the Franks’ methodology may have application far beyond the art world, extending into applications as diverse as identifying tumors in large radiology images and the frame-by-frame search for small anomalies in surveillance video streams. While there’s not too much speed required by the authentication of Rembrandts, an activity that’s stretched over more than a century, analysis of radiology images and searching through real-time video are another matter. These tasks are well suited to acceleration by Intel PACs.


For more information about hardware acceleration based on Intel FPGAs and Intel PACs, visit the Intel® FPGA Acceleration Hub.

Published on Categories AI/ML, PAC, Video, VisionTags , , ,
Steven Leibson

About Steven Leibson

Be sure to add the Intel Logic and Power Group to your LinkedIn groups. Steve Leibson is a Senior Content Manager at Intel. He started his career as a system design engineer at HP in the early days of desktop computing, then switched to EDA at Cadnetix, and subsequently became a technical editor for EDN Magazine. He’s served as Editor in Chief of EDN Magazine and Microprocessor Report and was the founding editor of Wind River’s Embedded Developers Journal. He has extensive design and marketing experience in computing, microprocessors, microcontrollers, embedded systems design, design IP, EDA, and programmable logic.