Image description with Microsoft's Cognitive Services and R

A while back, I created the small package Roxford to access Microsoft’s Cognitive Services API in order to easily recognize objects in images. Back then Microsoft called the service “Project Oxford”, hence the name “Roxford”. Since then Microsoft extended their API to include image tagging, description and celebrity detection. In the following post, I will try to illustrate the functionality and how it is called through the package. To install the package, just follow the guide.

After installing the package, usage is (hopefully) straight forward. Set your API key and call the function, either by providing a path to a local image or to a remote image url. If you use remote images, use functions which end with “URL”. The following code exemplifies this for the “getDescription” for an image function.

res <- Roxford::getDescriptionResponseURL(url, visionKey, maxCandidates = 4)

plot of chunk unnamed-chunk-3

All the text in the chart is provided in the API response. First the title is the “caption text” and the confidence score attached to it. Additionally, the call returns a list of tags associated to the image. Both the tags and the caption look pretty good to me.

Compared to the previous version, the updated API offers a “tagging” service. While the tags in the image above are pretty extensive, this call returns just one categorization.

resTag <- Roxford::getTaggingResponseURL(url, visionKey)
resTag["tags", ]$name
## [1] "person"

resTag["tags", ]$confidence
## [1] "0,999826610088348"

Finally, the last extension is the possibility to call domain-specific models. In a a first step the API returns a list of what domain models are available (currently just ‘celebrities’).

##                  name categories
## models    celebrities    people_
## requestId        <NA>       <NA>

In a second step, one can provide the image resource and the model specification (‘celebrities’) to get a classification of the provided image.

plot of chunk unnamed-chunk-6

Again, all text in the chart is returned by the API. I wonder if Microsoft plans to open up their backend in order for others to plugin in this domain-specific modelling aspect. I could imagine that there are developers who would love to share their models through a centralized market place. Overall I find the API quite convenient to work with. Happy classification :).

Written on December 14, 2016