Visual search engines will be at the center of the next phase of evolution for the search industry, with Pinterest, Google, and Bing all announcing major developments recently.
How do they stack up today, and who looks best placed to offer the best visual search experience?
Historically, the input-output relationship in search has been dominated by text. Even as the outputs have become more varied (video and image results, for example), the inputs have been text-based. This has restricted and shaped the potential of search engines, as they try to extract more contextual meaning from a relatively static data set of keywords.
Visual search engines are redefining the limits of our language, opening up a new avenue of communication between people and computers. If we view language as a fluid system of signs and symbols, rather than fixed set of spoken or written words, we arrive at a much more compelling and profound picture of the future of search.
Our culture is visual, a fact that visual search engines are all too eager to capitalize on.
Already, specific ecommerce visual search technologies abound: Amazon, Walmart, and ASOS are all in on the act. These companies’ apps turn a user’s smartphone camera into a visual discovery tool, searching for similar items based on whatever is in frame. This is just one use case, however, and the potential for visual search is much greater than just direct ecommerce transactions.
After a lot of trial and error, this technology is coming of age. We are on the cusp of accurate, real-time visual search, which will open a raft of new opportunities for marketers.
Below, we review the progress made by three key players in visual search: Pinterest, Google, and Bing.
Pinterest’s visual search technology is aimed at carving out a position as the go-to place for discovery searches. Their stated aim echoes the opening quote from this article: “To help you find things when you don’t have the words to describe them.”
Rather than tackle Google directly, Pinterest has decided to offer up something subtly different to users – and advertisers. People go to Pinterest to discover new ideas, to create mood boards, to be inspired. Pinterest therefore urges its 200 million users to “search outside the box”, in what could be deciphered as a gentle jibe at Google’s ever-present search bar.
All of this is driven by Pinterest Lens, a sophisticated visual search tool that uses a smartphone camera to scan the physical world, identify objects, and return related results. It is available via the smartphone app, but Pinterest’s visual search functionality can be used on desktop through the Google Chrome extension too.
Pinterest’s vast data set of over 100 billion Pins provides the perfect training material for machine learning applications. As a result, new connections are forged between the physical and digital worlds, using graphics processing units (GPUs) to accelerate the process.
In practice, Pinterest Lens works very well and is getting noticeably better with time. The image detection is impressively accurate and the suggestions for related Pins are relevant.
Below, the same object has been selected for a search using Pinterest and also Samsung visual search:
The differences in the results are telling.
On the left, Pinterest recognizes the object’s shape, its material, its purpose, but also the defining features of the design. This allows for results that go deeper than a direct search for another black mug. Pinterest knows that the less tangible, stylistic details are what really interest its users. As such, we see results for mugs in different colors, but that are of a similar style.
On the right, Samsung’s Bixby assistant recognizes the object, its color, and its purpose. Samsung’s results are powered by Amazon, and they are a lot less inspiring than the options served up by Pinterest. The image is turned into a keyword search for [black coffee mugs], which renders the visual search element a little redundant.
Visual search engines work best when they express something for us that we would struggle to say in words. Pinterest understands and delivers on this promise better than most.
Pinterest visual search: The key facts
- Over 200 million monthly users
- Focuses on the ‘discovery’ phase of search
- Pinterest Lens is the central visual search technology
- Great platform for retailers, with obvious monetization possibilities
- Paid search advertising is a core growth area for the company
- Increasingly effective visual search results, particularly on the deeper level of aesthetics
Google made early waves in visual search with the launch of Google Goggles. This Android app was launched in 2010 and allowed users to search using their smartphone camera. It works well on famous landmarks, for example, but it has not been updated significantly in quite some time.
It seemed unlikely that Google would remain silent on visual search for long, and this year’s I/O development revealed what the search giant has been working on in the background.
Google Lens, which will be available via the Photos app and Google Assistant, will be a significant overhaul of the earlier Google Goggles initiative.
Any nomenclative similarities to Pinterest’s product may be more than coincidental. Google has stealthily upgraded its image and visual search engines of late, ushering in results that resemble Pinterest’s format:
Google’s ‘similar items’ product was another move to cash in on the discovery phase of search, showcasing related results that might further pique a consumer’s curiosity.
Google Lens will provide the object detection technology to link all of this together in a powerful visual search engine. In its BETA format, Lens offers the following categories for visual searches:
Some developers have been given the chance to try an early version of Lens, with many reporting mixed results:
Looks like Google doesn’t recognize its own Home smart hub… (Source: XDA Developers)
These are very early days for Google Lens, so we can expect this technology to improve significantly as it learns from its mistakes and successes.
When it does, Google is uniquely placed to make visual search a powerful tool for users and advertisers alike. The opportunities for online retailers via paid search are self-evident, but there is also huge potential for brick-and-mortar retailers to capitalize on hyper-local searches.
For all its impressive advances, Pinterest does not possess the ecosystem to permeate all aspects of a user’s life in the way Google can. With a new Pixel smartphone in the works, Google can use visual search alongside voice search to unite its software and hardware. For advertisers using DoubleClick to manage their search and display ads, that presents a very appealing prospect.
We should also anticipate that Google will take this visual search technology further in the near future.
Google is set to open its ARCore product up to all developers, which will bring with it endless possibilities for augmented reality. ARCore is a direct rival to Apple’s ARKit and it could provide the key to unlock the full potential of visual search. We should also not rule out another move into the wearables market, potentially through a new version of Google Glass.
Google visual search: The key facts
- Google Goggles launched in 2010 as an early entrant to the visual search market
- Goggles still functions well on some landmarks, but struggles to isolate objects in crowded frames
- Google Lens scheduled to launch later this year (Date TBA) as a complete overhaul of Goggles
- Lens will link visual search to Google search and Google Maps
- Object detection is not perfected, but the product is in BETA
- Google is best placed to create an advertising product around its visual search engine, once the technology increases in accuracy
Microsoft had been very quiet on this front since sunsetting its Bing visual search product in 2012. It never really took off and perhaps the appetite wasn’t quite there yet among a mass public for a visual search engine.
Recently, Bing made an interesting re-entry to the fray with the announcement of a completely revamped visual search engine:
This change of tack has been directed by advances in artificial intelligence that can automatically scan images and isolate items.
The early versions of this search functionality required input from users to draw boxes around certain areas of an image for further inspection. Bing announced recently that this will no longer be needed, as the technology has developed to automate this process.
The layout of visual search results on Bing is eerily similar to Pinterest. If imitation is the sincerest form of flattery, Pinterest should be overwhelmed with flattery by now.
The visual search technology can hone in on objects within most images, and then suggests further items that may be of interest to the user. This is only available on Desktop for the moment, but Mobile support will be added soon.
The results are patchy in places, but when an object is detected relevant suggestions are made. In the example below, a search made using an image of a suit leads to topical, shoppable links:
It does not, however, take into account the shirt or tie – the only searchable aspect is the suit.
Things get patchier still for searches made using crowded images. A search for living room decor ideas made using an image will bring up some relevant results, but will not always hone in on specific items.
As with all machine learning technologies, this product will continue to improve and for now, Bing is a step ahead of Google in this aspect. Nonetheless, Microsoft lacks the user base and the mobile hardware to launch a real assault on the visual search market in the long run.
Visual search thrives on data; in this regard, both Google and Pinterest have stolen a march on Bing.
Bing visual search: The key facts
- Originally launched in 2009, but removed in 2012 due to lack of uptake
- Relaunched in July 2017, underpinned by AI to identify and analyze objects
- Advertisers can use Bing visual search to place shoppable images
- The technology is in its infancy, but the object recognition is quite accurate
- Desktop only for now, but mobile will follow soon
So, who has the best visual search engine?
For now, Pinterest. With billions of data points and some seasoned image search professionals driving the technology, it provides the smoothest and most accurate experience. It also does something unique by grasping the stylistic features of objects, rather than just their shape or color. As such, it alters the language at our disposal and extends the limits of what is possible in search marketing.
Bing has made massive strides in this arena of late, but it lacks the killer application that would make it stand out enough to draw searchers from Google. Bing visual search is accurate and functional, but does not create connections to related items in the way that Pinterest can.
The launch of Google Lens will surely shake up this market altogether, too. If Google can nail down automated object recognition (which it undoubtedly will), Google Lens could be the product that links traditional search to augmented reality. The resources and the product suite at Google’s disposal make it the likely winner in the long run.
Since the early 2010s, visual search has been offering users a novel alternative to keyword-based search results.
But with the sophistication of visual search tools increasing, and tech giants like Google and Microsoft investing heavily in the space, what commercial opportunities does it offer brands today?
Visual search 101
There are two types of visual search. The first compares metadata keywords for similarities (such as when searching an image database like Shutterstock).
The second is known as ‘content-based image retrieval’. This takes the colour, shape and texture of the image and compares it to a database, displaying entries according to similarity.
From a user perspective, this massively simplifies the process of finding products they like the look of. Instead of trying to find the words to describe the object, users can simply take a photo and see relevant results.
Visual search engines: A (very) brief history
The first product to really make use of this technology was ‘Google Goggles’. Released in 2010, it offered some fairly basic image-recognition capabilities. It could register unique objects like books, barcodes, art and landmarks, and provide additional information about them.
It also had the ability to understand and store text in an image – such as a photo of a business card. However, it couldn’t recognize general instances of objects, like trees, animals or items of clothing.
CamFind took the next step, offering an app where users could take photos of any object and see additional information alongside shopping results. My tests (featuring our beautiful office plant) yielded impressively accurate related images and web results.
More importantly for brands, it offers advertising based on the content of the image. However, despite the early offering, the app has yet to achieve widespread adoption.
A Pinterest-ing development
A newer player in the visual search arena, image-focused platform Pinterest has what CamFind doesn’t – engaged users. In fact, it reached 150m monthly users in 2016, 70m of which are in the US with a 60:40 split women to men.
So what do people use Pinterest for? Ben Silbermann, its CEO and co-founder, summed it up in a recent blog post:
“As a Pinner once said to me, “Pinterest is for yourself, not your selfies”—I love that. Pinterest is more of a personal tool than a social one. People don’t come to see what their friends are doing. (There are lots of other great places out there for that!) Instead, they come to Pinterest to find ideas to try, figure out which ones they love, and learn a little bit about themselves in the process.”
In other words, Pinterest is designed for discovery. Users are there to look for products and ideas, not to socialize. Which makes it inherently brand-friendly. In fact, 93% of Pinners said they use Pinterest to plan for purchases, and 87% said they’d bought something because of interest. Adverts are therefore less disruptive in this context than platforms like Facebook and Twitter, where users are focused on socializing, not searching.
Pinterest took their search functionality to the next level in February 2017 with an update offering users three new features:
Shop the Look allowed users to pick just one part of an image they were interested in to explore – like a hat or a pair of shoes.
Related Ideas gives users the ability to explore a tangent based on a single pin. For example, if I were interested in hideously garish jackets, I might click ‘more’ and see a collection of equally tasteless items.
Pinterest Lens was the heavyweight feature of this release. Linking to the functionality displayed in Shop the Look, it allowed users to take photos on their smartphone and see Pins that looked similar to the object displayed.
In practice, this meant a user might see a chair they were interested in purchasing, take a photo, and find similar styles – in exactly the same way as CamFind.
Pinterest Lens today
What does it mean for ecommerce brands?
Visual search engines have the potential to offer a butter-smooth customer journey – with just a few taps between snapping a picture of something and having it in a basket and checking out. Pinterest took a big step towards that in May this year, announcing they would be connecting their visual search functionality to Promoted Pins – allowing advertisers to get in front of users searching visually by surfacing adverts in the ‘Instant Ideas’ and the ‘More like this’ sections.
For retail brands with established Pinterest strategies like Target, Nordstrom, Walgreens and Lululemon, this is welcome news, as it presents a novel opportunity for brands to connect with users looking to purchase products.
Product images can be featured in visual search results
Nearly 2 million people Pin product-rich pins every day. The platform even offers the ability to include prices and other data on pins, which helps drive further engagement. Furthermore, it has the highest average order value of any major social platform at $50, and caters heavily to users on mobile (orders from mobile devices increased from 67% to 80% between 2013-2015).
But while Pinterest may have led the way in terms of visual search, it isn’t alone. Google and Bing have both jumped on the trend with Lens-equivalent products in the last year. Both Google Lens and Bing Visual Search (really, Microsoft? That’s the best you have?) function in an almost identical way to Pinterest Lens. Examples from Bing’s blog post on the product even show it being applied in the same contexts – picking out elements of a domestic scene and displaying shopping results.
One interesting question for ecommerce brands to answer will be how to optimize product images for these kinds of results.
Google Lens, announced at Google’s I/O conference in May to much furore, pitches itself as a tool to help users understand the world. By accessing Google’s vast knowledge base, the app can do things like identify objects, and connect to your WiFi automatically by snapping the code on the box.
Of course, this has a commercial application as well. One of the use cases highlighted by Google CEO Sundar Pichai was photographing a business storefront and having the Google Local result pop up, replete with reviews, menus and contact details.
The key feature here is the ability to connecting a picture taken with an action. It doesn’t take too much to imagine how brands might be able to use this functionality in interesting and engaging ways – for example, booking event tickets directly from an advert, as demonstrated at I/O:
Many marketers think we’re on the brink of a revolution when it comes to search. The growing popularity of voice search is arguably an indicator that consumers are moving away from keyword-based search and towards more intuitive methods.
It’s too soon to write off the medium entirely, of course – keywords are still by the far the easiest way to access most information. But visual search, along with voice, are certainly still useful additions to the roster of tools we might use to access information on the internet.
Ecommerce brands would be wise to keep close tabs on the progress of visual search tools; those that are prepared will have a significant competitive advance over those that aren’t.
This post was originally published on our sister site, ClickZ, and has been reproduced here for the enjoyment of our audience on Search Engine Watch.
Visual search is one of the most complex and fiercely competed sectors of our industry. Earlier this month, Bing announced their new visual search mode, hot on the heels of similar developments from Pinterest and Google.
Ours is a culture mediated by images, so it stands to reason that visual search has assumed such importance for the world’s largest technology companies. The pace of progress is certainly quickening; but there is no clear visual search ‘winner’ and nor will there be one soon.
The search industry has developed significantly over the past decade, through advances in personalization, natural language processing, and multimedia results. And yet, one could argue that the power of the image remains untapped.
This is not due to a lack of attention or investment. Quite the contrary, in fact. Cracking visual search will require a combination of technological nous, psychological insight, and neuroscientific know-how. This makes it a fascinating area of development, but also one that will not be mastered easily.
Therefore, in this article, we will begin with an outline of the visual search industry and the challenges it poses, before analyzing the recent progress made by Google, Microsoft and Pinterest.
What is visual search?
We all partake in visual search every day. Every time we need to locate our keys among a range of other items, for example, our brains are engaged in a visual search.
We learn to recognize certain targets and we can locate them within a busy landscape with increasing ease over time.
This is a trickier task for a computer, however.
Image search, in which a search engine takes a text-based query and tries to find the best visual match, is subtly distinct from modern visual search. Visual search can take an image as its ‘query’, rather than text. In order to perform an accurate visual search, search engines require much more sophisticated processes than they do for traditional image search.
Typically, as part of this process, deep neural networks are put through their paces in tests like the one below, with the hope that they will mimic the functioning of the human brain in identifying targets:
The decisions (or inherent ‘biases’, as they are known) that allow us to make sense of these patterns are more difficult to integrate into a machine. When processing an image, should a machine prioritize shape, color, or size? How does a person do this? Do we even know for sure, or do we only know the output?
As such, search engines still struggle to process images in the way we expect them to. We simply don’t understand our own biases well enough to be able to reproduce them in another system.
There has been a lot of progress in this field, nonetheless. Google image search has improved drastically in response to text queries and other options, like Tineye, also allow us to use reverse image search. This is a useful feature, but its limits are self-evident.
For years, Facebook has been able to identify individuals in photos, in the same way a person would immediately recognize a friend’s face. This example is a closer approximation of the holy grail for visual search; however, it still falls short. In this instance, Facebook has set up its networks to search for faces, giving them a clear target.
At its zenith, online visual search allows us to use an image as an input and receive another, related image as an output. This would mean that we could take a picture with a smartphone of a chair, for example, and have the technology return pictures of suitable rugs to accompany the style of the chair.
The typically ‘human’ process in the middle, where we would decipher the component parts of an image and decide what it is about, then conceptualize and categorize related items, is undertaken by deep neural networks. These networks are ‘unsupervised’, meaning that there is no human intervention as they alter their functioning based on feedback signals and work to deliver the desired output.
The result can be mesmerising, as in the below interpretations of an image of Georges Seurat’s ‘A Sunday Afternoon on the Island of La Grand Jatte’ by Google’s neural networks:
This is just one approach to answering a delicate question, however.
There are no right or wrong answers in this field as it stands; simply more or less effective ones in a given context.
We should therefore assess the progress of a few technology giants to observe the significant strides they have made thus far, but also the obstacles left to overcome before visual search is truly mastered.
Bing visual search
In early June at TechCrunch 50, Microsoft announced that it would now allow users to “search by picture.”
This is notable for a number of reasons. First of all, although Bing image search has been present for quite some time, Microsoft actually removed its original visual search product in 2012. People simply weren’t using it since its 2009 launch, as it wasn’t accurate enough.
Furthermore, it would be fair to say that Microsoft is running a little behind in this race. Rival search engines and social media platforms have provided visual search functions for some time now.
As a result, it seems reasonable to surmise that Microsoft must have something compelling if they have chosen to re-enter the fray with such a public announcement. While it is not quite revolutionary, the new Bing visual search is still a useful tool that builds significantly on their image search product.
A Bing search for “kitchen decor ideas” which showcases Bing’s new visual search capabilities
What sets Bing visual search apart is the ability to search within images and then expand this out to related objects that might complement the user’s selection.
A user can select specific objects, hone in on them, and purchase similar items if they desire. The opportunities for retailers are both obvious and plentiful.
It’s worth mentioning that Pinterest’s visual search has been able to do this for some time. But the important difference between Pinterest’s capability and Bing’s in this regard is that Pinterest can only redirect users to Pins that businesses have made available on Pinterest – and not all of them might be shoppable. Bing, on the other hand, can index a retailer’s website and use visual search to direct the user to it, with no extra effort required on the part of either party.
Powered by Silverlight technology, this should lead to a much more refined approach to searching through images. Microsoft provided the following visualisation of how their query processing system works for this product:
Microsoft combines this system with the structured data it owns to provide a much richer, more informative search experience. Although restricted to a few search categories, such as homeware, travel, and sports, we should expect to see this rolled out to more areas through this year.
The next step will be to automate parts of this process, so that the user no longer needs to draw a box to select objects. It is still some distance from delivering on the promise of perfect, visual search, but these updates should at least see Microsoft eke out a few more sellable searches via Bing.
Google recently announced its Lens product at the 2017 I/O conference in May. The aim of Lens is really to turn your smartphone into a visual search engine.
Take a picture of anything out there and Google will tell you what the object is about, along with any related entities. Point your smartphone at a restaurant, for example, and Google will tell you its name, whether your friends have visited it before, and highlight reviews for the restaurant too.
— Google (@Google) May 17, 2017
This is supplemented by Google’s envious inventory of data, both from its own knowledge graph and the consumer data it holds.
All of this data can fuel and refine Google’s deep neural networks, which are central to the effective functioning of its Lens product.
Google-owned company DeepMind is at the forefront of visual search innovation. As such, DeepMind is also particularly familiar with just how challenging this technology is to master.
The challenge is no longer necessarily in just creating neural networks that can understand an image as effectively as a human. The bigger challenge (known as the ‘black box problem’ in this field) is that the processes involved in arriving at conclusions are so complex, obscured, and multi-faceted that even Google’s engineers struggle to keep track.
This points to a rather poignant paradox at the heart of visual search and, more broadly, the use of deep neural networks. The aim is to mimic the functioning of the human brain; however, we still don’t really understand how the human brain works.
As a result, DeepMind have started to explore new methods. In a fascinating blog post they summarized the findings from a recent paper, within which they applied the inductive reasoning evident in human perception of images.
Drawing on the rich history of cognitive psychology (rich, at least, in comparison with the nascent field of neural networks), scientists were able to apply within their technology the same biases we apply as people when we classify items.
DeepMind use the following prompt to illuminate their thinking:
“A field linguist has gone to visit a culture whose language is entirely different from our own. The linguist is trying to learn some words from a helpful native speaker, when a rabbit scurries by. The native speaker declares “gavagai”, and the linguist is left to infer the meaning of this new word. The linguist is faced with an abundance of possible inferences, including that “gavagai” refers to rabbits, animals, white things, that specific rabbit, or “undetached parts of rabbits”. There is an infinity of possible inferences to be made. How are people able to choose the correct one?”
Experiments in cognitive psychology have shown that we have a ‘shape bias’; that is to say, we give prominence to the fact that this is a rabbit, rather than focusing on its color or its broader classification as an animal. We are aware of all of these factors, but we choose shape as the most important criterion.
“Gavagai” Credit: Misha Shiyanov/Shutterstock
DeepMind is one of the most essential components of Google’s development into an ‘AI-first’ company, so we can expect findings like the above to be incorporated into visual search in the near future. When they do, we shouldn’t rule out the launch of Google Glass 2.0 or something similar.
Pinterest aims to establish itself as the go-to search engine when you don’t have the words to describe what you are looking for.
The launch of its Lens product in March this year was a real statement of intent and Pinterest has made a number of senior hires from Google’s image search teams to fuel development.
In combination with its establishment of a paid search product and features like ‘Shop the Look’, there is a growing consensus that Pinterest could become a real marketing contender. Along with Amazon, it should benefit from advertisers’ thirst for more options beyond Google and Facebook.
Pinterest president Tim Kendall noted recently at TechCrunch Disrupt: “We’re starting to be able to segue into differentiation and build things that other people can’t. Or they could build it, but because of the nature of the products, this would make less sense.”
This drives at the heart of the matter. Pinterest users come to the site for something different, which allows Pinterest to build different products for them. While Google fights war on numerous fronts, Pinterest can focus on improving its visual search offering.
Admittedly, it remains a work in progress, but Pinterest Lens is the most advanced visual search tool available at the moment. Using a smartphone, a Pinner (as the site’s users are known) can take a picture within the app and have it processed with a high degree of accuracy by Pinterest’s technology.
The results are quite effective for items of clothing and homeware, although there is still a long way to go before we use Pinterest as our personal stylist. As a tantalising glimpse of the future, however, Pinterest Lens is a welcome and impressive development.
The next step is to monetize this, which is exactly what Pinterest plans to do. Visual search will become part of its paid advertising package, a fact that will no doubt appeal to retailers keen to move beyond keyword targeting and social media prospecting.
We may still be years from declaring a winner in the battle for visual search supremacy, but it is clear to see that the victor will claim significant spoils.
Google Lens and Pinterest Lens are merging AI and real-world visual search. Here's what you can do to prepare.
The post How to Optimize for Google Lens & Pinterest Lens by @lisabuyer appeared first on Search Engine Journal.
If there's one thing people will be talking about after Google's I/O conference this year it's Google Lens.
The post Google Lens: Search the Web With Your Smartphone’s Camera by @MattGSouthern appeared first on Search Engine Journal.