Google Lens, a tool that utilizes smartphone cameras to recognize objects, is now available to iOS users.
The post Google’s Object Recognition Tool, Google Lens, Now Available on iOS by @MattGSouthern appeared first on Search Engine Journal.
Google Lens, a tool that utilizes smartphone cameras to recognize objects, is now available to iOS users.
The post Google’s Object Recognition Tool, Google Lens, Now Available on iOS by @MattGSouthern appeared first on Search Engine Journal.
The human brain has evolved to instantly recognize images.
Visual identification is a natural ability made possible through a wonder of nerves, neurons, and synapses. We can look at a picture, and in 13 milliseconds or less, know exactly what we’re seeing.
But creating technology that can understand images as quickly and effectively as the human mind is a huge undertaking.
Visual search therefore requires machine learning tools that can quickly process images, but these tools must also be able to identify specific objects within the image, then generate visually similar results.
Yet thanks to the vast resources at the disposal of companies like Google, visual search is finally becoming viable. How, then, will SEO evolve as visual search develops?
Here’s a more interesting question: how soon until SEO companies have to master visual search optimization?
Visual search isn’t likely to replace text-based search engines altogether. For now, visual search is most useful in the world of sales and retail. However, the future of visual search could still disrupt the SEO industry as we know it.
If you have more than partial vision, you’re able to look across a room and identify objects as you see them. For instance, at your desk you can identify your monitor, your keyboard, your pens, and the sandwich you forgot to put in the fridge.
Your mind is able to identify these objects based on visual cues alone. Visual search does the same thing, but with a given image on a computer. However, it’s important to note that visual search is not the same as image search.
Image search is when a user inputs a word into a search engine and the search engine spits out related images. Even then, the search engine isn’t recognizing images, just the structured data associated with the image files.
Visual search uses an image as a query instead of text (reverse image search is a form of visual search). It identifies objects within the image and then searches for images related to those objects. For instance, based on an image of a desk, you’d be able to use visual search to shop for a desk identical or similar to the one in the image.
While this sounds incredible, the technology surrounding visual search is still limited at best. This is because machine learning must recreate the mind’s image processing before it can effectively produce a viable visual search application. It isn’t enough for the machine to identify an image. It must also be able to recognize a variety of colors, shapes, sizes, and patterns the way the human mind does.
The technology surrounding visual search is still limited at best
However, it’s difficult to recreate image processing in a machine when we barely understand our own image processing system. It’s for this reason that visual search programming is progressing so slowly.
Today’s engineers have been using machine learning technology to jumpstart the neural networks of visual search engines for improved image processing. One of the most recent examples of these developments is Google Lens.
Google Lens is an app that allows your smartphone to work as a visual search engine. Announced at Google’s 2017 I/O conference, the app works by analyzing the pictures that you take and giving you information about that image.
For instance, by taking a photo of an Abbey Road album your phone can tell you more about the Beatles and when the album came out. By taking a photo of an ice cream shop your phone can tell you its name, deliver reviews, and tell you if your friends have been there.
All of this information stems from Google’s vast stores of data, algorithms, and knowledge graphs, which are then incorporated into the the neural networks of the Lens product. However, the complexity of visual search involves more than just an understanding of the neural networks.
The mind’s image processing touches on more than just identification. It also draws conclusions that are incredibly complex. And it’s this complexity, known as the “black box problem”, that engineers struggle to recreate in visual search engines.
Rather than waiting explicitly on scientists to understand the human mind, DeepMind — a Google-owned company — has been taking steps toward programming the visual search engine based on cognitive psychology rather than relying solely on neural networks.
However, Google isn’t the only company with developing visual search technology. Pinterest launched its own Lens product in March 2017 to provide features such as Shop the Look and Pincodes. Those using Pinterest can take a photo of a person or place through the app and then have the photo analyzed for clothing or homeware options for shopping.
What makes Pinterest Lens and Google Lens different is that Pinterest offers more versatile options for users. Google is a search engine for users to gather information. Pinterest is a website and app for shopping, recipes, design ideas, and recreational searching.
Unlike Google, which has to operate on multiple fronts, Pinterest is able to focus solely on the development of its visual search engine. As a result, Pinterest could very well become the leading contender in visual search technology.
Nevertheless, other retailers are beginning to catch on and pick up the pace with their own technology. The fashion retailer ASOS also released a visual search tool on its website in August 2017.
The use of visual search in retail helps reduce what’s been called the Discovery Problem. The Discovery Problem is when shoppers have so many options to choose from on a retailer’s website that they simply stop shopping. Visual search reduces the number of choices and helps shoppers find what they want more effectively.
It’s safe to assume that the future of visual search engines will be retail-dominated. For now, it’s easier to search for information with words.
Users don’t need to take a photo of an Abbey Road album to learn more about the Beatles when they can use just as many keystrokes to type ‘Abbey Road’ into a search engine. However, users do need to take a photo of a specific pair of sneakers to convey to a search engine exactly what they’re looking to buy.
Searching for a pair of red shoes using Pinterest Lens
As a result, visual search engines are convenient, but they’re not ultimately necessary for every industry to succeed. Services, for instance, may be more likely to rely on textual search engines, whereas sales may be more likely to rely on visual search engines.
That being said, with 69% of young consumers showing an interest in making purchases based on visual-oriented searches alone, the future of visual search engines is most likely to be a shopper’s paradise in the right retailer’s hands.
Search engines are already capable of indexing images and videos and ranking them accordingly. Video SEO and image SEO have been around for years, ever since video and image content became popular with websites like YouTube and Facebook.
Yet despite this surge in video and image content, SEO still meets the needs of those looking to rank higher on search engines. Factors such as creating SEO-friendly alt text, image sitemaps, SEO-friendly image titles, and original image content can put your website’s images a step above the competition.
However, the see-snap-buy behavior of visual search can make image SEO more of a challenge. This is because the user no longer has to type, but can instead take a photo of a product and then search for the product on a retailer’s website.
Currently, SEO has been functioning alongside visual search via alt-tagging, image optimization, schema markup, and metadata. Schema markup and metadata are especially important for SEO in visual search. This is because, with such minimal text used in the future of visual search, this data may be one of the only sources of textual information for search engines to crawl.
Meticulously cataloging images with microdata may be tedious, but the enhanced description that microdata provides when paired with an optimized image should help that image rank higher in visual search.
Metadata is just as important. In both text-based searches and visual-based searches, metadata strengthens the marketer’s ability to drive online traffic to their website and products. Metadata hides in the HTML of both web pages and images, but it’s what search engines use to find relevant information.
Marking up your images with relevant metadata is essential for image SEO
For this reason, to optimize for image search, it’s essential to use metadata for your website’s images and not just the website itself.
Both microdata and metadata will continue to play an important role in the SEO industry even as visual search engines develop and revolutionize the online experience. However, additional existing SEO techniques will need to advance and improve to adapt to the future of visual search.
To assume visual search engines are unlikely to change the future of the SEO industry is to be short-sighted. Yet it’s just as unlikely that text-based search will be made obsolete and replaced by a world of visual-based technology.
However, just because text-based search engines won’t be going anywhere doesn’t mean they won’t be made to share the spotlight. As visual search engines develop and improve, they’ll likely become just as popular and used as text-based engines. It’s for this reason that existing SEO techniques will need to be fine-tuned for the industry to remain up-to-date and relevant.
But how can SEO stay relevant as see-snap-buy behavior becomes not just something used on retail websites, but in most places online? As mentioned before, SEO companies can still utilize image-based SEO techniques to keep up with visual search engines.
Like text-based search engines, visual search relies on algorithms to match content for online users. The SEO industry can use this to its advantage and focus on structured data and optimization to make images easier to process for visual applications.
Additional techniques can help impove image indexing by visual search engines. Some of these techniques include:
Visual search engines are bound to revolutionize the retail industry and the way we use technology. However, text-based search engines will continue to have an established place in industries that are better suited to them.
The future of SEO is undoubtedly set for rapid change. The only question is which existing strategies will be reinforced in the visual search revolution and which will be outdated.
Visual search engines will be at the center of the next phase of evolution for the search industry, with Pinterest, Google, and Bing all announcing major developments recently.
How do they stack up today, and who looks best placed to offer the best visual search experience?
Historically, the input-output relationship in search has been dominated by text. Even as the outputs have become more varied (video and image results, for example), the inputs have been text-based. This has restricted and shaped the potential of search engines, as they try to extract more contextual meaning from a relatively static data set of keywords.
Visual search engines are redefining the limits of our language, opening up a new avenue of communication between people and computers. If we view language as a fluid system of signs and symbols, rather than fixed set of spoken or written words, we arrive at a much more compelling and profound picture of the future of search.
Our culture is visual, a fact that visual search engines are all too eager to capitalize on.
Already, specific ecommerce visual search technologies abound: Amazon, Walmart, and ASOS are all in on the act. These companies’ apps turn a user’s smartphone camera into a visual discovery tool, searching for similar items based on whatever is in frame. This is just one use case, however, and the potential for visual search is much greater than just direct ecommerce transactions.
After a lot of trial and error, this technology is coming of age. We are on the cusp of accurate, real-time visual search, which will open a raft of new opportunities for marketers.
Below, we review the progress made by three key players in visual search: Pinterest, Google, and Bing.
Pinterest’s visual search technology is aimed at carving out a position as the go-to place for discovery searches. Their stated aim echoes the opening quote from this article: “To help you find things when you don’t have the words to describe them.”
Rather than tackle Google directly, Pinterest has decided to offer up something subtly different to users – and advertisers. People go to Pinterest to discover new ideas, to create mood boards, to be inspired. Pinterest therefore urges its 200 million users to “search outside the box”, in what could be deciphered as a gentle jibe at Google’s ever-present search bar.
All of this is driven by Pinterest Lens, a sophisticated visual search tool that uses a smartphone camera to scan the physical world, identify objects, and return related results. It is available via the smartphone app, but Pinterest’s visual search functionality can be used on desktop through the Google Chrome extension too.
Pinterest’s vast data set of over 100 billion Pins provides the perfect training material for machine learning applications. As a result, new connections are forged between the physical and digital worlds, using graphics processing units (GPUs) to accelerate the process.
In practice, Pinterest Lens works very well and is getting noticeably better with time. The image detection is impressively accurate and the suggestions for related Pins are relevant.
Below, the same object has been selected for a search using Pinterest and also Samsung visual search:
The differences in the results are telling.
On the left, Pinterest recognizes the object’s shape, its material, its purpose, but also the defining features of the design. This allows for results that go deeper than a direct search for another black mug. Pinterest knows that the less tangible, stylistic details are what really interest its users. As such, we see results for mugs in different colors, but that are of a similar style.
On the right, Samsung’s Bixby assistant recognizes the object, its color, and its purpose. Samsung’s results are powered by Amazon, and they are a lot less inspiring than the options served up by Pinterest. The image is turned into a keyword search for [black coffee mugs], which renders the visual search element a little redundant.
Visual search engines work best when they express something for us that we would struggle to say in words. Pinterest understands and delivers on this promise better than most.
Google made early waves in visual search with the launch of Google Goggles. This Android app was launched in 2010 and allowed users to search using their smartphone camera. It works well on famous landmarks, for example, but it has not been updated significantly in quite some time.
It seemed unlikely that Google would remain silent on visual search for long, and this year’s I/O development revealed what the search giant has been working on in the background.
Google Lens, which will be available via the Photos app and Google Assistant, will be a significant overhaul of the earlier Google Goggles initiative.
Any nomenclative similarities to Pinterest’s product may be more than coincidental. Google has stealthily upgraded its image and visual search engines of late, ushering in results that resemble Pinterest’s format:
Google’s ‘similar items’ product was another move to cash in on the discovery phase of search, showcasing related results that might further pique a consumer’s curiosity.
Google Lens will provide the object detection technology to link all of this together in a powerful visual search engine. In its BETA format, Lens offers the following categories for visual searches:
Some developers have been given the chance to try an early version of Lens, with many reporting mixed results:
Looks like Google doesn’t recognize its own Home smart hub… (Source: XDA Developers)
These are very early days for Google Lens, so we can expect this technology to improve significantly as it learns from its mistakes and successes.
When it does, Google is uniquely placed to make visual search a powerful tool for users and advertisers alike. The opportunities for online retailers via paid search are self-evident, but there is also huge potential for brick-and-mortar retailers to capitalize on hyper-local searches.
For all its impressive advances, Pinterest does not possess the ecosystem to permeate all aspects of a user’s life in the way Google can. With a new Pixel smartphone in the works, Google can use visual search alongside voice search to unite its software and hardware. For advertisers using DoubleClick to manage their search and display ads, that presents a very appealing prospect.
We should also anticipate that Google will take this visual search technology further in the near future.
Google is set to open its ARCore product up to all developers, which will bring with it endless possibilities for augmented reality. ARCore is a direct rival to Apple’s ARKit and it could provide the key to unlock the full potential of visual search. We should also not rule out another move into the wearables market, potentially through a new version of Google Glass.
Microsoft had been very quiet on this front since sunsetting its Bing visual search product in 2012. It never really took off and perhaps the appetite wasn’t quite there yet among a mass public for a visual search engine.
Recently, Bing made an interesting re-entry to the fray with the announcement of a completely revamped visual search engine:
This change of tack has been directed by advances in artificial intelligence that can automatically scan images and isolate items.
The early versions of this search functionality required input from users to draw boxes around certain areas of an image for further inspection. Bing announced recently that this will no longer be needed, as the technology has developed to automate this process.
The layout of visual search results on Bing is eerily similar to Pinterest. If imitation is the sincerest form of flattery, Pinterest should be overwhelmed with flattery by now.
The visual search technology can hone in on objects within most images, and then suggests further items that may be of interest to the user. This is only available on Desktop for the moment, but Mobile support will be added soon.
The results are patchy in places, but when an object is detected relevant suggestions are made. In the example below, a search made using an image of a suit leads to topical, shoppable links:
It does not, however, take into account the shirt or tie – the only searchable aspect is the suit.
Things get patchier still for searches made using crowded images. A search for living room decor ideas made using an image will bring up some relevant results, but will not always hone in on specific items.
As with all machine learning technologies, this product will continue to improve and for now, Bing is a step ahead of Google in this aspect. Nonetheless, Microsoft lacks the user base and the mobile hardware to launch a real assault on the visual search market in the long run.
Visual search thrives on data; in this regard, both Google and Pinterest have stolen a march on Bing.
For now, Pinterest. With billions of data points and some seasoned image search professionals driving the technology, it provides the smoothest and most accurate experience. It also does something unique by grasping the stylistic features of objects, rather than just their shape or color. As such, it alters the language at our disposal and extends the limits of what is possible in search marketing.
Bing has made massive strides in this arena of late, but it lacks the killer application that would make it stand out enough to draw searchers from Google. Bing visual search is accurate and functional, but does not create connections to related items in the way that Pinterest can.
The launch of Google Lens will surely shake up this market altogether, too. If Google can nail down automated object recognition (which it undoubtedly will), Google Lens could be the product that links traditional search to augmented reality. The resources and the product suite at Google’s disposal make it the likely winner in the long run.
Since the early 2010s, visual search has been offering users a novel alternative to keyword-based search results.
But with the sophistication of visual search tools increasing, and tech giants like Google and Microsoft investing heavily in the space, what commercial opportunities does it offer brands today?
There are two types of visual search. The first compares metadata keywords for similarities (such as when searching an image database like Shutterstock).
The second is known as ‘content-based image retrieval’. This takes the colour, shape and texture of the image and compares it to a database, displaying entries according to similarity.
From a user perspective, this massively simplifies the process of finding products they like the look of. Instead of trying to find the words to describe the object, users can simply take a photo and see relevant results.
The first product to really make use of this technology was ‘Google Goggles’. Released in 2010, it offered some fairly basic image-recognition capabilities. It could register unique objects like books, barcodes, art and landmarks, and provide additional information about them.
It also had the ability to understand and store text in an image – such as a photo of a business card. However, it couldn’t recognize general instances of objects, like trees, animals or items of clothing.
CamFind took the next step, offering an app where users could take photos of any object and see additional information alongside shopping results. My tests (featuring our beautiful office plant) yielded impressively accurate related images and web results.
More importantly for brands, it offers advertising based on the content of the image. However, despite the early offering, the app has yet to achieve widespread adoption.
A newer player in the visual search arena, image-focused platform Pinterest has what CamFind doesn’t – engaged users. In fact, it reached 150m monthly users in 2016, 70m of which are in the US with a 60:40 split women to men.
So what do people use Pinterest for? Ben Silbermann, its CEO and co-founder, summed it up in a recent blog post:
“As a Pinner once said to me, “Pinterest is for yourself, not your selfies”—I love that. Pinterest is more of a personal tool than a social one. People don’t come to see what their friends are doing. (There are lots of other great places out there for that!) Instead, they come to Pinterest to find ideas to try, figure out which ones they love, and learn a little bit about themselves in the process.”
In other words, Pinterest is designed for discovery. Users are there to look for products and ideas, not to socialize. Which makes it inherently brand-friendly. In fact, 93% of Pinners said they use Pinterest to plan for purchases, and 87% said they’d bought something because of interest. Adverts are therefore less disruptive in this context than platforms like Facebook and Twitter, where users are focused on socializing, not searching.
Pinterest took their search functionality to the next level in February 2017 with an update offering users three new features:
Shop the Look allowed users to pick just one part of an image they were interested in to explore – like a hat or a pair of shoes.
Related Ideas gives users the ability to explore a tangent based on a single pin. For example, if I were interested in hideously garish jackets, I might click ‘more’ and see a collection of equally tasteless items.
Pinterest Lens was the heavyweight feature of this release. Linking to the functionality displayed in Shop the Look, it allowed users to take photos on their smartphone and see Pins that looked similar to the object displayed.
In practice, this meant a user might see a chair they were interested in purchasing, take a photo, and find similar styles – in exactly the same way as CamFind.
Pinterest Lens today
Visual search engines have the potential to offer a butter-smooth customer journey – with just a few taps between snapping a picture of something and having it in a basket and checking out. Pinterest took a big step towards that in May this year, announcing they would be connecting their visual search functionality to Promoted Pins – allowing advertisers to get in front of users searching visually by surfacing adverts in the ‘Instant Ideas’ and the ‘More like this’ sections.
For retail brands with established Pinterest strategies like Target, Nordstrom, Walgreens and Lululemon, this is welcome news, as it presents a novel opportunity for brands to connect with users looking to purchase products.
Product images can be featured in visual search results
Nearly 2 million people Pin product-rich pins every day. The platform even offers the ability to include prices and other data on pins, which helps drive further engagement. Furthermore, it has the highest average order value of any major social platform at $50, and caters heavily to users on mobile (orders from mobile devices increased from 67% to 80% between 2013-2015).
But while Pinterest may have led the way in terms of visual search, it isn’t alone. Google and Bing have both jumped on the trend with Lens-equivalent products in the last year. Both Google Lens and Bing Visual Search (really, Microsoft? That’s the best you have?) function in an almost identical way to Pinterest Lens. Examples from Bing’s blog post on the product even show it being applied in the same contexts – picking out elements of a domestic scene and displaying shopping results.
One interesting question for ecommerce brands to answer will be how to optimize product images for these kinds of results.
Google Lens, announced at Google’s I/O conference in May to much furore, pitches itself as a tool to help users understand the world. By accessing Google’s vast knowledge base, the app can do things like identify objects, and connect to your WiFi automatically by snapping the code on the box.
Of course, this has a commercial application as well. One of the use cases highlighted by Google CEO Sundar Pichai was photographing a business storefront and having the Google Local result pop up, replete with reviews, menus and contact details.
The key feature here is the ability to connecting a picture taken with an action. It doesn’t take too much to imagine how brands might be able to use this functionality in interesting and engaging ways – for example, booking event tickets directly from an advert, as demonstrated at I/O:
Many marketers think we’re on the brink of a revolution when it comes to search. The growing popularity of voice search is arguably an indicator that consumers are moving away from keyword-based search and towards more intuitive methods.
It’s too soon to write off the medium entirely, of course – keywords are still by the far the easiest way to access most information. But visual search, along with voice, are certainly still useful additions to the roster of tools we might use to access information on the internet.
Ecommerce brands would be wise to keep close tabs on the progress of visual search tools; those that are prepared will have a significant competitive advance over those that aren’t.
This post was originally published on our sister site, ClickZ, and has been reproduced here for the enjoyment of our audience on Search Engine Watch.
Visual search is one of the most complex and fiercely competed sectors of our industry. Earlier this month, Bing announced their new visual search mode, hot on the heels of similar developments from Pinterest and Google.
Ours is a culture mediated by images, so it stands to reason that visual search has assumed such importance for the world’s largest technology companies. The pace of progress is certainly quickening; but there is no clear visual search ‘winner’ and nor will there be one soon.
The search industry has developed significantly over the past decade, through advances in personalization, natural language processing, and multimedia results. And yet, one could argue that the power of the image remains untapped.
This is not due to a lack of attention or investment. Quite the contrary, in fact. Cracking visual search will require a combination of technological nous, psychological insight, and neuroscientific know-how. This makes it a fascinating area of development, but also one that will not be mastered easily.
Therefore, in this article, we will begin with an outline of the visual search industry and the challenges it poses, before analyzing the recent progress made by Google, Microsoft and Pinterest.
We all partake in visual search every day. Every time we need to locate our keys among a range of other items, for example, our brains are engaged in a visual search.
We learn to recognize certain targets and we can locate them within a busy landscape with increasing ease over time.
This is a trickier task for a computer, however.
Image search, in which a search engine takes a text-based query and tries to find the best visual match, is subtly distinct from modern visual search. Visual search can take an image as its ‘query’, rather than text. In order to perform an accurate visual search, search engines require much more sophisticated processes than they do for traditional image search.
Typically, as part of this process, deep neural networks are put through their paces in tests like the one below, with the hope that they will mimic the functioning of the human brain in identifying targets:
The decisions (or inherent ‘biases’, as they are known) that allow us to make sense of these patterns are more difficult to integrate into a machine. When processing an image, should a machine prioritize shape, color, or size? How does a person do this? Do we even know for sure, or do we only know the output?
As such, search engines still struggle to process images in the way we expect them to. We simply don’t understand our own biases well enough to be able to reproduce them in another system.
There has been a lot of progress in this field, nonetheless. Google image search has improved drastically in response to text queries and other options, like Tineye, also allow us to use reverse image search. This is a useful feature, but its limits are self-evident.
For years, Facebook has been able to identify individuals in photos, in the same way a person would immediately recognize a friend’s face. This example is a closer approximation of the holy grail for visual search; however, it still falls short. In this instance, Facebook has set up its networks to search for faces, giving them a clear target.
At its zenith, online visual search allows us to use an image as an input and receive another, related image as an output. This would mean that we could take a picture with a smartphone of a chair, for example, and have the technology return pictures of suitable rugs to accompany the style of the chair.
The typically ‘human’ process in the middle, where we would decipher the component parts of an image and decide what it is about, then conceptualize and categorize related items, is undertaken by deep neural networks. These networks are ‘unsupervised’, meaning that there is no human intervention as they alter their functioning based on feedback signals and work to deliver the desired output.
The result can be mesmerising, as in the below interpretations of an image of Georges Seurat’s ‘A Sunday Afternoon on the Island of La Grand Jatte’ by Google’s neural networks:
This is just one approach to answering a delicate question, however.
There are no right or wrong answers in this field as it stands; simply more or less effective ones in a given context.
We should therefore assess the progress of a few technology giants to observe the significant strides they have made thus far, but also the obstacles left to overcome before visual search is truly mastered.
In early June at TechCrunch 50, Microsoft announced that it would now allow users to “search by picture.”
This is notable for a number of reasons. First of all, although Bing image search has been present for quite some time, Microsoft actually removed its original visual search product in 2012. People simply weren’t using it since its 2009 launch, as it wasn’t accurate enough.
Furthermore, it would be fair to say that Microsoft is running a little behind in this race. Rival search engines and social media platforms have provided visual search functions for some time now.
As a result, it seems reasonable to surmise that Microsoft must have something compelling if they have chosen to re-enter the fray with such a public announcement. While it is not quite revolutionary, the new Bing visual search is still a useful tool that builds significantly on their image search product.
A Bing search for “kitchen decor ideas” which showcases Bing’s new visual search capabilities
What sets Bing visual search apart is the ability to search within images and then expand this out to related objects that might complement the user’s selection.
A user can select specific objects, hone in on them, and purchase similar items if they desire. The opportunities for retailers are both obvious and plentiful.
It’s worth mentioning that Pinterest’s visual search has been able to do this for some time. But the important difference between Pinterest’s capability and Bing’s in this regard is that Pinterest can only redirect users to Pins that businesses have made available on Pinterest – and not all of them might be shoppable. Bing, on the other hand, can index a retailer’s website and use visual search to direct the user to it, with no extra effort required on the part of either party.
Powered by Silverlight technology, this should lead to a much more refined approach to searching through images. Microsoft provided the following visualisation of how their query processing system works for this product:
Microsoft combines this system with the structured data it owns to provide a much richer, more informative search experience. Although restricted to a few search categories, such as homeware, travel, and sports, we should expect to see this rolled out to more areas through this year.
The next step will be to automate parts of this process, so that the user no longer needs to draw a box to select objects. It is still some distance from delivering on the promise of perfect, visual search, but these updates should at least see Microsoft eke out a few more sellable searches via Bing.
Google recently announced its Lens product at the 2017 I/O conference in May. The aim of Lens is really to turn your smartphone into a visual search engine.
Take a picture of anything out there and Google will tell you what the object is about, along with any related entities. Point your smartphone at a restaurant, for example, and Google will tell you its name, whether your friends have visited it before, and highlight reviews for the restaurant too.
— Google (@Google) May 17, 2017
This is supplemented by Google’s envious inventory of data, both from its own knowledge graph and the consumer data it holds.
All of this data can fuel and refine Google’s deep neural networks, which are central to the effective functioning of its Lens product.
Google-owned company DeepMind is at the forefront of visual search innovation. As such, DeepMind is also particularly familiar with just how challenging this technology is to master.
The challenge is no longer necessarily in just creating neural networks that can understand an image as effectively as a human. The bigger challenge (known as the ‘black box problem’ in this field) is that the processes involved in arriving at conclusions are so complex, obscured, and multi-faceted that even Google’s engineers struggle to keep track.
This points to a rather poignant paradox at the heart of visual search and, more broadly, the use of deep neural networks. The aim is to mimic the functioning of the human brain; however, we still don’t really understand how the human brain works.
As a result, DeepMind have started to explore new methods. In a fascinating blog post they summarized the findings from a recent paper, within which they applied the inductive reasoning evident in human perception of images.
Drawing on the rich history of cognitive psychology (rich, at least, in comparison with the nascent field of neural networks), scientists were able to apply within their technology the same biases we apply as people when we classify items.
DeepMind use the following prompt to illuminate their thinking:
“A field linguist has gone to visit a culture whose language is entirely different from our own. The linguist is trying to learn some words from a helpful native speaker, when a rabbit scurries by. The native speaker declares “gavagai”, and the linguist is left to infer the meaning of this new word. The linguist is faced with an abundance of possible inferences, including that “gavagai” refers to rabbits, animals, white things, that specific rabbit, or “undetached parts of rabbits”. There is an infinity of possible inferences to be made. How are people able to choose the correct one?”
Experiments in cognitive psychology have shown that we have a ‘shape bias’; that is to say, we give prominence to the fact that this is a rabbit, rather than focusing on its color or its broader classification as an animal. We are aware of all of these factors, but we choose shape as the most important criterion.
“Gavagai” Credit: Misha Shiyanov/Shutterstock
DeepMind is one of the most essential components of Google’s development into an ‘AI-first’ company, so we can expect findings like the above to be incorporated into visual search in the near future. When they do, we shouldn’t rule out the launch of Google Glass 2.0 or something similar.
Pinterest aims to establish itself as the go-to search engine when you don’t have the words to describe what you are looking for.
The launch of its Lens product in March this year was a real statement of intent and Pinterest has made a number of senior hires from Google’s image search teams to fuel development.
In combination with its establishment of a paid search product and features like ‘Shop the Look’, there is a growing consensus that Pinterest could become a real marketing contender. Along with Amazon, it should benefit from advertisers’ thirst for more options beyond Google and Facebook.
Pinterest president Tim Kendall noted recently at TechCrunch Disrupt: “We’re starting to be able to segue into differentiation and build things that other people can’t. Or they could build it, but because of the nature of the products, this would make less sense.”
This drives at the heart of the matter. Pinterest users come to the site for something different, which allows Pinterest to build different products for them. While Google fights war on numerous fronts, Pinterest can focus on improving its visual search offering.
Admittedly, it remains a work in progress, but Pinterest Lens is the most advanced visual search tool available at the moment. Using a smartphone, a Pinner (as the site’s users are known) can take a picture within the app and have it processed with a high degree of accuracy by Pinterest’s technology.
The results are quite effective for items of clothing and homeware, although there is still a long way to go before we use Pinterest as our personal stylist. As a tantalising glimpse of the future, however, Pinterest Lens is a welcome and impressive development.
The next step is to monetize this, which is exactly what Pinterest plans to do. Visual search will become part of its paid advertising package, a fact that will no doubt appeal to retailers keen to move beyond keyword targeting and social media prospecting.
We may still be years from declaring a winner in the battle for visual search supremacy, but it is clear to see that the victor will claim significant spoils.
Google Lens and Pinterest Lens are merging AI and real-world visual search. Here's what you can do to prepare.
The post How to Optimize for Google Lens & Pinterest Lens by @lisabuyer appeared first on Search Engine Journal.