SPTAG: an open source machine learning algorithm from Microsoft

sptag

Today's search engines have evolved a lot since they are not only served by just returning the pages after receiving one or more keywords, but they also try to answer questions, suggest contexts, etc. Users even have the ability to search from other items, such as images.

Of course, yesFollowing users' search preferences is not new: It has been a difficult struggle since the inception of web search.

But now, it is increasingly difficult to meet these needs ever-changing, thanks to advances in artificial intelligence, including those developed by the Bing research team and researchers in the Microsoft Research Lab.

"Artificial intelligence makes the products we work with more and more natural", Rangan Majumder said, Group Program Manager for Microsoft's Bing Research and Artificial Intelligence team.

Using vectors for a better search

The Bing's machine learning algorithms are used to create vectors, (essentially a digital representation of a word, image pixel, or other data point). A vector helps capture what a piece of data really means, be it text on a web page, images, sound or videos.

Once the numerical point has been assigned to a data determined, can organize or map vectors, with close numbers placed close to each other to represent similarity. These proximal results are displayed to users, improving search results.

Microsoft began to focus on the technology on which it is based Bing vector research when the company's engineers began to notice unusual patterns in users' search patterns.

"By analyzing our magazines, the team found that the search queries were getting longer and longer," Majumder said.

This suggested that users were asking more questions, giving too much detail due to previous results that were not satisfactory with a keyword search, or "trying to act like computers" when describing abstract things.

With Bing Search, vectorization has expanded to more than 150 billion indexed data search engines to improve the comparison with traditional keywords.

These include simple words, characters, web page snippets, full queries, and others media. Once a user performs a search, Bing can analyze the indexed vectors and provide the best match.

Vector mapping is also shaped using deep learning technology for continuous improvement.

Models take into account inputs such as end-user clicks after a search to better understand the meaning of this search.

Space Partition Tree and Graph was released as open source

In fact, Microsoft uses an algorithm called Space Partition Tree and Graph (SPTAG). An input query is converted to a vector, and SPTAG is used to quickly find "nearest neighbors," that is, vectors that are similar to the input.

"Microsoft uses vector search for its own Bing search engine, a technology that helps Bing better understand the intent behind billions of web searches and find the most relevant result from billions of web pages."

Microsoft has made available to everyone, as an open source project on GitHub,

"One of the most advanced and best-suited artificial intelligence tools to meet the ever-changing search needs of users."

On Wednesday, the publisher also posted user technique samples and a companion video for these tools via Microsoft's artificial intelligence lab.

The Bing team has stated that they hope the open source offering can be used by major companies or applications to identify a spoken language based on an audio snippet, or for services that include many images, such as an application that allows users to match data and searches.

SPTAG source code


Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: AB Internet Networks 2008 SL
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.