Find the best tools to build your startup.

DiffBot

DiffBot is a product aimed at developers. It offers products to parse content and extract import features from it using machine learning algorithms.

DiffBot is extremely accurate and can extract features like features like comments, keywords, icons, title, author, images, primary content, publisher, tags, language, and more. These are all tools that are very useful to building a search index or developing an api.

DiffBot can also provide keywords for images, using computer vision, concatenate article content across multiple pages, and more.

Got a better description? Let us know!



Description from the DiffBot website:

People miss things; Diffbot doesn't. Our Automatic APIs retrieve every possible piece of data from a web page. Crawlbot automatically finds every important page on any site.

Wikipedia summary

Diffbot is a developer of machine learning and computer vision algorithms and public APIs for extracting data from web pages / web scraping. The company was founded in 2008 at Stanford University and was the first company funded by StartX (then Stanford Student Enterprises), Stanford's on-campus venture capital fund.
The company has gained interest from its application of computer vision technology to web pages, wherein it visually parses a web page for important elements and returns them in a structured format. In 2015 Diffbot announced it was working on its version of an automated "Knowledge Graph" by crawling the web and using its automatic web page extraction to build a large database of structured web data.
The company's products allow software developers to analyze web home pages and article pages, and extract the "important information" while ignoring elements deemed not core to the primary content.
In August 2012 the company released its Page Classifier API, which automatically categorizes web pages into specific "page types". As part of this, Diffbot analyzed 750,000 web pages shared on the social media service Twitter and revealed that photos, followed by articles and videos, are the predominant web media shared on the social network.
The company raised $2 million in funding in May 2012 from investors including Andy Bechtolsheim and Sky Dayton.
Diffbot's customers include Adobe, AOL, Cisco, DuckDuckGo, eBay, Instapaper, Microsoft, Onswipe and Springpad.


Show More

View on Wikipedia

License


Tagged: machine learning, api, data extraction, tagging, keywording, keywords,

Use DiffBot to...

find keywords in a document


Related products...

AlchemyAPI

Embedly

Ad