Freebase and Common Crawl

Freebase – [freebase.com]

What is Freebase?
Freebase is an open, Creative Commons licensed repository of structured data of almost 20 million entities.

An entity is a single person, place, or thing. Freebase connects entities together as a graph.

Ways to use Freebase:

* Use Freebase’s Ids to uniquely identify entities anywhere on the web
* Query Freebase’s data using MQL
* Build applications using our API or Acre, our hosted development platform

Freebase is also a community of thousands of data-lovers, working together to improve Freebase’s data. Learn how to contribute, join our mailing list, or find out more on our community page.

About Freebase’s data
Freebase has information about approximately 20 million Topics or Entities at the time of writing. Each one has a unique Id, which can help distinguish multiple entities which have similar names, such as Henry Ford the industrialist vs Henry Ford the footballer.

Most of our topics are associated with one or more types (such as people, places, books, films, etc) and may have additional properties like “date of birth” for a person or latitude and longitude for a location. These types and properties and related concepts are called Schema.

Anyone can contribute data to Freebase, and you can also build your own schema in a Base if Freebase does not yet have schema for a subject you’re interested in.

For more information see Freebase data.

A Free Database of the Entire Web May Spawn the Next Google – [technologyreview.com]

A nonprofit called Common Crawl is now using its own Web crawler and making a giant copy of the Web that it makes accessible to anyone. The organization offers up over five billion Web pages, available for free so that researchers and entrepreneurs can try things otherwise possible only for those with access to resources on the scale of Google’s.

Common Crawl – [commoncrawl.org]

Common Crawl Wiki – [commoncrawl.atlassian.net]

Comments are closed.