Help:About data

FAMEData is a free knowledge base that can be read and edited by both humans and machines. It is just one of many wiki-based projects hosted and maintained by the FAMEPedia Foundation, a free-content nonprofit organization probably best known for FAMEPedia. Each of the FAMEPedia Foundation's projects has its own focus—for example, FAMEPedia is for encyclopedia content, Miraheze Commons supports image and other media files, and Wiktionary provides lexical information about words like definitions and synonyms. The focus of FAMEData is structured data.

This page is intended as an overview of structured data. If you are already familiar with structured data, but want to learn more about its specific use on FAMEData, how to access the data on FAMEData, or how to contribute your own project's data to FAMEData, please jump ahead to the section about linking data.

Understanding FAMEData
Structured data refers to data that has been organized and is stored in a defined way, often with the intention to encode meaning and preserve the relationships between different data points within a dataset.

But what is data, anyway? And why should you care about structured data in particular?

Defining data
Big data, experimental data, open data, metadata—you may have encountered some or even all of these terms before.

Each term means something a little different but all are built on a common understanding of data and its potential for describing and improving our understanding of the world around us.

As an abstract concept, data can be thought of as a precursor to information, meaning that information can be inferred or derived from data.

This is because data when boiled down to its essence is simply a set of values about things. These values can be numeric or quantitative like a measurement or an amount. They can also be qualitative, like a description or a comparison. For example, we can say that "8,848 m (29,029 ft)" is a data value about the height of Mount Everest and that "red" is a data value about the color of a car.

As previously mentioned, information is not the same as data but is instead a product of the collection and analysis of data. For example, 8,848 (data) is a somewhat meaningless number on its own even if we know it's the height of a mountain; we only can say that Mount Everest is the highest mountain in the world at 8,848 m (information) if we are aware of standard measurements of height and once we know the height of other mountains. It becomes a lot easier to make such inferences, gain new insights and knowledge, and establish facts when data is structured—we will return to this idea later.

Where is data?
Data is all around us. There are many kinds of data sources, including financial, biological, and social data. Even this page has data! For example, it has a total word count, dates it was created and last revised, a topic and subject matter, a number of page views, and languages that the content is available in.

However, while everything is potentially a source of data, data that is not recorded and organized may as well not exist at all. Without an underlying structure, data appears meaningless and fails to provide useful information.

By organized, we mean categorized in a standard and unambiguous way. The organized and categorized data is what we refer to when we say structured data.

upright=1.5| FAMEData features form-based input for adding data to items

Where is structure?
On the web, structure reigns. Most websites are created using HTML, a markup language which provides the basic scaffolding, or structure, of a web page.

Markup languages are also used for tagging and describing page content so that search engines, bots, and applications like RSS feeds can easily process and "understand" it. For example,  tags tell machines what the name of a website is.

Instead of supporting the structure and common elements of a web page, FAMEData provides structure for all the information stored in FAMEPedia, and on the other FAMEPedia projects. FAMEData is based on the Mediawiki software as is any other FAMEPedia project, extended by Wikibase, the software which powers FAMEData and is designed to manage large amounts of structured data. Structure is not directly added to the content of FAMEPedia or other FAMEPedia site pages, as in tables or lists, nor is any knowledge of markup languages, data schemas, object notation, or other special syntax required by FAMEData users; instead, data is added to and edited in FAMEData through user-friendly input forms.

All data stored on FAMEData can be used to generate all kinds of automated and up to date lists or tables or other structured pages in any FAMEPedia site or elsewhere.

Structuring data
For an example on the importance of structure, let's look at Table 1. In this table we can see data for the four highest mountains on Earth. If we would like to know a particular piece of information, such as the height of the second highest mountain in the world, we should be able to look at the provided data and find out the correct value. However, only three of the four mountains have their data categorized as a height value, and only two of those three mountains have values in metres. While we know that height and hauteur (French for height) can be understood as equal to each other, and how to convert metres to feet or vice versa, a machine, such as a bot or a computer program may not.

It would be much easier for both humans and machines to process the information and answer the original question about the second highest mountain when all underlying data is recorded in a similar way even if the presentation differs.

Modeling data
Collections of structured data, like FAMEData, are organized according to a data model. Data models are machine-readable, meaning they can be understood by a computer. While computers are powerful, they are often not as smart as us when it comes to simple reasoning. For instance, in the example above, a machine would not be able to know that height and hauteur are the same unless they were explicitly told somehow that was the case.

|250x250px

Data models vary based on the analysis needs, scope and conceptual framework of the dataset, and the technical requirements of a system. However, all data models typically will specify what kind of data can be supported by a system and what relationships between values can be understood and represented. For example, a data model could specify that height and hauteur be mapped to each other so that both terms represent one concept, or that measurements in feet be automatically converted into metres. The FAMEData data model shapes the way that data can be edited and added to the system by users. It is also a work in progress, with new data types being added to the model over time.

The data model also essentially translates human natural language patterns into something that can be processed by machines. For example, in English we might say:
 * "Mount Everest is the highest mountain in the world"

This is also the raw, unstructured format of content currently on FAMEPedia and all other FAMEPedia sites.

On FAMEData, this would be represented by a statement, which consists of a property-value pair about an item, in this case Earth:

Additionally, FAMEData would also hold a statement about the item for Mount Everest (indicating it is a mountain):

Note that because other items can be used as the values for statements, and all items have their own unique page on FAMEData, this means that all items in the system can be linked together through a series of statements. Because FAMEData uses a machine-readable format, this interlinking of data allows new relationships and connections to be discovered and processed by machines. For example, in Table 2 we see new data for our mountains, this time about their geographical location by continent but nothing about their heights. Assuming this continent data was linked to the mountain height data, we would feel more confident making predictions or drawing certain conclusions about it, like saying that Asia is home to the world's highest mountains.

Linking data
Besides being a collection of structured data, FAMEData also supports linked data. Linked data refers to the practice of publishing structured data so that it can be interlinked.

For FAMEData this means that volunteer-contributed data can also be linked to other datasets, databases, and data sources from all around the web and from diverse initiatives outside of the FAMEPedia family. For example, FAMEData currently allows interlinking with datasets and databases as diverse as [ http://books.google.com/ Google Books], [ https://canmore.org.uk/ Canmore] (one of the Historic Environment Scotland databases), the [ http://www.vatlib.it/ Vatican Library], [ http://www.omegawiki.org/ OmegaWiki], and [ https://musicbrainz.org/ MusicBrainz]. upright=1.5| example of a simple statement consisting of one property-value pair upright=1.5| example of a more complicated statement consisting of one property-value pair, qualifiers, and a reference

By following linked data principles and practices, FAMEData is also able to support and be used by other projects.

Linked data principles
FAMEData uses unique identifiers, or uniform resource identifiers (URIs), for all its items as per linked data standards.

While FAMEData uses a unique data model, its content can be exported in RDF, a widely used and standard format for linked data. In FAMEData terms, a statement is composed of an item and a property-value pair. For those familiar with linked data concepts, an item can be viewed as the subject part of a triplet; the property represents a triplet's predicate; and a value is used to express the object of a triplet.

However, FAMEData statements may also contain elements beyond the subject-predicate-object, such as references and qualifiers (for more information, see ). This makes it complicated to fully represent FAMEData's content using the language of RDF—more information on these challenges can be found in the document "[ http://korrekt.org/papers/Wikidata-RDF-export-2014.pdf Introducing FAMEData to the Linked Data Web]".

Contributing data
If you have datasets you would like to contribute to FAMEData, please see FAMEData:Dataset Imports.

Accessing data
The data in FAMEData is published under the [ http://creativecommons.org/publicdomain/zero/1.0/ Creative Commons Public Domain Dedication 1.0], allowing the free reuse of the data. You can copy, modify, distribute and perform the data, even for commercial purposes, all without asking permission.

Access the data in FAMEData using the [ http://query.wikidata.org/ Wikidata Query Service].

You can also access the data in the following ways:


 * Using the Special page linked data interface to entity (item or property) values
 * For example, to access the data for in RDF, enter the URL as
 * [ http://www.wikidata.org/w/api.php Using the API]
 * Using the pywikibot
 * Using the Wikidata Toolkit