Skip to content
Tags

Big Graph Data Panel at ISWC2012

November 15, 2012

Although not in a conventional blog post style, I would like to share my notes from the (may I say, fantastic) Big Graph Data Panel at ISWC2012 in Boston, MA, USA. The panel was moderated by Frank van Harmelen and composed by Michael Stonebraker, Tim Berners-LeeJohn Giannandrea and Bryan Thompson.

Disclaimers: Although I tried to be faithful to what I heard at the panel, please do not take the attributed sentences as word-by-word quotes. There is certainly some unmeasured amount of interpretation and paraphrasing that went into this. There is also bias towards topics (or rhetoric) that picked my interest, and unfortunately lack of attribution (or misattribution) due to the speed under which I was required to type all the notes as people were speaking. 🙂 I still hope the content is useful for your understanding of the topics.

 

What is Big Data? What do Semantic Web and Big Data offer to each other? 

Stonebreaker: Big Data encompasses three types of data problem.
– volume: you have too much data. For example, all of the Web.
– speed: data comes at you too fast. For example, query logs, sensor data, etc.
– variety: you have too many sources of data. For example, a pharma company with thousands of spreadsheets each by individual researchers, no common language (German, English, Portuguese), different writing styles, vocabulary, etc.

In other words, as put by Deborah McGuiness in her tweet: Big Volume or Big Velocity or Big Variety.  These “Three ‘Vs’ of Big Data” were originally posited by Gartner’s Doug Laney in a 2001 research report.

Stonebreaker: Big Data is only a problem if your data need grows faster than memory gets cheaper.

Giannandrea: First thing is to understand what a Graph Database even is.

Berners-Lee: everything can be structured in a graph. Saying graph structured data is like saying “data data”.

Do we even need graph databases? Don’t relational database systems already solve everything?

Stonebreaker: “major DB vendors are 30 years old obsolete systems that are not good at anything”.

Stonebreaker: unsolved problem how to do graph problems at scale, meaning that whatever aggregate memory you have cannot fit all.

Paraphrasing Deborah McGuiness‘ quoting of Stonebraker: Let the benchmark wars begin. If winners are 10x better, then you survive (if they are only 2x better, than the giant companies will take you over).

Would anybody with Big Graph Data problems use SPARQL? Or even SQL? Or must it be MapReduce?

Stonebreaker: About SPARQL, don’t get hung up on your query language. In the Hadoop world, everybody is moving to Hive. Hence, all SQL vendors are starting to write Hive2SQL translators.

Stonebreaker: About MapReduce, it is not the final answer. Google wrote mapreduce 7 years ago. It is good at embarrassingly parallel tasks. Joins are not embarrassingly parallel.

What about Open Data? Is there any incentive for it?

Van Harmelen: “standard anecdote: the incentive for opening up your data is that if you get successful, your servers burn down.”

Stonebreaker: Biggest problem is trying to put stuff together after the fact that was not designed to be put together.

Stonebreaker: deduplicating fuzzy data is one of the killer problems.

Stonebreaker: “there is tremendous value in curating the data.”

Sheth: explicit, named relationships make deduplication easier.

What about the original Semantic Web idea, of querying a distributed graph on the Web?

Stonebreaker: “query response time in the distributed way is as slow as the slowest provider. people centralize to speed it up.”

Attendee: “centralizing only makes sense if you know what u want to do. Putting data out on the Web enables people to find it out.”

Someone (I think Thompson?): “Semantic Web research needs to find out what is the right bit of “well curated data/process/schema/query” to add on top of big data.”

My question to the panel: What will Big Graph Data look like in 2022? Solved or beginning? Volume? In silos or global?

Giannandrea: in 2022 we will understand data better
Thompson: we will still not get the semantic interoperability across systems by 2022
Stonebreaker: big data will be a bigger and bigger problem at least for the next decade
Berners-Lee: there is a battle of small vocabularies for interoperability coming in the next years

 

Participate! If you would like to continue discussing the future of Big Graph Data, you should join the mailing lists of the FP7 BIG Project. The Working Groups on Data Analysis and Data Storage are particularly relevant to this discussion. Check them out!

 

Advertisements
Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: