Skip to content

Log4J doesn’t understand your command-line pointer to

I had to answer this question more than once, so it’s probably worth posting in case somebody else out there is looking for the answer.

Are you getting this error message when trying to run a class that uses Log4J?

log4j:WARN No appenders could be found for logger (ldif.local.Ldif$).
log4j:WARN Please initialize the log4j system properly.

Are you specifying a pointer to your file like this?

java -Xmx2G -Xms256M -Dlog4j.configuration=file:../resources/ [MainClass]

The part that most people forget is that “file:” before the path to Check that once more and try again. ūüôā

If you are using IntelliJ IDEA, the place to set that is under “Run / Edit Configurations / (choose your class under Applications) / VM options”.

Cisco VPN Error 442 Unable to Enable Virtual Adapter

Wasted about 2 hours of my time with this, so I’d like to spare you the same frustration.

  1. Go to “Start > Run”
  2. Type “cmd” to open the command prompt
  3. Paste the line below to disable the ¬†“Internet Connection Sharing (ICS)”¬†¬†(saving you the usual multiple button clicking hunt on Windows)
    sc stop SharedAccess
  4. Now open Cisco VPN and it should work.


Record number of student applications for Google Summer of Code, and 48 for DBpedia Spotlight

This year Google’s Summer of Code has “had a record 6,685 student proposals from 4,258 students submitted to this year‚Äôs 180 participating mentoring organizations.” DBpedia Spotlight is participating this year for the first time. We were thrilled to watch and interact with prospective students as 48 applications gradually showed up in the system, culminating with an almost chaotic number of applications on the last day.¬†


Chart based on a spreadsheet kindly shared by Olly.

Excluding proposals that we considered spam (incomplete, copy+paste, unrelated, etc.), about 30 applications remained, including “ok”, “good”, “very good” and¬†several¬†“amazing” ones! Google will announce later today how many students we will get, based on their available budget. But I can already tell you that it will be a tough choice for us!

The most popular ideas were Topical Classification and Hadoop-based Indexing, followed by Internationalization, Spotting and Disambiguation. Surprisingly,¬†we got far fewer “obvious combination” proposals than we expected — that is, proposals combining ideas that we thought would be perfect matches.¬†Many students really put time into understanding the system and our ideas, and we were generally quite impressed with everybody’s interest and energy. Some students also went the extra mile and combined their ideas with ours. Overall, a really exciting summer is already rising above the horizon for us!

DBpedia Spotlight has been selected for Google Summer of Code.

The Google Summer of Code (GSoC) is a global program that offers student developers (BSc,MSc,PhD) stipends to write code for open source software projects. It has had thousands of participants since the first edition in 2005, connecting prospective students with mentors from open source communities such as Debian, KDE, Gnome, Apache Software Foundation, Mozilla, etc.

For the students, it is a great chance to get real-world software development experience. For the open source communities, it is a chance to expand their development community. For everybody else, more source code is created and released for the benefit of all!

We are thrilled to announce that our open source project DBpedia Spotlight has been selected for the Google Summer of Code 2012.

We are now seeking students interested in working with us to enhance operational aspects of DBpedia Spotlight, as well as to engage in research activities in collaboration with our team. If you are an energetic developer, passionate for open source and interested in areas related to DBpedia Spotlight, please get in touch with us!

We have shared a number of project ideas to get you started. To apply, visit:

If you would like to see DBpedia Spotlight in action, helping you to explore available projects within GSoC 2012, please visit our demonstration page at:

Git’s coolest merge ever for research collaborations

The coolest research groups in the world maintain a git repo where all their research papers, code and documentation is stored and versioned. Some groups keep these repos private, at least during the period when the paper is being written. The problem is: cool research groups collaborate with other cool research groups. Two groups would like to write a paper together, and share private access to that paper, but no access to the other papers. One way to do that is to use a service like BitBucket, which offers free private repos. Both groups would then have access to this fresh BitBucket repo for the writing period, and when they are done, both can merge the changes in with their internal private repo. This kind of merge, preserving the changes history, and everything, is known as ¬†“the coolest merge ever“. I found about it on StackOverflow, and I’ve done it already for two papers. Very cute!

See the recipe below:

cd ~/CoolResearchGroup/PrivateGitRepo/                             # this is where you keep your clone of the group's repo
git remote add other ~/BitBucket/PrivateGitRepoForCollaboration    # this is where you keep your clone of the private repo
git fetch other
git checkout -b PrivateGitRepoForCollaboration other/master
mkdir PrivateGitRepoForCollaboration
git mv *.tex PrivateGitRepoForCollaboration           # move the files that you need
git checkout master                                   # should add new branch to master
git commit
git remote rm other                                   # don't need the temporary branch anymore
git push                                              # if you have a remote, that is

DBpedia Spotlight’s anniversary

Today we complete one year since we first released DBpedia Spotlight (v0.1). A small but committed team, 12 months later, and we’re still alive and kickin’!¬†Our ambitious proposition to link text to *any* of the 3.5M things of 320 different classes in DBpedia has proven both challenging and a lot of fun. In our paper at I-SEMANTICS 2011 we showed that our simple and generic approach works quite reasonably, and that our focus on staying flexible pays off in the end. This perception has been echoed by other projects that have used DBpedia Spotlight throughout the year. We were excited to have counted 1.9M requests from 21/Nov/2011 to 08/Dez/2011, and managed to keep 98.64% up time since Aug/2011 up to now with 73 little availability accidents. ūüôā (By the way, thanks to Pingdom for tracking our availability for free.)

So much support from the community has made us both proud and agitated, as we would like to roll out more and more enhancements to support all of the fine people that have relied on DBpedia Spotlight in their work.

After our I-Semantics evaluation, we have participated at TAC-KBP 2011, which evaluates systems in their ability to link entities of type Person, Location and Organization to a reference Knowledge Base (KB). Their KB is much smaller than DBpedia, and the task is much more focused as there are only three target entity types. We have participated with DBpedia Spotlight as-is, and obtained surprisingly good results for a generic approach (around median), comparable to many old-timers in the field. Such a focused task allows for much more specialization, and we hope to participate again this year with a system specialized for that task.

We have also been working on developing new “spotting” techniques, that go beyond our lexicon-based approach. We have experimented with Named Entity Recognition, Keyphrase Extraction, among others. These results will be reported at LREC’2012 later this year. During that time we will also release some of the datasets that we created for DBpedia Spotlight, and give a talk on how they can be used to build or enhance NLP tools.

Next steps for DBpedia Spotlight include the release of internationalized versions, implementation of new disambiguation techniques and general enhancements (esp. performance of the ICF disambiguator).

Thanks to all of you that have supported, used or recommended our tool, and keep sending your comments our way! Special thanks to Jo Daiber, welcomed as new committer and Rohana Rajapakse and Iavor Ielev for their contributions in 2011.

Happy Birthday DBpedia Spotlight!!!

LOD2 Indian Summer School Project Ideas

I will describe below a few very simple ideas that could be done in a few hours of dedicated work. They mainly focus on how one could use DBpedia Spotlight in order to enhance content and enrich user experience. They are presented in increasing complexity, and they cater to varied skill sets.

1) Links to complementary information. Implement a ‚ÄúMore about this‚ÄĚ function, where users can find explanatory information about concepts or entities that they do not know in a piece of news or educational material. While reading the text, a user can use the mouse to select some text in the page, getting a popup with DBpedia Spotlight suggestions for what that selected text can mean. Suggested background: Javascript, jQuery, JSON, CSS

2) Faceted browsing of blog posts. Many current CMSs and blogging systems (e.g. Drupal, WordPress) already allow people to manually add tags to their blog posts. These tags then appear on the right-hand side of the screen, and readers can click on them in order to obtain more entries annotated with the same tag. Similarly, one could call DBpedia Spotlight to “tag” posts with DBpedia entities, which then could be shown on the right-hand side of the screen for people to get more posts with the same entries. Moreover, the types (e.g. people, organization, sport, etc.) of the entities could also be used as tags, maybe in a different “tag box” allowing more coarse grained filtering of posts. Suggested background: javascript, jQuery, JSON, CSS, WordPress / Drupal

3) Rich snippets on Google. Search engines started to display info from annotations. DBpedia Spotlight can generate annotations with You could implement a “result formatter” in DBpedia Spotlight to generate Microdata annotations and, and show a preview of how Google would interpret this using the Google Webmaster Tools. Suggested background: Java, Scala, XML, Maven2

4) More expressive filtering of information streams. Twarql allows users to use tweet annotations in order to filter tweets based on more than just keywords. DBpedia Spotlight could be used in Twarql’s pipeline to extract DBpedia entities and enable expressive filtering of tweets to avoid information overload. Suggested background: Java, RDF, SPARQL

These are initial ideas that could be expanded with other cooler (but more time consuming) ideas that I have. So if you adopt one of these and want to expand, feel free to shoot me an e-mail. If you need tutoring, I am available on Skype until Wednesday night, and in person on Thursday and Friday.

Happy coding!

%d bloggers like this: