Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Looking for sponsors for my academic project: Ontocrawler
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Looking for sponsors for my academic project: Ontocrawler

haphanhaphan Member
edited May 2012 in General

Hi,

Ontocrawler is a part of my Msc thesis which was completed in 2011. The thesis surveys the landscape of the Semantic web documents/ontologies, particularly OWL documents. In order to conduct the research systematically and efficiently, I wrote the Ontocrawler to collect semantic documents all over the web and the result was very fruitful.

Now it's been almost 1 year and I do want to see how the semantic world is moving. My plan is to rewrite the OntoCrawler and build a continuous crawling mechanism and an indexing system on top of the documents collected.

Initial result
I was able to collect semantic 200k documents. In those 200k files, ~25k are valid documents. For further analysis regarding my research, please drop me an PM or email if you're interested.

You can visit this link to see the heart beat monitor of the Ontocrawler
http://ontocrawler.com/

Documents retrieved:
http://haphan.co.uk/ontocrawler/snapshot-new.tar.bz2

Technical aspect for the project
Initially I implemented the crawler using PHP-CLI but it exhibited limitations. PHP cannot utilize multiple threading and is very slow in terms of crawling. I have built one prototype crawler using node.js and it looks promising.

Now I am looking for someone who willing sponsor my project with a VPS or DS. The crawler won't use much CPU the I am concern about the bandwidth. Last time when I conduct the research using school's lab, it used like 2TB/month. Location is not important to me but most of the websites that host semantic documents are in the UK/EU.

Future Plan
Since this is purely an academic project, I have no intention to commercialize or make money from it. My plan is to write a paper from the result I collected. I will list the name and website of sponsors on my project website and fully acknowledge theme in my paper.

I look forward to hearing from you, especially from those who have background regarding what I am doing.

Thank you very much.

Ha Phan

Comments

  • AsadAsad Member
    edited May 2012

    I hate semantics.

  • ErkanErkan Member
    edited May 2012

    question:
    if this is still an ongoing academic project,
    why don't you use the infra structure from The University of Manchester? (they surely have very good resources)
    Or you are not anymore at the university ?

  • haphanhaphan Member

    @Erkan said: if this is still an ongoing academic project,

    why don't you use the infra structure from The University of Manchester? (they surely have very good resources)
    Or you are not anymore at the university ?

    UoM indeed has excellent network infrastructure. However, the thesis is completed and I am no longer at the university. I just want to continue my research.

  • DamianDamian Member

    I'm interested in this project.

    What will you do with the collected data, after the paper is written?

  • haphanhaphan Member
    edited May 2012

    @Damian said: What will you do with the collected data, after the paper is written?

    Firstly the data can be a precious sample set for Semantic Reasoners. In the institution I studied in, they are maintaining 2 state-of-the-art reasoners right now (FaCT++ and Hermit). A real-world ontology set helps them tweak and improve the reasoner. For now there are still some ontologies that take ages to entail.

    Secondly, I can provide a public repository with collected data for public use.

  • WilliamWilliam Member

    2TB? I give you 10TB if Austria as location is fine :)

  • How much ram do you need?

  • haphanhaphan Member
    edited May 2012

    @liam said: Do you live near Manchester?

    I used to :)

    @joodle said: How much ram do you need?

    Something like 1GB at least with swap if possible. I have a document validator module written in java and it uses loads of RAM.

    @William said: 2TB? I give you 10TB if Austria as location is fine :)

    Thanks, PM you!

Sign In or Register to comment.