Map.put("status_code", response.getStatusLine().getStatusCode() + "") Map.put("pageCount", metadata.get("xmpTPg:NPages")) Map.put("title", metadata.get(TikaCoreProperties.TITLE)) Parser.parse(input, handler, metadata, parseContext) ParseContext parseContext = new ParseContext() HttpEntity entity = response.getEntity() īod圜ontentHandler handler = new Bod圜ontentHandler() ĪutoDetectParser parser = new AutoDetectParser() HttpResponse response = httpclient.execute(httpGet) Import .Bod圜ontentHandler ĭefaultHttpClient httpclient = new DefaultHttpClient() Hi we can extract the pdf files using Apache Tika The analysis of 11.5 CCD1.1 results indicates that OntoGain performs better than Text2Onto in terms of precision extracts more correct concepts while being more selective extracts fewer but more reasonable concepts. OntoGain is applied on two separate data sources a medical and computer corpus and its results are compared with similar results obtained by Text2Onto, a state-of-the-art-ontology learning method. The OntoGain allows transformation of the derived ontology into OWL using Jena Semantic Web Frame- work1. To show proof of concept, a system prototype is implemented. A method which tries to carry out the most appropriate generalization level between a relation's concepts is also implemented. Furthermore an association rule algorithm is applied for revealing non-taxonomic relations. We opted for a hierarchical clustering method and Formal Concept Analysis (FCA) algorithm for building the term taxonomy. OntoGain is based upon multi-word term concepts, as multi-word or compound terms are vested with more solid and distinctive semantics than plain single word terms. Several dierent state-of-the-art methods have been examined for implementing each layer. The derived term taxonomy is then enriched with non-taxonomic relations. Building upon plain term extraction a con-cept hierarchy is formed by clustering the extracted concepts. OntoGain follows an ontology learning process dened by distinct processing layers. We propose OntoGain, a method for ontology learning from multi-word concept terms extracted from plain text.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |