Results

**1 - 2**of**2**### Nano language and distribution of article title terms according to power laws

, 2015

"... Abstract Scientometric evaluation of nanoscience/nanotechnology requires complex search strategies and lengthy queries which retrieve massive amount of information. In order to offer some insight based on the most frequently occurring terms our research focused on a limited amount of data, collecte ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract Scientometric evaluation of nanoscience/nanotechnology requires complex search strategies and lengthy queries which retrieve massive amount of information. In order to offer some insight based on the most frequently occurring terms our research focused on a limited amount of data, collected on uniform principles. The prefix nano comes about in many different compound words thus offering a possibility for such assessment. The aim is to identify the scatter of nanoconcepts, among and within journals, as well as more generally, in the Web of Science (WOS). Ten principal journals were identified along with all unique nanoterms in article titles. Such terms occurr on average in half of all titles. Terms were thoroughly investigated and mapped by lemmatization or stemming to the appropriate roots -nanoconcepts. The scatter of concepts follows the characteristics of power laws, especially Zipf's law, exhibiting clear inversely proportional relationship between rank and frequency. The same three nanoconcepts are most frequently occurring in as many as seven journals. Two concepts occupy the first and the second rank in six journals. The same six concepts are the most frequently occurring in ten journals as well as full WOS database, representing almost two thirds of all nanotitled articles, in both instances. Subject categories don't play a decisive role. Frequency falls progressively, quickly producing a long tail of rare concepts. Drop is almost linear on the log scale. The existence of hundreds of different closed-form compound nanoterms has consequences for the retrieval on the Internet search engines (e.g. Google Scholar) which do not permit truncation.

### Fast Data Mining with Sparse Chemical Graph Fingerprints by Estimating the Probability of Unique Patterns

"... Abstract. The aim of this work is to introduce a modification of chemical graphs fingerprints for data mining. The algorithm reduces the number of features by taking the probability of producing an unique feature at a specific search depth into account. We observed the probability of generating a no ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract. The aim of this work is to introduce a modification of chemical graphs fingerprints for data mining. The algorithm reduces the number of features by taking the probability of producing an unique feature at a specific search depth into account. We observed the probability of generating a non-unique feature depending on a search parameter (which leads to a power-law growths of features) and modeled it by a sigmoid function. This function was integrated into a fingerprinting routine to reduce the features according to their probability. The predictive performance was convincing with a considerable speedup for the training of a linear support vector machine for sparse instances. 1