• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 86
Next 10 →

Overview of TweetLID: Tweet Language Identification at SEPLN 2014 Introducción a TweetLID: Tarea Compartida sobre Identificación de Idioma de Tuits en SEPLN 2014

by Arkaitz Zubiaga, Iñaki San Vicente, Pablo Gamallo, Jose ́ Ramom Pichel, Nora Aranberri, Aitzol Ezeiza
"... Resumen: Este art́ıculo presenta un resumen de la tarea compartida y taller TweetLID, organizado junto a SEPLN 2014. Resume brevemente el proceso de colección y anotación de datos, el desarrollo y evaluación de la tarea compartida, y por último, los resultados obtenidos por los participantes. Pa ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Resumen: Este art́ıculo presenta un resumen de la tarea compartida y taller TweetLID, organizado junto a SEPLN 2014. Resume brevemente el proceso de colección y anotación de datos, el desarrollo y evaluación de la tarea compartida, y por último, los resultados obtenidos por los participantes

TweetSafa: Tweet language identification TweetSafa: Identificación del lenguaje de tweets

by Iosu Mendizabal, Jeroni Car, Daniel Horowitz
"... Resumen: Este art́ıculo describe la metodoloǵıa utilizada en la tarea propuesta en SE-PLN 14 para la identificación de lenguaje de tweets (TweetLID), como se explica en (Iñaki San Vicente, 2014). El sistema consta de un preprocesamiento de tweets, creación de dic-cionarios a partir de N-Grams y ..."
Abstract - Add to MetaCart
dos algoritmos de reconocimiento de lenguaje. Palabras clave: Reconocimiento de lenguaje, lenguaje de tweets. Abstract: This paper describes the methodology used for the SEPLN 14 shared task of tweet language identification (TweetLID), as explained on (Iñaki San Vicente, 2014). The system consists

Tweets language identification using feature weighting Identificación de idioma en tweets mediante pesado de términos

by Juglar Díaz Zamora, Adrian Fonseca Bruzón
"... Resumen: Este trabajo describe un método de detección de idiomas presentado en el Taller de Identificación de Idioma en Twitter (TweetLID-2014). El método propuesto representa los tweets por medio de trigramas de caracteres pesados de acuerdo a su relevancia para cada idioma. Para el pesado de los t ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
. Finalmente, realizamos un análisis de los resultados obtenidos. Palabras clave: tweets, identificación de idioma, pesado de rasgos Abstract: This paper describes the language identification method presented in Twitter Language Identification Workshop (TweetLID-2014). The proposed method represents tweets

Language Identification for Creating Language-Specific Twitter Collections

by Shane Bergsma, Paul Mcnamee
"... Social media services such as Twitter offer an immense volume of real-world linguistic data. We explore the use of Twitter to obtain authentic user-generated text in low-resource languages such as Nepali, Urdu, and Ukrainian. Automatic language identification (LID) can be used to extract language-sp ..."
Abstract - Cited by 22 (5 self) - Add to MetaCart
-specific data from Twitter, but it is unclear how well LID performs on short, informal texts in low-resource languages. We address this question by annotating and releasing a large collection of tweets in nine languages, focusing on confusable languages using the Cyrillic, Arabic, and Devanagari scripts

Language variety identification in Spanish tweets

by Wolfgang Maier
"... We study the problem of language vari-ant identification, approximated by the problem of labeling tweets from Spanish speaking countries by the country from which they were posted. While this task is closely related to “pure ” language iden-tification, it comes with additional com-plications. We bui ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
We study the problem of language vari-ant identification, approximated by the problem of labeling tweets from Spanish speaking countries by the country from which they were posted. While this task is closely related to “pure ” language iden-tification, it comes with additional com-plications. We

Language Identification of Tweets Using LZW Compression

by Brian O. Bush
"... Language identification plays a major role in natural language processing ap-plications and is commonly used as a pre-processing phase. Typical tech-niques employ n-gram models and re-quire text parsing and cleanup such as punctuation removal, white-space nor-malization, etc. This paper investi-gate ..."
Abstract - Add to MetaCart
Language identification plays a major role in natural language processing ap-plications and is commonly used as a pre-processing phase. Typical tech-niques employ n-gram models and re-quire text parsing and cleanup such as punctuation removal, white-space nor-malization, etc. This paper investi

Nirmal: Automatic identification of software relevant tweets leveraging language model

by Abhishek Sharma, Yuan Tian, David Lo - in Software Analysis, Evolution and Reengineering (SANER), 2015 IEEE 22nd International Conference on. IEEE, 2015
"... Abstract—Twitter is one of the most widely used social media platforms today. It enables users to share and view short 140-character messages called “tweets”. About 284 million active users generate close to 500 million tweets per day. Such rapid generation of user generated content in large magnitu ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
with noise, we propose a novel approach named NIRMAL, which automatically identifies software relevant tweets from a collection or stream of tweets. Our approach is based on language modeling which learns a statistical model based on a training corpus (i.e., set of documents). We make use of a subset

Detecting the Gender of a Tweet Sender

by Thomas Oshiobughie , Saskatchewan Ugheoke Regina
"... ABSTRACT Social media has been in existence for over two decades and, increasingly, people are using it to communicate, connect, share content, and socialize across the globe. Given the huge amount of data generated by the great popularity of social media sites, opportunities have emerged for resea ..."
Abstract - Add to MetaCart
and Google+), but such information could be useful for targeting a specific audience for advertising, for personalizing content, and for legal investigation. These procedures, known as authorship identification, provide veritable information about the tweet author, for example, the gender of the author

Goal-Directed Elaboration of Requirements for a Meeting Scheduler

by Axel Van Lamsweerde, Robert Darimont , 1995
"... Recently a number of requirements engineering languages and methods have flourished that do not only address what questions but also why, who and when questions. The objective of this paper is twofold: (i) to assess the strengths and weaknesses of one of these methodologies on a non-trivial benchmar ..."
Abstract - Cited by 139 (10 self) - Add to MetaCart
Recently a number of requirements engineering languages and methods have flourished that do not only address what questions but also why, who and when questions. The objective of this paper is twofold: (i) to assess the strengths and weaknesses of one of these methodologies on a non

Analysis of Named Entity Recognition and Linking for Tweets

by Leon Derczynskia, Diana Maynarda, Marieke Van Erpc, Genevieve Gorrella, Johann Petraka, Kalina Bontchevaa
"... Applying natural language processing for mining and intelligent information ac-cess to tweets (a form of microblog) is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-depen ..."
Abstract - Add to MetaCart
-dependent, and dynamic nature. Information extraction from tweets is typically performed in a pipeline, comprising consecutive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). In this work, we describe a new
Next 10 →
Results 1 - 10 of 86
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University