[PDF] Web Corpus Construction - eBooks Review

Web Corpus Construction


Web Corpus Construction
DOWNLOAD

Download Web Corpus Construction PDF/ePub or read online books in Mobi eBooks. Click Download or Read Online button to get Web Corpus Construction book now. This website allows unlimited access to, at the time of writing, more than 1.5 million titles, including hundreds of thousands of titles in various foreign languages. If the content not found or just blank you must refresh this page



Web Corpus Construction


Web Corpus Construction
DOWNLOAD
Author : Roland Schäfer
language : en
Publisher: Morgan & Claypool Publishers
Release Date : 2013-07-01

Web Corpus Construction written by Roland Schäfer and has been published by Morgan & Claypool Publishers this book supported file pdf, txt, epub, kindle and other format this book has been release on 2013-07-01 with Computers categories.


The World Wide Web constitutes the largest existing source of texts written in a great variety of languages. A feasible and sound way of exploiting this data for linguistic research is to compile a static corpus for a given language. There are several adavantages of this approach: (i) Working with such corpora obviates the problems encountered when using Internet search engines in quantitative linguistic research (such as non-transparent ranking algorithms). (ii) Creating a corpus from web data is virtually free. (iii) The size of corpora compiled from the WWW may exceed by several orders of magnitudes the size of language resources offered elsewhere. (iv) The data is locally available to the user, and it can be linguistically post-processed and queried with the tools preferred by her/him. This book addresses the main practical tasks in the creation of web corpora up to giga-token size. Among these tasks are the sampling process (i.e., web crawling) and the usual cleanups including boilerplate removal and removal of duplicated content. Linguistic processing and problems with linguistic processing coming from the different kinds of noise in web corpora are also covered. Finally, the authors show how web corpora can be evaluated and compared to other corpora (such as traditionally compiled corpora). For additional material please visit the companion website: sites.morganclaypool.com/wcc Table of Contents: Preface / Acknowledgments / Web Corpora / Data Collection / Post-Processing / Linguistic Processing / Corpus Evaluation and Comparison / Bibliography / Authors' Biographies



Corpus Linguistics And The Web


Corpus Linguistics And The Web
DOWNLOAD
Author : Marianne Hundt
language : en
Publisher: Rodopi
Release Date : 2007

Corpus Linguistics And The Web written by Marianne Hundt and has been published by Rodopi this book supported file pdf, txt, epub, kindle and other format this book has been release on 2007 with Computers categories.


Using the Web as Corpus is one of the recent challenges for corpus linguistics. This volume presents a current state-of-the-arts discussion of the topic. The articles address practical problems such as suitable linguistic search tools for accessing the www, the question of register variation, or they probe into methods for culling data from the web. The book also offers a wide range of case studies, covering morphology, syntax, lexis, as well as synchronic and diachronic variation in English. These case studies make use of the two approaches to the www in corpus linguistics - web-as-corpus and web-for-corpus-building. The case studies demonstrate that web data can provide useful additional evidence for a broad range of research questions.



Building And Exploring Web Corpora Wac3 2007


Building And Exploring Web Corpora Wac3 2007
DOWNLOAD
Author : Cédrick Fairon
language : en
Publisher: Presses univ. de Louvain
Release Date : 2007

Building And Exploring Web Corpora Wac3 2007 written by Cédrick Fairon and has been published by Presses univ. de Louvain this book supported file pdf, txt, epub, kindle and other format this book has been release on 2007 with Language Arts & Disciplines categories.


WAC More and more people are using Web data for linguistic and NLP research. The Web as Corpusworkshop (WAC) provides a venue for exploring how we can use it effectively and the advancementsto which this could lead.This book is a collection of the talks presented at the 3 rd WAC in Louvain-la-Neuve (Belgium).The focus is on the description of Web corpus collection projects, the exploration of Web datacharacteristics from a linguistics/NLP perspective, and on the use of crawled Web data for NLPpurposes. CLEANEVAL Any use of Web data requires that it be cleaned in order to get rid of unwanted material including,for example, HTML markup, navigation bars, advertisements. To date there has been no sharingof resources or expertise in this particular domain and the cleaning has often been done minimally.Cleaneval was an exercise aimed at promoting collaboration and improving our understandingof the issues. Results and perspectives are presented in this book.



The Web As Corpus


The Web As Corpus
DOWNLOAD
Author : Maristella Gatto
language : en
Publisher:
Release Date : 2014

The Web As Corpus written by Maristella Gatto and has been published by this book supported file pdf, txt, epub, kindle and other format this book has been release on 2014 with Computational linguistics categories.




Qualitative Researching With Text Image And Sound


Qualitative Researching With Text Image And Sound
DOWNLOAD
Author : Paul Atkinson
language : en
Publisher: SAGE
Release Date : 2000-06-22

Qualitative Researching With Text Image And Sound written by Paul Atkinson and has been published by SAGE this book supported file pdf, txt, epub, kindle and other format this book has been release on 2000-06-22 with Social Science categories.


`This excellent text will introduce advanced students - and remind senior researchers - of the availability of a broad range of techniques available for the systematic analysis of social data that is not numeric. It makes the key point that neither quantitative nor qualitative methods are interpretive and at the same time demonstrates once and for all that neither a constructivist perspective nor a qualitative approach needs to imply abandonment of rigor. That the chapters are written by different authors makes possible a depth of expertise within each that is unusually strong' - Susanna Hornig Priest, Texas A&M University; Author of `Doing Media Research' Qualitative Researching with Text, Image and Sound off



Developing Linguistic Corpora


Developing Linguistic Corpora
DOWNLOAD
Author : Martin Wynne
language : en
Publisher: Oxbow Books Limited
Release Date : 2005

Developing Linguistic Corpora written by Martin Wynne and has been published by Oxbow Books Limited this book supported file pdf, txt, epub, kindle and other format this book has been release on 2005 with Language Arts & Disciplines categories.


A linguistic corpus is a collection of texts which have been selected and brought together so that language can be studied on the computer. Today, corpus linguistics offers some of the most powerful new procedures for the analysis of language, and the impact of this dynamic and expanding sub-discipline is making itself felt in many areas of language study. In this volume, a selection of leading experts in various key areas of corpus construction offer advice in a readable and largely non-technical style to help the reader to ensure that their corpus is well designed and fit for the intended purpose. This guide is aimed at those who are at some stage of building a linguistic corpus. Little or no knowledge of corpus linguistics or computational procedures is assumed, although it is hoped that more advanced users will find the guidelines here useful. It is also aimed at those who are not building a corpus, but who need to know something about the issues involved in the design of corpora in order to choose between available resources and to help draw conclusions from their studies.



The Mihi Est Construction


The Mihi Est Construction
DOWNLOAD
Author : Mihaela Ilioaia
language : en
Publisher: Walter de Gruyter GmbH & Co KG
Release Date : 2023-12-18

The Mihi Est Construction written by Mihaela Ilioaia and has been published by Walter de Gruyter GmbH & Co KG this book supported file pdf, txt, epub, kindle and other format this book has been release on 2023-12-18 with Foreign Language Study categories.


This book examines the Romanian mihi est construction (Mi-e foame/frică, me.dat = is hunger/fear ‘I am hungry/ afraid’). While it disappeared from all other Romance languages to be replaced with a habeo structure, the mihi est pattern is in Romanian the most common way of expressing psychological or physiological states. By means of synchronic and diachronic corpus studies, the book investigates the status of the core arguments of the mihi est structure, i.e. the dative experiencer and the nominative state noun, as well as its evolution throughout the centuries. The data analysis reveals that the dative experiencer syntactically behaves like nominative subjects, whereas the state noun shows predicate behavior. As for the evolution of the mihi est structure, the analysis shows a certain tendency toward innovation, since in present-day Romanian it can coerce nouns coming from other semantic fields into the construction’s psychological or physiological interpretation. Could this be another unique trait of Romanian, which causes it to seemingly go against the tendency of most Romance languages toward canonical marking of core arguments?



Building And Using Comparable Corpora For Multilingual Natural Language Processing


Building And Using Comparable Corpora For Multilingual Natural Language Processing
DOWNLOAD
Author : Serge Sharoff
language : en
Publisher: Springer Nature
Release Date : 2023-08-23

Building And Using Comparable Corpora For Multilingual Natural Language Processing written by Serge Sharoff and has been published by Springer Nature this book supported file pdf, txt, epub, kindle and other format this book has been release on 2023-08-23 with Computers categories.


This book provides a comprehensive overview of methods to build comparable corpora and of their applications, including machine translation, cross-lingual transfer, and various kinds of multilingual natural language processing. The authors begin with a brief history on the topic followed by a comparison to parallel resources and an explanation of why comparable corpora have become more widely used. In particular, they provide the basis for the multilingual capabilities of pre-trained models, such as BERT or GPT. The book then focuses on building comparable corpora, aligning their sentences to create a database of suitable translations, and using these sentence translations to produce dictionaries and term banks. Then, it is explained how comparable corpora can be used to build machine translation engines and to develop a wide variety of multilingual applications.



Ai 2008 Advances In Artificial Intelligence


Ai 2008 Advances In Artificial Intelligence
DOWNLOAD
Author : Wayne Wobcke
language : en
Publisher: Springer Science & Business Media
Release Date : 2008-11-13

Ai 2008 Advances In Artificial Intelligence written by Wayne Wobcke and has been published by Springer Science & Business Media this book supported file pdf, txt, epub, kindle and other format this book has been release on 2008-11-13 with Computers categories.


This book constitutes the refereed proceedings of the 21th Australasian Joint Conference on Artificial Intelligence, AI 2008, held in Auckland, New Zealand, in December 2008. The 42 revised full papers and 21 revised short papers presented together with 1 invited lecture were carefully reviewed and selected from 143 submissions. The papers are organized in topical sections on knowledge representation, constraints, planning, grammar and language processing, statistical learning, machine learning, data mining, knowledge discovery, soft computing, vision and image processing, and AI applications.



Building And Using Comparable Corpora


Building And Using Comparable Corpora
DOWNLOAD
Author : Serge Sharoff
language : en
Publisher: Springer Science & Business Media
Release Date : 2013-12-13

Building And Using Comparable Corpora written by Serge Sharoff and has been published by Springer Science & Business Media this book supported file pdf, txt, epub, kindle and other format this book has been release on 2013-12-13 with Computers categories.


The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated. This situation resulted in a natural drive towards the use of comparable corpora, i.e. non-parallel texts in the same domain or genre. Nevertheless, this research direction has not produced a single authoritative source suitable for researchers and students coming to the field. The proposed volume provides a reference source, identifying the state of the art in the field as well as future trends. The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation.