---
title: "UDPipe Natural Language Processing - Universe"
author: "Jan Wijffels"
date: "`r Sys.Date()`"
output:
  html_vignette:
    fig_caption: false
    toc: false
vignette: >
  %\VignetteIndexEntry{UDPipe Natural Language Processing - Universe}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE, cache=FALSE}
options(width = 1000)
knitr::opts_chunk$set(echo = TRUE, message = FALSE, comment = NA, eval = TRUE)
```


## udpipe universe

The udpipe package is loosely coupled with other NLP packages by the same author. Loosely coupled means that none of the packages have hard dependencies of one another making it easy to install and maintain and allowing you to use only the packages and tools that you want.

Hereby a small list of loosely coupled packages by the same author which contain functions and documentation where the udpipe package is used as a preprocessing step.

- **BTM**: Biterm Topic Modelling: available at https://CRAN.R-project.org/package=BTM 
- **crfsuite**: Build named entity recognition models using conditional random fields: https://CRAN.R-project.org/package=crfsuite
- **nametagger**: Build named entity recognition models using markov models: https://CRAN.R-project.org/package=nametagger
- **torch.ner**: Named Entity Recognition using torch
- **word2vec**: Training and applying the word2vec algorithm: https://CRAN.R-project.org/package=word2vec
- **doc2vec**: Building paragraph vector models also known as doc2vec https://CRAN.R-project.org/package=doc2vec
- **ruimtehol**: Text embedding techniques using Starspace: https://CRAN.R-project.org/package=ruimtehol
- **topicmodels.etm**: Topic Modelling in Embedding Spaces: https://CRAN.R-project.org/package=topicmodels.etm
- **brown**: Brown word clustering on texts
- **sentencepiece**: Byte Pair Encoding and Unigram tokenisation using sentencepiece: https://CRAN.R-project.org/package=sentencepiece
- **tokenizers.bpe**: Byte Pair Encoding tokenisation using YouTokenToMe: https://CRAN.R-project.org/package=tokenizers.bpe
- **text.alignment**: Find text similarities using Smith-Waterman: https://CRAN.R-project.org/package=text.alignment 
- **textplot**: Visualise complex relations in texts: https://CRAN.R-project.org/package=textplot
- **udpipe**: Tokenization, Parts of Speech Tagging, Lemmatization, Dependency Parsing, Keyword detection and NLP processing: https://CRAN.R-project.org/package=udpipe