Changes in version 0.8.16 (2026-01-30)                 

  - Fix on the declaration of persistent_unordered_map for C++20

                 Changes in version 0.8.15 (2025-11-28)                 

  - Drop C++11 from Makevars

                 Changes in version 0.8.14 (2025-11-26)                 

  - Add comment section in Authors@R and put aut instead of ctb for
    udpipe.cpp part

                 Changes in version 0.8.13 (2025-11-26)                 

  - fix load of misaligned address and UBSan messages reported by CRAN

                 Changes in version 0.8.12 (2025-09-04)                 

  - avoid warning: overlapping comparisons always evaluate to true in
    parse_int
      - replacing: !(str.str[0] >= '0' || str.str[0] <= '9') with
        (str.str[0] < '0' || str.str[0] > '9')
  - fix some R CMD check NOTEs about the use of itemize in the
    documentation
  - fix a vignette index entry NOTE
  - fix of an URL in the documentation

                 Changes in version 0.8.11 (2023-01-06)                 

  - replace move with std::move to fix R CMD check warning on recent
    versions of clang compilers

                 Changes in version 0.8.10 (2022-11-10)                 

  - use snprintf instead of sprintf to handle the R CMD check
    deprecating note on M1mac
  - reduction of timings of the examples of document_term_matrix,
    document_term_frequencies, document_term_frequencies_statistics,
    cooccurrence, dtm_bind, keywords_collocation

                 Changes in version 0.8.9 (2022-03-24)                  

  - fix R CMD check message on Fedora clang infrastructure:
    rcpp_udpipe.cpp:243:8: warning: use of bitwise '&' with boolean
    operands

                 Changes in version 0.8.8 (2021-12-02)                  

  - dtm_svd_similarity, fix to make sure that if provided a dtm with
    features which are all missing/zero, the scoring still works as
    expected instead of removing features which contain no data
    whatsoever. So that dtm_svd_similarity can be used alongside
    embeddings of R package word2vec which might contain words which are
    not in the dtm. See the example in ?dtm_svd_similarity
  - added txt_grepl
  - dtm_align now uses NCOL to see if y is a vector instead of a
    data.frame

                        Changes in version 0.8.7                        

  - txt_count now always returns an integer, even if in the border case
    where a character vector of length 0 is supplied

                 Changes in version 0.8.6 (2021-06-01)                  

  - Downloading models to paths containing non-ASCII characters now
    works (issue #95)
  - strsplit.data.frame gains ... which are passed on to strsplit (e.g.
    to use fixed=TRUE for speeding up)
  - read_connlu is now using fixed=TRUE when splitting by newline symbol
    (for speeding up parsing with function udpipe)
  - Added txt_paste
  - Added txt_context
  - Use html_vignette instead of html_document in the vignettes in order
    to reduce package size

                 Changes in version 0.8.5 (2020-12-10)                  

  - Added document_term_matrix.default, document_term_matrix.integer and
    document_term_matrix.numeric
  - Added groups argument to dtm_colsums and dtm_rowsums
  - Added dtm_align
  - Added dtm_sample
  - Added document_term_matrix.matrix
  - dtm_cbind and dtm_rbind allow to pass more than 2 sparse matrices
  - cbind_morphological gains argument which to specify which
    morphological features to extract
  - txt_count now returns NA when NA is provided instead of an error
  - txt_contains now returns NA when NA is provided instead of FALSE,
    unless value is set to TRUE
  - txt_collapse now also works if provided a list of character vectors
  - paste.data.frame now works as well if a data.table is passed instead
    of a data.frame
  - txt_recode gains an extra argument na.rm

                Changes in version 0.8.4-1 (2020-10-12)                 

  - Fixing the Solaris compilation issue in
    ufal::udpipe::multiword_splitter::append_token

                 Changes in version 0.8.4 (2020-10-10)                  

  - Update to UDPipe 1.2.1 (28 Sep 2018)
      - this adds segment_size and learning_rate_final parameters to
        tokenizer training
      - correctly set SpaceAfter for last token when normalizing spaces.
  - Default of udpipe_download_model is now changed, downloads now
    models built on Universal Dependencies 2.5 instead of the models
    build on Universal Dependencies 2.4
  - Added txt_count
  - Added txt_overlap
  - Added dtm_conform
  - Added dtm_chisq
  - Added dtm_svd_similarity
  - Added as_fasttext
  - Added unlist_tokens
  - txt_recode_ngram now also works gracefully in case ngram is set to 1
    although the intention is not to use it when ngram is set to 1
  - Experimental changes regarding cbind_dependencies which might change
    in a subsequent release.
      - cbind_dependencies now has been implementend for type 'child'.
      - cbind_dependencies now allows to add row numbers of the parent
        or children where the token is linked to using the dependency
        parsing output.
  - Experimental and unfinished work on allowing to easily query
    dependency relations

                 Changes in version 0.8.3 (2019-07-05)                  

  - Default of udpipe_download_model is now changed, downloads now
    models built on Universal Dependencies 2.4 instead of the models
    build on Universal Dependencies 2.3
  - also allow strsplit.data.frame to work if the data argument is a
    data.table
  - in case the model loaded with udpipe_load_model is a nil pointer
    (most likely due to users which restarted their R sessions without
    knowing), try reloading the model file in udpipe_annotate
  - fix issue in udpipe_reconstruct giving wrong values in start/end
    positions of the token in case someone had as well SpacesBefore as
    SpacesAfter for a token. For users prior to version 0.8.3 you can
    easily circumvent this issue by removing leading/trailing white
    space in your text by using trimws on your text before using
    udpipe::udpipe.
  - document_term_matrix now gains argument weight allowing to select
    another column to put into the matrix cells
  - add txt_contains

                 Changes in version 0.8.2 (2019-05-29)                  

  - udpipe::udpipe now gains 2 arguments: parallel.cores and
    parallel.chunksize in order to annotate in parallel over your CPU
    cores.
  - document_term_matrix.data.frame now preserves order of the documents
    (issue #44)
  - dtm_remove_lowfreq, dtm_remove_tfidf, dtm_remove_terms gain extra
    argument remove_emptydocs explicitely add drop=FALSE to internal
    dtm_... calls
  - add dtm_remove_sparseterms (issue #44)
  - make sure downloading model fails gracefully if github internet
    resource is not available on CRAN machines
  - udpipe_download_model now also returns
    download_failed/download_message indicating if the download failed
    due to internet connectivity issues

                 Changes in version 0.8.1 (2019-02-15)                  

  - Allow to pass on a .udpipe filename in udpipe_download_model
  - Update documentation on keywords_collocation
  - Added strsplit.data.frame and paste.data.frame

                  Changes in version 0.8 (2018-12-09)                   

  - Default of udpipe_download_model is now changed, downloads now
    models built on Universal Dependencies 2.3 instead of the models
    build on Universal Dependencies 2.0
  - Incorporate models from Universal Dependencies 2.3 released
    on 2018-11-15
  - Incorporate models from conll18 shared task baseline built on
    Universal Dependencies 2.2
  - In case someone uses document_term_frequencies.character incorrectly
    with double document identifiers, make sure this is handled
  - txt_recode now returns x if the length of x is 0
  - added txt_sentiment
  - added txt_previousgram

                  Changes in version 0.7 (2018-09-10)                   

  - Allow to reconstruct the original text + allow to add a start/end
    field in as.data.frame (useful but undocumented feature). Set up
    mainly to be used with the crfsuite R package
  - Added txt_tagsequence
  - Added 1 general function called udpipe which does annotation of data
    in TIF format.
  - Add option in udpipe_download_model to download the model only it
    does not exist on disk
  - Loaded model are put into an environment such that users of the
    function udpipe do not need to care about loading

                 Changes in version 0.6.1 (2018-07-30)                  

  - src/udpipe.cpp: at the request of CRAN: remove dynamic execution
    specification which g++-7 and later complain about by removing the
    throw statements
  - add ctb role to authors Milan and Jana in DESCRIPTION

                  Changes in version 0.6 (2018-05-14)                   

  - Added cbind_morphological and cbind_dependencies
  - Allow to show progress in udpipe_annotate
  - txt_nextgram now does not paste NA's together in case someone would
    use it with missing text data
  - Add example on only doing pos tagging and dependency parsing and
    excluding tokenisation
  - Fix gcc8 message: warning: 'char* strncpy(char*, const char*,
    size_t)' specified bound 15 equals destination size
    [-Wstringop-truncation]

                  Changes in version 0.5 (2018-03-12)                   

  - Added txt_recode_ngram for recoding tokens with compound multi-word
    expressions
  - Fix to make sure as.data.frame.udpipe_connlu also works with
    data.table version 1.9.6. Fixes issue #16
  - Allow keywords_rake to use in group a character vector of column
    names
  - Added a vignette on the use of the package to do topic modelling
    using the POS tags and multi-word expressions
  - Add example of correlation analysis in vignette on 'Basic Analytical
    Use Cases'
  - dtm_remove_lowfreq to uses minfreq as lower bound

                  Changes in version 0.4 (2018-02-07)                   

  - Fix R CMD check on clang-UBSAN: UndefinedBehaviorSanitizer (runtime
    error: reference binding to misaligned address)
  - Add more documentation on required UTF-8 encoding
  - Add as_conllu
  - Add as_word2vec
  - Add as.data.table.udpipe_conllu for convenience
  - Add keywords_rake and keywords_collocation
  - Exported also keywords_collocation and keywords_phrases
  - Add document_term_frequencies_statistics
  - Add boilerplate functions dtm_rowsums and dtm_colsums
  - Make output of keywords_collocation, keywords_rake and
    keywords_phrases consistent
  - Allow cooccurrence.data.frame to provide a vector of groups
  - Added another vignette

                  Changes in version 0.3 (2018-01-15)                   

  - Add docusaurus site
  - udpipe_download_model gains and extra argument called
    udpipe_model_repo to allow to download models mainly released under
    CC-BY-SA from https://github.com/bnosac/udpipe.models.ud
  - Add udpipe_accuracy
  - Add dtm_rbind and dtm_cbind
  - Add udpipe_read_conllu to simplify creating wordvectors
  - Allow to provide several fields in document_term_frequencies to
    easily allow to include bigrams/trigrams/... for topic modelling
    purposes e.g. alongside the textrank package or alongside
    collocation
  - Adding Serbian + Afrikaans
  - Fixing UBSAN messages (misaligned addresses)
  - If user has R version < 3.3.0, use own startsWith function instead
    of base::startsWith

                 Changes in version 0.2.2 (2017-12-07)                  

  - Another stab at fixing the Solaris compilation issue in
    ufal::udpipe::multiword_splitter::append_token

                 Changes in version 0.2.1 (2017-12-06)                  

  - Added phrases to extract POS sequences more easily like noun
    phrases, verb phrases or any sequence of parts of speech tags and
    their corresponding words
  - Fix issue in txt_nextgram if n was larger than the number of
    elements in x
  - Fix heap-use-after-free address sanitiser issue
  - Fix runtime error: null pointer passed as argument 1, which is
    declared to never be null (e.g. udpipe.cpp: 3338)
  - Another stab at the Solaris compilation issue

                  Changes in version 0.2 (2017-11-13)                   

  - Added data preparation elements for standard text mining flows
    namely: cooccurrence collocation document_term_frequencies
    document_term_matrix dtm_tfidf dtm_remove_terms dtm_remove_lowfreq
    dtm_remove_tfidf dtm_reverse dtm_cor txt_collapse txt_sample
    txt_show txt_highlight txt_recode txt_previous txt_next txt_nextgram
    unique_identifier
  - Added predict.LDA_VEM and predict.LDA_Gibbs
  - Renamed dataset annotation_params to udpipe_annotation_params
  - Added example datasets called brussels_listings, brussels_reviews,
    brussels_reviews_anno
  - Use path.expand on conll-u files which are used for training
  - udpipe_download_model now downloads from
    https://raw.githubusercontent.com/jwijffels/udpipe.models.ud.2.0/master
    instead of
    https://github.com/jwijffels/udpipe.models.ud.2.0/raw/master

                        Changes in version 0.1.2                        

  - Remove logic of UDPIPE_PROCESS_LOG (using Rcpp::Rout instead). This
    fixes issue detected with valgrind about ofstream

                 Changes in version 0.1.1 (2017-09-13)                  

  - Fix issue on Solaris builds at CRAN, namely: error: expected
    primary-expression before ‘enum’
  - Use ufal::udpipe namespace directly
  - Documentation fixes

                         Changes in version 0.1                         

  - Initial release based on UDPipe commit
    a2ebb99d243546f64c95d0faf36882bb1d67a670
  - Allow to do annotation (tokenisation, POS tagging, Lemmatisation,
    Dependency parsing)
  - Allow to build your own UDPipe model based on data in CONLL-U format
  - Convert the output of udpipe_annotate to a data.frame
  - Allow to download models from
    https://github.com/jwijffels/udpipe.models.ud.2.0
  - Add vignettes