DaCy#
DaCy is a Danish text processing pipeline built using SpaCy. At the time of writing, it has achieved State-of-the-Art performance on part-of-speech (POS) tagging, named-entity recognition (NER) and Dependency parsing for Danish.
This website contains the documentation for DaCy as well as an introduction to how to get started using DaCy for your project.
📰 News#
1.2.0 (04/11/21)
Removed DaNLP dependency, now DaNLP models is downloaded directly from Huggingface’s model hub which is faster and more stable 🌟.
Removed the readability module, we instead recommend you use the more extensive textdescriptives package developed by [HLasse](https://github.com/HLasse) and I for extracting readability and other text metrics.
Added support for the configuring the default the model location with the environmental variable ‘DACY_CACHE_DIR’ thanks to a PR by dhpullack 🙏.
1.1.0 (23/07/21)
DaCy in now available on the Huggingface model hub 🤗 . Including detailed performance descriptions of biases and robustness.
It also got a brand new online demo - try it out!
And more, including documentation update and prettier prints.
1.0.0 (09/07/21)
DaCy version 1.0.0 releases as the first version to pypi! 📦
Including a series of augmenters with a few specifically designed for Danish
Code for behavioural tests of NLP pipelines
And new tutorials for both 📖
A new beautiful hand-drawn logo 🤩
A behavioural test for biases and robustness in Danish NLP pipelines 🧐
DaCy is now officially supported by the Centre for Humanities Computing at Aarhus University
The first paper on DaCy; check it out as a preprint and code for reproducing it here! 🌟
0.4.1 (03/06/21)
DaCy now has a stunningly looking documentation site 🌟
0.3.1 (01/06/21)
DaCy’s tests now cover 99% of its codebase 🎉
DaCy’s test suite is now being applied for all major operating systems instead of just Linux 👩💻
0.2.2 (25/05/21)
The new Danish Model Senda was added to DaCy
0.2.1 (30/03/21)
DaCy now includes a small model for efficient processing based on the Danish Ælæctra 🏃
0.1.1 (24/03/21)
DaCy includes a wrapped version of major Danish sentiment analysis software including the models by DaNLP, as well as code for wrapping any sequence classification model into its pipeline 🤩
Tutorials is added to introduce the above functionality
0.0.1 (25/02/21)
DaCy launches with a medium-sized and a large language model obtaining state-of-the-art on Named entity recognition, part-of-speech tagging and dependency parsing for Danish 🇩🇰
Contents#
The documentation is organized in three parts:
Getting Started contains the installation instructions, guides, and tutorials on how to use DaCy.
Performance contains a series of performance metrics and comparisons of DaCy and other Danish NLP pipelines.
Package References contains the documentation of each public class and function.
Getting Started
Performance
Package References