DaCy#

https://img.shields.io/github/stars/centre-for-humanities-computing/DaCy.svg?style=social&label=Star&maxAge=2592000

DaCy is a Danish text processing pipeline built using SpaCy. At the time of writing, it has achieved State-of-the-Art performance on part-of-speech (POS) tagging, named-entity recognition (NER) and Dependency parsing for Danish.

This website contains the documentation for DaCy as well as an introduction to how to get started using DaCy for your project.

📰 News#

  • 1.2.0 (04/11/21)

    • Removed DaNLP dependency, now DaNLP models is downloaded directly from Huggingface’s model hub which is faster and more stable 🌟.

    • Removed the readability module, we instead recommend you use the more extensive textdescriptives package developed by [HLasse](https://github.com/HLasse) and I for extracting readability and other text metrics.

    • Added support for the configuring the default the model location with the environmental variable ‘DACY_CACHE_DIR’ thanks to a PR by dhpullack 🙏.

  • 1.1.0 (23/07/21)

    • DaCy in now available on the Huggingface model hub 🤗 . Including detailed performance descriptions of biases and robustness.

    • It also got a brand new online demo - try it out!

    • And more, including documentation update and prettier prints.

  • 1.0.0 (09/07/21)

    • DaCy version 1.0.0 releases as the first version to pypi! 📦

      • Including a series of augmenters with a few specifically designed for Danish

      • Code for behavioural tests of NLP pipelines

      • And new tutorials for both 📖

    • A new beautiful hand-drawn logo 🤩

    • A behavioural test for biases and robustness in Danish NLP pipelines 🧐

    • DaCy is now officially supported by the Centre for Humanities Computing at Aarhus University

    • The first paper on DaCy; check it out as a preprint and code for reproducing it here! 🌟

  • 0.4.1 (03/06/21)

    • DaCy now has a stunningly looking documentation site 🌟

  • 0.3.1 (01/06/21)

    • DaCy’s tests now cover 99% of its codebase 🎉

    • DaCy’s test suite is now being applied for all major operating systems instead of just Linux 👩‍💻

  • 0.2.2 (25/05/21)

    • The new Danish Model Senda was added to DaCy

  • 0.2.1 (30/03/21)

    • DaCy now includes a small model for efficient processing based on the Danish Ælæctra 🏃

  • 0.1.1 (24/03/21)

    • DaCy includes a wrapped version of major Danish sentiment analysis software including the models by DaNLP, as well as code for wrapping any sequence classification model into its pipeline 🤩

    • Tutorials is added to introduce the above functionality

  • 0.0.1 (25/02/21)

    • DaCy launches with a medium-sized and a large language model obtaining state-of-the-art on Named entity recognition, part-of-speech tagging and dependency parsing for Danish 🇩🇰

Contents#

The documentation is organized in three parts:

  • Getting Started contains the installation instructions, guides, and tutorials on how to use DaCy.

  • Performance contains a series of performance metrics and comparisons of DaCy and other Danish NLP pipelines.

  • Package References contains the documentation of each public class and function.