Using DaCy#

To use the model you first have to download either the small, medium or large model. To see a list of all available models:

import dacy
for model in dacy.models():
   print(model)
# da_dacy_small_tft-0.0.0
# da_dacy_medium_tft-0.0.0
# da_dacy_large_tft-0.0.0

Note

The name of the indicated language (da), framework (dacy), model size (e.g. small), model type (tft),and model version (0.0.0)

From here we can now download a model using:

nlp = dacy.load("da_dacy_medium_tft-0.0.0")
# or equivalently
nlp = dacy.load("medium")

Which will download the model to the .dacy directory in your home directory. If the model is already downloaded the model will be loaded. To download the model to a specific directory:

# Just download
dacy.download_model("da_dacy_medium_tft-0.0.0", your_save_path)
# Download and load
nlp = dacy.load("da_dacy_medium_tft-0.0.0", your_save_path)

Using this we can now apply DaCy to text with conventional SpaCy syntax:

doc = nlp("DaCy er en hurtig og effektiv pipeline til dansk sprogprocessering bygget i SpaCy.")

See also

DaCy is built using SpaCy, hence you will be able to find a lot of the required documentation for using the pipeline in their very well written documentation on their website

Tagging named entities#

A named entity is a “real-world object” that’s assigned a name – for example, a person, a country, a product or a book title. DaCy can recognize organizations, persons, and location, as well as other miscellaneous entities.

for entity in doc.ents:
   print(entity, ":", entity.label_)
# DaCy : ORG
# dansk : MISC
# SpaCy : ORG

We can also plot these using:

from spacy import displacy
displacy.render(doc, style="ent")

See also

For more on named entity recognition see SpaCy’s documentation.

Named entity recognition using DaCy

Tagging parts-of-speech#

print("Token POS-tag")
for token in doc:
   print(f"{token}: {token.pos_}")
# Token POS-tag
# DaCy:              PROPN
# er:                AUX
# en:                DET
# hurtig:            ADJ
# og:                CCONJ
# effektiv:          ADJ
# pipeline:          NOUN
# til:               ADP
# dansk:             ADJ
# sprogprocessering: NOUN
# bygget:            VERB
# i:                 ADP
# SpaCy:             PROPN
# .:                 PUNCT

See also

For more on Part-of-speech tagging see SpaCy’s documentation.

Dependency parsing#

DaCy features a fast and accurate syntactic dependency parser. In DaCy this dependency parsing is also used for sentence segmentation and detecting noun chunks.

You can see the dependency tree using:

doc = nlp("DaCy er en effektiv pipeline til dansk fritekst.")

from spacy import displacy
displacy.render(doc)
Dependency parsing using DaCy

See also

For more on dependency parsing with DaCy, especially on how to navigate the tree, see SpaCy’s documentation.