Augmenters#
dacy.augmenters.character#
SpaCy augmenters for character level augmentation.
- dacy.augmenters.character.create_char_random_augmenter(doc_level: float, char_level: float, keyboard: Union[str, dacy.augmenters.keyboard.Keyboard] = 'QWERTY_EN') Callable[[spacy.language.Language, spacy.training.example.Example], Iterator[spacy.training.example.Example]][source]#
Creates an augmenter that replacies a character with a random character from the keyboard.
- Parameters
doc_level (float) – probability to augment document.
char_level (float) – probability to augment character, if document is augmented.
keyboard (str, Keyboard, optional) – A Keyboard class or a string denoting a default keyboard from which replace characters are sampled from. Possible options for string include: “QWERTY_EN”: English QWERTY keyboard “QWERTY_DA”: Danish QWERTY keyboard Defaults to “QWERTY_EN”.
- Returns
The augmenter function.
- Return type
Callable[[Language, Example], Iterator[Example]]
- dacy.augmenters.character.create_char_replace_augmenter(doc_level: float, char_level: float, replacement: dict) Callable[[spacy.language.Language, spacy.training.example.Example], Iterator[spacy.training.example.Example]][source]#
Creates an augmenter that replaces a character with a random character from the keyboard.
- Parameters
doc_level (float) – probability to augment document.
char_level (float) – probability to augment character, if document is augmented.
replace (dict) – A dictionary denoting which characters denote potentials replacement for each character. E.g. {“æ”: “ae”}
- Returns
The augmenter function.
- Return type
Callable[[Language, Example], Iterator[Example]]
- dacy.augmenters.character.create_char_swap_augmenter(doc_level: float, char_level: float) Callable[[spacy.language.Language, spacy.training.example.Example], Iterator[spacy.training.example.Example]][source]#
Creates an augmenter that swaps two characters in a token.
- Parameters
doc_level (float) – probability to augment document.
char_level (float) – probability to augment character, if document is augmented.
- Returns
The augmenter function.
- Return type
Callable[[Language, Example], Iterator[Example]]
- dacy.augmenters.character.create_keyboard_augmenter(doc_level: float, char_level: float, distance=1, keyboard: Union[str, dacy.augmenters.keyboard.Keyboard] = 'QWERTY_EN') Callable[[spacy.language.Language, spacy.training.example.Example], Iterator[spacy.training.example.Example]][source]#
Creates a document level augmenter using plausible typos based on keyboard distance.
- Parameters
doc_level (float) – probability to augment document.
char_level (float) – probability to augment character, if document is augmented.
distance (int, optional) – keyboard distance. Defaults to 1.
keyboard (str, Keyboard, optional) – A Keyboard class or a string denoting a default keyboard. Possible options for string include: “QWERTY_EN”: English QWERTY keyboard “QWERTY_DA”: Danish QWERTY keyboard Defaults to “QWERTY_EN”.
- Returns
The augmentation function
- Return type
Callable[[Language, Example], Iterator[Example]]
- dacy.augmenters.character.create_spacing_augmenter(doc_level: float, spacing_level: float) Callable[[spacy.language.Language, spacy.training.example.Example], Iterator[spacy.training.example.Example]][source]#
Creates an augmenter that removes spacing.
- Parameters
doc_level (float) – probability to augment document.
spacing_level (float) – probability to remove spacing, if document is augmented.
- Returns
The augmenter function.
- Return type
Callable[[Language, Example], Iterator[Example]]
dacy.augmenters.danish#
Danish specific SpaCy augmenters.
- dacy.augmenters.danish.create_æøå_augmenter(doc_level: float, char_level: float) Callable[[spacy.language.Language, spacy.training.example.Example], Iterator[spacy.training.example.Example]][source]#
Augments æøå into their spelling variants ae, oe, aa.
- Parameters
doc_level (float) – probability to augment document.
char_level (float) – probability to augment character, if document is augmented.
- Returns
The desired augmenter.
- Return type
Callable[[Language, Example], Iterator[Example]]
dacy.augmenters.person#
Augmentation function for SpaCy which augments persons (PER) entities.
- dacy.augmenters.person.augment_entity(entities: List[List[str]], ent_dict: Dict[str, List[str]], patterns: List[str], patterns_prob: Optional[List[float]], force_pattern_size: bool, keep_name: bool, prob: float) List[List[str]][source]#
Augment entities. For each entity to augment, randomly sample a pattern and apply transformation to the entity. See create_pers_augmenter.
Examples
>>> entities = [["Lasse", "Hansen"], ["Kenneth", "Christian", "Enevoldsen"]] >>> ent_dict = {"first_name" : ["John", "Ole"], "last_name" : ["Eriksen"]} >>> patterns = ["fn,ln", "abbpunct,ln"] >>> augment_entity(entities, ent_dict, patterns, None, force_pattern_size=False, keep_name=True, prob=1) [['L.', 'Hansen'], ['K.', 'Christian', 'Enevoldsen']] >>> augment_entity(entities, ent_dict, patterns, None, force_pattern_size=True, keep_name=True, prob=1) [['Lasse', 'Hansen'], ['K.', 'Christian']] >>> augment_entity(entities, ent_dict, patterns, None, force_pattern_size=True, keep_name=False, prob=1) [['Ole', 'Eriksen'], ['J.', 'Eriksen']] >>> augment_entity(entities, ent_dict, patterns, None, force_pattern_size=False, keep_name=False, prob=1) [['O.', 'Eriksen'], ['John', 'Eriksen', 'Enevoldsen']]
- Returns
Augmented names
- Return type
List[List[str]]
- dacy.augmenters.person.create_pers_augmenter(ent_dict: Dict[str, List[str]], patterns: List[str], force_pattern_size: bool, keep_name: bool, patterns_prob: Optional[List[float]] = None, prob: float = 1) Callable[[spacy.language.Language, spacy.training.example.Example], Iterator[spacy.training.example.Example]][source]#
Create person augmenter
- Parameters
ent_dict (Dict[str, List[str]]) – A dictionary with keys “first_name” and “last_name”. Values should be a list of names to sample from.
patterns (List[str]) – The patterns to replace names with. Should be a list of strings with each pattern in a string separated by a comma. Will choose one at random if more than one, optionally weighted by pattern_probs. Options: “fn”, “ln”, “abb”, “abbpunct”. . “fn” = first name “ln” = last name “abb” = abbreviate to first character (e.g. Lasse -> L) “abbpunct” = abbreviate to first character including punctuation (e.g. Lasse -> L.). Patterns can be arbitrarily combined, e.g. [“fn,ln”, “abbpunct,ln,ln,ln”]
force_pattern_size (bool) – Whether to force entities to have the same format/length as the pattern. Defaults to False.
keep_name (bool) – Whether to use the current name or sample from ent_dict. I.e., if True, will only augment if the pattern is “abb” or “abbpunct”, if False, will sample new names from ent_dict. Defaults to True.
patterns_prob (List[float]) – Defaults to None (equal weights)
prob (float, optional) – which proportion of entities to augment. Defaults to 1.
- Returns
The augmenter
- Return type
Callable[[Language, Example], Iterator[Example]]
>>> from dacy.dataset import danish_names >>> name_dict = danish_names() >>> pers_aug = create_pers_augmenter(name_dict, patterns=["fn,ln","abbpunct,ln"], force_pattern_size=True, keep_name=False)
dacy.augmenters.keyboard#
Functions for character augmentation based on keyboard layout.
- class dacy.augmenters.keyboard.Keyboard(*, keyboard_array: Dict[str, List[List[str]]], shift_distance: int = 3)[source]#
Bases:
pydantic.main.BaseModelA Pydantic dataclass object for constructing Keyboard setup.
- Parameters
keyboard_array (Dict[str, str]) – An array corresponding to a keyboard. This should include two keys a “default” and a “shifted”. Each containing an array of non-shifted and shifted keys respectively.
shift_distance (int) – The distance given by the shift operator. Defaults to 3.
- Returns
a Keyboard object
- Return type
- coordinate(key: str) Tuple[int, int][source]#
get coordinate for key
- Parameters
key (str) – keyboard key
- Returns
key coordinate on keyboard
- Return type
Tuple[int, int]
- euclidian_distance(key_a: str, key_b: str) int[source]#
Returns euclidian distance between two keys
- Parameters
key_a (str) – keyboard key
key_b (str) – keyboard key
- Returns
The euclidian distance between two keyboard keys.
- Return type
int
- get_neighboors(key: str, distance: int = 1) Set[int][source]#
gets the neighbours of a key with a specified distance.
- Parameters
key (str) – A keyboard key
distance (int, optional) – The euclidian distance of neightbours. Defaults to 1.
- Returns
The neighbours of a key with a specified distance.
- Return type
Set[int]
- is_shifted(key: str) bool[source]#
is the key shifted?
- Parameters
key (str) – keyboard key
- Returns
a boolean indicating whether key is shifted.
- Return type
bool
- keyboard_array: Dict[str, List[List[str]]]#
- shift_distance: int#