narrata.compression.digits
¶
Digit-level tokenization utilities.
References: [LLMTime] Gruver et al., "Large Language Models Are Zero-Shot Time Series Forecasters", NeurIPS 2023. arXiv:2310.07820
digit_tokenize(text, add_note=True)
¶
Split every digit into standalone tokens separated by spaces.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Input text. |
required |
add_note
|
bool
|
Prefix output with a short marker when digit splitting is applied. |
True
|
Returns:
| Type | Description |
|---|---|
str
|
Digit-tokenized text. |