Measurement Expression Annotator
Annotate numbers and measurement expressions in text. This pipeline recognises many types of measurements including length, temperature, time and speed, and calculates their normalised values in the SI system of units. These annotations are ideal for indexing with Mímir, which supports queries in one unit matching results expressed in another.
As a side-effect this pipeline also annotates tokens and sentences.
Default annotations | |||
:Measurement | Measurement expressions, with features: | ||
type | "scalar" for single measurements, or "interval" for intervals (e.g. "1 to 5 pounds") | ||
unit | The unit of the measurement (gram, mile, ...) | ||
value | The numeric value of the measurement quantity as specified in the text | ||
normalizedUnit | The "normalized" unit for the measurement in the SI system (kilogram, metre, etc.) | ||
normalizedValue | The equivalent value of the measurement in the normalized unit. For interval measurements this is replaced by a "normalizedMaxValue" and "normalizedMinValue" giving the end-points of the interval. | ||
dimension | Speed, volume, area, time, etc. | ||
Additional annotations available if selected | |||
:Sentence | Sentences detected by the sentence splitter | ||
:Token | The individual tokens of the text | ||
:Ratio | Expressions denoting a ratio rather than a simple measurement, typically percentages but also expressions like "300 parts per million" |
Use this pipeline
You can process up to 1,200 documents per day free of charge using the REST API, at an average rate of 2 documents/sec. Higher quotas are available for research users by arrangement, contact us for details.
The API endpoint for this pipeline is:
You can process any amount of data with this pipeline on a pay-as-you-go basis, for £0.80 per hour. This can be data you upload yourself, data you collected from Twitter, or the results of a previous job.