Introduction to Refonte.AI API
Data Engine Core Resources
Data Engine Tasks Types
An array of TextCollectionAttachment objects to be labeled.
The video attachment should have content that is a link. Supported media types are listed on the MDN Web Docs.
Customers can pass Markdown as the string content when creating a job in TextCollection. The Markdown syntax supports the use of HTML tags as well.
However, we use the HTML-sanitize JavaScript package to sanitize all HTML tags given within the Markdown syntax in order to protect the security of the TextCollection platform. With the exception of the particular set of permitted HTML tags listed in the table on the right, this package removes all tags.
We guarantee the security and compliance of the content presented to the tasker by permitting just these particular HTML tags to be transmitted through the string. During the sanitization process, any HTML tags that are not on the list of permitted tags will be eliminated from the string. We maintain a high level of security on our platform by cleaning the HTML tags to avoid any potential security issues that may come from the use of unapproved HTML tags.
Content sectioning'address', 'article', 'aside', 'footer', 'header','h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hgroup', 'main', 'nav', 'section'.
Inline text semantics'a', 'abbr', 'b', 'bdi', 'bdo', 'br', 'cite', 'code', 'data', 'dfn', 'em', 'i', 'kbd', 'mark', 'q', 'rb', 'rp', 'rt', 'rtc', 'ruby', 's', 'samp', 'small', 'span', 'strong', 'sub', 'sup', 'time', 'u', 'var'
Table content 'caption', 'col', 'colgroup', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'tr'
Additional Tags 'img', 'iframe'
Parameter | Type | Description |
---|---|---|
type* | sting | One of pdf, image, text, video, website, or audio. |
content* | string | Content or link to relevant file. |
forms | array | Array of field_id strings from FormField. If this value is set, only show the corresponding attachment if one of the referenced form fields is active. |
UnitField objects define simple components for data collection.
There are situations where a field should only appear if certain options are chosen for other fields. In these situations, the conditions—the dependent questions and matching sets of options—can be specified.
The conditions property should have the following structure: an array of objects, which define one set of conditions allowing the field to be shown. The operators AND (), OR ( [ ] ), and NOT ( not ) are supported, so you could specify an arbitrary set of fields and choices. Each set may contain objects or arrays with the following:
See the code on the right for examples of conditions. As of right now, only dependent fields of type CategoryField are compatible with conditions. On other fields, the syntax is correct, although it might cause problems or undefined behavior.
type string required
One of text, boolean, number, datetime, or category, select, time_range.
field_id string required
Field title to be displayed to taskers. This should be short and singular. This may change among tasks within a project. Must not be an empty string.
description string
A brief description about what the response should be. This may change among tasks within a project.
hint string
Longer explanation of why the field exists and how it should be used. Renders as a tooltip.
required boolean
Determines whether or not a response for this field is required. The default is false.
min_responses_required integer
The minimum number of separate annotations allowed for this field. Must be larger than 0. The default is 1.
max_responses_required integer
The maximum number of separate annotations allowed for this field. Must be larger than or equal to min_responses_required, with an upper bound of 100. The default is 1.
conditions array of objects
A set of conditions which must be satisfied for this field to be shown. Default is undefined.
Additional Fields objects
See the Text Field, Boolean Field, Number Field, Datetime Field, and Category Field sections.
Example
// Example of UnitField with conditions
{
type: "category",
field_id: "occlusion",
title: "Is there occlusion in the image?",
choices: [{label: 'None', value: '0' },
{label: 'A little', value: '1'},
{label: 'A lot', value: '2'}],
conditions: [{}],
},
{
type: "category",
field_id: "occlusion_detail",
title: "What is the cause of the occlusion?",
choices: [{label: 'Rain', value: 'rain'},
{label: 'Shadow', value: 'shadow'}],
conditions: [{
occlusion: ['1', '2'], // show if 1 or 2 are selected
// equivalently {not: [[], ['0']}
// equivalently [{not: []}, {not: ['0']}]
// equivalently [['1'],['2']]
}],
},
{
type: "text",
field_id: "a_lot_of_shadow",
title: "Please describe why there is so much shadow.",
conditions: [{
// show if 2 and shadow are selected in their respective fields
occlusion: ['2'],
occlusion_detail: ['shadow'],
}],
},
Subclass of UnitField and returns a string response.
max_character integer
The maximum number of characters allowed in the field.
show_word_counter boolean
To display word count in text fields, we can include `show_word_count = true` in the text field's object.
show_markdown_preview boolean
To enable a markdown preview for the text field, we can include `show_markdown_preview = true` in the text field's object.
max_tokens integer
To enable maximum word counts to a specific text field, we can include `max_tokens = 1000` to set the maximum words in a text response to be 1000 words.
min_token integer
To enable minimum and maximum word counts to a specific text field, we can include `min_tokens = 100` to set the minimum words in a text response to be 100 words.
disable_pasting boolean
To disable copying and pasting to a specific text field, we can include `disable_pasting = true`.
Example
{
"type": "text",
"field_id": "summary",
"title": "Summary",
"min_responses_required": 1,
"max_responses_required": 3,
"max_characters": 500,
"required": true
}
Subclass of UnitField and returns a boolean response. Has no additional parameters.
Example
{
"type": "boolean",
"field_id": "availability",
"title": "Item Availability",
"description": "Choose true if available."
}
Subclass of UnitField and returns a string response based on the annotated number.
use_slider boolean
Set to true to use a slider instead of textbox.
min float
Sets the minimum value of the slider.
max float
Sets the maximum value of the slider.
step float
Sets the step value of the slider.
prefix string
A string label for the lowest numerical value response.
suffix string
A string label for the greatest numerical value.
mid_label string
A string label for the middle numerical value.
Example
{
"type": "number",
"field_id": "item_price",
"title": "Item Price",
"description": "Leave empty if not applicable.",
"required": false,
"use_slider": true,
"min": 0,
"max": 100
}
Subclass of UnitField and returns a DatetimeAnnotation response.
An enum that consists of year, month, day, hour, and minute.
An interface that contains optional number fields including year, month, day, hour, and minute.
include array of objects required
An array of DatetimeSpec elements. Must contain at least one element.
Example
{
"type": "datetime",
"field_id": "release_date",
"title": "Date of Product Release",
"description": "Leave empty if not applicable.",
"include": ["year", "month", "day"],
"defaults": {
"year": 2021,
"month": 4,
"day": 13
}
}
Subclass of UnitField and returns an array of selected CategoryChoiceValue elements in its response. CategoryChoice elements with subchoices are only used for navigation. The only selectable CategoryChoice elements are those with no subchoices.
choices array of objects required
An array of CategoryChoice elements to define the relevant choice.
min_choices integer
Minimum number of choices to select.
max_choices integer
Maximum number of choices to select. If this value is greater than 1, the form renders a checkbox. Otherwise, it renders a radio button.
label string required
The label of the choice field. This description may change among tasks within a project.
CategoryChoiceValue array of objects
The value of the choice field. Must be a string, number, or boolean.
hint string
The tooltip text shown for this choice.
subchoices array of objects
An array of CategoryChoice elements to define the relevant subchoices.
Example
{
"type": "category",
"field_id": "genre",
"title": "Select all genres that apply.",
"choices": [
{
"label": "Hip-Hop/Rap",
"value": "hip-hop-rap",
"hint":
"It consists of a stylized rhythmic music
that commonly accompanies rapping, a rhythmic
and rhyming speech that is chanted.",
"subchoices": [
{ "label": "Dirty South", "value": "dirty-south" },
{ "label": "Industrial Hip Hop", "value": "industrial-hip-hop" },
{ "label": "Nerdcore", "value": "nerdcore" },
{ "label": "Rap", "value": "rap" },
]
},
{
"label": "R&B/Soul",
"value": "rb-soul",
"subchoices": [
{ "label": "Disco", "value": "disco" },
{ "label": "Funk", "value": "funk" },
{ "label": "Motown", "value": "motown" },
]
},
],
"min_choices": 1,
"max_choices": 5
}
Subclass of UnitField.
default_seconds array of integers required
Must have length 2, and be in range [0, 24 * 60 * 60]
increment_seconds integer
Must be between 1 and 60 * 60
default_from_field string
Must be a valid field_id
Example
{
"type": "time_range",
"field_id": "hours",
"title": "Store Hours",
"defaults_seconds": [
28800,
72000
],
"increment_seconds": 300,
"max_responses_required": 2,
"min_responses_required": 0
}
Subclass of UnitField.
choices array of objects
An array of selectable options, choices is not required if choices_from_field is present.
choices_from_field string
Must be a valid field_id
Example
{
"type": "select",
"field_id": "sentiment",
"title": "Sentiment",
"description": "Choose a sentiment that best describes this text",
"required": True,
"choices_from_field": "Options",
}
RankingField objects allow you to define task to rank task attachments. Returns a list response with ordered options.
title string
A brief description about what the response should be. This may change among tasks within a project.
hint string
An array of child UnitField and FieldSet objects. Must contain at least 2 elements.
first_label string
Determines whether or not all.
num_items_to_rank integer
The number of options required to rank (can be less than number of attachments).
required boolean
Determines whether or not all num_items_to_rank fields should filled.
Example
{
"type": "ranking_order",
"field_id": "relevance_ranking",
"title": "Rank titles based on their relevance to the article",
"hint": "From the most relevant to the least one",
"first_label": "Best",
"last_label": "Worst",
"num_items_to_rank": 3
}
You can create many mini-forms with varied attachments by using FormField objects. The child fields of the object will populate these mini-forms. Provides key-value pairs defined by its child fields as a dictionary response.
type string required
A brief description about what the response should be. This may change among tasks within a project.
field_id string required
A unique identifier for the field, which should not change among tasks within a project.
title string required
Field title to be displayed to taskers. This should be short and singular. This may change among tasks within a project.
description string
A brief description about what the response should be. This may change among tasks within a project.
fields array of objects required
An array of child UnitField and FieldSet objects. Any FieldSet objects here must have incline set to true
Example
{
"type": "form",
"field_id": "form_query",
"title": "Query Intention",
"fields": [
{
"type": "text",
"field_id": "query_intention",
"title": "Query Intention",
"hint": "Please investigate the search links."
},
]
}
An annotations field will be present in the response object, which is a component of the callback POST request and is saved permanently as a part of the task object. The annotations object is a dictionary with the corresponding annotation for each field as its value and the field_id given in the job parameters as its key.
Every annotation will belong to the type specified in the corresponding field above. If the value of max_responses_required exceeds 1, the annotation will take the form of an array of that kind.
Example
{
"response": {
"annotations": {
"category_name": "Soup", //TextField
"category_items": [ //FieldSet with max_responses_required greater than one
{
"item_name": "Tom Yum Chicken Soup", //TextField
"item_price": "11.79" //NumberField
},
{
"item_name": "Tom Yum Beef Soup", //TextField
"item_price": "11.79" //NumberField
}
],
"category_metadata": { //FieldSet
"gluten_friendly": true, //BooleanField
"labels": [ //TextField with max_responses_required greater than one
"Free Range",
"All Natural"
]
}
}
},
"task_id": "5774cc78b01249ab09f089dd",
"task": {
// populated task for convenience
}
}
In order to save workers' time while annotating an image, prelabels can be included in the hypothesis field when constructing a text collection assignment.
To add pre-labels to a task using hypothesis, you must supply them in the task's hypothesis payload field at task creation. The task response's schema and the hypothesis object's schema must coincide.
The format for the hypothesis will be similar to that of Refonte.Ai's task response. Within the hypothesis object, the annotations field is required for this specific job type. The response format and hypothesis differ simply in that you must include two more field fields inside each field that you wish to pre-annotate. field type (category, select, text, etc.) is described by type. The identification assigned to this field for tracking (field name) is described by field_id.
You can find these two fields in your task taxonomy
Note: The response format for text fields is different from those of other types. Rather of an array of arrays containing strings, the response field for this specific field type will be an array of a single string. duties, payloads, and hypotheses
task_payload_with_hypothesis
{
...
"batch": "regular_batch_name",
"hypothesis": {
"annotations": {
"(EXAMPLE) Multiple Choice Question": {
"type": "category",
"field_id": "(EXAMPLE) Multiple Choice Question",
"response": [
[
"B"
]
]
}
}
},
...
}
task_taxonomy
{
"fields": [
{
"type": "category",
"field_id": "(EXAMPLE) Multiple Choice Question",
"title": "Which option best fits this task?",
"choices": [
{
"label": "A",
"value": "A"
},
{
"label": "B",
"value": "B"
},
{
"label": "C",
"value": "C"
}
],
"min_choices": 1,
"max_choices": 1,
"description": "Select one of the following. "
}
]
}
task_payload_with_hypothesis_text_field
{
...
"hypothesis": {
"annotations": {
"Product Description": {
"type": "text",
"field_id": "(EXAMPLE) Text Input Field",
"response": [
"Dolore in dolor occaecat deserunt ex in qui non amet est."
]
}
}
}
...
}
NamedEntityRecognitionLabel objects define the taxonomy of labels to use to annotate spans of text.
name string required
A unique identifier for this label.
display_name string
An alias for this label to display to taskers.
description string
A description of what this label should represent. Displayed to taskers to improve quality.
children array of objects
An array of NamedEntityRecognitionLabel objects to group underneath this label. Specifying this field causes this label itself to no longer be used for labeling text spans.
NamedEntityRecognitionAttribute object
define form fields for individual annotations.
type string
Only 'select' for now.
options array of objects
List of select option objects.
display_name string
Optional display name.
description string
Optional description.
AttributeSelectOption objects
objects define possible values for select attributes.
value string
The value that will show up in the response if this option is selected.
display_name string
Optional display name if different from the value.
NamedEntityRecognitionRelationshipDefinition objects specify the types of relationship that can exist between two text spans. There are two types of relationships: named and unnamed. If you need to differentiate between several kinds of relationships that could occur between the same two text spans, a named connection can be helpful. For example, you may want to distinguish between a "child of" and a "sibling of" relationship when annotating a description of someone's family history.
A task can only specify one type of relationship. Either all the relationships in a task must be named, or all must be unnamed.
name string
A unique identifier for this type of relationship. Required for named relationships; disallowed for unnamed relationships.
display_name string
A description for this relationship to display to taskers. Should be able to be used to construct a short phrase describing the relationship. For example, a relationship between two text spans "A" and "B" with display_name "is parent of" would be rendered to taskers as "A is parent of B". Required for named relationships; disallowed for unnamed relationships.
is_directed boolean
A field indicating whether the directionality of this relationship matters. For example, a "is parent of" relationship would likely be directed, whereas a "is sibling of" relationship would likely not be directed. Optional for named relationships; disallowed for unnamed relationships.
source_label string
A string referencing the name field of a NamedEntityRecognitionLabel object. If set, mandates that the source text span of this field must be labeled with the corresponding NamedEntityRecognitionLabel, or one of its children. Optional for both named and unnamed relationships.
target_label string
A string referencing the name field of a NamedEntityRecognitionLabel object. If set, mandates that the target text span of this field must be labeled with the corresponding NamedEntityRecognitionLabel, or one of its children. Optional for both named and unnamed relationships.
The answer object is saved permanently as a component of the task object and is included in the callback POST request. Response to NamedEntityRecognitionResponse is a named entity recognition response object is composed of two arrays: one for the entity annotations and another for the relationships between these entities.
NamedEntityRecognitionAnnotation The structure for a single entity annotation in the named entity recognition answer; includes information about the recognized text span's position, content, and unique identifier; additionally, it includes its label and any optional attributes.
In tasks with undirected relationships, the source_ref and target_ref fields are interchangeable. In tasks with links that do not have relationship names, the name field will be left blank.
Example
{
"annotations": [
{
"id": "b86c22a3-1f7c-4be2-bb8f-899ee9324c0b",
"start": 10,
"end": 17,
"text": "Alex Wang",
"label": "person",
},
{
"id": "a76da53e-4ebd-4466-aed7-80db6fb98329",
"start": 22,
"end": 31,
"text": "Transform",
"label": "conference",
}
],
"relationships": [
{
"id": "ade8e9e9-ef9c-4fc7-9517-62d79a15c1cb",
"source_ref": "b86c22a3-1f7c-4be2-bb8f-899ee9324c0b",
"target_ref": "a76da53e-4ebd-4466-aed7-80db6fb98329",
"name": "speaker_at",
}
]
}
Field | Type | Description |
---|---|---|
annotations | object | array List of NamedEntityRecogntionAnnotation objects. |
relationships | object | array List of NamedEntityRecognitionRelationship objects. |
Field | Type | Description |
---|---|---|
id | string | Unique identifier. |
start | number | Start index of the text span. |
end | number | End index of the text span. |
text | string | Text of the text span. |
label | string | References the name field of a label in the task params. |
Updated about 2 months ago