educe.stac.sanity package


educe.stac.sanity.common module

Functionality and report types common to sanity checker

class educe.stac.sanity.common.ContextItem(doc, contexts)


Report item involving EDU contexts

class educe.stac.sanity.common.RelationItem(doc, contexts, rel, naughty)

Bases: educe.stac.sanity.common.ContextItem

Errors which involve Glozz relation annotations

class educe.stac.sanity.common.SchemaItem(doc, contexts, schema, naughty)

Bases: educe.stac.sanity.common.ContextItem

Errors which involve Glozz schema annotations

class educe.stac.sanity.common.UnitItem(doc, contexts, unit)

Bases: educe.stac.sanity.common.ContextItem

Errors which involve Glozz unit-level annotations


Short code providing a clue what the annotation is


True if the annotation has type ‘default’


True if the annotation is a Glozz relation


True if the annotation is a Glozz schema


True if the annotation is a Glozz unit


Return either

  • “EDU”
  • “relation”
  • or the annotation type
educe.stac.sanity.common.search_for_glozz_relations(inputs, k, pred, endpoint_is_naughty=None)

Return a ReportItem for any glozz relation that satisfies the given predicate.

If endpoint_is_naughty is supplied, note which of the endpoints can be considered naughty

educe.stac.sanity.common.search_for_glozz_schema(inputs, k, pred, member_is_naughty=None)

Search for schema that satisfy a condition

educe.stac.sanity.common.search_glozz_units(inputs, k, pred)

Return an item for every unit-level annotation in the given document that satisfies some predicate

Return type:ReportItem
educe.stac.sanity.common.search_in_glozz_schema(inputs, k, stype, pred, member_is_naughty=None)

Search for schema whose memmbers satisfy a condition. Not to be confused with search_for_glozz_schema

educe.stac.sanity.common.summarise_anno(doc, light=False)

Return a function that returns a short text summary of an annotation

educe.stac.sanity.common.summarise_anno_html(doc, contexts)

Return a function that creates HTML descriptions of an annotation given document and contexts

educe.stac.sanity.html module

Helpers for building HTML Hint: import the ET for the ET package too

Create and return an HTML br tag under the parent node

educe.stac.sanity.html.elem(parent, tag, text=None, attrib=None, **kwargs)

Create an HTML element under the given parent node, with some text inside of it

educe.stac.sanity.html.span(parent, text=None, attrib=None, **kwargs)

Create and return an HTML span under the given parent node

educe.stac.sanity.main module

Check the corpus for any consistency problems

class educe.stac.sanity.main.SanityChecker(args)

Bases: object

Sanity checker settings and state


True if we are writing to an output directory


Perform sanity checks and write the output

educe.stac.sanity.main.add_element(settings, k, html, descr, mk_path)

Add a link to a report element for a given document, but only if it actually exists


Copy relevant stanford parser outputs from corpus to report


Create the directory beneath a path if it does not exist


Modify args to reflect user-friendly defaults.

Terminates the program if args.corpus is set but does not point to an existing folder ; otherwise args.doc must be set and everything else is expected to be empty.

Parameters:args (Namespace) – Arguments of the argparser.

See also



Return the first element or None if there isn’t one


Draw SVG graphs for each of the documents in the corpus

educe.stac.sanity.main.issues_descr(report, k)

Return a string characterising a report as either being warnings or error (helps the user scan the index to figure out what needs clicking on)


Sanity checker CLI entry point

educe.stac.sanity.main.run_checks(inputs, k)

Run sanity checks for a given document


We want to sort file id by order of

  1. doc
  2. subdoc
  3. annotator
  4. stage (unannotated < unit < discourse)

The important bit here is the idea that we should maybe group unit and discourse for 1-3 together


Write the report index module

Reporting component of sanity checker

class, output_dir)

Bases: object

Representation of a report that we would like to generate. Output will be dumped to a directory

anchor_name(k, header)

HTML anchor name for a report section

css = '\n.annoid { font-family: monospace; font-size: small; }\n.feature { font-family: monospace; }\n.snippet { font-style: italic; }\n.indented { margin-left:1em; }\n.hidden { display:none; }\n.naughty { color:red; }\n.spillover { color:red; font-weight: bold; } /* needs help to be visible */\n.missing { color:red; }\n.excess { color:blue; }\n'

Delete the subreport for a given key. This can be used if you want to iterate through lots of different keys, generating reports incrementally and then deleting them to avoid building up memory.

No-op if we don’t have a sub-report for the given key


Write and delete (to save memory)


If we have error-level reports for the given key

javascript = '\nfunction has(xs, x) {\n for (e in xs) {\n if (xs[e] === x) { return true; }\n }\n return false;\n}\n\n\nfunction toggle_hidden(name) {\n var ele = document.getElementById(name);\n var anc = document.getElementById(\'anc_\' + name);\n if (has(ele.classList, "hidden")) {\n ele.classList.remove("hidden");\n anc.innerText = "[hide]";\n } else {\n ele.classList.add("hidden");\n anc.innerText = "[show]";\n }\n}\n'
mk_hidden_with_toggle(parent, anchor)

Attach some javascript and html to the given block-level element that turns it into a hide/show toggle block, starting out in the hidden state


Initialise and cache the subreport for a key, including the subreports for each severity level below it

If already cached, retrieve from cache

classmethod mk_output_path(odir, k, extension='')

Generate a path within a parent directory, given a fileid

report(k, err_type, severity, header, items, noisy=False)

Append bullet points for each item to the appropriate section of the appropriate report in progress


Note that this report has seen at least one error-level severity message

subreport_path(k, extension='.report.html')

Report for a single document

write(k, path)

Write the subreport for a given key to the path. No-op if we don’t have a sub-report for the given key


Bases: object

An individual reportable entry (usually involves a list of annotations), rendered as a block of text in the report


The annotations which this report item is about


Return an HTML element corresponding to the visualisation for this item


If you don’t want to create an HTML visualisation for a report item, you can fall back to just generating lines of text

Return type:[string]

Bases: enum.Enum

Severity of a sanity check error block

error = 2
warning = 1


Report item which just consists of lines of text

text(), anno, bracket=False)

Create and return an HTML span parent node displaying the local annotation id for an annotation item, k, err_type, severity)

Return a convenience function that generates report entries at a fixed error type and severity level

Return type:(string, [ReportItem]) -> string, stop=50)

truncate a string if it’s longer than stop chars