educe.stac.sanity package¶

Subpackages¶

educe.stac.sanity.checks package

Submodules¶

educe.stac.sanity.common module¶

Functionality and report types common to sanity checker

class educe.stac.sanity.common.ContextItem(doc, contexts)¶

Bases: educe.stac.sanity.report.ReportItem

Report item involving EDU contexts

class educe.stac.sanity.common.RelationItem(doc, contexts, rel, naughty)¶

Bases: educe.stac.sanity.common.ContextItem

Errors which involve Glozz relation annotations

annotations()¶

html()¶

class educe.stac.sanity.common.SchemaItem(doc, contexts, schema, naughty)¶

Bases: educe.stac.sanity.common.ContextItem

Errors which involve Glozz schema annotations

annotations()¶

html()¶

class educe.stac.sanity.common.UnitItem(doc, contexts, unit)¶

Bases: educe.stac.sanity.common.ContextItem

Errors which involve Glozz unit-level annotations

annotations()¶

html()¶

educe.stac.sanity.common.anno_code(anno)¶: Short code providing a clue what the annotation is

educe.stac.sanity.common.is_default(anno)¶: True if the annotation has type ‘default’

educe.stac.sanity.common.is_glozz_relation(anno)¶: True if the annotation is a Glozz relation

educe.stac.sanity.common.is_glozz_schema(anno)¶: True if the annotation is a Glozz schema

educe.stac.sanity.common.is_glozz_unit(anno)¶: True if the annotation is a Glozz unit

educe.stac.sanity.common.rough_type(anno)¶

Return either

“EDU”

“relation”

or the annotation type

educe.stac.sanity.common.search_for_glozz_relations(inputs, k, pred, endpoint_is_naughty=None)¶

Return a ReportItem for any glozz relation that satisfies the given predicate.

If endpoint_is_naughty is supplied, note which of the endpoints can be considered naughty

educe.stac.sanity.common.search_for_glozz_schema(inputs, k, pred, member_is_naughty=None)¶: Search for schema that satisfy a condition

educe.stac.sanity.common.search_glozz_units(inputs, k, pred)¶

Return an item for every unit-level annotation in the given document that satisfies some predicate

Return type:	`ReportItem`

educe.stac.sanity.common.search_in_glozz_schema(inputs, k, stype, pred, member_is_naughty=None)¶: Search for schema whose memmbers satisfy a condition. Not to be confused with search_for_glozz_schema

educe.stac.sanity.common.summarise_anno(doc, light=False)¶: Return a function that returns a short text summary of an annotation

educe.stac.sanity.common.summarise_anno_html(doc, contexts)¶: Return a function that creates HTML descriptions of an annotation given document and contexts

educe.stac.sanity.html module¶

Helpers for building HTML Hint: import the ET for the ET package too

educe.stac.sanity.html.br(parent)¶: Create and return an HTML br tag under the parent node

educe.stac.sanity.html.elem(parent, tag, text=None, attrib=None, **kwargs)¶: Create an HTML element under the given parent node, with some text inside of it

educe.stac.sanity.html.span(parent, text=None, attrib=None, **kwargs)¶: Create and return an HTML span under the given parent node

educe.stac.sanity.main module¶

Check the corpus for any consistency problems

class educe.stac.sanity.main.SanityChecker(args)¶

Bases: object

Sanity checker settings and state

output_is_temp()¶: True if we are writing to an output directory

run()¶: Perform sanity checks and write the output

educe.stac.sanity.main.add_element(settings, k, html, descr, mk_path)¶: Add a link to a report element for a given document, but only if it actually exists

educe.stac.sanity.main.copy_parses(settings)¶: Copy relevant stanford parser outputs from corpus to report

educe.stac.sanity.main.create_dirname(path)¶: Create the directory beneath a path if it does not exist

educe.stac.sanity.main.easy_settings(args)¶

Modify args to reflect user-friendly defaults.

Terminates the program if args.corpus is set but does not point to an existing folder ; otherwise args.doc must be set and everything else is expected to be empty.

Parameters:	args (Namespace) – Arguments of the argparser.

educe.stac.sanity.report module¶

Reporting component of sanity checker

class educe.stac.sanity.report.HtmlReport(anno_files, output_dir)¶

Bases: object

Representation of a report that we would like to generate. Output will be dumped to a directory

anchor_name(k, header)¶: HTML anchor name for a report section

css = '\n.annoid { font-family: monospace; font-size: small; }\n.feature { font-family: monospace; }\n.snippet { font-style: italic; }\n.indented { margin-left:1em; }\n.hidden { display:none; }\n.naughty { color:red; }\n.spillover { color:red; font-weight: bold; } /* needs help to be visible */\n.missing { color:red; }\n.excess { color:blue; }\n'¶

delete(k)¶

Delete the subreport for a given key. This can be used if you want to iterate through lots of different keys, generating reports incrementally and then deleting them to avoid building up memory.

No-op if we don’t have a sub-report for the given key

flush_subreport(k)¶: Write and delete (to save memory)

has_errors(k)¶: If we have error-level reports for the given key

javascript = '\nfunction has(xs, x) {\n for (e in xs) {\n if (xs[e] === x) { return true; }\n }\n return false;\n}\n\n\nfunction toggle_hidden(name) {\n var ele = document.getElementById(name);\n var anc = document.getElementById(\'anc_\' + name);\n if (has(ele.classList, "hidden")) {\n ele.classList.remove("hidden");\n anc.innerText = "[hide]";\n } else {\n ele.classList.add("hidden");\n anc.innerText = "[show]";\n }\n}\n'¶

mk_hidden_with_toggle(parent, anchor)¶: Attach some javascript and html to the given block-level element that turns it into a hide/show toggle block, starting out in the hidden state

mk_or_get_subreport(k)¶

Initialise and cache the subreport for a key, including the subreports for each severity level below it

If already cached, retrieve from cache

classmethod mk_output_path(odir, k, extension='')¶: Generate a path within a parent directory, given a fileid

report(k, err_type, severity, header, items, noisy=False)¶: Append bullet points for each item to the appropriate section of the appropriate report in progress

set_has_errors(k)¶: Note that this report has seen at least one error-level severity message

subreport_path(k, extension='.report.html')¶: Report for a single document

write(k, path)¶: Write the subreport for a given key to the path. No-op if we don’t have a sub-report for the given key

class educe.stac.sanity.report.ReportItem¶

Bases: object

An individual reportable entry (usually involves a list of annotations), rendered as a block of text in the report

annotations()¶: The annotations which this report item is about

html()¶: Return an HTML element corresponding to the visualisation for this item

text()¶

If you don’t want to create an HTML visualisation for a report item, you can fall back to just generating lines of text

Return type:	[string]

class educe.stac.sanity.report.Severity¶

Bases: enum.Enum

Severity of a sanity check error block

error = 2¶

warning = 1¶

class educe.stac.sanity.report.SimpleReportItem(lines)¶

Bases: educe.stac.sanity.report.ReportItem

Report item which just consists of lines of text

text()¶

educe.stac.sanity.report.html_anno_id(parent, anno, bracket=False)¶: Create and return an HTML span parent node displaying the local annotation id for an annotation item

educe.stac.sanity.report.mk_microphone(report, k, err_type, severity)¶

Return a convenience function that generates report entries at a fixed error type and severity level

Return type:	(string, [ReportItem]) -> string

educe.stac.sanity.report.snippet(txt, stop=50)¶: truncate a string if it’s longer than stop chars