educe.stac.sanity package¶
Subpackages¶
Submodules¶
educe.stac.sanity.common module¶
Functionality and report types common to sanity checker
-
class
educe.stac.sanity.common.ContextItem(doc, contexts)¶ Bases:
educe.stac.sanity.report.ReportItemReport item involving EDU contexts
-
class
educe.stac.sanity.common.RelationItem(doc, contexts, rel, naughty)¶ Bases:
educe.stac.sanity.common.ContextItemErrors which involve Glozz relation annotations
-
annotations()¶
-
html()¶
-
-
class
educe.stac.sanity.common.SchemaItem(doc, contexts, schema, naughty)¶ Bases:
educe.stac.sanity.common.ContextItemErrors which involve Glozz schema annotations
-
annotations()¶
-
html()¶
-
-
class
educe.stac.sanity.common.UnitItem(doc, contexts, unit)¶ Bases:
educe.stac.sanity.common.ContextItemErrors which involve Glozz unit-level annotations
-
annotations()¶
-
html()¶
-
-
educe.stac.sanity.common.anno_code(anno)¶ Short code providing a clue what the annotation is
-
educe.stac.sanity.common.is_default(anno)¶ True if the annotation has type ‘default’
-
educe.stac.sanity.common.is_glozz_relation(anno)¶ True if the annotation is a Glozz relation
-
educe.stac.sanity.common.is_glozz_schema(anno)¶ True if the annotation is a Glozz schema
-
educe.stac.sanity.common.is_glozz_unit(anno)¶ True if the annotation is a Glozz unit
-
educe.stac.sanity.common.rough_type(anno)¶ Return either
- “EDU”
- “relation”
- or the annotation type
-
educe.stac.sanity.common.search_for_glozz_relations(inputs, k, pred, endpoint_is_naughty=None)¶ Return a
ReportItemfor any glozz relation that satisfies the given predicate.If endpoint_is_naughty is supplied, note which of the endpoints can be considered naughty
-
educe.stac.sanity.common.search_for_glozz_schema(inputs, k, pred, member_is_naughty=None)¶ Search for schema that satisfy a condition
-
educe.stac.sanity.common.search_glozz_units(inputs, k, pred)¶ Return an item for every unit-level annotation in the given document that satisfies some predicate
Return type: ReportItem
-
educe.stac.sanity.common.search_in_glozz_schema(inputs, k, stype, pred, member_is_naughty=None)¶ Search for schema whose memmbers satisfy a condition. Not to be confused with search_for_glozz_schema
-
educe.stac.sanity.common.summarise_anno(doc, light=False)¶ Return a function that returns a short text summary of an annotation
-
educe.stac.sanity.common.summarise_anno_html(doc, contexts)¶ Return a function that creates HTML descriptions of an annotation given document and contexts
educe.stac.sanity.html module¶
Helpers for building HTML Hint: import the ET for the ET package too
-
educe.stac.sanity.html.br(parent)¶ Create and return an HTML br tag under the parent node
-
educe.stac.sanity.html.elem(parent, tag, text=None, attrib=None, **kwargs)¶ Create an HTML element under the given parent node, with some text inside of it
-
educe.stac.sanity.html.span(parent, text=None, attrib=None, **kwargs)¶ Create and return an HTML span under the given parent node
educe.stac.sanity.main module¶
Check the corpus for any consistency problems
-
class
educe.stac.sanity.main.SanityChecker(args)¶ Bases:
objectSanity checker settings and state
-
output_is_temp()¶ True if we are writing to an output directory
-
run()¶ Perform sanity checks and write the output
-
-
educe.stac.sanity.main.add_element(settings, k, html, descr, mk_path)¶ Add a link to a report element for a given document, but only if it actually exists
-
educe.stac.sanity.main.copy_parses(settings)¶ Copy relevant stanford parser outputs from corpus to report
-
educe.stac.sanity.main.create_dirname(path)¶ Create the directory beneath a path if it does not exist
-
educe.stac.sanity.main.easy_settings(args)¶ Modify args to reflect user-friendly defaults.
Terminates the program if args.corpus is set but does not point to an existing folder ; otherwise args.doc must be set and everything else is expected to be empty.
Parameters: args (Namespace) – Arguments of the argparser. See also
educe.stac.util.args.check_easy_settings()
-
educe.stac.sanity.main.first_or_none(itrs)¶ Return the first element or None if there isn’t one
-
educe.stac.sanity.main.generate_graphs(settings)¶ Draw SVG graphs for each of the documents in the corpus
-
educe.stac.sanity.main.issues_descr(report, k)¶ Return a string characterising a report as either being warnings or error (helps the user scan the index to figure out what needs clicking on)
-
educe.stac.sanity.main.main()¶ Sanity checker CLI entry point
-
educe.stac.sanity.main.run_checks(inputs, k)¶ Run sanity checks for a given document
-
educe.stac.sanity.main.sanity_check_order(k)¶ We want to sort file id by order of
- doc
- subdoc
- annotator
- stage (unannotated < unit < discourse)
The important bit here is the idea that we should maybe group unit and discourse for 1-3 together
-
educe.stac.sanity.main.write_index(settings)¶ Write the report index
educe.stac.sanity.report module¶
Reporting component of sanity checker
-
class
educe.stac.sanity.report.HtmlReport(anno_files, output_dir)¶ Bases:
objectRepresentation of a report that we would like to generate. Output will be dumped to a directory
-
anchor_name(k, header)¶ HTML anchor name for a report section
-
css= '\n.annoid { font-family: monospace; font-size: small; }\n.feature { font-family: monospace; }\n.snippet { font-style: italic; }\n.indented { margin-left:1em; }\n.hidden { display:none; }\n.naughty { color:red; }\n.spillover { color:red; font-weight: bold; } /* needs help to be visible */\n.missing { color:red; }\n.excess { color:blue; }\n'¶
-
delete(k)¶ Delete the subreport for a given key. This can be used if you want to iterate through lots of different keys, generating reports incrementally and then deleting them to avoid building up memory.
No-op if we don’t have a sub-report for the given key
-
flush_subreport(k)¶ Write and delete (to save memory)
-
has_errors(k)¶ If we have error-level reports for the given key
-
javascript= '\nfunction has(xs, x) {\n for (e in xs) {\n if (xs[e] === x) { return true; }\n }\n return false;\n}\n\n\nfunction toggle_hidden(name) {\n var ele = document.getElementById(name);\n var anc = document.getElementById(\'anc_\' + name);\n if (has(ele.classList, "hidden")) {\n ele.classList.remove("hidden");\n anc.innerText = "[hide]";\n } else {\n ele.classList.add("hidden");\n anc.innerText = "[show]";\n }\n}\n'¶
Attach some javascript and html to the given block-level element that turns it into a hide/show toggle block, starting out in the hidden state
-
mk_or_get_subreport(k)¶ Initialise and cache the subreport for a key, including the subreports for each severity level below it
If already cached, retrieve from cache
-
classmethod
mk_output_path(odir, k, extension='')¶ Generate a path within a parent directory, given a fileid
-
report(k, err_type, severity, header, items, noisy=False)¶ Append bullet points for each item to the appropriate section of the appropriate report in progress
-
set_has_errors(k)¶ Note that this report has seen at least one error-level severity message
-
subreport_path(k, extension='.report.html')¶ Report for a single document
-
write(k, path)¶ Write the subreport for a given key to the path. No-op if we don’t have a sub-report for the given key
-
-
class
educe.stac.sanity.report.ReportItem¶ Bases:
objectAn individual reportable entry (usually involves a list of annotations), rendered as a block of text in the report
-
annotations()¶ The annotations which this report item is about
-
html()¶ Return an HTML element corresponding to the visualisation for this item
-
text()¶ If you don’t want to create an HTML visualisation for a report item, you can fall back to just generating lines of text
Return type: [string]
-
-
class
educe.stac.sanity.report.Severity¶ Bases:
enum.EnumSeverity of a sanity check error block
-
error= 2¶
-
warning= 1¶
-
-
class
educe.stac.sanity.report.SimpleReportItem(lines)¶ Bases:
educe.stac.sanity.report.ReportItemReport item which just consists of lines of text
-
text()¶
-
-
educe.stac.sanity.report.html_anno_id(parent, anno, bracket=False)¶ Create and return an HTML span parent node displaying the local annotation id for an annotation item
-
educe.stac.sanity.report.mk_microphone(report, k, err_type, severity)¶ Return a convenience function that generates report entries at a fixed error type and severity level
Return type: (string, [ReportItem]) -> string
-
educe.stac.sanity.report.snippet(txt, stop=50)¶ truncate a string if it’s longer than stop chars