We develop and license a computer system technology called Orion, intended to read and extract knowledge from complex text, turn it into a semantic structure, and later activate that structure.
The process of turning a large amount of text into structure is slow, requiring it to be done well before it is needed, with only questions or specific searches being turned into semantic structure on the fly. The newly created semantic structures are matched with the existing semantic structure to generate answers or actions.
Corporate knowledge text uses specialised words and concepts, indexed lists, references to other sections or documents, control of the existence of sections or documents. A system to handle knowledge in text has to handle all these things, and their equivalent in a semantic network of nodes and operators is intended to be an exact functional match for the text.
To aid in modelling, the system provides a visual interface to the semantic network – fine while you are pottering about in a small area, but useless where there are tens of thousands of words and concepts, and a million network elements – the primary interface between user and machine has to be text, otherwise the user would be overwhelmed.
We license the technology rather than sell implementations of it to different companies because we are interested in pushing the envelope of what is possible with automating the extraction of knowledge from text.
A licensee receives a licence to use the technology in a particular field or domain. That means they need to develop the expertise to support it and expand its usefulness. We will train the licensee’s people, but this is a new and difficult area – the licensee will need to find talented people who can carry the intellectual load until the underlying concepts are better known.
What Makes It Hard
Some text is written by subject experts, and is intended to be read by subject experts – medical codes are a good example. Many concepts are embedded in the text – ranging from concepts as simple as “adult dosage” to concepts as wordy as “multiple-family group adaptive behaviour treatment guidance”, and where medical codes cover all of medicine, including surgery, radiology, blood chemistry, injections, the latest genetic analysis, durable medical equipment like wheelchairs, and, just starting, functioning artificial eyes. The knowledge is in a continual state of flux, with new techniques sitting side by side with methods from the 19th century. The writer (in fact, many writers, each in their own specialty, and with different styles) expects all these concepts to be already present in the reader’s head, so the system has to know them too.
In some areas the writer demands a lot of the reader – the ability to pull apart compound words is one such ability. Things like:
– Extra and intracranial surgery (extracranial is a word)
– Single or multichannel receiver (there is a single channel receiver)
Every person has, over decades, built a subconscious reading machine within themselves. An automated system handling knowledge in text needs to be complete, versatile, and with better reliability than the human version.