Schematron

Key point: PageSeeder’s in-built, customizable document validation technology.

Schematron is a rule-based validation language with powerful capabilities and configurable error messages. It expresses rules, or constraints, in terms that domain experts or users can understand.

Further information about the language is available on schematron.com.

Schematron in a nutshell

Schematron defines constraints for your content using natural language that is meaningful to users.

It supports two kinds of constraints:

  • Assertions – let you confirm that the document conforms to a particular schema.
  • Reports – let you diagnose and return useful information about the content.

Related assertions and reports can be grouped together in a pattern, that can operate as follows:

  1. It finds a specific context (for example, a heading).
  2. It checks against a pattern expression (for example, text contains less than 100 characters).
  3. It reports which assertions have failed and which reports have succeeded alongside some diagnostic information.

The Schematron results is a list of failed assertions and successful reports.

The difference between a failure and a report is that a failure is counted towards the total failure count. A document is considered invalid when there is at least one failure.

Example

Your style guide requires that all your headings use sentence-style capitalization. Conformance means you capitalize the first word but use lower case for all other words, except for proper nouns.

Your Schematron must define a rule which context is “heading” and includes an assertion that the text matches sentence-style capitalization.

When Schematron processes your document, it does the following:

  1. Finds all the headings in your document.
  2. Checks that each one matches the correct capitalization style.
  3. Returns a list of all the headings that didn’t match as failed assertions.

Schematron in PageSeeder

You can use Schematron to validate any document or URL in PageSeeder.

  • For PSML documents, Schematron validates the full content, edit notes and reverse references.
  • For non-PSML documents, such as images, PDF and Office files, and URLs, Schematron validates the metadata PSML. This includes the metadata properties, edit notes and reverse references.

When viewing or editing a document:

You can also validate files from the group search page, and the documents page, as a batch action.

PSML schema and Schematron

PageSeeder uses Schematron in conjunction with the PSML schema to validate documents.

The PSML schema uses a grammar-based language to define constraints which are common to all PSML documents, and enforced by PageSeeder. A document which doesn’t validate the PSML schema is not considered to be PSML.

Schematron complements the PSML schema to check for system-specific constraints.

You can associate multiple Schematron schemas to each document type, media type or URL type. When viewing a document or URL, PageSeeder automatically validates your content using the default schema.

Result format

After processing a document, Schematron returns a list of failed assertions and successful reports. These are presented as a red cross   or a green tick   in PageSeeder by default.

To provide more nuanced results, the assertions and reports in your schema can be defined as:

  • errors (for failed assertions),
  •  warnings,
  • infos (for reports only),
  • and tips (for reports only).

They can be used to filter the results:

PageSeeder only counts the number of failed assertions to determine whether the document is valid, regardless of which icon is used.

Diagnostic information

Assertions and reports can be associated with diagnostic information. The most common diagnostic hint is the fragment that hosts the failed assertion or report occurred. This is used by PageSeeder to take you to the fragment in the validation results when you click the  icon.

In some cases, PageSeeder is able to provide a more precise location based on the context of the error, and highlights the context paragraph or heading.

For developers

In developer view, the validation results include the xpath of the context so that you can precisely locate the source of the failed assertion or successful report in PSML.

IDs for notes and filtering

If an assertion is given an ID in the schema, you can also filter the results that match that ID by clicking the icon and add an edit note with that ID as a label when you click the Add note button.

Since edit notes are part of the content that is validated, you can use them as part of your constraint. You can specify in the content of the edit note that the rule doesn’t apply in a particular instance or to alter the rule.

Example

If you have a list of terms to avoid, but you want to allow some flexibility in case the term is unavoidable, you could use the edit note to indicate that the term is allowed in that particular situation.

Using data for validation

Schematron can use data defined in other documents or PageSeeder search results as part of its validation process.

Example

You can define a controlled vocabulary in a PageSeeder document, that Schematron can use to check against the terms in a document. This flexible approach lets users manage the vocabulary and authors can modify content so it conforms to that vocabulary.

Quick fix

Assertions can be associated with a “quick fix”. A quick fix is a transformation that has been preconfigured by a developer to fix a particular issue in your content. By using Schematron to identify an issue and bind it to a quick fix, you can streamline the process of addressing content issues.

When a quick fix is associated with an assertion, you can see the Quickfix button. It opens the quick fix dialog so that you can preview the effect of the quick fix and decide whether to apply it or not.

In some cases, there might be multiple quick fixes available to address the same problem.

Example

You have a rule that requires your headings to use sentence-style capitalization (where only the first word is capitalized). Schematron can report which headings don’t follow the capitalization style and quick fix can automatically apply that style.

Available from the template configuration page, the Validate all button uses Schematron to validate document type configuration files to determine whether they are valid.

See How to create a schema for a document.

For developers: For more information, see Validating documents on the developer’s website.