![]() |
![]()
| ![]() |
![]()
NAMEData::Sah::Manual::Developer - Data::Sah developer information VERSIONThis document describes version 0.917 of Data::Sah::Manual::Developer (from Perl distribution Data-Sah), released on 2024-02-16. OVERVIEWPERL CODE GENERATIONThis section will describe how the schema is converted into Perl code. From each clause, an equivalent Perl expression will be generated (except for a few special clauses). The expression will return true/false depending on whether data passes the clause. For example, in the schema: ["int", min=>1, max=>10] the clause "min=>1" will be translated into something like: $data >= 1 and the clause "max=>10" will be translated into something like: $data <= 10 For the type itself ("int") we will generate a Perl expression for type checking: Scalar::Util::Numeric::isint($data) These Perl expressions are then ordered and combined into a single one. The order follows the priorities specified by the Sah specification, as each clause has its priority (the lower the number, the higher the priority). The "min" and "max" clauses are "regular" type constraint clauses so they each have a priority of 50. There is a special clause "req" (unspecified here, the default is 0) which have a high priority of 3, which is even higher than the type check. The "req" clause, if given the value of 1/true will require data to be defined. On the other hand, if "req" is false then if data is undefined then all the other constraint clauses will be skipped (so "undef" will pass the schema). After the ordering, the type constraint expressions are joined using the Perl operator "&&" to be able to shortcut after the first failure. The final Perl expression becomes: (!defined($data) ? 1 : (Scalar::Util::Numeric::isnt($data) && ($data >= 1) && ($data <= 10)) ) Default valueThe "default" clause is another special clause that has a high priority, evaluated before "req", type check, or the other constraint clauses. ["int", min=>1, max=>10, default=>1] The "default" clause will be translated into this Perl expression: (($data //= 1), 1) What the above expression does is evaluate the argument to the left of the comma operator (assigning default value to data) then evaluate the argument to the right of the comma, then return that value. So the effect is the above expression will always return true, even though the default value given in the schema might be false Perl-wise, like "" or 0. So the final expression will become: (($data //= 1), 1) && (!defined($data) ? 1 : (Scalar::Util::Numeric::isnt($data) && ($data >= 1) && ($data <= 10)) ) Required value (req=>1)What if "req" is true? ["int*", min=>1, max=>10] # a.k.a. ["int", req=>1, min=>1, max=>10] Then the final expression will become this instead: (defined($data) && Scalar::Util::Numeric::isnt($data) && ($data >= 1) && ($data <= 10)) And if we add the default value: ["int*", min=>1, max=>10, default=>1] Then the final expression will become this: (($data //= 1), 1) && (defined($data) && Scalar::Util::Numeric::isnt($data) && ($data >= 1) && ($data <= 10)) Validator subroutineTo generate a validator subroutine, then, is only a matter of adding some bits to make a full subroutine. Let's get back to this schema: ["int", min=>1, max=>10, default=>1] The final validator code generated would be something like: require Scalar::Util::Numeric; my $validator = sub { my $data = shift; (($data //= 1), 1) && (!defined($data) ? 1 : (Scalar::Util::Numeric::isnt($data) && ($data >= 1) && ($data <= 10)) ) }; This is what is returned by the Data::Sah's gen_validator() function. This validator will return true when data is valid, or false otherwise. Let's test it: $validator->("x"); # false (fails the type check, isint()) $validator->(-1); # false (fails the min clause, $data >= 0) $validator->(20); # false (fails the max clause, $data <= 10) $validator->(5); # true $validator->(undef); # true (because there is the default value of 1 String-returning validatorThe above is fine if all you want is a validator that returns true/false (bool). What if instead you want to return some error message on failure. gen_validator() supports this: if you pass the option return_type => "str_errmsg" you will get such validator: $validator = gen_validator(["int", min=>1, max=>10, default=>1], {return_type=>"str_errmsg"}); To do this, each Perl expression will need to be able to set an error message: require Scalar::Util::Numeric; my $validator = sub { my $data = shift; my $err_data; (($data //= 1), 1) && (!defined($data) ? 1 : (Scalar::Util::Numeric::isnt($data) ? 1 : (($err_data //= "Not integer"),0) ) && ($data >= 1 ? 1 : (($err_data //= "Must be at least 1"),0) ) && ($data <= 10 ? 1 : (($err_data //= "Must be at most 10"),0) ) ); $err_data //= ""; $err_data; }; So each constraint expression still either returns true or false like in the boolean validator case, but before the expression returns 0, it sets $err_data first. After the whole expression is evaluated, $err_data is returned. Other possible values for the "return_type" are:
Or-logicNormally all clauses in a clause set must return true for the validation to succeed ("and-logic"). However, some other logics are possible: only N clauses need to succeed, at most N clauses must succeed, or its combination. When only one clauses need to succeed, this is called an "or-logic". Example schema for a password policy: ["str*", { clause => [ [min_len => 10], [match => qr/\W/], [match => qr/[A-Z][0-9]|[0-9][A-Z]/i], ], "clause.op" => "or", }] The above schema says that a password needs to be at least 10 characters long, or contains a symbol (non-word character), or contains both letters and numbers. This will be translated into something like this: (defined $data) && (!ref($data)) && # type check for str (do { my $_sahv_ok = 0; my $_sahv_nok = 0; (length($data) >= 10 ? ++$_sahv_ok : ++$_sahv_nok) && ($data =~ qr/\W/ ? ++$_sahv_ok : ++$_sahv_nok) && ($data =~ qr/[A-Z][0-9]|[0-9][A-Z]/i ? ++$_sahv_ok : ++$_sahv_nok) && $_sahv_ok >= 1; }) XXX shortcut after $_sahv_ok becomes 1? HUMAN TEXT GENERATIONThis section explains how Sah schema is converted into human description text, e.g. "[int => div_by=>3]" into "integer, divisible by 3". This human text is used for error messages or for documentation. You should read the previous section about code generation first, since text generation is basically the same: it's just another "compilation" process. The difference is, instead of generating Perl code as in the case of the "perl" compiler (Data::Sah::Compiler::perl), the "human" compiler (Data::Sah::Compiler::human) generates text as the result. As in generating code, when generating text, we visit the type handler and then clause handler for each clause. Each of these handlers usually calls add_ccl() to add a "compiled clause" which will be joined together to create the final result. The type handler usually adds a "noun" compiled clause. For example, for schema "["float", min=>1, max=>10]", the type handler for float (method "handle_type" in Data::Sah::Compiler::human::TH::float, TH is short for type handler) will add this compiled clause: { type => 'noun', text => ['decimal number', 'decimal numbers'], xlt => 1, } The "xlt=>1" signifies that the text has been translated (note that the human compiler supports producing human text in languages other than English). Next, the clause handler for clause "min" (method "clause_min" in Data::Sah::Compiler::human::TH::float) will add this compiled clause: { type => 'clause', fmt => '%(modal_verb)s be at least %s', } Now, instead of "text" we have "fmt". This will be converted into "text" using sprintfn (see Text::sprintfn) by add_ccl(). The positional arguments (like %s) will be fed from clause value (in this case, 1). While the named arguments (like "%(modal_verb)s") will be supplied by add_ccl(). Since "xlt" is not set to true, this means the format string needs to be translated first. add_ccl() will find a suitable translation first (see "Translation") and then call sprintfn() to finally get "text". The final result of this compiled clause is: { type => 'clause', text => 'must be at least 1', xlt => 1, } For the last clause "max", we'll similarly get a compiled clause: { type => 'clause', fmt => '%(modal_verb)s be at most %s', } which will become: { type => 'clause', text => 'must be at most 10', xlt => 1, } Finally, all the compiled clauses will simply be joined and the compilation result is: "decimal number, must be at least 1, must be at least 10" FormatsTBD Handling CLAUSE.op and CLAUSE.err_levelConsider this schema: [int => 'div_by&' => [3, 5]] which is a shortcut for: [int => 'div_by'=>[3, 5], 'div_by.op'=>'and'] This is a clause with multivalues. This is the compiled clauses that will be added during generation: {type=>'noun', text=>['integer','integers'], xlt=>1} and: {type=>'clause', fmt=>'%(modal_verb)s divisible by %s'} which will become: {type=>'clause', text=>'must be divisible by 3 and 5', xlt=>1} In other words, the clause "fmt" is the same but the arguments supplied to it are formatted to contain the multiple values. Another example, for "[int => 'div_by&'=>[2,3,5]]", the clause will generate this final compiled clause: {type=>'clause', text=>'must be divisible by all of [2,3,5]', xlt=>1} For "[int => 'div_by|'=>[2,3,5]]" (which is shortcut for "[int => 'div_by'=>[2,3,5], 'div_by.op'=>'or']") the final compiled clause will be: {type=>'clause', text=>'must be divisible by one of [2,3,5]', xlt=>1} For "[int => '!div_by'=>3]" (which is shortcut for "[int => 'div_by'=>3, 'div_by.op'=>'or']") the final compiled clause will be: {type=>'clause', text=>'must not be divisible by 3', xlt=>1} that is, the value for "modal_verb" named argument supplied by add_ccl() is changed from the default "must" to "must not". For "[int => 'div_by'=>3, 'div_by.err_level'=>'warn']", the final compiled clause will be: {type=>'clause', text=>'should be divisible by 3', xlt=>1} that is, the value for "modal_verb" named argument supplied by add_ccl() is changed from the default "must" to "should". Not all clauses can use multiple clause values in its arguments. For example, in "[int => mod=>[3, 1]]", the compiled clause for the "mod" clause will be: {type=>'clause', fmt=>'%(modal_verb)s leave a remainder of %2$s when divided by %1$s', vals=>[3, 1]} (Note: the "vals" key supplies positional arguments for "sprintfn" if you want it other than the default clause value. In this case we want to flatten the clause value because otherwise the positional arguments array would be "[ [3,1] ]". The "%1$s" and "%2$s" are printf syntax for using positional arguments (see "sprintf" in perlfunc). The final compiled clause will become: {type=>'clause', text=>'must leave a remainder of 1 when divided by 3', xlt=>1} Now what if we have this schema: "[int => 'mod&' => [ [3,1], [5,1] ]". If we use the same "fmt" for multiple values, the final compiled clause will become: {type=>'clause', text=>'must leave a remainder of [5,1] when divided by [3,1]', xlt=>1} in which the text doesn't make grammatical sense. In this case, the clause handler will need to add a compiled clause of type "list" instead of of type "clause": { type =>'list', text => 'all of the following must be true', items => [ {type=>'clause', text='must leave a remainder of 1 when divided by 3', xlt=>1}, {type=>'clause', text='must leave a remainder of 1 when divided by 5', xlt=>1}, ], xlt => 1, } The "list" compiled clause is used to create text with bullet points (which can be inlined into a clause in some cases where possible). The final compilation result for the last schema will be: "integer, all of the following must be true: must leave a remainder of 1 when divided by 3, must leave a remainder of 1 when divided by 5" Coercion (perl)Coercion rules for perl are organized modularly in "Data::Sah::Coerce::perl::To_$TARGET_TYPE::From_$SOURCE_TYPE::$DESCRIPTION" modules, where $TARGET_TYPE is the schema being compiled, $SOURCE_TYPE is source type, $DESCRIPTION is some extra description. Example: Data::Sah::Coerce::perl::To_date::From_float::Epoch This module contain rule to convert integer (which assumed to be Unix epoch) into date. Another example: Data::Sah::Coerce::perl::To_date::From_str::ISO8601 This is also a module to coerce date from (a subset of) ISO8601 strings. Handling expressionTBD TranslationTBD COERCIONIn Data::Sah, coercion rules are organized modularly in "Data::Sah::Coerce::$LANG::To_$TARGET_TYPE::From_$SOURCE_TYPE::$DESCRIPTION" modules, where $TARGET_TYPE is the schema being compiled, $SOURCE_TYPE is source type, and $DESCRIPTION is some extra description. For language-specific information, see "Coercion (perl)". Code for coercion is generated by collecting all rules from the coercion handler modules then combining them and putting it after setting default value and before type check. HOMEPAGEPlease visit the project's homepage at <https://metacpan.org/release/Data-Sah>. SOURCESource repository is at <https://github.com/perlancar/perl-Data-Sah>. AUTHORperlancar <perlancar@cpan.org> CONTRIBUTINGTo contribute, you can send patches by email/via RT, or send pull requests on GitHub. Most of the time, you don't need to build the distribution yourself. You can simply modify the code, then test via: % prove -l If you want to build the distribution (e.g. to try to install it locally on your system), you can install Dist::Zilla, Dist::Zilla::PluginBundle::Author::PERLANCAR, Pod::Weaver::PluginBundle::Author::PERLANCAR, and sometimes one or two other Dist::Zilla- and/or Pod::Weaver plugins. Any additional steps required beyond that are considered a bug and can be reported to me. COPYRIGHT AND LICENSEThis software is copyright (c) 2024, 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012 by perlancar <perlancar@cpan.org>. This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself. BUGSPlease report any bugs or feature requests on the bugtracker website <https://rt.cpan.org/Public/Dist/Display.html?Name=Data-Sah> When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.
|