Analyzers
AbstractAggregateFunction
AbstractAnalyzer
AnalyzerOptions
dataclass
Container for Analyzer Options.
Source code in tsumugi-python/tsumugi/analyzers.py
ApproxCountDistinct
dataclass
Bases: AbstractAnalyzer
Computes the approximate count distinctness of a column with HyperLogLogPlusPlus.
Source code in tsumugi-python/tsumugi/analyzers.py
ApproxQuantile
dataclass
Bases: AbstractAnalyzer
Computes the Approximate Quantile of a column.
The allowed relative error compared to the exact quantile can be configured with the
relativeError
parameter.
Source code in tsumugi-python/tsumugi/analyzers.py
ApproxQuantiles
dataclass
Bases: AbstractAnalyzer
Computes the approximate quantiles of a column.
The allowed relative error compared to the exact quantile can be configured with
relativeError
parameter.
Source code in tsumugi-python/tsumugi/analyzers.py
ColumnCount
dataclass
Bases: AbstractAnalyzer
Computes the count of columns.
Source code in tsumugi-python/tsumugi/analyzers.py
Completeness
dataclass
Bases: AbstractAnalyzer
Completeness is the fraction of non-null values in a column.
Source code in tsumugi-python/tsumugi/analyzers.py
Compliance
dataclass
Bases: AbstractAnalyzer
Compliance measures the fraction of rows that complies with the given column constraint.
E.g if the constraint is "att1>3" and data frame has 5 rows with att1 column value greater than 3 and 10 rows under 3; a DoubleMetric would be returned with 0.33 value.
Source code in tsumugi-python/tsumugi/analyzers.py
ConstraintBuilder
Source code in tsumugi-python/tsumugi/analyzers.py
571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 |
|
for_analyzer(analyzer)
should_be_eq_to(value)
Add an assertion that metric == value.
This result of this methods depends of the passed type!
should_be_geq_than(value)
Add an assertion that metric >= value.
This result of this methods depends of the passed type!
should_be_gt_than(value)
Add an assertion that metric > value.
This result of this methods depends of the passed type!
should_be_leq_than(value)
Add an assertion that metric <= value.
This result of this methods depends of the passed type!
should_be_lt_than(value)
Add an assertion that metric < value.
This result of this methods depends of the passed type!
with_hint(hint)
Set a hint for the constraint.
Hint can be helpful in the case when one needs to realize the reason of the constraint or why did it fail.
Source code in tsumugi-python/tsumugi/analyzers.py
Correlation
dataclass
Bases: AbstractAnalyzer
Computes the pearson correlation coefficient between the two given columns.
Source code in tsumugi-python/tsumugi/analyzers.py
CountAggregate
dataclass
Bases: AbstractAggregateFunction
Computes Histogram Count Aggregation
Source code in tsumugi-python/tsumugi/analyzers.py
CountDistinct
dataclass
Bases: AbstractAnalyzer
Counts the distinct elements in the column(s).
Source code in tsumugi-python/tsumugi/analyzers.py
CustomSql
dataclass
Bases: AbstractAnalyzer
Compute the number of rows that match the custom SQL expression.
Source code in tsumugi-python/tsumugi/analyzers.py
DataType
dataclass
Bases: AbstractAnalyzer
Data Type Analyzer. Returns the datatypes of column.
Source code in tsumugi-python/tsumugi/analyzers.py
Distinctness
dataclass
Bases: AbstractAnalyzer
Count the distinctness of elements in column(s).
Distinctness is the fraction of distinct values of a column(s).
Source code in tsumugi-python/tsumugi/analyzers.py
Entropy
dataclass
Bases: AbstractAnalyzer
Entropy is a measure of the level of information contained in a message.
Given the probability distribution over values in a column, it describes how many bits are required to identify a value.
Source code in tsumugi-python/tsumugi/analyzers.py
ExactQuantile
dataclass
Bases: AbstractAnalyzer
Compute an exact quantile of the given column.
Source code in tsumugi-python/tsumugi/analyzers.py
Histogram
dataclass
Bases: AbstractAnalyzer
Histogram is the summary of values in a column of a DataFrame.
It groups the column's values then calculates the number of rows with that specific value and the fraction of the value.
Source code in tsumugi-python/tsumugi/analyzers.py
KLLParameters
dataclass
Parameters for KLLSketch.
Source code in tsumugi-python/tsumugi/analyzers.py
KLLSketch
dataclass
Bases: AbstractAnalyzer
The KLL Sketch analyzer.
Source code in tsumugi-python/tsumugi/analyzers.py
MaxLength
dataclass
Bases: AbstractAnalyzer
MaxLength Analyzer. Get Max length of a str type column.
Source code in tsumugi-python/tsumugi/analyzers.py
Maximum
dataclass
Bases: AbstractAnalyzer
Get the maximum of a numeric column.
Source code in tsumugi-python/tsumugi/analyzers.py
Mean
dataclass
Bases: AbstractAnalyzer
Mean Analyzer. Get mean of a column.
Source code in tsumugi-python/tsumugi/analyzers.py
MinLength
dataclass
Bases: AbstractAnalyzer
Get the minimum length of a column.
Source code in tsumugi-python/tsumugi/analyzers.py
Minimum
dataclass
Bases: AbstractAnalyzer
Get the minimum of a numeric column.
Source code in tsumugi-python/tsumugi/analyzers.py
MutualInformation
dataclass
Bases: AbstractAnalyzer
Describes how much information about one column can be inferred from another column.
Source code in tsumugi-python/tsumugi/analyzers.py
PatternMatch
dataclass
Bases: AbstractAnalyzer
PatternMatch is a measure of the fraction of rows that complies with a given column regex constraint.
Source code in tsumugi-python/tsumugi/analyzers.py
RatioOfSums
dataclass
Bases: AbstractAnalyzer
Compute ratio of sums between two columns.
Source code in tsumugi-python/tsumugi/analyzers.py
Size
dataclass
Bases: AbstractAnalyzer
Size is the number of rows in a DataFrame.
Source code in tsumugi-python/tsumugi/analyzers.py
StandardDeviation
dataclass
Bases: AbstractAnalyzer
Calculates the Standard Deviation of column.
Source code in tsumugi-python/tsumugi/analyzers.py
Sum
dataclass
Bases: AbstractAnalyzer
Calculates the sum of a column.
Source code in tsumugi-python/tsumugi/analyzers.py
SumAggregate
dataclass
Bases: AbstractAggregateFunction
Computes Histogram Sum Aggregation
Source code in tsumugi-python/tsumugi/analyzers.py
UniqueValueRatio
dataclass
Bases: AbstractAnalyzer
Compute the ratio of uniqu values for columns.
Source code in tsumugi-python/tsumugi/analyzers.py
Uniqueness
dataclass
Bases: AbstractAnalyzer
Compute the uniqueness of the columns.