FQL: Boosting Developer Productivity with String Functions

FaunaDB 2.6 introduces new capabilities for string and math functions in Fauna Query Language (FQL). String functions are built-in database functions for string manipulations that developers can leverage out of the box in their code. As a result, developers can use these functions within the database, much closer to the data, instead of on the client side. This delivers gains in both developer productivity and application performance.

Design Considerations in FQL String Functions

While traditional databases assume certain defaults, programming languages have evolved to a certain set of standards in how string functions are used by developers. Since one of the goals for FQL is to closely mimic a developer’s programming experience, we aimed to comply with three key best practices.

1. String Encoding

    Strings are UTF-8 encoded in Fauna, so we can include any properly encoded data in them.

    नमस्ते  Здравствуйте Hello - are all valid values and each string function within Fauna should use the same ground rules to calculate the length or position of a specific character.

    From the user’s point of view, the minimal unit of a string is a character. "A" is a character, and so is "€". The user expects the same behavior irrespective of whether a string is composed of single-byte characters, multi-byte characters, or a mixture of both.

    For example:

    SubString('Hello World',3,2) should return 'lo'  and similarly SubString('Здравствуйте',2,2)  should return ра. In essence, the user always interacts with a string character by character irrespective of the characterset that translates to the behavior of calculating lengths using unicode complete characters, or code points when it comes to multi-byte characters.

    In the future, we plan to provide a separate set of string functions that will allow users to manipulate strings as bytes.

    2. Position of Character

    There is a difference between how most databases assign (or calculate) the position of a character and how programming languages treat positions of characters. For most databases, like Oracle, the substring function works like the following: SUBSTR(String, starting_position, string_length), which translates to

    SQL> select substr('Hello World',3,5) from dual;
    SUBST
    -----
    llo W

    A programming language like Python also requires the start_position and the end_position:

    >>> a="Hello World"
    >>> a[3:8]
    'lo Wo'

    As we can see, the output of the above two is different. Python, and other programming languages assign the first character a position of ‘0’, whereas databases treat the first character as 1.

    Since FaunaDB is looking to integrate with most programming languages and make the developer experience as seamlessly as possible, we decided to use Zero as the starting index position.

    3. Sign of the Starting position

    For functions that substring or find a specific character in a given string when the starting position is marked as negative, we should look at the string in the reverse direction. In that case, the last digit will be assigned a position of 0.

    String Functions in FaunaDB 2.6

    In this section, I will highlight a few interesting ones.

    Function Name

    Description

    Example

    CaseFold

    The casefold function returns a normalized string

    test> Casefold("San FrancIsCo")
    > 'san francisco'

    FindStr

    The FindStr function returns the offset position of a string in another string, or -1 if the string is not found.

    test> FindStr("San#Francisco","#")

    > 3

    Note: The index assumes a starting position of 0.

    FindStrRegex

    The FindStrRegex function returns an array of objects which contains the start position, the end position, and the data which was matched. This function supports regular expressions unlike the FindStr function

    test> FindStrRegex("San Francisco","an")

    [ { start: 1, end: 2, data: 'an' },

     { start: 6, end: 7, data: 'an' } ]


    Length

    The length function returns the length of a string.

    test> Length("San Francisco")

    > 13


    NGram

    The NGram function tokenizes the input terms into n-grams of the given sizes.

    Any non-string terms are returned unmodified.

    test> [NGram("Fauna"),NGram("Fauna",3,4)]

    [ [ 'F', 'Fa', 'a', 'au', 'u', 'un', 'n', 'na', 'a' ],

     [ 'Fau', 'Faun', 'aun', 'auna', 'una' ] ]

    SubString

    The SubString function returns a portion of the “value” string beginning at the character “start” position for “length” characters long.

    test> SubString("San Francisco",1,2)

    'an'




    Conclusion

    As you can see from these examples, the availability of these string functions in FQL helps developers write more efficient code closer to the data. Executing functions within the database also ensure global data correctness and improves the overall robustness of your app. A complete list of string functions along with their detailed description, purpose, and examples are available in the FaunaDB documentation.

    We plan to introduce new data processing functions in FQL at a regular cadence. If there are some you’d like to see, please reach me at product@fauna.com. I’d love to talk to you.