Fauna
2.6 introduces new capabilities for string and math functions in Fauna Query Language (FQL). String functions are built-in database functions for string manipulations that developers can leverage out of the box in their code. As a result, developers can use these functions within the database, much closer to the data, instead of on the client side. This delivers gains in both developer productivity and application performance.
Design Considerations in FQL String Functions
While traditional databases assume certain defaults, programming languages have evolved to a certain set of standards in how string functions are used by developers. Since one of the goals for FQL is to closely mimic a developer’s programming experience, we aimed to comply with three key best practices.
1. String Encoding
Strings are UTF-8 encoded in Fauna, so we can include any properly encoded data in them.
नमस्ते Здравствуйте Hello - are all valid values and each string function within Fauna should use the same ground rules to calculate the length or position of a specific character.
From the user’s point of view, the minimal unit of a string is a character. "A" is a character, and so is "€". The user expects the same behavior irrespective of whether a string is composed of single-byte characters, multi-byte characters, or a mixture of both.
For example:
SubString('Hello World',3,2) should return 'lo' and similarly SubString('Здравствуйте',2,2) should return ра. In essence, the user always interacts with a string character by character irrespective of the characterset that translates to the behavior of calculating lengths using unicode complete characters, or code points when it comes to multi-byte characters.
In the future, we plan to provide a separate set of string functions that will allow users to manipulate strings as bytes.
2. Position of Character
There is a difference between how most databases assign (or calculate) the position of a character and how programming languages treat positions of characters. For most databases, like Oracle, the substring function works like the following: SUBSTR(String, starting_position, string_length), which translates to
SQL> select substr('Hello World',3,5) from dual;
SUBST
-----
llo W
A programming language like Python also requires the start_position and the end_position:
>>> a="Hello World"
>>> a[3:8]
'lo Wo'
As we can see, the output of the above two is different. Python, and other programming languages assign the first character a position of ‘0’, whereas databases treat the first character as 1.
Since Fauna is looking to integrate with most programming languages and make the developer experience as seamlessly as possible, we decided to use Zero as the starting index position.
3. Sign of the Starting position
For functions that substring or find a specific character in a given string when the starting position is marked as negative, we should look at the string in the reverse direction. In that case, the last digit will be assigned a position of 0.
String Functions in Fauna 2.6
In this section, I will highlight a few interesting ones.
Function Name | Description | Example |
CaseFold | The casefold function returns a normalized string | test> Casefold("San FrancIsCo") > 'san francisco' |
FindStr | The FindStr function returns the offset position of a string in another string, or -1 if the string is not found. | test> FindStr("San#Francisco","#") > 3 Note: The index assumes a starting position of 0. |
FindStrRegex | The FindStrRegex function returns an array of objects which contains the start position, the end position, and the data which was matched. This function supports regular expressions unlike the FindStr function | test> FindStrRegex("San Francisco","an") [ { start: 1, end: 2, data: 'an' }, { start: 6, end: 7, data: 'an' } ]
|
Length | The length function returns the length of a string. | test> Length("San Francisco") > 13
|
NGram | The NGram function tokenizes the input terms into n-grams of the given sizes. Any non-string terms are returned unmodified. | test> [NGram("Fauna"),NGram("Fauna",3,4)] [ [ 'F', 'Fa', 'a', 'au', 'u', 'un', 'n', 'na', 'a' ], [ 'Fau', 'Faun', 'aun', 'auna', 'una' ] ] |
SubString | The SubString function returns a portion of the “value” string beginning at the character “start” position for “length” characters long. | test> SubString("San Francisco",1,2) 'an'
|
Conclusion
As you can see from these examples, the availability of these string functions in FQL helps developers write more efficient code closer to the data. Executing functions within the database also ensure global data correctness and improves the overall robustness of your app. A complete list of string functions along with their detailed description, purpose, and examples are available in the
Fauna documentation.
We plan to introduce new data processing functions in FQL at a regular cadence. If there are some you’d like to see, please reach me at
product@fauna.com. I’d love to talk to you.