Introducing new capabilities for teams, third-party authentication and real-time database streaming!
Read More ->
Fauna logo
FeaturesPricing
Learn
Customers
Company
Support
Log InSign Up
Fauna logo
FeaturesPricing
Customers
Sign Up
© 2020 Fauna, Inc. All Rights Reserved.
<- Back

FQL: Boosting Developer Productivity with String Functions

Deba Chatterjee|Feb 6th, 2019|

Categories:

Product

FaunaDB 2.6 introduces new capabilities for string and math functions in Fauna Query Language (FQL). String functions are built-in database functions for string manipulations that developers can leverage out of the box in their code. As a result, developers can use these functions within the database, much closer to the data, instead of on the client side. This delivers gains in both developer productivity and application performance.

Design Considerations in FQL String Functions

While traditional databases assume certain defaults, programming languages have evolved to a certain set of standards in how string functions are used by developers. Since one of the goals for FQL is to closely mimic a developer’s programming experience, we aimed to comply with three key best practices.

1. String Encoding

Strings are UTF-8 encoded in Fauna, so we can include any properly encoded data in them.

नमस्ते  Здравствуйте Hello - are all valid values and each string function within Fauna should use the same ground rules to calculate the length or position of a specific character.

From the user’s point of view, the minimal unit of a string is a character. "A" is a character, and so is "€". The user expects the same behavior irrespective of whether a string is composed of single-byte characters, multi-byte characters, or a mixture of both.

For example:

SubString('Hello World',3,2) should return 'lo'  and similarly SubString('Здравствуйте',2,2)  should return ра. In essence, the user always interacts with a string character by character irrespective of the characterset that translates to the behavior of calculating lengths using unicode complete characters, or code points when it comes to multi-byte characters.

In the future, we plan to provide a separate set of string functions that will allow users to manipulate strings as bytes.

2. Position of Character

There is a difference between how most databases assign (or calculate) the position of a character and how programming languages treat positions of characters. For most databases, like Oracle, the substring function works like the following: SUBSTR(String, starting_position, string_length), which translates to

SQL> select substr('Hello World',3,5) from dual;
SUBST
-----
llo W

A programming language like Python also requires the start_position and the end_position:

>>> a="Hello World"
>>> a[3:8]
'lo Wo'

As we can see, the output of the above two is different. Python, and other programming languages assign the first character a position of ‘0’, whereas databases treat the first character as 1.

Since FaunaDB is looking to integrate with most programming languages and make the developer experience as seamlessly as possible, we decided to use Zero as the starting index position.

3. Sign of the Starting position

For functions that substring or find a specific character in a given string when the starting position is marked as negative, we should look at the string in the reverse direction. In that case, the last digit will be assigned a position of 0.

String Functions in FaunaDB 2.6

In this section, I will highlight a few interesting ones.

Function Name

Description

Example

CaseFold

The casefold function returns a normalized string

test> Casefold("San FrancIsCo")
> 'san francisco'

FindStr

The FindStr function returns the offset position of a string in another string, or -1 if the string is not found.

test> FindStr("San#Francisco","#")

> 3

Note: The index assumes a starting position of 0.

FindStrRegex

The FindStrRegex function returns an array of objects which contains the start position, the end position, and the data which was matched. This function supports regular expressions unlike the FindStr function

test> FindStrRegex("San Francisco","an")

[ { start: 1, end: 2, data: 'an' },

 { start: 6, end: 7, data: 'an' } ]


Length

The length function returns the length of a string.

test> Length("San Francisco")

> 13


NGram

The NGram function tokenizes the input terms into n-grams of the given sizes.

Any non-string terms are returned unmodified.

test> [NGram("Fauna"),NGram("Fauna",3,4)]

[ [ 'F', 'Fa', 'a', 'au', 'u', 'un', 'n', 'na', 'a' ],

 [ 'Fau', 'Faun', 'aun', 'auna', 'una' ] ]

SubString

The SubString function returns a portion of the “value” string beginning at the character “start” position for “length” characters long.

test> SubString("San Francisco",1,2)

'an'


Conclusion

As you can see from these examples, the availability of these string functions in FQL helps developers write more efficient code closer to the data. Executing functions within the database also ensure global data correctness and improves the overall robustness of your app. A complete list of string functions along with their detailed description, purpose, and examples are available in the FaunaDB documentation.

We plan to introduce new data processing functions in FQL at a regular cadence. If there are some you’d like to see, please reach me at product@fauna.com. I’d love to talk to you.

If you enjoyed our blog, and want to work on systems and challenges related to globally distributed systems, serverless databases, GraphQL, and Jamstack, Fauna is hiring!

Share this post

TwitterLinkedIn

Subscribe to Fauna blogs & newsletter

Get latest blog posts, development tips & tricks, and latest learning material delivered right to your inbox.