How to Spot Tech Trends Early Using Stack Overflow and GitHub
What’s a proxy for actual tech activity, not just noise?
But “how do I query these sources?” you ask
select concat(datepart(year, CreationDate),'-', datepart(month, CreationDate)) as dt, count(1) num_serverless_mentions from Posts where Title LIKE '%serverless%' group by concat(datepart(year, CreationDate),'-', datepart(month, CreationDate))
- The concat() function turns “CreationDate” into a YYYY-MM date format that Google Sheets can easily turn into a time series graph
- count(1) counts each record in the Posts table that matches our conditions
- Title LIKE '%serverless%' locates all of the different posts with serverless anywhere in the post title
- Replace serverless with any interesting tech trend or language to perform your own analysis 😸
- The group by clause ensures that all posts are counted, in monthly buckets
select concat(cast(year(created_at) as string),'-',cast(month(created_at) as string)) dt, count(1) num_serverless_repos FROM (TABLE_DATE_RANGE([githubarchive:day.], TIMESTAMP('2017-01-01'), TIMESTAMP('2019-06-01') )) where repo.name like '%serverless%' group by 1 order by 1
- As in the SO query, the concat() function turns “created_at” into a YYYY-MM date format that Google Sheets can easily turn into a time series graph.
- FROM (TABLE_DATE_RANGE(githubarchive:day.... requires some explanation. The way this data is stored in BigQuery is in daily tables. To perform a time series query across days, we need to select a range of tables by date. The TABLE_DATE_RANGE function takes three inputs: the table prefix (i.e. githubarchive:day.), and the start and end dates.
- Note that BigQuery limits the date range to 1000 days/tables. So performing this analysis across many years requires a UNION between result tables.
- repo.name like '%serverless%' picks out all of the different posts with "serverless" anywhere in the repo name
- group by 1 and order by 1 tell the query to organize the query by the date, i.e. the first line in the select statement.
If you enjoyed our blog, and want to work on systems and challenges related to globally distributed systems, serverless databases, GraphQL, and Jamstack, Fauna is hiring!
Subscribe to Fauna blogs & newsletter
Get latest blog posts, development tips & tricks, and latest learning material delivered right to your inbox.