综合技术

How I Built a Bio-Surveillance Information System Using Joomla! 3 , TwitterAPIExchange, YAM…

微信扫一扫,分享到朋友圈

How I Built a Bio-Surveillance Information System Using Joomla! 3 , TwitterAPIExchange, YAM…
0

Build an automated electronic information system for monitoring, organizing, and visualizing reports of outbreaks of public health concerns in targeted geographical regions according to location, time, and condition/event.

It processes data from a number of sources, such as: social media, news agencies, and personal blogs. It pushes for filtering using large datasets of keywords and will filter the extracted text that may be of interest. Such a system would be trained to categorize bio-surveillance of interests (or not) and prioritize them based on different levels.

The system would have a user interface display that analyzes data for human interpretation. The result is feedback into the analysis engine for training or to improve its algorithm.

What I Built

An automated system that crawls through various data sources, such as Twitter and Google search engine’s news, and extracts a blob of text, which it then pushes for filtering using large datasets of keywords.

This system filters the extracted text that may be of interest and with a provision of training the system to categorize: bio-surveillance of interests (or not) and prioritize them based on different levels.

Public Libraries Used

Joomla! 3:

Easily provides MVC frameworks, user authentication, and back-end admin.

TwitterAPIExchange:

https://github.com/J7mbo/twitter-api-php

PHP Wrapper for Twitter API v1.1 calls, located at:

(root)/components/com_project/helpers/TwitterAPIExchange.php

YAML UI framework:

http://www.yaml.de/docs/index.html

Similar in usage to Bootstrap, but much lighter and more cross-platform. In Projects, YAML’s corefiles have been compressed, along with some custom CSS, into one file at:

(root)/templates/erp1/css/adm.css

And a few other of its files are at:

(root)/media/vnassets/lib/yaml/

jQuery and jQuery UI:

(root)/media/vnassets/lib/jquery-2.1.4.min.js

(root)/media/vnassets/lib/jqui/

Development Approach

First, there were several meetings to further understand the full requirement specifications and make exceptions where necessary. Also, we looked at the project budget, deployment site, and external influence factors.

Then, after clarifying all grounds, the below six milestones were used.

Milestone 1 : Design and Specification Document

Milestone 2 : Development of data crawler

Milestone 3 : Machine Learning Interface development

Milestone 4 : User interface design and development

Milestone 5 : Integration / Refinements

Milestone 6 : Training

Considerations

Weighing options:

Several weighing options were considered and we finally implemented a weighing algorithm that estimates the average number of people who engage with (and talk about) posted data, based on the authenticity and readership size of the source.

Topics and Keywords:

Topics are items to be tracked online. Keywords are carefully chosen words or partial words that filter each topic to ensure that only references that are relevant to the application are fetched.

For example: topics like “Yellow Fever” and keywords like “outbreak” combine to ensure that all “Yellow Fever” references to traffic wardens or anything else outside of our context of interest are not considered for fetching.

Challenges

One major challenge was that mobile users often do not enable their location on Twitter, and for some users, their actual location cannot be determined because the Internet Service Providers (ISPs) provide users location based on the ISP’s situated headquarters location. Based on the above, regular expression was used in determining the location where the feed is coming from.

Key Learnings

I learned that this can be implemented on-site using Public libraries such as Joomla!, TwitterAPIExchange, YAML UI framework, and jQuery.

Ordinarily, I would have considered the following:

1.BigQuery — delivers major improvements in speed, cost, and real-time querying compared to other big-data databases.

2.Google’s Natural Language Processing API — to achieve a sufficient quality of meaning and categorization from our data filters, within the short period of this project, to build basic pattern-recognition, combined with context-aware entity recognition and sentiment analysis.

3.User interface — to be built in HTML5/CSS3/jQuery to be light, fast, and accessible across desktop, tablet, and most mobile screens with its back-end built on Python/Flask technologies.

Tips and advice

From my experience on this project, what informs the choice of technology is in the clear terms of reference and requirement specification. This will guide one through whether it should be a cloud-based application or on-site hosted application. The best technology, tools, and libraries will be based on these criteria.

The project delivered its phase target and will continue to be developed while training and fine tuning the system’s algorithm to recognize bio-surveillance messages of interest.

阅读原文...

Codementor Tutorials

千宿科技:散落式单体民宿难做大,民宿群落将成行业未来发展新方向

上一篇

微信登录昵称中文乱码 & Emoji表情乱码问题总结

下一篇

您也可能喜欢

评论已经被关闭。

插入图片
How I Built a Bio-Surveillance Information System Using Joomla! 3 , TwitterAPIExchange, YAM…

长按储存图像,分享给朋友