How I Built a Bio-Surveillance Information System Using Joomla! 3 , TwitterAPIExchange, YAM…

综合技术 2017-11-13 阅读原文

Build an automated electronic information system for monitoring, organizing, and visualizing reports of outbreaks of public health concerns in targeted geographical regions according to location, time, and condition/event.

It processes data from a number of sources, such as: social media, news agencies, and personal blogs. It pushes for filtering using large datasets of keywords and will filter the extracted text that may be of interest. Such a system would be trained to categorize bio-surveillance of interests (or not) and prioritize them based on different levels.

The system would have a user interface display that analyzes data for human interpretation. The result is feedback into the analysis engine for training or to improve its algorithm.

What I Built

An automated system that crawls through various data sources, such as Twitter and Google search engine's news, and extracts a blob of text, which it then pushes for filtering using large datasets of keywords.

This system filters the extracted text that may be of interest and with a provision of training the system to categorize: bio-surveillance of interests (or not) and prioritize them based on different levels.

Public Libraries Used

Joomla! 3:

Easily provides MVC frameworks, user authentication, and back-end admin.

TwitterAPIExchange:

https://github.com/J7mbo/twitter-api-php

PHP Wrapper for Twitter API v1.1 calls, located at:

(root)/components/com_project/helpers/TwitterAPIExchange.php

YAML UI framework:

http://www.yaml.de/docs/index.html

Similar in usage to Bootstrap, but much lighter and more cross-platform. In Projects, YAML's corefiles have been compressed, along with some custom CSS, into one file at:

(root)/templates/erp1/css/adm.css

And a few other of its files are at:

(root)/media/vnassets/lib/yaml/

jQuery and jQuery UI:

(root)/media/vnassets/lib/jquery-2.1.4.min.js

(root)/media/vnassets/lib/jqui/

Development Approach

First, there were several meetings to further understand the full requirement specifications and make exceptions where necessary. Also, we looked at the project budget, deployment site, and external influence factors.

Then, after clarifying all grounds, the below six milestones were used.

Milestone 1 : Design and Specification Document

Milestone 2 : Development of data crawler

Milestone 3 : Machine Learning Interface development

Milestone 4 : User interface design and development

Milestone 5 : Integration / Refinements

Milestone 6 : Training

Considerations

Weighing options:

Several weighing options were considered and we finally implemented a weighing algorithm that estimates the average number of people who engage with (and talk about) posted data, based on the authenticity and readership size of the source.

Topics and Keywords:

Topics are items to be tracked online. Keywords are carefully chosen words or partial words that filter each topic to ensure that only references that are relevant to the application are fetched.

For example: topics like "Yellow Fever" and keywords like "outbreak" combine to ensure that all "Yellow Fever" references to traffic wardens or anything else outside of our context of interest are not considered for fetching.

Challenges

One major challenge was that mobile users often do not enable their location on Twitter, and for some users, their actual location cannot be determined because the Internet Service Providers (ISPs) provide users location based on the ISP's situated headquarters location. Based on the above, regular expression was used in determining the location where the feed is coming from.

Key Learnings

I learned that this can be implemented on-site using Public libraries such as Joomla!, TwitterAPIExchange, YAML UI framework, and jQuery.

Ordinarily, I would have considered the following:

1.BigQuery — delivers major improvements in speed, cost, and real-time querying compared to other big-data databases.

2.Google's Natural Language Processing API — to achieve a sufficient quality of meaning and categorization from our data filters, within the short period of this project, to build basic pattern-recognition, combined with context-aware entity recognition and sentiment analysis.

3.User interface — to be built in HTML5/CSS3/jQuery to be light, fast, and accessible across desktop, tablet, and most mobile screens with its back-end built on Python/Flask technologies.

Tips and advice

From my experience on this project, what informs the choice of technology is in the clear terms of reference and requirement specification. This will guide one through whether it should be a cloud-based application or on-site hosted application. The best technology, tools, and libraries will be based on these criteria.

The project delivered its phase target and will continue to be developed while training and fine tuning the system's algorithm to recognize bio-surveillance messages of interest.

Codementor Tutorials

责编内容by:Codementor Tutorials阅读原文】。感谢您的支持!

您可能感兴趣的

Tips for Writing Good jQuery Plugins If you are JavaScript or web developer , you probably know jQuery. It wouldn’t be wrong to say that it is the lifeline ...
Passing data to a jQuery function on the server si... Given I have: $("a.clickable").livequery('click', (function (e) { var values = $(this).attr('id').s...
25+ Top Free Responsive Joomla CMS Templates to Do... Joomla is a powerful CMS that allows anyone to create a website without any technical knowledge. You can use it create a...
Ajax jQuery Fade in php content Im using a timer to call an ajax function that loads in a response from a seperate php page. I want to make it so the co...
jQuery and CSS positioning conflicts I'm writing a small viewer for a few images. I'm having trouble and I'm not sure what the cause is, but my guess is it's...