[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

grid fault treatment questionnaire



 
Hello! My name is Raissa Medeiros. I am a Ph.D. student at the Universidade Federal
de Campina Grande, Brazil, under the supervision of Prof. Walfredo Cirne and Prof.
Francisco Vilar Brasileiro. We're investigating failures on Grids and since Grids are more complex than
tradictional systems, failures may be tougher to identify and correct.

We're currently trying to understand how people deal with failures in Grids.
To do so, we´ve devised the following questionnaire. If you are a Grid user,
developer or administrator, please respond us. We need your help to capture
the actual experience from those who effectively use Grids. It won't take more than a
minute. (It's only 6 multiple-choice questions :-))

Of course, answers will be taken anonymously. You can either reply this
email with your answers to me, or fill an web-based form at http://www.dsc.ufcg.edu.br/~raissa/survey/form.html

If you are interested on the results of this research, feel free to contact
me at
raissa@dsc.ufcg.edu.br

1. What are the more frequent kinds of failures you face on Grids?
[   ] hardware failures
[   ] middleware failures
[   ] application failures
[   ] configuration failures (software incompatibility, wrong conf file etc)
[   ] others: _______________________________________________________

 2. What are the used mechanisms for detecting and/or correcting and/or
tolerating faults?
[   ] visualization tools (e.g. Mapcenter)
[   ] monitoring systems (e.g. JAMM)
[   ] fault-tolerant scheduling
[   ] checkpointing-recovery
[   ] application-dependent
[   ] others: _______________________________________________________

3. What are the greatest problems when you need to recover from a failure
scenario?
[   ] to implement the application-specific fault behavior in an automatic way
[   ] to gain authorization to correct the faulty component
[   ] to diagnose the fault
[   ] others: _______________________________________________________

 4. To what degree is the user involved during the failure recovery process?
[   ] high (the user needs to be highly involved in this process because
he/she defines exactly what should be done)
[   ] medium (the user needs to be involved, but only marginally)
[   ] low (the user doesn´t need to be involved because the system provides
automatic failure recovery)
[   ] others: _______________________________________________________

5. What are the greatest users complains?
[   ] high time to recover from a failure
[   ] complexity of the failure treatment abstractions/mechanisms
[   ] high failure occurrence rate
[   ] others: _______________________________________________________

6. Are there mechanisms for application debugging in your grid
environment?
[   ] yes, good mechanisms that allow me to influence the application
execution (e.g. change a variable value)
[   ] yes, but they only allow me to watch the application execution
[   ] yes, but they don't show me a grid-wide vision of my application (i.e.
their scope is limited to a single resource that comprise the grid)
[   ] no


Raissa Medeiros
PhD Student in Computer Science
Universidade Federal de Campina Grande
Campina Grande - PB - Brazil
raissa@dsc.ufcg.edu.br